Querying Intervals

Finding the row with the interval containing X

The Problem
No Surprise
The Cause
Solutions

The Problem

Background: Each row will contain a range, a lower bound and an upper bound, and a piece of data associated with the range:
Row: ID, lower, upper, more columns...

The problem is simple- given a number X, we want to find the ID of the row where:
lower <= X <= upper
Possibly no row will match.

Our simple query is:

    SELECT stuff FROM ranges WHERE (lo <= ? AND hi >= ?);

The table "ranges" was made as follows, and to ensure good performance, the designers added an index on each of the bounds. They trust Oracle will figure out the best way to use them.

    CREATE TABLE ranges(
        range_id NUMBER(15),
        lo NUMBER(15),
        hi NUMBER(15),
        about 4 more columns...
    );
    CREATE INDEX ranges_lo ON ranges(lo);
    CREATE INDEX ranges_hi ON ranges(hi);

The real problem is that the customer called back to say a statspack says this query is taking 22% of the CPU time!!!

No Surprise

Of course part of making a query should be:
- running an explain plan
- better, producing an Oracle trace
- unit-testing the query on a large data set plus generating a statspack
- notifying QA to test this on a large amount of data

Often, development organizations don't know to do these, or how to do them, or don't budget the time, or the query is deceptively simple (like this one), so they think it's not necessary.

But we should be suspect, because Oracle hasn't a clue that our table contains ranges. Oracle doesn't know:
- lo <= hi
- the ranges do not overlap.
- the answer will be one or zero rows

In hindsight, it's not too surprising an error like this got through.

Lessons:

Don't depend on Oracle being smart
Use explain-plan, trace, test, and notify QA
Remember that often simple queries are expensive
If there are helpful relationships or constraints, ensure your query uses them explicitly.

The Cause

With hindsight, now that we know it's a problem, we can make sense out of it. Say the table has N rows.

Since Oracle knows nothing about the relationship between the two columns, it thinks they're two unrelated sets of random integers. It expects on average, about half, or N/2 rows of the table will satisfy the first condition (lo < ?) and a different half will satisfy the second (? < hi).

So Oracle thinks to itself: "If I use an index, I'll have to read in about N/2 index entries, and then fetch those N/2 rows of the table to see if they satisfy the second condition. If I'm going to read in N/2 index entries and then another N/2 rows, I might as well simplify it and just scan the whole table of N rows."

Similarly, it might consider: "Maybe I could change the AND to a JOIN and fetch those N/2 rows that satisfy Lo<X and join them with the about N/2 rows that satisfy X<hi, sort and merge them and find the intersection. No- then I'll be copying about N rows PLUS sorting..."

So it cleverly realizes the fastest thing to do is skip the indexes and scan the whole table (and proudly believe it has a really smart optimizer...)

Solutions

Here are 6 ideas. (If you're in a hurry, skip to the answer.) One can be implemented in production, without changing the code. Two are probably just wishes. The others require code changes.

Not changing query: A bit of help- making a 2-column index
Not changing query: Wish- will an ANALYZE help?
Not changing query: Wish- we could add a constraint
Changing the query: A bit of help- using BETWEEN
Changing the query: Solution- Ask for exactly what we want
Changing the query: Similarly, but if ranges can nest

A bit of help- making a 2-column index

One way to improve this is to make an index on BOTH columns. (I don't know if there's a word for this kind of index- Oracle doesn't give it a term, perhaps "composite" or "multi-column" index.) This way, Oracle can find the rows it needs just by accessing the index. So it thinks to itself, "I can scan in about N/2 rows of the index, and find and return exactly those rows that satisfy the condition."

    create index ranges_lohi on ranges(lo,hi);
    drop index ranges_lo;  -- no longer needed

This will make the query a bit more than twice as fast, on average. (A bit more because the index takes up less memory than a table row- about 1/5th.) Note if the range we're looking for is high, Oracle will still look through most of the table. This is an improvement, but it's still very inefficient compared to asking Oracle for just the correct row.

Wish- will an ANALYZE help

I don't have the time to look this up now. We could run on production:

    ANALYZE ranges COMPUTE STATISTICS;

But I very much doubt it will look for, or discover, relationships between two columns...

Wish- we could add a constraint

Can we add a constraint? This might indeed help- but, in 9.2.0.4 at least (I think) we can't even tell oracle that
lo <= hi

And this wouldn't help much. What we really want is to say the ranges are unique and Oracle doesn't know what a range is. (It knows what a date interval is, but that's a length, not two end-points...)

A bit of help- using BETWEEN

One obvious problem here is that we have TWO parameters when they're actually both the same. Often we can't tell Oracle they're both the same, but in this case we can:

    SELECT stuff FROM ranges WHERE ? BETWEEN lo AND hi;

This is cleaner, but it doesn't solve the problem. Oracle still does not know that [lo..hi] is a range- it thinks these are just random numbers. This would only help if Oracle can learn more about the table. For instance, if Oracle realizes the average of each column is about the same, it could use index on the Hi column if X is larger than average and the Lo index if X is smaller than average. Then on average, it would be looking at N/4 index entries and rows. But Oracle is not this smart...

Solution- Ask for exactly what we want

To make a better query, let's be explicit about the relationships. What's missing from our query is the knowledge that no two ranges overlap. So we really just need the single largest lo <= X. Thus we can write the query:

    SELECT range_id FROM ranges WHERE
      lo = (SELECT MAX(lo) FROM ranges WHERE lo <= X) AND
      X <= hi;

Oracle will try to satisfy the '=' condition first, and if Oracle is smart enough to optimize the inner query, it can make it be a single look-up, so this query will take very little time- just one lookup in the index, then one lookup in the table to read that row, then see if this row satisfies the second query.

Unfortunately, the explain plan isn't convincing:

      1 SELECT STATEMENT
      2 TABLE ACCESS     BY INDEX ROWID   RANGES     
      3 INDEX            RANGE SCAN       RANGES_LOHI
      4 SORT             AGGREGATE                   
      5 INDEX            RANGE SCAN       RANGES_LOHI

Why is Oracle doing the sort? Can it not figure out that the highest one of the range is the one it wants? Sigh...

So the "answer" is to tell Oracle to return one row that's always there, and to give Oracle a special hint about how to find it:

    SELECT range_id FROM 
       ( SELECT /* INDEX_DESC(ranges ranges_lo) */
           lo, high, range_id                          <--- BEST QUERY!
         FROM ranges WHERE lo <= X and rownum=1 )
    WHERE X <= hi;

Notice the inner query- we're telling Oracle to use the index on Lo starting from the highest values (the INDEX_DESC means search "descending" from high to low). So it'll first find the largest value of Lo where Lo<X. Then the "rownum=1" condition tells it to look no further! So the inner query returns the values from this one row using one lookup. Then the outer query returns the row if X is less than the high value, otherwise it returns no rows.

This is confirmed by the explain plan (again, #5 is the part of the inner query which is done first):

      1 SELECT STATEMENT
      2 VIEW
      3 COUNT            STOPKEY
      4 TABLE ACCESS     BY INDEX ROWID   RANGES
      5 INDEX            RANGE SCAN       RANGES_LO

Don't believe an Explain Plan completely

A fast-looking explain plan does NOT prove the query is fast.

First, keep in mind that explain Plan ignores hints! If we used INDEX_ASC instead of INDEX_DESC, the explain plan would look the same- it would just always test the lowest range in the table and thus almost always be wrong. Testing is key! (With Oracle tracing turned on.)

Second, consider the following query. It's Explain Plan leads us to believe it's fast, but it's much slower than the query above.

    SELECT /* INDEX_DESC(ranges ranges_lohi) */
           range_id FROM ranges WHERE x BETWEEN lo AND hi AND rownum=1;

This seems to do it- it uses the index, starting at the highest value possible, to find the range where lo <= x. Since it fetches these rows in descending order (highest first), it finds what WE know is the only possible row first. By restricting the search to 1 row, we get the single row with the largest lo value. This is confirmed by the explain plan, below.

      1 SELECT STATEMENT
      2 COUNT            STOPKEY     
      3 TABLE ACCESS     BY INDEX ROWID   RANGES
      4 INDEX            RANGE SCAN       RANGES_LOHI

The problem is that if NO row satisfies the conditions, it'll keep moving down the index, till it runs out of rows. So it's fast if there's a range, but slow if there's none, even though the plan looks good.

What if the ranges nest?

If the ranges nest, the "fastest" solution depends on how your data is organized and what kind of output you want.

For instance, if at most 2 ranges nest, you can add a nesting level to each row, (which needs to be maintained during insert and delete, adding to the complexity), and add a multi-column index on Level,Lo, and simply run the fast query twice, once for each level (you can pass 2 queries to Oracle using "union".)

Also consider a different solution. Say you have lots of nesting with relatively random distribution and an order-N search isn't good enough- consider bucket-sorting. Say the range boundaries are 32-bit numbers and very few ranges contain more than 65000 numbers between Lo and Hi. Add a Lo65k column to contain the high 16-bits of Lo. Each row with fewer than 65k entries will either fit into a 65k bucket or two, in which case you can split the row into two. For the few rows with larger ranges, we'll just set its "bucket" to 70k. Then we can just do an order-N search of all the rows with bucket = (Lo/65k) or bucket = 70k.

The bottom line is that are many, many possible problems and many require proficiency in Oracle and/or problem-solving to craft a solution. For those without expertise: test and run Oracle trace and learn at least enough to interpret the results. And if you run into a tough problem, hire a consultant (or contact me.)

To reach me, use the form at the bottom of my home page.
-r