MATCH_RECOGNIZE: Can I use MATCH_NUMBER() as a filter?

Espresso Machine

Recently I spotted a post on OTN that asked the question: Can MATCH_RECOGNIZE skip out of partition? This requires a bit more detail because it raises all sorts of additional questions. Fortunately the post included more information which went something like this:
after a match is found I would like match_recognize to stop searching - I want at most one match per partition. I don’t want to filter by MATCH_NUMBER() in an outer query - that is too wasteful (or, in some cases, I may know in advance that there is at most one match per partition, and I don’t want match_recognize to waste time searching for more matches which I know don't exist).
Can MATCH_RECOGNIZE do this? Short answer is: NO.
Long answer is: Still NO.

Going back to the original question… you could interpret it as asking “is it possible to only return the first match”? The answer to this question is YES, it is possible.

There are a couple of different ways of doing it. Let’s use our good old “TICKER”  data set. The point of this exercise is to simply show that there are different ways to achieve the same result but in the end you need to look beyond what is passed back from MATCH_RECOGNIZE to understand what is going on as we process the rows from our TICKER table…

In simple terms, I only want my query to return the first match. Here is my starting query:
SELECT 
 symbol,
 tstamp,
 price,
 match_number,
 classifier,
 first_x,
 last_y 
FROM ticker
MATCH_RECOGNIZE (
 PARTITION BY symbol
 ORDER BY tstamp
 MEASURES
   FIRST(x.tstamp) AS first_x,
   LAST(y.tstamp) AS last_y,
   MATCH_NUMBER() AS match_number,
 CLASSIFIER() AS classifier
 ALL ROWS PER MATCH WITH UNMATCHED ROWS
 AFTER MATCH SKIP PAST LAST ROW
 PATTERN (strt X+ Y)
 DEFINE 
   X AS (price <= PREV(price)),
   Y AS (price >= PREV(price))
)
ORDER BY symbol, match_number, tstamp asc;
This is the output from the above query which shows that for each symbol we have multiple matches of our pattern:

Starting Query


Returning only the 1st match

If we want to return just the first match it is simply a matter of using applying a filter on MATCH_NUMBER() as follows:

SELECT 
 symbol,
 tstamp,
 price,
 match_number,
 classifier,
 first_x,
 last_y 
FROM ticker
MATCH_RECOGNIZE (
 PARTITION BY symbol
 ORDER BY tstamp
 MEASURES
   FIRST(x.tstamp) AS first_x,
   LAST(y.tstamp) AS last_y,
   MATCH_NUMBER() AS match_number,
   CLASSIFIER() AS classifier
 ALL ROWS PER MATCH WITH UNMATCHED ROWS
 AFTER MATCH SKIP PAST LAST ROW
 PATTERN (strt X+ Y)
 DEFINE 
  X AS (price <= PREV(price)),
  Y AS (price >= PREV(price))
)
WHERE match_number = 1
ORDER BY symbol, match_number, tstamp asc;
which returns the desired results:
Filter Query

BUT have we saved any processing? That is to say: did MATCH_RECOGNIZE stop searching for matches after the first match was found? NO! Checking the explain plan we can see that all 60 rows from our table where processed:
Explain Plan

Anyway the original post pointed out that simply filtering was not what they wanted so we can discount using MATCH_NUMBER within the WHERE clause. Although it does sort of achieve the result we wanted.

Let’s try an alternative approach. Can we limit the number of rows that are processed by using the exclude syntax within the PATTERN clause?

SELECT 
 symbol,
 tstamp,
 price,
 match_number,
 classifier,
 first_x,
 last_y 
FROM ticker
MATCH_RECOGNIZE (
 PARTITION BY symbol
 ORDER BY tstamp
 MEASURES
   FIRST(x.tstamp) AS first_x,
   LAST(y.tstamp) AS last_y,
   MATCH_NUMBER() AS match_number,
   CLASSIFIER() AS classifier
 ALL ROWS PER MATCH
 AFTER MATCH SKIP PAST LAST ROW
 PATTERN (strt X+ Y c*)
 DEFINE 
   X AS (price <= PREV(price)),
   Y AS (price >= PREV(price))
)
ORDER BY symbol, match_number, tstamp asc;
I have added another pattern variable “c” but made it always true by not providing a definition within the DEFINE clause.

Single Match for each symbol
This is getting close to what we might need because now we have only one match for each symbol. Therefore, if we now use the exclude syntax around the pattern variable c we should be able to remove all matches except the first!
SELECT 
 symbol,
 tstamp,
 price,
 match_number,
 classifier,
 first_x,
 last_y 
FROM ticker
MATCH_RECOGNIZE (
 PARTITION BY symbol
 ORDER BY tstamp
 MEASURES
   FIRST(x.tstamp) AS first_x,
   LAST(y.tstamp) AS last_y,
   MATCH_NUMBER() AS match_number,
   CLASSIFIER() AS classifier
 ALL ROWS PER MATCH
 AFTER MATCH SKIP PAST LAST ROW
 PATTERN (strt X+ Y {-c*-})
 DEFINE 
   X AS (price <= PREV(price)),
   Y AS (price >= PREV(price))
)
ORDER BY symbol, match_number, tstamp asc;

which does in fact return exactly the same result as our second query where we applied a filter on the column MATCH_NUMBER:

Filter Query

but if we check the explain plan we can see that yet again all 60 rows were processed.

Explain Plan

Therefore, we have got the right result but we have not been able to actually halt the MATCH_RECOGNIZE processing after the first match has been found.

Returning only the 2nd match

What if we wanted to return only the 2nd match? Well for this use case the exclude syntax is not going to work. The only viable solution in this situation would be to using the match_number column and apply a filter to find the required match. However, all rows from input table will be processed!

…and the final answer is: Enhancement Request

Let’s start with the simple answer to our original problem: after a match is found I would like match_recognize to stop searching
Alas, there is definitely no way to stop MATCH_RECOGNIZE processing all  the rows passed to it. To make this happen we would need to extend the AFTER MATCH SKIP TO syntax to include phrases that let us call a halt to the pattern matching process. What we need is something like “AFTER MATCH SKIP TO END”, however, this assumes that only the first match is important.

What if you wanted the first and the second or maybe it’s the second match that’s of most interest. What we really need then is something like the following: “AFTER MATCH ’N SKIP TO END  where ’N’ indicates the maximum number of matches that you want to process before jumping to the end of the partition.

Assuming I can find enough valid use cases I will put this on the “enhancement” list for MATCH_RECOGNIZE. If you some great use cases for this scenario then please send me the details (keith.laker@oracle.com).

Technorati Tags: , , ,

Comments

Popular posts from this blog

The Complete Guide To Data Warehousing and Big Data at Oracle OpenWorld 2016

Thursday&#39;s Top Picks at OpenWorld for Data Warehousing and Big Data

Your Essential Online Session and Hands-on Lab Calendars for #oow16