MATCH_RECOGNIZE - What should I include in the MEASURES clause?

Image courtesy of wikipedia

This post is the result of reviewing a post on stackoverflow.com: http://stackoverflow.com/questions/41649178/getting-error-ora-00918-when-using-match-recognize. Here is my version of the code which includes the same issues/errors as the original, however, I am using the TICKER schema table that I always use for the tutorials that I post on liveSQL :

SELECT symbol, tstamp, price 
FROM ticker
MATCH_RECOGNIZE(
PARTITION BY symbol
ORDER BY symbol, tstamp
MEASURES
a.symbol AS a_symbol,
a.tstamp AS a_date,
a.price AS a_price
ALL ROWS PER MATCH
PATTERN(A B*)
DEFINE
B AS (price < PREV(price))
);

The above example will not run because of the following error:

ORA-00918: column ambiguously defined
00918. 00000 - "column ambiguously defined"
*Cause:
*Action:
Error at Line: 1 Column: 8

So what is wrong with our code? As MATHGUY pointed out in his reply on stackoverflow.com - quite a lot actually! Let’s start by differentiating between “won’t run” and “wrong”. The ORA-918 error is easy to resolve if you stare long enough at the ORDER BY clause! It’s pointless to include the SYMBOL column as the partition by key and the order by key. If we change the ORDER BY clause as shown here then the code will run:

SELECT symbol, tstamp, price 
FROM ticker
MATCH_RECOGNIZE(
PARTITION BY symbol
ORDER BY tstamp
MEASURES
a.symbol AS a_symbol,
a.tstamp AS a_date,
a.price AS a_price
ALL ROWS PER MATCH
PATTERN(A B*)
DEFINE
B AS (price < PREV(price))
); 

which returns the following resultset (all 60 rows from our source ticker table):

All rows from the ticker table

 

No MEASURE clause

Okay so our code is running now what? If you look at the output you will notice that it contains the same rows and columns as the source table. What happens if we omit the MEASURES clause? Well it’s optional so the code should still run…

SELECT symbol, tstamp, price 
FROM ticker
MATCH_RECOGNIZE(
PARTITION BY symbol
ORDER BY tstamp
ALL ROWS PER MATCH
PATTERN(A B*)
DEFINE
B AS (price < PREV(price))
); 

and sure enough we get the same resultset (all 60 rows from our source ticker table):

All rows from the ticker table

 

ALL ROWS PER MATCH vs. ONE ROW

from the above we can assume that there is no need to list the source columns from your input table in the MEASURE clause because they are automatically included in the output. BUT this is ONLY true when you use ALL ROWS PER MATCH. If we change the output control to ONE ROW PER MATCH:

SELECT symbol, tstamp, price
FROM ticker
MATCH_RECOGNIZE(
PARTITION BY symbol
ORDER BY tstamp
ONE ROW PER MATCH
PATTERN(A B*)
DEFINE
B AS (price < PREV(price))
); 

you will now get an error:

ORA-00904: "PRICE": invalid identifier

00904. 00000 - "%s: invalid identifier"
*Cause:
*Action:
Error at Line: 19 Column: 24

 

because when using ONE ROW PER MATCH the only columns that are automatically returned are those listed in the PARTITION BY clause. Therefore, we need to use either “SELECT * FROM …..” or “SELECT symbol FROM…” and list all the columns we want to be returned to get a working version of our code. Using “SELECT * FROM…” as follows:

SELECT * 
FROM ticker
MATCH_RECOGNIZE(
PARTITION BY symbol
ORDER BY tstamp
ONE ROW PER MATCH
PATTERN(A B*)
DEFINE
B AS (price < PREV(price))
); 

 

actually returns only one column (symbol) from the ticker table:

 

Only partition by column returned

So what should we include in the MEASURE clause?

Based on the query that was in the original post I think the following syntax would make it easier to understand what is happening within the pattern matching process and provide useful information about the data that matches the pattern:

SELECT symbol, tstamp, price, first_date, last_date, first_price, last_price, m_n, classi 
FROM ticker
MATCH_RECOGNIZE(
PARTITION BY symbol
ORDER BY tstamp
MEASURES
FIRST(b.tstamp) AS first_date,
LAST(b.tstamp) AS last_date,
FIRST(b.price) AS first_price,
LAST(b.price) AS last_price,
match_number() AS m_n,
classifier() AS classi
ALL ROWS PER MATCH
PATTERN(A B*)
DEFINE
B AS (price < PREV(price))
); 

which gives us the following output based on ALL ROWS PER MATCH:

Output from Amended Measure Clause

 

and if we want to switch to using ONE ROW PER MATCH then we need to remove references to the columns tstamp and price and replace them with references to the pattern variable specific versions, or we can just remove the references all together. In this case as we only have two pattern variables we can NVL the references to return the required data:

SELECT symbol, o_tstamp, o_price, first_date, last_date, first_price, last_price, m_n, classi 
FROM ticker
MATCH_RECOGNIZE(
PARTITION BY symbol
ORDER BY tstamp
MEASURES
nvl(a.tstamp, b.tstamp) as o_tstamp,
nvl(a.price, b.price) as o_price,
FIRST(b.tstamp) as first_date,
LAST(b.tstamp) as last_date,
FIRST(b.price) as first_price,
LAST(b.price) as last_price,
match_number() as m_n,
classifier() as classi
ONE ROW PER MATCH
PATTERN(A B*)
DEFINE
B AS (price < PREV(price))
);

 

the above query generates slightly fewer rows compared with the previous statement because we are only returning one for each match. However, note that this time we are referencing specific instances of tstamp and price.

New Measures For TSTAMP and PRICE

 

Summary

What have we learned:

Point 1: Check your PARTITION BY and ORDER BY clauses to ensure they make sense!

Point 2: there is no need to list the source columns from your input table in the MEASURE clause because they are automatically included BUT ONLY when you use ALL ROWS PER MATCH.

Point 3: Decide on your output method and match the columns listed in the SELECT clause with those returned by either ALL ROWS PER MATCH or ONE ROW PER MATCH.

Point 4: Always a good idea to check your pattern is being applied correctly by using the built-in  MATCH_NUMBER and CLASSIFIER() measures.

 

For more general information about how to get started with MATCH_RECOGNIZE follow these links to previous blog posts:

and checkout the growing library of tutorials on liveSQL.oracle.com. Hope this helps to throw some light on the workings of the MEASURE clause within MATCH_RECOGNIZE.

Technorati Tags: , , , ,

Comments

Popular posts from this blog

The Complete Guide To Data Warehousing and Big Data at Oracle OpenWorld 2016

Your Essential Online Session and Hands-on Lab Calendars for #oow16

Thursday's Top Picks at OpenWorld for Data Warehousing and Big Data