Friday, 21 February 2014

Oracle Technology Network's FREE Virtual Database Developer Day in EMEA on March 4 @9:00am GMT

otn virtual dvlper day

Oracle Technology Network's FREE Virtual Developer Day is coming to EMEA on March 4 starting at 9:00am GMT. This fantastic online event for DBAs and Database Developers is focused on building and deploying Oracle Database apps using modern techniques. My SQL pattern matching workshop will be at 11:45am GMT and lasts for 45 minutes. This is the same event that we ran in the US earlier this month and the response was amazingly positive from all those people that joined us.
To make things amazingly easy for you and to make sure that you get the best experience from this event we are providing a pre-built virtual machine which contains all the software pre-installed and configured along with all the workshops for the two Database 12c tracks: DBA and Developer. You will find the links to download the VM on the event registration page.
The 12c DBA track will be running the following sessions:
09:00 AM - Oracle VM performance with Oracle Database 12c
10:15 AM - Automatic Data Optimization (ADO) features of Oracle 12c
11:00 AM - RAC vs. Multitenant
11:45 AM - Pattern matching in SQL
12:30 PM - Database as a Service in a DBA's World
The 12c Developer track will run the following sessions:
09:00 AM - SQL Data Modeler 4.0
10:15 AM - SQL Developer 4.0
11:00 AM - XML Database
11:45 AM - Building Mobile Web Applications with Oracle Application Express 4.2
12:30 PM - Best Practices for Performance, Scalability, & Availability with Oracle Database12c
If you are interested in big data there are two tracks for big data developers and DBAs that cover all the basic concepts of Hadoop, how to process data (social media streams) on Hadoop and then push data to Oracle Database.
Of course you can mix and match topics across all four of these streams and don't worry if you end-up with wanting to attend two sessions running at the same time because you won't miss out on any of these sessions because all the workshops are included in the VM. To register for this event please follow this link: https://oracle.6connex.com/portal/database2014EMEA/login?langR=en_US&mcc=socspk.
During my SQL pattern matching workshop (@11:45am, March 4) I will be on the live Q&A chat to answer any questions so mark March 4 in your diaries and I hope to see you online during the event.

Technorati Tags: , , , , ,

Thursday, 6 February 2014

Sessionization with 12c SQL pattern matching is super fast

To help us evaluate our new 12c pattern matching feature our development team ran a series of performance tests using the TPC-H schema. The data set contained approximately seven years of data which consumed about 1TB of space. As part of the tests we ran two different searches:
the first search was based on a sessionization pattern using the orders table and this was compared to using a window function
the second search was based on a W-pattern using the line-item table and this was also compared to using a window function.
Let's examine these two use cases in more detail.

Sessionization

To create the first part of the sessionization workflow we took the original source data and used the USER_ID as the PARTITION BY  key and the timestamp for the ORDER BY clause. In this example we defined a session as a sequence of one or more events with the same partition key (USER_ID) where the gap between the timestamps is less than 10 seconds - obviously the figure for the gap is completely arbitrary and could be set to any number as required. The objective for this first step is to detect the various sessions and assign a surrogate session id to each session within each partition (USER_ID).
This creates an output result set that delivers a simplified sessionization data set as shown here:

NewImage

The SQL to create the initial result set is as follows:
SELECT user_id, session_id start_time, no_of_events, duration
FROM Events MATCH_RECOGNIZE
 (PARTITION BY User_ID ORDER BY Time_Stamp 
  MEASURES match_number() session_id, 
           count(*) as no_of_events,
           first(time_stamp) start_time, 
           last(time_stamp) - first(time_stamp) duration 
  ONE ROW PER MATCH 
  PATTERN (b s*) 
  DEFINE 
       s as (s.Time_Stamp - prev(s.Time_Stamp) <= 10) 
 )
;
as a comparison for how to achieve the above using analytical window functions
CREATE VIEW Sessionized_Events as
SELECT Time_Stamp, User_ID,
 Sum(Session_Increment) 
 over (partition by User_ID order by Time_Stampasc) Session_ID
FROM ( SELECT Time_Stamp, User_ID,
 CASE WHEN (Time_Stamp - 
 Lag(Time_Stamp) over (partition by User_ID 
 order by Time_Stampasc)) < 10
 THEN 0 ELSE 1 END Session_Increment
 FROM Events)
;
SELECT User_ID, Min(Time_Stamp) Start_Time,
 Count(*) No_Of_Events, Max(Time_Stamp) -Min(Time_Stamp) Duration
FROM Sessionized_Events
GROUP BY User_ID, Session_ID;

As you can see the non-12c approach is a little more complex to understand but it produces the same output - i.e. our initial sessionized data set.
However, to get business value from this derived data set we need to do some additional processing.  Typically, with this kind of analysis the business value of within the data emerges only after aggregation which in this case needs to by session. We need to reduce the data set to a single tuple, or row, per session with some derived attributes
  • Within-partition Session_ID
  • Number of events in a session
  • Total duration 
To do this we can use the MATCH_RECOGNIZE clause to determine how many events are captured within each session. There are actually two ways to do this: 1) we can compare the current record to the previous record, i.e. peek backwards or 2)  we can compare the current record to the next record, i.e. peek forwards.
Here is code based on using the PREV() function to check the current record against the previous record:
select count(*) 
from ( select /* mr_sessionize_prev */ *
 from
 ( select o_pbykey, session_id, start_time, no_of_events, duration
 from orders_v MATCH_RECOGNIZE
 (
  PARTITION BY o_pbykey
  ORDER BY O_custkey, O_Orderdate
  MEASURES match_number() session_id, count(*) as no_of_events, 
           first(o_orderdate) start_time,
           last(o_orderdate) - first(o_orderdate) duration
  PATTERN (b s*)
  DEFINE s as (s.O_Orderdate - prev(O_Orderdate) <= 100)
 )
 )
 where No_Of_Events >= 20
); 
Here is code based on using the NEXT() function to check the current record against the next record:
select count(*) 
from ( select /* mr_sessionize_prev */ *
 from
 ( select o_pbykey, session_id, start_time, no_of_events, duration
 from orders_v MATCH_RECOGNIZE
 (
PARTITION BY o_pbykey
ORDER BY O_custkey, O_Orderdate
MEASURES match_number() session_id, count(*) as no_of_events, 
           first(o_orderdate) start_time,
           last(o_orderdate) - first(o_orderdate) duration
PATTERN (s* e)
DEFINE s as (next(s.O_Orderdate) - s.O_Orderdate <= 100)
 )
 )
 where No_Of_Events >= 20
Finally we can compare the 12c MATCH_RECOGNIZE code to the typical 11g code using window functions (which in my opinion is a lot more complex):
select count(*)
from (
 select /* wf */ *
 from (select O_pbykey, Session_ID, min(O_Orderdate) Start_Time, 
               count(*) No_Of_Events,
              (max(O_Orderdate) - Min(O_Orderdate)) Duration
        from ( select O_Orderdate, O_Custkey, o_pbykey, 
                      sum(Session_Increment) 
                      over(partition by o_pbykey order by O_custkey, O_Orderdate) Session_ID
               from ( select O_Custkey, O_Orderdate, o_pbykey,
                             case when (O_Orderdate –
                             Lag(O_Orderdate) 
                                 over(partition by o_pbykey 
                                      order by O_custkey, O_Orderdate)) <= 100 -- Threshold
                             then 0 else 1 end Session_Increment
                      from orders_v
                    )
            )
       group by o_pbykey, Session_ID
    )
 where No_Of_Events >= 20

The output would look something like this:
NewImage

The performance results for these three approaches are shown below: column 3 shows the SQL using window functions and columns 4 & 5 show the performance figures for MATCH_RECOGNIZE based on using either PREV() or NEXT(). As you can see from the results, it is worth investigating how you create your comparison test in terms of looking forward or backwards to do the comparison test because there can be differences in performance. In general the 12c MATCH_RECOGNIZE code is between 1.5x and 1.9x faster compared to using window functions.

NewImage



Wednesday, 5 February 2014

OTN Virtual Developer Day content now available on-demand

Thank you to everyone who attended the SQL pattern matching session during yesterday's OTN Virtual Developer Day event. We had a great crowd of people join our live workshop session. I hope everyone enjoyed using the amazing platform which the OTN team put together to host the event.  

The great news is that all the content from the event is now available for download and you can watch the all on-demand videos from the four tracks (Big Data DBA, Big Data Developer, Database DBA and Database Developer). 

The link to fantastic OTN VDD platform is here: https://oracle.6connex.com/portal/database2014/login?langR=en_US&mcc=aceinvite and this is what the landing pad page looks like:

OTNVDD Me

 

This page will give you access to the keynote session by Tom Kyte and Jonathan Lewis which covered the landscape of Oracle DB technology evolution and adoption.  The content looks at what's next for Oracle Database 12c looking at the high value technologies and techniques that are driving greater database efficiencies and innovation.

You will be able to access the videos, slides from each presentation and a huge range of technical hands-on labs covering big data and database technologies, including my SQL Pattern Matching workshop. If you want to download the the Virtualbox image for the Database tracks it is available here: http://www.oracle.com/technetwork/database/enterprise-edition/databaseappdev-vm-161299.html (this contains everything you need to run my SQL Pattern Matching workshop).

While you doing the workshop, if you have any questions then please feel free to email me - keith.laker@oracle.com.

Enjoy.

 

Technorati Tags: , , ,