Google Analytics

Wednesday, 9 November 2016

SQL Pattern Matching Deep Dive - Part 5, SKIP TO where exactly?


Image courtesy of flicker.com

 

So far in this series we looked at how to ensure query consistency, how correctly use predicates, managing sorting, using the built-in measures to help with optimise your code and the impact of different types of quantifiers:

In this post I am going to review what MATCH_RECOGNIZE does after a match has been found i.e. where the search begins for the next match. It might seem obvious, i.e. you start at the next record, but MATCH_RECOGNIZE provides a lot of flexibility in this specific area (as you would expect).

Basic Syntax

We use the AFTER MATCH SKIP clause to determine the precise point to resume row pattern matching after a non-empty match is found. If you don’t supply an AFTER MATCH SKIP clause then the default is AFTER MATCH SKIP PAST LAST ROW.

Of course there are quite a few options available:

  • AFTER MATCH SKIP TO NEXT ROW Resume pattern matching at the row after the first row of the current match.
  • AFTER MATCH SKIP PAST LAST ROW Resume pattern matching at the next row after the last row of the current match.
  • AFTER MATCH SKIP TO FIRST pattern_variable Resume pattern matching at the first row that is mapped to the pattern variable.
  • AFTER MATCH SKIP TO LAST pattern_variable Resume pattern matching at the last row that is mapped to the pattern variable.
  • AFTER MATCH SKIP TO pattern_variable The same as AFTER MATCH SKIP TO LAST pattern_variable.

 

Using Pattern Variables and ORA-62154

Note that you can set the restart point to be linked to a specific pattern variable which allows you to work with overlapping patterns - i.e. where you are searching for “shapes” within your data set such as “W” shaped patterns within our ticker data stream. But what happens if the pattern variable within the SKIP TO clause is not matched? Let’s look at the following example:

SELECT *
FROM Ticker MATCH_RECOGNIZE (
 PARTITION BY symbol
 ORDER BY tstamp
 MEASURES STRT.tstamp AS start_tstamp,
                     LAST(UP.tstamp) as end_tstamp,
          MATCH_NUMBER() AS match_num,
          CLASSIFIER() AS var_match
 ALL ROWS PER MATCH
 AFTER MATCH SKIP TO DOWN
 PATTERN (STRT DOWN* UP)
 DEFINE
       DOWN AS DOWN.price < PREV(DOWN.price),
       UP AS UP.price > PREV(UP.price)
) MR
WHERE symbol='ACME'
ORDER BY MR.symbol, MR.tstamp;

here we are stating that we need at least zero or more matches of the variable DOWN to occur and once a match has been found then we will resume the search for the next pattern at the DOWN event. With this pattern it is possible that DOWN will never get matched so the AFTER MATCH SKIP TO DOWN cannot happen even though a complete match for the pattern is found. Therefore, the compiler throws an error to let you know that this code will not work:

ORA-62514: AFTER MATCH SKIP TO variable is not bounded in the match found.
62514. 00000 - "AFTER MATCH SKIP TO variable is not bounded in the match found."
*Cause: AFTER MATCH SKIP TO variable was not bound in the match found due
to pattern operators such as |, *, ?, and so on.
*Action: Modify the query and retry the operation

Therefore, you need to change the pattern to search for at least one or more instances of DOWN rather than zero or more as this will allow the DOWN event to be matched at least once and therefore it will be available for AFTER MATCH SKIP TO processing.

Skipping PAST LAST ROW [DEFAULT]

This is the default behaviour and in many circumstances this is the most obvious choice. In these situations the searching for the next pattern it makes sense to resume at the row after the last match since going back over previous rows does not make any sense and would only result in more rows than necessary being processed. For example, let’s look at the sessionization example: http://oracle-big-data.blogspot.co.uk/2014/02/sessionization-with-12c-sql-pattern.html and if you want to try the code see the tutorial on the LiveSQL site.

 Looking at the source data for the sessionization example it’s clear that as we walk through the entries in the log file to check if an entry is part of the current session or not, there is no point in stepping backwards to begin searching again once a match has been found.

Source data for sessionization example

 

You can run the code for this sessionization example on LiveSQL.

Looking for shapes and controlling skipping

As I previously stated, you might think the obvious position to start searching for the next occurrence of a pattern is the next record after the last row of the current match. But what if there are overlapping patterns where the middle of an earlier match overlaps with the start of the next match? For example if we are looking for a w-shaped pattern within our ticker data set then it is quite possible to have overlapping w-shapes where the next “W” starts within the second down phase of the previous ”W”.

Fortunately MATCH_RECOGNIZE  provides great flexibility in terms of being able to specify the restart point. If we look at the source data for the ACME symbol within our ticker data set then we can see that there are overlapping W-shapes (assuming we allow for the flat-top in the middle of the 2nd w-shape by using the <= and >= tests for each pattern variable!).

 

 

 

Twin W-shapes within source data

 

Let’s use this example to explore the various AFTER MATCH SKIP TO options…starting with the default behaviour:

 

SELECT *
FROM Ticker MATCH_RECOGNIZE (
 PARTITION BY symbol ORDER BY tstamp
 MEASURES STRT.tstamp AS start_w,
          LAST(z.tstamp) AS end_w
 ONE ROW PER MATCH
 AFTER MATCH SKIP PAST LAST ROW
 PATTERN (STRT x+ y+ w+ z+)
DEFINE
  x AS x.price <= PREV(x.price),
  y AS y.price >= PREV(y.price),
  w AS w.price <= PREV(w.price),
  z AS z.price >= PREV(z.price)
) MR
WHERE symbol='ACME'
ORDER BY symbol, MR.start_w;

 

returns only one match within the ACME data set:

Only one w-shape found in ACME data set

 

and if we expand the output, using ALL ROWS PER MATCH, so we can see how the pattern was matched we can see that it starts on 05-Apr-11 with pattern variable STRT and ends on 14-Apr-11 with pattern variable Z.

All rows report for single W shape

 

 

Now let’s change the above code sample so that after the first pattern has been found we begin searching at the row after the end of the matching process for the Y variable - i.e. row 6, 10-Apr-11.

SELECT *
FROM Ticker MATCH_RECOGNIZE (
 PARTITION BY symbol ORDER BY tstamp
 MEASURES STRT.tstamp AS start_w,
          LAST(z.tstamp) AS end_w
 ONE ROW PER MATCH
 AFTER MATCH SKIP TO LAST Y
 PATTERN (STRT x+ y+ w+ z+)
 DEFINE
     x AS x.price <= PREV(x.price),
     y AS y.price >= PREV(y.price),
     w AS w.price <= PREV(w.price),
     z AS z.price >= PREV(z.price)
) MR
WHERE symbol='ACME'
ORDER BY symbol,start_w;

which now finds two w-shapes with the second W starting on 10-Apr-11 and ending on 18-Apr-11:

Two w shapes within ticker data stream

but what is going on under-the-covers?

SELECT *
FROM Ticker MATCH_RECOGNIZE (
 PARTITION BY symbol ORDER BY tstamp
 MEASURES STRT.tstamp AS start_w,
          LAST(z.tstamp) AS end_w,
          classifier() AS pv,
          match_number() AS mn,
          count(*) as row_count
 ALL ROWS PER MATCH
 AFTER MATCH SKIP TO LAST Y
 PATTERN (STRT x+ y+ w+ z+)
 DEFINE
      x AS x.price <= PREV(x.price),
      y AS y.price >= PREV(y.price),
      w AS w.price <= PREV(w.price), 
      z AS z.price >= PREV(z.price)
) MR
WHERE symbol='ACME'
ORDER BY symbol, mn, tstamp;

 

now shows us that the records for 10-Apr-11 to 14-Apr-11 were actually processed twice:

 

Detailed report for 2 w-shapes

Skip to next row?

What about using the SKIP TO NEXT ROW syntax? How does that affect our results? It is important to remember that this will force MATCH_RECOGNIZE to resume pattern matching at the row after the first row of the current match. Using our ticker data we can see that this would actually increase the number of W-shapes to three!

 

3 W shapes in our ticker data stream

 

In match 2 we have two occurrences of pattern variable x, there once the second W-shape has been matched the search process restarts on row 12, i.e. the first row of the current match, which is row 12 mapped to STRT.

SELECT *
FROM Ticker MATCH_RECOGNIZE (
 PARTITION BY symbol ORDER BY tstamp
 MEASURES STRT.tstamp AS start_w,
          LAST(z.tstamp) AS end_w
 ONE ROW PER MATCH
 AFTER MATCH SKIP TO NEXT ROW
 PATTERN (STRT x+ y+ w+ z+)
 DEFINE
      x AS x.price <= PREV(x.price),
      y AS y.price >= PREV(y.price),
      w AS w.price <= PREV(w.price),
      z AS z.price >= PREV(z.price)
 ) MR
WHERE symbol='ACME'
ORDER BY symbol, mr.start_w;

creates the following output:

Three w shapes within ticker stream

 

and if we change our code to return the more detailed report we can see how the pattern is being matched:

SELECT *
FROM Ticker MATCH_RECOGNIZE (
 PARTITION BY symbol ORDER BY tstamp
 MEASURES STRT.tstamp AS start_w,
          LAST(z.tstamp) AS end_w,
          classifier() AS pv,
          match_number() AS mn,
          count(*) as row_count
 ALL ROWS PER MATCH
 AFTER MATCH SKIP TO NEXT ROW
 PATTERN (STRT x+ y+ w+ z+)
 DEFINE
      x AS x.price <= PREV(x.price),
      y AS y.price >= PREV(y.price),
      w AS w.price <= PREV(w.price),
      z AS z.price >= PREV(z.price)
) MR
WHERE symbol='ACME'
ORDER BY symbol, mn, tstamp;

 

which produces the following output:

Detailed report showing mapping of 3 w-shapes

 

Note that match two, the 2nd W-shape, starts on line 11 but we began the search for this second match on row 2, i.e. the next row after the first start variable. Similarly, the search for the third W-shape on row 12 after the second STRT variable. Given that our original data set for ACME only contained 20 rows you can see from this example how it is possible to do a lot more processing when you start to fully exploit the power of the AFTER MATCH SKIP syntax.

Just accept the default?

The AFTER MATCH SKIP clause determines the point at which we will resume searching for the next match after a non-empty match has been found. The default for the clause is AFTER MATCH SKIP PAST LAST ROW: resume pattern matching at the next row after the last row of the current match. In most examples of using MATCH_RECOGNIZE you will notice that the AFTER MATCH clause is not present and the developer blindly assumes that the AFTER MATCH SKIP PAST LAST ROWclause is applied. This obviously does not help the next developer who has to amend the code to fit new business requirements. 

Therefore, my recommendation is that you should always clearly state where you want the matching process to start searching for the next match. Never assume the default will behaviour will be good enough!

Summary

We are getting near the end of this series of deep dive posts. Hopefully this post has explained the ways in which you can use the AFTER MATCH SKIP… clause to ensure that you capture all of the required patterns/shapes within your data set. It’s always a good idea to explicitly include this clause because it is very important - if you don’t want to allow for overlapping matches then clearly state this in your code by using AFTER MATCH SKIP PAST LAST ROW clause. Don’t assume the default will kick-in and that the next developer will have time to read all your detailed documentation when making the next round of changes to the code.

Don’t forget to try our pattern matching tutorials and scripts on LiveSQL and all the above code examples are available via the “Skip to where?” tutorial on livesql.oracle.com.

 

What’s next?

In the next post in this series I am going to review the keywords that control the output from MATCH_RECOGNIZE: ALL ROWS vs. ONE ROW. Feel free to contact me if you have an interesting use cases for SQL pattern matching or if you just want some more information. Always happy to help. My email address is keith.laker@oracle.com

 

Looking for more Information

Use the tag search to see more information about pattern matching or SQL Analytics or Database 12c.

 

Technorati Tags: 

 

   

 

Monday, 31 October 2016

Data Warehousing in the Cloud - Part 3

In my last post I looked at Oracle’s Cloud Services for data warehousing and described how they are based around engineered systems running the industry’s #1 database for data warehousing, fully optimised for data warehousing workloads and providing 100% compatibility with existing workloads. Most importantly, Oracle customers can run their data warehouse services on-premise, in the Cloud or using hybrid Cloud using the same management and business tools.

I also looked at how Oracle’s Cloud Services for data warehousing are designed to simplify the process of integrating a data warehouse with cutting edge business processes around big data. Oracle Cloud offers a complete range of big data services are available to speed up the monetisation of data sets: Oracle Big Data Cloud Service, Big Data Preparation Cloud, Big Data Discovery, IoT Cloud.

In this post, the last in this series, I am going to discuss Oracle’s cloud architecture for supporting data warehousing projects. 

Complete Architecture of Oracle’s Cloud for Data Warehousing

Oracle’s Cloud Services for data warehousing extend beyond the data management scenarios outlined in the previous sections. They cover a wide range of cloud services that together form a broad and complete set of enterprise-grade solutions for data warehousing in the cloud.

 

 

Data Warehouse Cloud Architecture

 

Oracle Storage Cloud Service

This provides an environment for managing staging files coming out of enterprise operational systems or ETL files that have been processed and need loading into the data warehouse. All files with the cloud are replicated across multiple storage nodes, which guarantees protection against hardware failure and data corruption. Enterprise-grade data protection and privacy policies are enforced to ensure that staging files remain secure at all times.

Oracle Database Backup Cloud Service

Oracle Database Backup Cloud Service is a secure, scalable, on-demand storage solution for backing up Oracle data warehouses, both on-premise and cloud-based, to the public cloud. The cloud infrastructure provides enterprise-grade performance, redundancy, and security to make sure that the data warehouse does not lose availability.

Oracle Data Preparation and Integration Services

This service provides a highly intuitive and interactive way for analysts and data scientists to prepare unstructured, semi-structured and structured data for downstream processing and analysis within the data warehouse. It aims to reduce the challenges of data processing and preparation of new data sets such as those linked to IoT and social media by providing a large set of data repair, transformation, and enrichment options that require zero coding or scripting.

It significantly reduces the burden of repairing, classifying, and publishing new data sets into the data warehouse, which can be on-premise or in the Oracle Cloud.

Oracle Business Intelligence Cloud Service

A best-in-class business intelligence cloud offering that provides the full array of intuitive BI tools. It includes interactive interfaces with built-in guidance, easy search, dynamic navigation, contextual analytics and tutorials to increase productivity. It offers tight integration with the sophisticated analytical capabilities of the Oracle Database and using Big Data SQL is able to integrate and join structured and unstructured data sets.

Summary

When moving to the cloud it is important to remember that all the technical requirements that have driven data warehousing over the last 20+ years still apply. A data warehouse is expected to deliver extreme query performance, scalable and robust data management, handle concurrent workloads etc. All these requirements still apply in the cloud. Oracle provides the only true enterprise-grade cloud based around a fully optimized infrastructure for data warehousing.

Oracle’s Cloud Services for data warehousing are based around engineered systems running the industry’s #1 database for data warehousing, fully optimized for data warehousing workloads and providing 100% compatibility with existing workloads. The unique aspect of Oracle’s Cloud service is the “same experience” guarantee. Customers running data warehouse services on-premise, in the Cloud or using hybrid Cloud will use the same management and business tools.

Many companies recognize that their most important use cases for moving to cloud rely on integration with big data and access to sophisticated analytics: data mining, statistics, spatial and graph, pattern matching etc. These analytical features are key to consolidation projects (moving current disconnected on-premise systems to the cloud) and delivering new projects that aim to monetize new data streams by treating them as a profit centers.

Oracle’s Cloud Services for data warehousing are designed to simplify the process of integrating a data warehouse with cutting edge business processes around big data. A complete range of big data services are available to speed up the monetization of data sets: Oracle Big Data Cloud Service, Big Data Preparation Cloud, Big Data Discovery, IoT Cloud. Overall this provides a unique complete cloud offering: an end-to-end solution for data warehouse covering data integration, big data, database, analytics and business intelligence. Of course all of Oracle’s cloud services can be delivered as services on-premise and in the Oracle Public Cloud. The choice is yours. For more information about Oracle’s Cloud Services visit cloud.oracle.com.

Feel free to contact me (keith.laker@oracle.com)if you have any questions about Oracle’s Cloud Services for data warehousing.

Technorati Tags: 

 

Friday, 28 October 2016

Data Warehousing in the Cloud - Part 2

In the last blog post (Data Warehousing in the Cloud - Part 1) I examined why you need to start thinking about and planning your move to the cloud: looking forward data warehousing in the cloud is seen as having the greatest potential for driving significant business impact through increased agility, better cost control and faster data integration via co-location. In the last section I outlined the top 3 key benefits of moving your data warehouse to the Oracle cloud: it provides an opportunity to consolidate and rationalise your data warehouse environment, it opens up new opportunities to monetise the content within your warehouse, new data security requirements means require IT teams to start implementing robust data security systems alongside comprehensive audit reporting.

In this post I am going to review Oracle’s cloud solutions for data warehousing, how Oracle’s key technologies enable Data Warehousing in the cloud and why Oracle’s Cloud runs Oracle better than any other cloud environment.

Oracle Database enabling technologies supporting the cloud

Many of the leading analysts have recognized that more and more organizations are moving to the cloud as a fast and efficient way of deploying data warehouse environments. However, they all point out that, although clouds in general are very appealing in terms of flexibility and agility of deployment and pricing, clouds must deliver support for hybrid on-premises-and-cloud solutions. Oracle’s leadership in data warehousing and vision for the cloud is unique in being able to support this must-have hybrid model. Oracle’s dominant position in the data warehouse market, through its delivery of engineered hardware to support data warehousing, and end to-end data integration services, is recognized by leading analysts as giving it a significant competitive advantage. Each release of the Oracle Database continues to add innovative new solutions for data warehousing. Oracle has taken these industry leading data warehouse innovations and made them available in the Cloud. Key technology areas include:

Multitenant - New architecture for DW consolidation and moving to cloud Oracle’s multitenant feature is the cornerstone for both consolidation and the transition to the cloud. Multitenant makes it quick and easy to consolidate multiple data marts into a single system using it’s pluggable database capabilities. This unique feature also enables seamless movement of a database, data mart and data warehouse from an on-premise environment to the cloud and, if needed, even back again to on-premise.

In-Memory - Immediate answers to any question with real-time analytics Oracle In-Memory option stores data in a highly optimised columnar format that is able to support the types of analytical queries that characterize data warehouse workloads. Oracle’s Cloud Services offer configurations that maximize the analytical potential of this feature making it easier for business teams to gain access to significantly faster analytical processing.

Analytical SQL - New SQL Innovations for data warehousing and big data Oracle as a foundation for data warehousing has always innovated in the area of SQL both as an analytical language and in ways to broaden the cope SQL to manage more types of data, such as JSON documents. These innovations allow SQL to do more types of operations such as pattern matching and delivering approximate results that directly support the new types of projects being launched in the cloud.

 

Why Oracle Runs Oracle Better in the Cloud

There several key things that Oracle is doing to ensure that the Oracle Database runs better in the Oracle Cloud.

Firstly, Oracle is providing integrated and optimized hardware, from disk-to-flash-to-memory, as part of its cloud infrastructure. Customers can now get the same extreme performance capabilities along with fully scalable storage and compute resources of the Exadata platform combined with the ease of use, scaling up and down in a few clicks, and cost effectiveness of Oracle’s cloud infrastructure. No other cloud vendor offers this level of optimization and integration between hardware and software.

Secondly, running the Oracle Database in the Oracle Cloud is exactly the same experience as running on-premise such that existing on-premise workloads are completely compatible with Oracle Cloud. Oracle provides easy to use tools and services to move data into the Cloud and out of the Cloud back to on-premise systems.

Lastly, no cloud solution should, or even can, act as a data silo - enterprise systems most definitely cannot function as data silos. Every organization has different applications. From an operational perspective this covers ERP, CRM, OLTP systems. From a data warehouse perspective it includes data preparation, data integration and business intelligence. Oracle provides all these solutions within its Cloud Services.

This means it is possible to put the data warehouse in the cloud alongside the source systems that push data into the warehouse and tools that analyze and visualize that data.

Oracle Cloud Solutions

This section provides an overview of the various Oracle Cloud Services that support data warehousing and the use cases for each service. In general terms, Oracle Cloud provides a broad spectrum of data management offerings that can be offered both on-premise and in the Oracle Public Cloud:

LiveSQL - Free Service

The free cloud service is a great place to quickly, easily and safely road-test new database features. Entry into this service is via Oracle Live SQL, which provides an simple way to test and share SQL and PL/SQL application development concepts. LiveSQL offers:

  • Browser based SQL worksheet access to an Oracle database schema
  • Ability to save and share SQL scripts
  • Schema browser to view and extend database objects
  • Interactive educational tutorials
  • Customized data access examples for PL/SQL, Java, PHP, C

All that is needed to access this service is an account Oracle Technology Network – which itself is a free online service.

Exadata Express Cloud Service

Most organizations focus 80% of their valuable resources on creating on-premise development and test environments. This is because setting up both software and hardware even for these types of systems is time consuming. There are all sorts of procedures that need to be followed to buy licenses, configure servers, connect up networks, configure DB tools etc. However, it’s very rare that these systems match the eventual production environment of the data warehouse and this creates significant challenges around testing and QA processes.

In general, the majority of IT teams want to develop and test in a cloud environment where scaling up for testing and then scaling back once the tests and QA procedures are complete is simply a few clicks.

Use Cases for Exadata Express Cloud Service

Exadata Express Cloud Service is ideally suited to supporting development, test, small-scale marts and data discovery sandboxes up to 50GB in size. From a development and test perspective it provides the option, once a project is production-ready, to move the data warehouse schema back on-premise if regulatory requirements mandate the relocation of data within a specific geographic boundary. Using Oracle Multitenant it is a simple process to unplug from the Cloud and then plug-in and run on-premise.

Database-as-a-Service

Delivers all the same functionality and power of Oracle Database 12c running on-premise. Oracle’s Database-as-a-Service configurations provide access to large numbers of both CPUs and memory to match workload requirements and take full advantage of advanced database features such as Oracle In-Memory and in-database advanced analytical capabilities (advanced data mining, spatial graph and multidimensional analytics). All the typical cloud attributes are part of this service: the ability to quickly and easily create new databases, manage databases from a single cloud console, tools to migrate existing data from on-premise into Database 12c running in the cloud.

Use Cases for Database-as-a-Service

Oracle Database Cloud Service is designed to be a production-ready cloud environment that supports medium sized deployments beyond the size limits of the Exadata Express Cloud Service. This makes it ideal for managing larger scale enterprise level development systems as well as departmental marts and warehouses.

It supports situations where data scientists need large-scale sandboxes to support data mining projects that require access to large amounts of historical data integrated with raw data streams from IoT devices and other big data related subject areas. The plug-and-play features of multitenant combined with Big Data SQL make it possible to minimize data movement when deploying these types of sandboxes. The scale-up and scale-down capabilities of the cloud make it very easy to deploy production-realistic quality assurance environments for final testing and end user acceptance operations.

Exadata Cloud Service

This is Oracle’s flagship solution for data warehousing. It offers the highest levels of performance and scalability and is optimized for data warehouse workloads. It delivers fully integrated and preconfigured software and infrastructure providing extreme performance for all types of data warehouses workloads.

The Exadata Cloud Service bundles all the comprehensive and rich set of data management and analytics features of Oracle Database 12c as standard, such as:

  • Partitioning, Multitenant, advanced compression features, advanced security and the complete range of Enterprise Manager packs
  • In-memory, Advanced Analytics (Data Mining and R Enterprise), Spatial and Graph and OLAP

Use Cases for Exadata Cloud Service

Oracle’s comprehensive Exadata Cloud Service is designed to support enterprise-level, multi-petabyte data warehouse deployments where these environments (typically based around warehouses linked to multiple marts and large-scale plug-and-play sandboxes) typically have high levels of concurrency along with a wide variety of workloads.

The extreme performance characteristics of Exadata Cloud Service make it the perfect consolidation environment for running multiple departmental marts, warehouses and data discovery sandboxes all within a single robust cloud environment.

The Exadata Cloud Service is the foundation for a complete data warehouse solution due its tight integration with Oracle’s other cloud services such as Compute Cloud Service (for managing 3rd party tools), Big Data Preparation Services, Big Data Cloud Service and Backup and Recovery Cloud Service.

Big Data Cloud Service

Just like the Exadata Cloud Service, the Big Data Cloud Service, is based on Oracle’s engineered system for Big Data, which delivers fully integrated and preconfigured software and infrastructure providing extreme performance and scalability.

Today’s enterprise data warehouse extends beyond the management of just structured, relational, data to encompass new data streams from areas such as Internet of Things and concepts such as data reservoirs based on unstructured and semi-structured data sets. In many cases the data reservoir is hosted on a Hadoop or on a Big Data platform. Oracle’s Big Data Cloud Service is the ideal service to support these complimentary services that are being built around the enterprise data warehouse.

The co-location of big data reservoirs in the cloud alongside the Exadata Cloud Service opens up the ability to run SQL over both structured and unstructured data by using Oracle Big Data SQL, minimizing data movement and reducing the time taken to monetize new data streams.

Use Cases for Big Data Cloud Service

Oracle Big Data Cloud Service is the platform of choice for data-reservoir projects and IoT projects because the service leverages Cloudera’s industry’s leading distribution of Apache Hadoop. Many customers are using their data reservoirs as part of a wider information lifecycle management framework where historical, “cold”, data is pushed from the enterprise data warehouse to the data reservoir, which then sits alongside other operational information stored using NoSQL technologies, such as Oracle NoSQL Database.

Using Big Data SQL, Data scientists and business users can easily incorporate data from all these different data management engines into their data models and analysis. This opens up the opportunity to extend existing analysis by incorporating broader data sets and explore new areas of opportunity using newly acquired data sets.

For data scientists the Big Data Cloud Service combined with Exadata Cloud Service offers fast deployment and teardown of sandboxes environments. These data discovery sandboxes provide access to sophisticated analytical tools such as Oracle’s Enterprise R and Oracle Big Data Spatial and Graph analytics. These tools include extensive libraries of built-in functions that speed up the process of discovering relationships and making recommendations using big data.

Compute Cloud Service

On-premise data warehouse systems always rely on supporting systems to deliver data into the warehouse, visualize data via executive dashboards or analyze data using specialized processing engines. An enterprise cloud environment needs to also incorporate these solutions as well.

Oracle supports this key requirement using its Compute Cloud Service, which allows customers to install and manage any non-Oracle software tools and components. As with Oracle’s various data warehouse cloud services, this means that 3rd party products can benefit from the advantages of co-location, such as lower latency, by running alongside the data warehouse in the same data center.

Summary

Oracle’s Cloud Services for data warehousing are based around engineered systems running the industry’s #1 database for data warehousing, fully optimized for data warehousing workloads and providing 100% compatibility with existing workloads. The unique aspect of Oracle’s Cloud service is the “same experience” guarantee. Customers running data warehouse services on-premise, in the Cloud or using hybrid Cloud will use the same management and business tools.

Oracle’s Cloud Services for data warehousing are designed to simplify the process of integrating a data warehouse with cutting edge business processes around big data. A complete range of big data services are available to speed up the monetization of data sets: Oracle Big Data Cloud Service, Big Data Preparation Cloud, Big Data Discovery, IoT Cloud. 

In the next post I will discuss Oracle’s cloud architecture for supporting data warehousing projects.

Feel free to contact me (keith.laker@oracle.com)if you have any questions about Oracle’s Cloud Services for data warehousing.

 Technorati Tags: 

 

Monday, 24 October 2016

Data Warehousing in the Cloud - Part 1

Why is cloud so important?

Data warehouses are currently going through two very significant transformations that have the potential to drive significant levels of business innovation:

The first area of transformation is the drive to increase overall agility. The vast majority of IT teams are experiencing a rapid increase demand for data. Business teams want access to more and more historical data whilst at the same time, data scientists and business analysts are exploring ways to introduce new data streams into the warehouse to enrich existing analysis as well as drive new areas of analysis. This rapid expansion in data volumes and sources means that IT teams need to invest more time and effort ensuring that query performance remains consistent and they need to provision more and more environments (data sandboxes) for individual teams so that they can validate the business value of new data sets.

The second area of transformation is around the need to improve the control of costs. There is a growing need to do more with fewer and fewer resources whilst ensuring that all sensitive and strategic data is fully secured, throughout the whole lifecycle, in the most cost efficient manner. Cloud is proving to be the key enabler. It allows organizations to actively meet the challenges presented by the two key transformations of expanding data volumes and increased focus on cost control.

In this series of blog posts I hope to explain why and how moving your data warehouse to the cloud can support and drive these two key transformation as well as explaining the benefits that it brings for DBAs, data scientists and business users. Hopefully, this information will be useful for enterprise architects, project managers, consultants and DBAs. Over the coming weeks I will cover the following topics:

  1. Why is cloud so important for data warehousing
  2. Top 3 use cases for moving your data warehouse to the cloud
  3. Oracle’s cloud solutions for data warehousing
  4. Why Oracle Cloud?
  5. Why Oracle’s Cloud runs Oracle better
  6. A review of Oracle’s complete architecture for supporting data warehousing in the cloud

In this first post I will cover points 1 and 2 and subsequent posts then cover the other topics. So here we go with part 1…

Why move to the cloud now?

A recent report by KPMG (Insights & Predictions On Disruptive Tech From KPMG’s 2015 Global Innovation Survey: http://softwarestrategiesblog.com/tag/idc-saas-forecasts/looked at all the various technologies that are likely to have the greatest impact on business transformation over the coming years. KPMG talked to over 800 C-level business leaders around the world from a very broad range of businesses including tech industry startups, mid to large-scale organizations, angel investors and venture capital firms.

One of the key objectives of the survey was to identify disruptive technologies and innovation opportunities. Looking forward, the top 3 technologies that will have the biggest impact of business transformation are: cloud, data and analytics and Internet of Things. All three of these technologies are key parts of the today’s data warehouse ecosystem. Therefore, it is possible to draw the conclusion that technology leaders view data warehousing in the cloud as having the greatest potential for driving significant business impact.

The importance of cloud for data warehousing to Oracle customers is directly linked to three key drivers:

  • Increased agility
  • Better cost control
  • Co-location

Improving agility

Many data warehouses are now embarking on a refresh phase. With much of the ground work for working with big data now in place, businesses are looking to leverage new data streams and new, richer types of analytics to support and drive new project areas such as: Customer-360, predictive analytics, fraud detection, IoT analytics and establishing data as profit center. Many of these projects require provisioning of new hardware environments and deployment of software. It is faster, easier and more efficient to kick-start these new data centric projects using Oracle’s comprehensive Cloud Services.

Delivering better cost control

Many IT teams are looking for ways to consolidate existing Oracle marts, each running on dedicated hardware, and legacy non-Oracle marts, running on proprietary hardware, into a single integrated environment. The delivery of Oracle’s enterprise-grade cloud services provides the perfect opportunity to start these types of projects and Oracle has cloud-enabled migration tools to support these projects. Compliance cannot be seen as an optional extra when planning a move to the cloud. Data assets need to be secured across their whole lifecycle. Oracle’s enterprise-grade cloud services make compliance easier to manage and more cost efficient because all security features can be enabled by default and transparently upgraded and enhanced.

Co-Location for faster loading

The vast majority of Oracle E-Business customers have already begun the process of moving their applications to Oracle’s Cloud Services. Most data warehouses source data directly from these key applications such as order entry, sales, finance and manufacturing etc. Therefore, it makes perfect sense to co-locate the data warehouse alongside source systems that are already running in the cloud. Co-location offers faster data loading which means that business users get more timely access to their data.

 

Key benefits of moving to the Oracle Cloud

There are typically three main benefits for moving the data warehouse to Oracle Cloud and these are directly linked to the three key drivers listed in the previous section: 

  1. Easier CONSOLIDATION and RATIONALIZATION
  2. Faster MONETIZATION of data in the Cloud
  3. Cloud offers better PROTECTION
Let’s explore each of these use cases in turn:
 

1) Easier consolidation and rationalization 

“All enterprise data will be stored virtually in the cloud. More data is in the cloud now than in traditional storage systems” - Oracle CEO Mark Hurd

For a whole variety of reasons many data warehouse environments are made up of a number of databases that cover corporate data sets, departmental data sets, data discovery sandboxes, spread-marts etc. Each one of these usually runs on dedicated hardware that requires on-going investment and dedicated DBA services. This means that a significant proportion of IT costs is allocated to just keeping the lights on rather then helping drive new levels of innovation.

Oracle customers are seeing the opportunities provided by the Oracle Multitenant feature of Database 12c and the availability of Oracle’s Cloud Services for data warehousing as an ideal opportunity to consolidate systems into a single cloud environment. At the same time these systems benefit from co-location: being next to their key source systems, which are already running in the cloud.  By moving to the Oracle Cloud it is possible to both improve ETL performance and reduce IT costs.

With growing interest in big data there is an on-going need to rapidly provision new sandboxes for data discovery projects. The provisioning process is much simpler and faster using Oracle’s Cloud Services, allowing business teams to start their discovery work a lot quicker.

As new data warehouse projects are formulated the deployment of development, test and training environments within the cloud provides the ability to free up costly on-premise resources and make them available to other projects.

The cloud provides and excellent opportunity to convert and transform outdated legacy marts by leveraging the sophisticated analytical features of Oracle’s industry leading database. Oracle has a complete set of cloud-ready migration tools to convert and move schemas and data from all major database vendors to the Oracle Cloud. 

2) Faster Monetization

IoT - next wave of data explosion. By 2020 there will be 32 billion connected devices, generating 4.4ZB  (Cars, Infrastructure, Appliances, Wearable Technology)

There is a growing need to monetize data, i.e. treat it as a valuable asset and convert it into a profit center. Moving to the cloud opens up a wide range of opportunities by providing an agile and simple way of implementing new style hybrid transactional/analytical requirements. It is a quick, efficient way to onboard new data streams such as IoT, external data sets from social media sources, 3rd party data sources etc. to enrich existing data sets making it possible to explore new business opportunities.

The fast deployment and tear-down capabilities of the cloud offers an effective way to keep both deployment and development costs down to support the new style of “fail-fast” data discovery projects.

The success of these data monetization projects largely depends on being able to integrate both existing traditional data sources, such as those from transactional operations, and the newer big data/IoT data streams. This integration is not only about matching specific data keys but also the ability to use a single, common industry standards driven query language such as SQL over all the data. Oracle’s Data Warehouse Cloud Services uniquely offers tight integration between relational and big data via features such as Big Data SQL.

3) Better Protection

“Oracle’s enterprise cloud will be the most secure IT environment. We are fully patched, fully secured, fully encrypted—that’s our cloud. . .“ - Oracle CEO Mark Hurd

Treating data as a profit center naturally requires IT teams to consider the concept of data protection and data availability. Outside of the cloud, security profiles have to be replicated, operating system patch levels need to kept in sync along with database patchsets. Trying to enforce common security and compliance rules across multiple standalone systems that share data is a time-consuming and costly process.

The processes of consolidating data sets by leveraging the multitenant features of Oracle Database 12c and moving schemas to Oracle Cloud Services gives DBAs a simple and efficient way to secure data across the whole lifecycle. Within Oracle’s Cloud Services all data is encrypted by default: in transit, in the Cloud and at rest. Backups with encryption are enforced which ensures that cloud data remains secure at all times.

Overall security is managed using Oracle Key Vault, secure SSH access, federated identity and an isolated cloud management network. Compliance and reporting is managed through the use of comprehensive audit trails.

Below the database level Oracle Cloud Service is fully managed by Oracle, which means it is always fully patched and therefore fully secured. This allows DBAs and business teams to focus on business innovation without having to invest considerable time and resources securing, encrypting and patching environments to ensure they are in line with compliance and regulatory requirements. 

 

Summary

In this blog post I have examined why you need to start thinking about and planning your move to the cloud: looking forward data warehousing in the cloud is seen as having the greatest potential for driving significant business impact through increased agility, better cost control and faster data integration via co-location. I have outlined the top 3 key benefits of moving your data warehouse to the Oracle cloud: it provides an opportunity to consolidate and rationalise your data warehouse environment, it opens up new opportunities to monetise the content within your warehouse, new data security requirements means require IT teams to start implementing robust data security systems alongside comprehensive audit reporting.

In the next post I will discuss Oracle’s cloud solutions for data warehousing, how Oracle’s key technologies enable Data Warehousing in the cloud and why Oracle’s Cloud runs Oracle better than any other cloud environment.

Feel free to contact me (keith.laker@oracle.com)if you have any questions about Oracle’s Cloud Services for data warehousing.

 

Technorati Tags: , , ,

Tuesday, 11 October 2016

SQL Pattern Matching Deep Dive - Part 4, Empty matches and unmatched rows?

 

image courtesy of flicker: https://c1.staticflickr.com/1/55/185807556_21c547c02e_b.jpg

I have been asked a number of times during and after presenting on this topic (SQL Pattern Matching Deep Dive) what is the difference between the various matching options such as EMPTY MATCHES and UNMATCHED ROWS. This is the area that I am going to cover in this particular blog post, which is No 4 in this deep dive series.

When determining the type of output you want MATCH_RECOGNIZE to return most developers will opt for one of the following:

  • ONE ROW PER MATCH - each match produces one summary row. This is the default.
  • ALL ROWS PER MATCH - a match spanning multiple rows will produce one output row for each row in the match.

The default behaviour for MATCH_RECOGNIZE is to return one summary row for each match. In the majority of use cases this is probably the ideal solution. However, there are also many use cases that require more detailed information to be returned. If you are debugging your MATCH_RECOGNIZE statement then a little more information can help show how the pattern is being matched to your data set. In some cases you may find it useful, or even necessary, to use the extended syntax of the ALL ROWS PER MATCH keywords. There are three sub options:

  • ALL ROWS PER MATCH SHOW EMPTY MATCHES <- note that this is the default
  • ALL ROWS PER MATCH OMIT EMPTY MATCHES
  • ALL ROWS PER MATCH WITH UNMATCHED ROWS

Let’s look at these sub options in more detail but first a quick point of reference: all the examples shown below use the default AFTER MATCH SKIP PAST LAST ROW syntax. More on this later… 

TICKER DATA

Here is part of the ticker data set that we are going to use in this example - if you want to take a look at the full data set then see the example on the LiveSQL site:

 

Sample ticker data set

 

Empty matches

An obvious first question is: what’s the difference between an “empty match” and an “unmatched row”? This is largely determined by the type of quantifier used as part of the pattern definition. By changing the quantifier it is possible to produce the similar result using both sets of keywords.  To help explore the subtleties of these keywords I have simplified the pattern to just look for price decreases and you should note that we are using the * quantifier to indicate that we are looking for zero or more matches of the DOWN pattern. Therefore, if we run the following code:

SELECT
  symbol,
  tstamp,
  price,
  start_tstamp,
  end_tstamp,
  match_num,
  classifier
FROM ticker
MATCH_RECOGNIZE (
  PARTITION BY symbol ORDER BY tstamp
  MEASURES FIRST(down.tstamp) AS start_tstamp,
           LAST(down.tstamp) AS end_tstamp,
           match_number() AS match_num,
           classifier() AS classifier
  ALL ROWS PER MATCH SHOW EMPTY MATCHES
  PATTERN (DOWN*)
  DEFINE
    DOWN AS (price <= PREV(price))
)
WHERE symbol = 'GLOBEX';

We get the following output:

 

Query showing results of empty matches

 

You can see that the result set contains all 20 rows that make up the data for my symbol “GLOBEX". Rows 1- 3, 9, and 13-15 are identified as empty matches - the classifier returns null. These rows appear because we have defined the search requirements for pattern DOWN as being zero or more occurrences.

Based on this we can state that an empty match is a row that does not map explicitly to a pattern variable (in this case DOWN). However, it is worth noting that an empty match does in fact have a starting row and it is assigned a sequential match number, based on the ordinal position of its starting row. The above situation is largely the result of the specific quantifier that we are using: * (asterisk). Given that the DOWN variable can be matched zero or more times there is the opportunity for an empty match to occur. As the complexity of the PATTERN increases, adding more variables and using different combinations of quantifiers, the probability of getting empty matches decreases but it is something that you need to consider. Why? Because the MATCH_NUMBER() function counts the empty matches and assigns a number to them - as you can see above. Therefore, if we omit the empty matches from the results the MATCH_NUMBER() column no longer contains a contiguous set of numbers:

So that if we run the following code where we have specified “OMIT EMPTY MATCHES”:

SELECT 
  symbol, 
  tstamp,
  price,
  start_tstamp,
  end_tstamp,
  match_num,
  classifier
FROM ticker 
MATCH_RECOGNIZE ( 
  PARTITION BY symbol ORDER BY tstamp 
  MEASURES FIRST(down.tstamp) AS start_tstamp,
           LAST(down.tstamp) AS end_tstamp,
           match_number() AS match_num,
           classifier() AS classifier
  ALL ROWS PER MATCH OMIT EMPTY MATCHES
  PATTERN (DOWN*) 
  DEFINE 
    DOWN AS (price <= PREV(price))
)
WHERE symbol = 'GLOBEX';

We get the following output:

 

Query results based on omitting empty matches

 

as you can see the MATCH_NUMBER() column starts with match number 4 followed by match 6 followed by match 10. Therefore, you need to be very careful if you decide to test for a specific match number within the MATCH_RECOGNIZE section and/or the result set because you might get caught out if you are expecting a contiguous series of numbers.  

Summary of EMPTY MATCHES

Some patterns permit empty matches such as those using the asterisk quantifier, as shown above. Three mains points to remember when your pattern permits this type of matching:

  1. The value of MATCH_NUMBER() is the sequential match number of the empty match.
  2. Any COUNT is 0.
  3. Any other aggregate, row pattern navigation operation, or ordinary row pattern column reference is null.

The default is always to return empty matches, therefore, it is always a good idea to determine from the start if your pattern is capable of returning an empty match and how you want to manage those rows: include them (SHOW EMPTY MATCHES) or exclude them (OMIT EMPTY MATCHES). Be careful if you are using MATCH_NUMBER() within the DEFINE section as part of a formula because empty matches increment the MATCH_NUMBER() counter.

Reporting unmatched rows?

Always useful to view the complete result set - at least when you are running your code against test data sets. Getting all the input rows into your output is relatively easy because you just need to include the phrase ALL ROWS PER MATCH WITH UNMATCHED ROWS. Other than for testing purposes I can’t think of a good use case for using this in production so make sure you check your code before you submit your production-ready code to your DBA.

What about skipping?

Note that if ALL ROWS PER MATCH WITH UNMATCHED ROWS is used with the default skipping behaviour (AFTER MATCH SKIP PAST LAST ROW), then there is exactly one row in the output for every row in the input. This statement will lead us nicely into the next topic in this deep dive series where I will explore SKIPPING. Taking a quick peak into this next topic…obviously there are many different types of skipping behaviours that are permitted when using WITH UNMATCHED ROWS. It does, in fact, become possible for a row to be mapped by more than one match and appear in the row pattern output table multiple times. Unmatched rows will appear in the output only once.

Can a query contain all three types of match?

Now the big question: Can I have a query where it is possible to have both UNMATCHED ROWS and EMPTY MATCHES? Short answer: Yes.

When the PATTERN clause allows empty matches, nothing in the DEFINE clause can stop the empty matches from happening. However, there are special PATTERN symbols that are called anchors. Anchors work in terms of positions rather than rows. They match a position either at the start or end of a partition, or it used together then across the whole partition.

  • ^ matches the position before the first row in the partition
  • $ matches the position after the last row in the partition

Therefore, using these symbols it is possible to create a PATTERN where the keywords SHOW EMPTY MATCHES, OMIT EMPTY MATCHES, and WITH UNMATCHED ROWS all produce different results from the same result set. For example, let’s start with the following:

SELECT symbol, tstamp, price, mnm, nmr, cls
FROM ticker MATCH_RECOGNIZE(
 PARTITION BY symbol
 ORDER BY tstamp
 MEASURES match_number() AS mnm,
          count(*) AS nmr,
          classifier() AS cls
 ALL ROWS PER MATCH SHOW EMPTY MATCHES
 PATTERN ((^A*)|A+)
 DEFINE A AS price > 11)
WHERE symbol = 'GLOBEX'
ORDER BY 1, 2;

 

returns the following 5 rows:

 

Screen Shot 2016 10 07 at 15 32 29

this shows row 1 as an empty match for the pattern A* because we are matching from the start of the partition. This sets the MATCH_NUMBER() counter to 1. After the empty match the state moves to the pattern A+ for the remainder of the rows. The first match for this pattern starts at row 2 and completes at row 4. The final match in our data set is found at the row containing 15-APR-11. Therefore, if we omit the empty match at row 1 we only get 4 rows returned as shown here:

SELECT symbol, tstamp, price, mnm, nmr, cls
FROM ticker MATCH_RECOGNIZE(
 PARTITION BY symbol
 ORDER BY tstamp
 MEASURES match_number() AS mnm, 
          count(*) AS nmr, 
          classifier() AS cls
 ALL ROWS PER MATCH OMIT EMPTY MATCHES
 PATTERN ((^A*)|A+)
 DEFINE A AS price > 11)
WHERE symbol = 'GLOBEX'
ORDER BY 1, 2;

returns the following 4 rows:

Screen Shot 2016 10 07 at 15 34 16

 

Now if we use the last iteration of this example the MATCH_RECOGNIZE statement returns all the rows from the input data. The actual “unmatched rows” are identified as having a NULL match number and NULL classifier. The “empty matches” are identified as having a NULL classifier and in this example the COUNT(*) function returns zero.

SELECT symbol, tstamp, price, mnm, nmr, cls
FROM ticker MATCH_RECOGNIZE(
 PARTITION BY symbol
 ORDER BY tstamp
 MEASURES match_number() AS mnm, 
          count(*) AS nmr, 
          classifier() AS cls
 ALL ROWS PER MATCH WITH UNMATCHED ROWS
 PATTERN ((^A*)|A+)
 DEFINE A AS price > 11)
WHERE symbol = 'GLOBEX'
ORDER BY 1, 2;

returns all 20 rows from our data set:

 

Screen Shot 2016 10 07 at 15 29 13

  

LiveSQL

I have taken all the code and the associated explanations and created a tutorial on LiveSQL so you can try out the code for yourself: https://livesql.oracle.com/apex/livesql/file/tutorial_DZO3CVNYA7IYFU1V8H0PWHPYN.html.

Summary

I hope this helps to explain how the various output keywords that are part of the ALL ROWS PER MATCH syntax can affect the results you get back. You should now understand why your results contains match_number values that are not contiguous and why classifier can return a NULL value along with specific aggregate functions. I expect the hardest concept to understand is the idea of empty matches. As I stated earlier it is always a good idea to determine from the start if your pattern is capable of returning an empty match: are you using an asterisk * within the PATTERN clause? Then you can determine how you want to manage those rows: include the empty matches (SHOW EMPTY MATCHES) or exclude them (OMIT EMPTY MATCHES). Be careful if you are using MATCH_NUMBER() within the DEFINE section as part of a formula because empty matches increment the MATCH_NUMBER() counter.

What should be immediately obvious is that in all the examples I have used the default skip behaviour: AFTER MATCH SKIP PAST LAST ROW. In the next post I will explore the various skip keywords and how they can impact the results returned by your MATCH_RECOGNIZE statement.

 

What’s next?

In the next post in this series I am going to review the keywords that control where we restart searching once a pattern has been found: the keywords SKIP TO. Feel free to contact me if you have an interesting use cases for SQL pattern matching or if you just want some more information. Always happy to help. My email address is keith.laker@oracle.com

 

Looking for more Information

Use the tag search to see more information about pattern matching or SQL Analytics or Database 12c.

Technorati Tags: 

 

 

 

 

Tuesday, 4 October 2016

The complete review of data warehousing and big data content from Oracle OpenWorld 2016


Your COMPLETE REVIEW of data warehousing and big fata from #OOW16

The COMPLETE REVIEW of OpenWorld covers all the most important sessions and related content from this year's conference, including Oracle's key data warehouse and big technologies: Oracle Database 12c Release 2, Oracle Cloud, engineered systems, partitioning, parallel execution, Oracle Optimizer, analytic SQL, analytic views, in-memory, spatial, graph, data mining, multitenant, Big Data SQL, NoSQL Database and industry data models.

The COMPLETEreview covers the following areas:
  • On-demand videos of the most important keynotes
  • Overviews of key data warehouse and big data sessions and links to download each presentation
  • List of data warehouse and big data presenters who were at #oow16
  • Overview of Oracle Cloud services for data warehousing and big data
  • Details of OpenWorld 2017 and details of how to justify your trip to San Francisco
  • Links to the data warehouse and big data product web pages, blogs, social media sites
This review is available in Apple iBooks for people who are living in the 21st Century and for those of you stuck in early 1900's there is the fall-back option of a PDF version. Of course the iBook version offers a complete, exciting and immersive multi-media experience whilst the PDF version is a simply and quite literally just a PDF.
Hope this review is useful. Let me know if you have any questions and/or comments. Enjoy!

Thursday, 22 September 2016

Thursday's Top Picks at OpenWorld for Data Warehousing and Big Data

Attend my top must-attend sessions and hands-on labs for Thursday. You can add them to your personal agenda with there "My Schedule” schedule feature on the OpenWorld content catalog. Enjoy #oow16, today is the last day and it's going to be awesome.

Data
Warehousing
Analytics
Unstructured
Data
Big Data

8:00AM - 9:00AM
LAB: Use Oracle Big Data SQL to Analyze Data Across Oracle Database, Hadoop, and NoSQL
Hotel Nikko—Bay View (25th Floor)

Martin Gubar, Director of Product Management, Oracle
Keith Laker, Senior Principal Product Managerm Oracle
Organizations are expanding their data management platform including data stores based on Hadoop technologies and NoSQL databases. These data sources may contain new types of data that were not previously captured in the data warehouse, provide historical data that must be retained for compliance, and more. Oracle Big Data SQL enables existing applications that query Oracle Database to analyze data from all these sources, utilizing Oracle's rich SQL support and security policies. This hands-on lab introduces you to how this is achieved: providing access to big data sources, implementing unified security, and delivering analyses that combine and correlate all data.
View in online content catalog click here




LAB: Using Oracle Database 12c as a NoSQL JavaScript Object Notation Document Store
Mark Drake, Product Manager, Oracle

Hotel Nikko—Golden Gate (25th Floor)
The schemaless nature of JavaScript Object Notation (JSON) has enabled it to become an extremely popular method for persisting application data. Oracle Database 12c introduces new features that enable the database to be used as a JSON document store and provide full support for schemaless development. This latest version can store, index, and query JSON content in a highly performant manner. It also enables SQL to be used to query JSON content and join JSON content with relational data, spatial data, and XML. This hands-on lab provides an introduction to the JSON-related features of Oracle Database 12c as well as new RESTFUL services that make it possible to work with JSON content without requiring detailed knowledge of SQL.
View in online content catalog click here


9:30AM - 10:15AM
A RESTful Microservice for JSON in Oracle Database 12c Release 2
Kuassi Mensah, Director, Product Management, Oracle
Paul Lo, Senior Director, Oracle

Park Central—Olympic
The monolithic web application is dead; this is the era of service-based architecture, microservices, and functional and reactive programming. A modern web application is an assembly of services or microservices, each with its own implementation. In this technical session learn the steps for implementing a JavaScript microservice for data processing directly in Oracle Database. This microservice processes JavaScript Object Notation documents in place, and returns only the result sets. The session also shows how to turn such microservice into a RESTful data service for cloud deployment.
View in online content catalog click here




Multitenancy at Wells Fargo Bank
Gerald Bowers, Database Administrator, Wells Fargo
Michael Anderson, Database Administrator, Wells Fargo

Park Central—Franciscan I
Wells Fargo Bank shares its experience with implementing Oracle Multitenant as part of a large-scale Oracle Exadata deployment. The Oracle Multitenant option is a vital component that will enable Wells Fargo to reach desired consolidation densities. The session covers data-gathering and planning for consolidation, consolidation selection criteria, impacts to existing processes, resource management, integration with other Oracle products, and database provisioning, and provides tips and future directions.
View in online content catalog click here




SAS to Oracle Advanced Analytics Migration at Zagrebacka Banka, UniCredit
Charlie Berger, Oracle
Sinisa Behin, Head of IT CRM function, Zagrebacka banka - UniCredit Group

Moscone South—104
With 25 percent market share, Zagreba─Źka banka (ZABA), part of UniCredit Group, is the leading Croatian financial institution. ZABA migrated from SAS to Oracle and increased bank performance using Oracle Advanced Analytics. SAS required data movement for predictive modeling and took days to complete. ZABA reduced this to seconds, minutes, and hours by leveraging the security, reliability, performance, and scalability of Oracle Database and Oracle Advanced Analytics and running data preparation, model building, and scoring inside Oracle. ZABA saved 1,000 person-days/year in IT and increased cash loans by 15 percent in 18 months due to improved hit ratios. Join this session to hear what is next in big data analytics at ZABA, and learn how to do more for less with Oracle Advanced Analytics.
View in online content catalog click here


9:45AM - 10:45AM
LAB: Getting Started with Oracle REST Data Services
Jeff Smith, Senior Principal Product Manager, Oracle
Kris Rice, Senior Director, Oracle
Colm Divilly, Consulting Member of Technical Staff, Oracle
Elizabeth Saunders, Principal Software Developer, Oracle
Gordon Smith, Director, ORDS Product Management, Oracle

Hotel Nikko—Bay View (25th Floor)
In this session learn how to quickly get up to speed with Oracle REST Data Services by using Oracle SQL Developer to quickly install Oracle REST Data Services and AutoREST-enable your database objects. This hands-on lab walks attendees through the full REST development to create and test a RESTful service, use parameters and feeds, insert and update data, and view results in JavaScript Object Notation in the web browser. In addition, the lab demonstrates how to configure security in Oracle REST Data Services using OAUTH2 authentication.
View in online content catalog click here




LAB: Oracle Database In-Memory Boot Camp: Everything You Need to Know
Andy Rivenes, Senior Principal Product Manager, Oracle
Vineet Marwah, Sr. Director, Oracle

Hotel Nikko—Golden Gate (25th Floor)
Oracle Database In-Memory introduces a new in-memory-only columnar format and a new set of SQL execution optimizations such as SIMD processing, column elimination, storage indexes, and in-memory aggregation, all of which are designed specifically for the new columnar format. This hands-on lab provides a step-by-step guide on how to get started with Oracle Database In-Memory and how to identify which of the optimizations are being used and how your SQL statements benefit from them. Experience firsthand just how easy it is to start taking advantage of this technology and the incredible performance improvements it has to offer.
View in online content catalog click here


10:45AM - 11:30AM
360-Degree Customer Predictive Analytics in the Cloud with Clear Visibility
Charlie Berger, Oracle
Ray Owens, President, DX Marketing
Michelle Plecha, Senior Data Scientist, DX Marketing

Park Central—Franciscan I
DX Marketing, a data-driven marketing firm, uses Oracle Database Cloud Service, Oracle Advanced Analytics, and Oracle Marketing Cloud Service to build 360-degree predictive analytics models for its clients, faster and with greater ease than ever before. Faced with the challenge of dealing with hundreds of millions of records and separate analytical servers, DX Marketing moved everything to the Oracle Cloud. In this session learn how it is realizing greater performance, security, and huge productivity improvements to reach its goal of delivering best-in-class predictive models for clients. Come hear from DX Marketing and Oracle how easy it is to fly through volumes of data in the cloud extracting actionable analytics.
View in online content catalog click here




Build Applications on Spark, Hadoop, and NoSQL with Oracle Big Data Spatial
Siva Ravada, Oracle
Alan Wu, Oracle, Oracle
James Steiner, Oracle

Park Central—Olympic
In this session learn about graph and spatial technologies for Spark, Hadoop, and NoSQL, and how to build analytic applications using languages and tools such as R, Python, Scala, and Java. Oracle Big Data Spatial and Graph offers a set of analytic algorithms, services, and data models that support big data workloads. Data scientists and developers can be more productive by using dozens of prebuilt, parallel, in-memory graph analytics functions and spatial functions and services. Analysts can discover relationships and connections among customers, organizations, and assets using graphs. Users can harmonize data and group results based on spatial relationships, in addition to applying spatial services to cleanse, filter, normalize, and process geospatial data sets.
View in online content catalog click here




IoT and Big Data Application Development for Analytics: Industry Use Cases
Balaji Ramachandran, CEO, SoftClouds LLC
Carla Romano, Director - Development, Oracle
Chris Fox, Director, Enterprise Architecture, Oracle

Moscone South—104
In this session learn how to develop analytic applications that integrate all kinds of data, including streaming data, to gain valuable real-time situational and context-sensitive intelligence. In addition, see how the new ecosystem can generate new (and potentially disruptive) products and services and generate requirements and priorities for standards organizations supporting the industrial internet.
View in online content catalog click here




Thinking Clearly About Database Application Architecture
Gerald Venzl, Principal Product Manager, Oracle
Bryn Llewellyn, Distinguished Product Manager, Database Division, Oracle
Connor Mcdonald, Developer Advocate for SQL, Oracle
Toon Koppelaars, Consulting Member of Technical Staff, Real World Performance Group, Oracle
Cary Millsap, Infrastructure Principal Director, Method R Corporation

Moscone South—103
The panelists strive to deserve their “Database Scientist” informal job titles by drawing their understanding of the correctness, security, and performance properties of applications that use Oracle Database from the data that proactive, empirical investigation yields. Like all scientists, they appreciate that future experiments might prompt the modification of current understanding. With this ordinary caveat, they confidently promote the understanding born of the current corpus of evidence: that the optimal database architecture starts with a carefully designed data model, manipulated by humanly composed SQL, encapsulated in PL/SQL. Come to hear them present their evidence. Challenge them with your own. They love a feisty debate.
View in online content catalog click here


11:30AM - 12:30PM
LAB: Oracle Database Cloud: Let’s Get Started
Manoj Moteka, Principal Member of Technical Staff, Oracle
Brian Spendolini, Oracle
Kris Bhanushali, Sr. Principal Product Manager, Oracle
Arindam Bose, Product Manager, Oracle

Hotel Nikko—Mendocino I/II (2nd Floor)
In this session get your first taste of Oracle Database Cloud. Create a cloud database from scratch with a few click of the mouse, and change access rules with the compute console enabling web, monitoring, and command line entry. Then see how Oracle Database Cloud can be scaled up in real time adding memory and CPU and storage. Experience the various application development, database monitoring, and administration console available via web access.
View in online content catalog click here




LAB: Upgrade and Migrate to Oracle Database 12c Release 2
Mike Dietrich, Oracle

Hotel Nikko—Bay View (25th Floor)
In this hands-on lab learn about new ways to upgrade, migrate, and consolidate databases in Oracle Database 12c Release 2. Learn to Upgrade a database to Oracle Database 12c Release 2 using the new command-line parallel upgrade script; Plug that database into an Oracle Database 12c Release 2 container database as a pluggable database (PDB); Migrate a second database to a PDB, using the new Oracle Database 12c full transportable export/import feature. The end result is a CDB with two PDBs, which is the first step toward database as a service (DBaaS) or toward building a private cloud.
View in online content catalog click here


12:00PM - 12:45PM
Analytics at Energy Australia - Building an IM Platform in the Age of Big Data
Gaurav Singh, Solution Architect, Energy Australia
Gurinder S Sidhu, Information and Analytics Leader, Energy Australia

Moscone South—302
Energy generation and retailing in Australia is in a state of flux, with centralized power generation in decline, new competitive pressures, and more customer choice. Energy Australia recognized that to be competitive and support its business strategy moving forward, it would need to revitalize its IM architecture. Implementing the Oracle Big Data solution (including Exadata, Oracle Big Data Appliance, Oracle Data Integrator, and Oracle Data Quality) has provided real benefits. This session examines Energy Australia’s journey from developing the business case and data strategy, to selecting key technologies, to implementation and quick wins. Also covered is future plans and lessons learned so far.
View in online content catalog click here




Best Practices for Developing Geospatial Big Data Apps for the Cloud
David Tatar, Principal Software Architect, Neustar
Nick Salem, Distinguished Engineer, NEUSTAR INC

Park Central—Franciscan I
Learn best practices and techniques from Neustar’s experience to build robust, scalable cloud-based applications using Oracle Database 12c, Oracle Big Data Spatial and Graph, partitioning, Oracle Real Application Clusters, Oracle Advanced Security, and Oracle WebLogic 12c. Neustar’s analytics platform delivers marketing insights to its customers quickly and efficiently, and it’s backed by a 3TB database with 5B geospatially enabled rows of information. Learn how to organize multiterabyte spatial data for maximum performance. Understand how to deploy key Oracle security and high availability capabilities essential for a cloud system, the Oracle Spatial functions for rich geospatial analytics, and leverage latest Oracle Database 12.2 features. A live demo is also included.
View in online content catalog click here




Prevent Bad SQL in the Cloud
Arup Nanda, Principal Database Architect, Starwood Hotels

Moscone South—103
Managing a database in Oracle Public Cloud makes most DBA-related tasks simpler. But applications are a different animal. Bad SQL could ruin a perfectly architected database in the cloud, and most DBAs fail to prevent that simply because that’s not their core skillset or they don’t have enough time for it. But 90 percent of the performance issues due to SQL can be handled automatically by tools built into Oracle Database. In this session learn how to harness those features to tune errant SQL without rewriting code, knowing anything about SQL constructs, or spending a large amount of time on them.
View in online content catalog click here


1:15PM - 2:00PM
Do You Have Text in Your Database? Use a Text Index
Roger Ford, Product Manager, Oracle

Moscone South—303
Most databases have textual data in them, from a few words to long documents. However, most applications don't take full advantage of that valuable content. In this session learn how Oracle Text allows you to create full-text indexes and run powerful and flexible queries on the content, all from within SQL. Oracle Text is included in all versions of Oracle Database 12c Release 2, no extra licenses needed. If you're not using it, you should, and this session shows you how.
View in online content catalog click here




Optimizing SQL for Performance and Maintainability
Christian Antognini, Senior Principal Consultant, Trivadis AG
John Clarke, Software Development Director, Real World Performance, Oracle
Connor Mcdonald, Developer Advocate for SQL, Oracle
Timothy Hall, DBA, Developer, Author, Trainer, Haygrays Limited
Chris Saxon, Developer Advocate for SQL, Oracle
Nigel Bayliss, Senior Principal Product Manager, Oracle
Keith Laker, Senior Principal Product Manager, Oracle

Moscone South—103
SQL is at the heart of every enterprise running on Oracle Database, so it is critical that our application’s SQL statements are optimized both in terms of performance and maintainability. Surprised to see the mention of maintainability? You shouldn’t be. Too much attention is paid on getting things working NOW. We need to think about the future, too. Join this session’s speaker and other SQL experts for a discussion (and lots of Q&A) on how to optimize your SQL statements.
View in online content catalog click here




Oracle’s Internal Use of Data Mining and Predictive Analytics: Case Study
Charlie Berger, Oracle
Frank Heiland, Senior Specialist Business Operations, Oracle

Palace—Grand Ballroom
Oracle uses predictive analytics in sales and customer support, HR, and elsewhere. Oracle targets customers most likely to buy products and anticipates customers most likely to discontinue service agreements. Customer service can monitor hosted database environments and anticipate unhealthy runtime behavior. Oracle Data Mining even contributed to the 2014 America’s Cup win. Oracle “moves the algorithms to the data” and extends the database to become an analytical platform; hence, Oracle has developed “predictive applications” for improved, analytically based decision-making and is adopting them internally. In this session, these internal “predictive analytics” use cases and an overview of Oracle Advanced Analytics are presented.
View in online content catalog click here


1:15PM - 2:15PM
LAB: Oracle Multitenant: Hands-on Lab
Can Tuzla, Product Manager, Oracle
Giridhar Ravipati, Principal Member of Technical Staff, Oracle
Sanket Jain, PRINCIPAL MEMBER OF TECHNICAL STAFF, Oracle
Prashanth Shanthaveerappa, Senior Member of Technical Staff, Oracle

Hotel Nikko—Golden Gate (25th Floor)
Oracle Multitenant is the first database designed for the cloud. With this architecture you can reduce both capital and operating expenses and increase agility, and it’s easy to adopt and use. Attend this hands-on lab to experience the exciting capabilities of Oracle Multitenant for yourself.
View in online content catalog click here


2:30PM - 3:15PM
Design Flexible Data Models Using Oracle's JavaScript Object Notation Capabilities
Aris Prassinos, Chief Engineer, Morpho

Park Central—Franciscan I
Oracle's JavaScript Object Notation (JSON) capabilities allow organizations to build schemaless database models while at the same time maintaining the full power of SQL. This session provides a quick introduction to these capabilities, and then examines the lessons learned from an actual application deployed to multiple customers with diverse schema requirements that also evolved over time. The examples presented cover design of JSON documents, storing and querying, advanced indexing techniques, and best practices. Finally, the functionality and performance of the JSON design approach will be contrasted to a traditional relational design and also to the XML capabilities available in prior versions of Oracle Database.
View in online content catalog click here