Monday, 31 October 2016

Data Warehousing in the Cloud - Part 3

In my last post I looked at Oracle’s Cloud Services for data warehousing and described how they are based around engineered systems running the industry’s #1 database for data warehousing, fully optimised for data warehousing workloads and providing 100% compatibility with existing workloads. Most importantly, Oracle customers can run their data warehouse services on-premise, in the Cloud or using hybrid Cloud using the same management and business tools.

I also looked at how Oracle’s Cloud Services for data warehousing are designed to simplify the process of integrating a data warehouse with cutting edge business processes around big data. Oracle Cloud offers a complete range of big data services are available to speed up the monetisation of data sets: Oracle Big Data Cloud Service, Big Data Preparation Cloud, Big Data Discovery, IoT Cloud.
In this post, the last in this series, I am going to discuss Oracle’s cloud architecture for supporting data warehousing projects.

Complete Architecture of Oracle’s Cloud for Data Warehousing

Oracle’s Cloud Services for data warehousing extend beyond the data management scenarios outlined in the previous sections. They cover a wide range of cloud services that together form a broad and complete set of enterprise-grade solutions for data warehousing in the cloud.


Data Warehouse Cloud Architecture

Oracle Storage Cloud Service

This provides an environment for managing staging files coming out of enterprise operational systems or ETL files that have been processed and need loading into the data warehouse. All files with the cloud are replicated across multiple storage nodes, which guarantees protection against hardware failure and data corruption. Enterprise-grade data protection and privacy policies are enforced to ensure that staging files remain secure at all times.

Oracle Database Backup Cloud Service

Oracle Database Backup Cloud Service is a secure, scalable, on-demand storage solution for backing up Oracle data warehouses, both on-premise and cloud-based, to the public cloud. The cloud infrastructure provides enterprise-grade performance, redundancy, and security to make sure that the data warehouse does not lose availability.

Oracle Data Preparation and Integration Services

This service provides a highly intuitive and interactive way for analysts and data scientists to prepare unstructured, semi-structured and structured data for downstream processing and analysis within the data warehouse. It aims to reduce the challenges of data processing and preparation of new data sets such as those linked to IoT and social media by providing a large set of data repair, transformation, and enrichment options that require zero coding or scripting.

It significantly reduces the burden of repairing, classifying, and publishing new data sets into the data warehouse, which can be on-premise or in the Oracle Cloud.

Oracle Business Intelligence Cloud Service

A best-in-class business intelligence cloud offering that provides the full array of intuitive BI tools. It includes interactive interfaces with built-in guidance, easy search, dynamic navigation, contextual analytics and tutorials to increase productivity. It offers tight integration with the sophisticated analytical capabilities of the Oracle Database and using Big Data SQL is able to integrate and join structured and unstructured data sets.

Summary

When moving to the cloud it is important to remember that all the technical requirements that have driven data warehousing over the last 20+ years still apply. A data warehouse is expected to deliver extreme query performance, scalable and robust data management, handle concurrent workloads etc. All these requirements still apply in the cloud. Oracle provides the only true enterprise-grade cloud based around a fully optimized infrastructure for data warehousing.

Oracle’s Cloud Services for data warehousing are based around engineered systems running the industry’s #1 database for data warehousing, fully optimized for data warehousing workloads and providing 100% compatibility with existing workloads. The unique aspect of Oracle’s Cloud service is the “same experience” guarantee. Customers running data warehouse services on-premise, in the Cloud or using hybrid Cloud will use the same management and business tools.

Many companies recognize that their most important use cases for moving to cloud rely on integration with big data and access to sophisticated analytics: data mining, statistics, spatial and graph, pattern matching etc. These analytical features are key to consolidation projects (moving current disconnected on-premise systems to the cloud) and delivering new projects that aim to monetize new data streams by treating them as a profit centers.

Oracle’s Cloud Services for data warehousing are designed to simplify the process of integrating a data warehouse with cutting edge business processes around big data. A complete range of big data services are available to speed up the monetization of data sets: Oracle Big Data Cloud Service, Big Data Preparation Cloud, Big Data Discovery, IoT Cloud.

Overall this provides a unique complete cloud offering: an end-to-end solution for data warehouse covering data integration, big data, database, analytics and business intelligence. Of course all of Oracle’s cloud services can be delivered as services on-premise and in the Oracle Public Cloud. The choice is yours. For more information about Oracle’s Cloud Services visit cloud.oracle.com.

Part 1 of this series is posts is available here: Data Warehousing in the Cloud - Part 1
Part 2 of this series is posts is available here: Data Warehousing in the Cloud - Part 2

Feel free to contact me (keith.laker@oracle.com)if you have any questions about Oracle’s Cloud Services for data warehousing.
Technorati Tags: 

Friday, 28 October 2016

Data Warehousing in the Cloud - Part 2

In the last blog post (Data Warehousing in the Cloud - Part 1) I examined why you need to start thinking about and planning your move to the cloud: looking forward data warehousing in the cloud is seen as having the greatest potential for driving significant business impact through increased agility, better cost control and faster data integration via co-location.

In the last section I outlined the top 3 key benefits of moving your data warehouse to the Oracle cloud: it provides an opportunity to consolidate and rationalise your data warehouse environment, it opens up new opportunities to monetise the content within your warehouse, new data security requirements means require IT teams to start implementing robust data security systems alongside comprehensive audit reporting.

In this post I am going to review Oracle’s cloud solutions for data warehousing, how Oracle’s key technologies enable Data Warehousing in the cloud and why Oracle’s Cloud runs Oracle better than any other cloud environment.

Oracle Database enabling technologies supporting the cloud

Many of the leading analysts have recognized that more and more organizations are moving to the cloud as a fast and efficient way of deploying data warehouse environments. However, they all point out that, although clouds in general are very appealing in terms of flexibility and agility of deployment and pricing, clouds must deliver support for hybrid on-premises-and-cloud solutions. Oracle’s leadership in data warehousing and vision for the cloud is unique in being able to support this must-have hybrid model.

Oracle’s dominant position in the data warehouse market, through its delivery of engineered hardware to support data warehousing, and end to-end data integration services, is recognized by leading analysts as giving it a significant competitive advantage. Each release of the Oracle Database continues to add innovative new solutions for data warehousing. Oracle has taken these industry leading data warehouse innovations and made them available in the Cloud. Key technology areas include:
Multitenant - New architecture for DW consolidation and moving to cloud Oracle’s multitenant feature is the cornerstone for both consolidation and the transition to the cloud. Multitenant makes it quick and easy to consolidate multiple data marts into a single system using it’s pluggable database capabilities. This unique feature also enables seamless movement of a database, data mart and data warehouse from an on-premise environment to the cloud and, if needed, even back again to on-premise.

In-Memory - Immediate answers to any question with real-time analytics Oracle In-Memory option stores data in a highly optimised columnar format that is able to support the types of analytical queries that characterize data warehouse workloads. Oracle’s Cloud Services offer configurations that maximize the analytical potential of this feature making it easier for business teams to gain access to significantly faster analytical processing.

Analytical SQL - New SQL Innovations for data warehousing and big data Oracle as a foundation for data warehousing has always innovated in the area of SQL both as an analytical language and in ways to broaden the cope SQL to manage more types of data, such as JSON documents. These innovations allow SQL to do more types of operations such as pattern matching and delivering approximate results that directly support the new types of projects being launched in the cloud.

Why Oracle Runs Oracle Better in the Cloud

There several key things that Oracle is doing to ensure that the Oracle Database runs better in the Oracle Cloud.

Firstly, Oracle is providing integrated and optimized hardware, from disk-to-flash-to-memory, as part of its cloud infrastructure. Customers can now get the same extreme performance capabilities along with fully scalable storage and compute resources of the Exadata platform combined with the ease of use, scaling up and down in a few clicks, and cost effectiveness of Oracle’s cloud infrastructure. No other cloud vendor offers this level of optimization and integration between hardware and software.
Secondly, running the Oracle Database in the Oracle Cloud is exactly the same experience as running on-premise such that existing on-premise workloads are completely compatible with Oracle Cloud. Oracle provides easy to use tools and services to move data into the Cloud and out of the Cloud back to on-premise systems.

Lastly, no cloud solution should, or even can, act as a data silo - enterprise systems most definitely cannot function as data silos. Every organization has different applications. From an operational perspective this covers ERP, CRM, OLTP systems. From a data warehouse perspective it includes data preparation, data integration and business intelligence. Oracle provides all these solutions within its Cloud Services.

This means it is possible to put the data warehouse in the cloud alongside the source systems that push data into the warehouse and tools that analyze and visualize that data.

Oracle Cloud Solutions

This section provides an overview of the various Oracle Cloud Services that support data warehousing and the use cases for each service. In general terms, Oracle Cloud provides a broad spectrum of data management offerings that can be offered both on-premise and in the Oracle Public Cloud:

LiveSQL - Free Service

The free cloud service is a great place to quickly, easily and safely road-test new database features. Entry into this service is via Oracle Live SQL, which provides an simple way to test and share SQL and PL/SQL application development concepts. LiveSQL offers:
  • Browser based SQL worksheet access to an Oracle database schema
  • Ability to save and share SQL scripts
  • Schema browser to view and extend database objects
  • Interactive educational tutorials
  • Customized data access examples for PL/SQL, Java, PHP, C
All that is needed to access this service is an account Oracle Technology Network – which itself is a free online service.

Exadata Express Cloud Service

Most organizations focus 80% of their valuable resources on creating on-premise development and test environments. This is because setting up both software and hardware even for these types of systems is time consuming. There are all sorts of procedures that need to be followed to buy licenses, configure servers, connect up networks, configure DB tools etc. However, it’s very rare that these systems match the eventual production environment of the data warehouse and this creates significant challenges around testing and QA processes.

In general, the majority of IT teams want to develop and test in a cloud environment where scaling up for testing and then scaling back once the tests and QA procedures are complete is simply a few clicks.

Use Cases for Exadata Express Cloud Service

Exadata Express Cloud Service is ideally suited to supporting development, test, small-scale marts and data discovery sandboxes up to 50GB in size. From a development and test perspective it provides the option, once a project is production-ready, to move the data warehouse schema back on-premise if regulatory requirements mandate the relocation of data within a specific geographic boundary. Using Oracle Multitenant it is a simple process to unplug from the Cloud and then plug-in and run on-premise.

Database-as-a-Service

Delivers all the same functionality and power of Oracle Database 12c running on-premise. Oracle’s Database-as-a-Service configurations provide access to large numbers of both CPUs and memory to match workload requirements and take full advantage of advanced database features such as Oracle In-Memory and in-database advanced analytical capabilities (advanced data mining, spatial graph and multidimensional analytics). All the typical cloud attributes are part of this service: the ability to quickly and easily create new databases, manage databases from a single cloud console, tools to migrate existing data from on-premise into Database 12c running in the cloud.

Use Cases for Database-as-a-Service

Oracle Database Cloud Service is designed to be a production-ready cloud environment that supports medium sized deployments beyond the size limits of the Exadata Express Cloud Service. This makes it ideal for managing larger scale enterprise level development systems as well as departmental marts and warehouses.

It supports situations where data scientists need large-scale sandboxes to support data mining projects that require access to large amounts of historical data integrated with raw data streams from IoT devices and other big data related subject areas. The plug-and-play features of multitenant combined with Big Data SQL make it possible to minimize data movement when deploying these types of sandboxes. The scale-up and scale-down capabilities of the cloud make it very easy to deploy production-realistic quality assurance environments for final testing and end user acceptance operations.

Exadata Cloud Service

This is Oracle’s flagship solution for data warehousing. It offers the highest levels of performance and scalability and is optimized for data warehouse workloads. It delivers fully integrated and preconfigured software and infrastructure providing extreme performance for all types of data warehouses workloads.

The Exadata Cloud Service bundles all the comprehensive and rich set of data management and analytics features of Oracle Database 12c as standard, such as:
  • Partitioning, Multitenant, advanced compression features, advanced security and the complete range of Enterprise Manager packs
  • In-memory, Advanced Analytics (Data Mining and R Enterprise), Spatial and Graph and OLAP

Use Cases for Exadata Cloud Service

Oracle’s comprehensive Exadata Cloud Service is designed to support enterprise-level, multi-petabyte data warehouse deployments where these environments (typically based around warehouses linked to multiple marts and large-scale plug-and-play sandboxes) typically have high levels of concurrency along with a wide variety of workloads.
The extreme performance characteristics of Exadata Cloud Service make it the perfect consolidation environment for running multiple departmental marts, warehouses and data discovery sandboxes all within a single robust cloud environment.

The Exadata Cloud Service is the foundation for a complete data warehouse solution due its tight integration with Oracle’s other cloud services such as Compute Cloud Service (for managing 3rd party tools), Big Data Preparation Services, Big Data Cloud Service and Backup and Recovery Cloud Service.

Big Data Cloud Service

Just like the Exadata Cloud Service, the Big Data Cloud Service, is based on Oracle’s engineered system for Big Data, which delivers fully integrated and preconfigured software and infrastructure providing extreme performance and scalability.

Today’s enterprise data warehouse extends beyond the management of just structured, relational, data to encompass new data streams from areas such as Internet of Things and concepts such as data reservoirs based on unstructured and semi-structured data sets. In many cases the data reservoir is hosted on a Hadoop or on a Big Data platform. Oracle’s Big Data Cloud Service is the ideal service to support these complimentary services that are being built around the enterprise data warehouse.

The co-location of big data reservoirs in the cloud alongside the Exadata Cloud Service opens up the ability to run SQL over both structured and unstructured data by using Oracle Big Data SQL, minimizing data movement and reducing the time taken to monetize new data streams.

Use Cases for Big Data Cloud Service

Oracle Big Data Cloud Service is the platform of choice for data-reservoir projects and IoT projects because the service leverages Cloudera’s industry’s leading distribution of Apache Hadoop. Many customers are using their data reservoirs as part of a wider information lifecycle management framework where historical, “cold”, data is pushed from the enterprise data warehouse to the data reservoir, which then sits alongside other operational information stored using NoSQL technologies, such as Oracle NoSQL Database.

Using Big Data SQL, Data scientists and business users can easily incorporate data from all these different data management engines into their data models and analysis. This opens up the opportunity to extend existing analysis by incorporating broader data sets and explore new areas of opportunity using newly acquired data sets.

For data scientists the Big Data Cloud Service combined with Exadata Cloud Service offers fast deployment and teardown of sandboxes environments. These data discovery sandboxes provide access to sophisticated analytical tools such as Oracle’s Enterprise R and Oracle Big Data Spatial and Graph analytics. These tools include extensive libraries of built-in functions that speed up the process of discovering relationships and making recommendations using big data.

Compute Cloud Service

On-premise data warehouse systems always rely on supporting systems to deliver data into the warehouse, visualize data via executive dashboards or analyze data using specialized processing engines. An enterprise cloud environment needs to also incorporate these solutions as well.

Oracle supports this key requirement using its Compute Cloud Service, which allows customers to install and manage any non-Oracle software tools and components. As with Oracle’s various data warehouse cloud services, this means that 3rd party products can benefit from the advantages of co-location, such as lower latency, by running alongside the data warehouse in the same data center.

Summary

Oracle’s Cloud Services for data warehousing are based around engineered systems running the industry’s #1 database for data warehousing, fully optimized for data warehousing workloads and providing 100% compatibility with existing workloads. The unique aspect of Oracle’s Cloud service is the “same experience” guarantee. Customers running data warehouse services on-premise, in the Cloud or using hybrid Cloud will use the same management and business tools.

Oracle’s Cloud Services for data warehousing are designed to simplify the process of integrating a data warehouse with cutting edge business processes around big data. A complete range of big data services are available to speed up the monetization of data sets: Oracle Big Data Cloud Service, Big Data Preparation Cloud, Big Data Discovery, IoT Cloud. 

In the next post I will discuss Oracle’s cloud architecture for supporting data warehousing projects.
Feel free to contact me (keith.laker@oracle.com)if you have any questions about Oracle’s Cloud Services for data warehousing.

Part 1 of this series is posts is available here: Data Warehousing in the Cloud - Part 1

Technorati Tags: 

Monday, 24 October 2016

Data Warehousing in the Cloud - Part 1

Why is cloud so important?

Data warehouses are currently going through two very significant transformations that have the potential to drive significant levels of business innovation:

The first area of transformation is the drive to increase overall agility. The vast majority of IT teams are experiencing a rapid increase demand for data. Business teams want access to more and more historical data whilst at the same time, data scientists and business analysts are exploring ways to introduce new data streams into the warehouse to enrich existing analysis as well as drive new areas of analysis. This rapid expansion in data volumes and sources means that IT teams need to invest more time and effort ensuring that query performance remains consistent and they need to provision more and more environments (data sandboxes) for individual teams so that they can validate the business value of new data sets.

The second area of transformation is around the need to improve the control of costs. There is a growing need to do more with fewer and fewer resources whilst ensuring that all sensitive and strategic data is fully secured, throughout the whole lifecycle, in the most cost efficient manner. Cloud is proving to be the key enabler. It allows organizations to actively meet the challenges presented by the two key transformations of expanding data volumes and increased focus on cost control.

In this series of blog posts I hope to explain why and how moving your data warehouse to the cloud can support and drive these two key transformation as well as explaining the benefits that it brings for DBAs, data scientists and business users. Hopefully, this information will be useful for enterprise architects, project managers, consultants and DBAs. Over the coming weeks I will cover the following topics:
  1. Why is cloud so important for data warehousing
  2. Top 3 use cases for moving your data warehouse to the cloud
  3. Oracle’s cloud solutions for data warehousing
  4. Why Oracle Cloud?
  5. Why Oracle’s Cloud runs Oracle better
  6. A review of Oracle’s complete architecture for supporting data warehousing in the cloud
In this first post I will cover points 1 and 2 and subsequent posts then cover the other topics. So here we go with part 1…

Why move to the cloud now?

A recent report by KPMG (Insights & Predictions On Disruptive Tech From KPMG’s 2015 Global Innovation Survey: http://softwarestrategiesblog.com/tag/idc-saas-forecasts/looked at all the various technologies that are likely to have the greatest impact on business transformation over the coming years. KPMG talked to over 800 C-level business leaders around the world from a very broad range of businesses including tech industry startups, mid to large-scale organizations, angel investors and venture capital firms.

One of the key objectives of the survey was to identify disruptive technologies and innovation opportunities. Looking forward, the top 3 technologies that will have the biggest impact of business transformation are: cloud, data and analytics and Internet of Things. All three of these technologies are key parts of the today’s data warehouse ecosystem. Therefore, it is possible to draw the conclusion that technology leaders view data warehousing in the cloud as having the greatest potential for driving significant business impact.

The importance of cloud for data warehousing to Oracle customers is directly linked to three key drivers:
  • Increased agility
  • Better cost control
  • Co-location

Improving agility

Many data warehouses are now embarking on a refresh phase. With much of the ground work for working with big data now in place, businesses are looking to leverage new data streams and new, richer types of analytics to support and drive new project areas such as: Customer-360, predictive analytics, fraud detection, IoT analytics and establishing data as profit center. Many of these projects require provisioning of new hardware environments and deployment of software. It is faster, easier and more efficient to kick-start these new data centric projects using Oracle’s comprehensive Cloud Services.

Delivering better cost control

Many IT teams are looking for ways to consolidate existing Oracle marts, each running on dedicated hardware, and legacy non-Oracle marts, running on proprietary hardware, into a single integrated environment. The delivery of Oracle’s enterprise-grade cloud services provides the perfect opportunity to start these types of projects and Oracle has cloud-enabled migration tools to support these projects. Compliance cannot be seen as an optional extra when planning a move to the cloud. Data assets need to be secured across their whole lifecycle. Oracle’s enterprise-grade cloud services make compliance easier to manage and more cost efficient because all security features can be enabled by default and transparently upgraded and enhanced.

Co-Location for faster loading

The vast majority of Oracle E-Business customers have already begun the process of moving their applications to Oracle’s Cloud Services. Most data warehouses source data directly from these key applications such as order entry, sales, finance and manufacturing etc. Therefore, it makes perfect sense to co-locate the data warehouse alongside source systems that are already running in the cloud. Co-location offers faster data loading which means that business users get more timely access to their data.

Key benefits of moving to the Oracle Cloud

There are typically three main benefits for moving the data warehouse to Oracle Cloud and these are directly linked to the three key drivers listed in the previous section: 
  1. Easier CONSOLIDATION and RATIONALIZATION
  2. Faster MONETIZATION of data in the Cloud
  3. Cloud offers better PROTECTION
Let’s explore each of these use cases in turn:

1) Easier consolidation and rationalization 

“All enterprise data will be stored virtually in the cloud. More data is in the cloud now than in traditional storage systems” - Oracle CEO Mark Hurd
For a whole variety of reasons many data warehouse environments are made up of a number of databases that cover corporate data sets, departmental data sets, data discovery sandboxes, spread-marts etc. Each one of these usually runs on dedicated hardware that requires on-going investment and dedicated DBA services. This means that a significant proportion of IT costs is allocated to just keeping the lights on rather then helping drive new levels of innovation.

Oracle customers are seeing the opportunities provided by the Oracle Multitenant feature of Database 12c and the availability of Oracle’s Cloud Services for data warehousing as an ideal opportunity to consolidate systems into a single cloud environment. At the same time these systems benefit from co-location: being next to their key source systems, which are already running in the cloud.  By moving to the Oracle Cloud it is possible to both improve ETL performance and reduce IT costs.
With growing interest in big data there is an on-going need to rapidly provision new sandboxes for data discovery projects. The provisioning process is much simpler and faster using Oracle’s Cloud Services, allowing business teams to start their discovery work a lot quicker.

As new data warehouse projects are formulated the deployment of development, test and training environments within the cloud provides the ability to free up costly on-premise resources and make them available to other projects.
The cloud provides and excellent opportunity to convert and transform outdated legacy marts by leveraging the sophisticated analytical features of Oracle’s industry leading database. Oracle has a complete set of cloud-ready migration tools to convert and move schemas and data from all major database vendors to the Oracle Cloud. 

2) Faster Monetization

IoT - next wave of data explosion. By 2020 there will be 32 billion connected devices, generating 4.4ZB  (Cars, Infrastructure, Appliances, Wearable Technology)
There is a growing need to monetize data, i.e. treat it as a valuable asset and convert it into a profit center. Moving to the cloud opens up a wide range of opportunities by providing an agile and simple way of implementing new style hybrid transactional/analytical requirements. It is a quick, efficient way to onboard new data streams such as IoT, external data sets from social media sources, 3rd party data sources etc. to enrich existing data sets making it possible to explore new business opportunities.
The fast deployment and tear-down capabilities of the cloud offers an effective way to keep both deployment and development costs down to support the new style of “fail-fast” data discovery projects.

The success of these data monetization projects largely depends on being able to integrate both existing traditional data sources, such as those from transactional operations, and the newer big data/IoT data streams. This integration is not only about matching specific data keys but also the ability to use a single, common industry standards driven query language such as SQL over all the data. Oracle’s Data Warehouse Cloud Services uniquely offers tight integration between relational and big data via features such as Big Data SQL.

3) Better Protection

“Oracle’s enterprise cloud will be the most secure IT environment. We are fully patched, fully secured, fully encrypted—that’s our cloud. . .“ - Oracle CEO Mark Hurd
Treating data as a profit center naturally requires IT teams to consider the concept of data protection and data availability. Outside of the cloud, security profiles have to be replicated, operating system patch levels need to kept in sync along with database patchsets. Trying to enforce common security and compliance rules across multiple standalone systems that share data is a time-consuming and costly process.

The processes of consolidating data sets by leveraging the multitenant features of Oracle Database 12c and moving schemas to Oracle Cloud Services gives DBAs a simple and efficient way to secure data across the whole lifecycle. Within Oracle’s Cloud Services all data is encrypted by default: in transit, in the Cloud and at rest. Backups with encryption are enforced which ensures that cloud data remains secure at all times.
Overall security is managed using Oracle Key Vault, secure SSH access, federated identity and an isolated cloud management network. Compliance and reporting is managed through the use of comprehensive audit trails.

Below the database level Oracle Cloud Service is fully managed by Oracle, which means it is always fully patched and therefore fully secured. This allows DBAs and business teams to focus on business innovation without having to invest considerable time and resources securing, encrypting and patching environments to ensure they are in line with compliance and regulatory requirements. 

Summary

In this blog post I have examined why you need to start thinking about and planning your move to the cloud: looking forward data warehousing in the cloud is seen as having the greatest potential for driving significant business impact through increased agility, better cost control and faster data integration via co-location. I have outlined the top 3 key benefits of moving your data warehouse to the Oracle cloud: it provides an opportunity to consolidate and rationalise your data warehouse environment, it opens up new opportunities to monetise the content within your warehouse, new data security requirements means require IT teams to start implementing robust data security systems alongside comprehensive audit reporting.

In the next post I will discuss Oracle’s cloud solutions for data warehousing, how Oracle’s key technologies enable Data Warehousing in the cloud and why Oracle’s Cloud runs Oracle better than any other cloud environment.

Feel free to contact me (keith.laker@oracle.com)if you have any questions about Oracle’s Cloud Services for data warehousing.

Technorati Tags: , , ,

Tuesday, 11 October 2016

SQL Pattern Matching Deep Dive - Part 4, Empty matches and unmatched rows?

 

image courtesy of flicker: https://c1.staticflickr.com/1/55/185807556_21c547c02e_b.jpg

I have been asked a number of times during and after presenting on this topic (SQL Pattern Matching Deep Dive) what is the difference between the various matching options such as EMPTY MATCHES and UNMATCHED ROWS. This is the area that I am going to cover in this particular blog post, which is No 4 in this deep dive series.

When determining the type of output you want MATCH_RECOGNIZE to return most developers will opt for one of the following:

  • ONE ROW PER MATCH - each match produces one summary row. This is the default.
  • ALL ROWS PER MATCH - a match spanning multiple rows will produce one output row for each row in the match.

The default behaviour for MATCH_RECOGNIZE is to return one summary row for each match. In the majority of use cases this is probably the ideal solution. However, there are also many use cases that require more detailed information to be returned. If you are debugging your MATCH_RECOGNIZE statement then a little more information can help show how the pattern is being matched to your data set. In some cases you may find it useful, or even necessary, to use the extended syntax of the ALL ROWS PER MATCH keywords. There are three sub options:

  • ALL ROWS PER MATCH SHOW EMPTY MATCHES <- note that this is the default
  • ALL ROWS PER MATCH OMIT EMPTY MATCHES
  • ALL ROWS PER MATCH WITH UNMATCHED ROWS

Let’s look at these sub options in more detail but first a quick point of reference: all the examples shown below use the default AFTER MATCH SKIP PAST LAST ROW syntax. More on this later… 

TICKER DATA

Here is part of the ticker data set that we are going to use in this example - if you want to take a look at the full data set then see the example on the LiveSQL site:

 

Sample ticker data set

 

Empty matches

An obvious first question is: what’s the difference between an “empty match” and an “unmatched row”? This is largely determined by the type of quantifier used as part of the pattern definition. By changing the quantifier it is possible to produce the similar result using both sets of keywords.  To help explore the subtleties of these keywords I have simplified the pattern to just look for price decreases and you should note that we are using the * quantifier to indicate that we are looking for zero or more matches of the DOWN pattern. Therefore, if we run the following code:

SELECT
  symbol,
  tstamp,
  price,
  start_tstamp,
  end_tstamp,
  match_num,
  classifier
FROM ticker
MATCH_RECOGNIZE (
  PARTITION BY symbol ORDER BY tstamp
  MEASURES FIRST(down.tstamp) AS start_tstamp,
           LAST(down.tstamp) AS end_tstamp,
           match_number() AS match_num,
           classifier() AS classifier
  ALL ROWS PER MATCH SHOW EMPTY MATCHES
  PATTERN (DOWN*)
  DEFINE
    DOWN AS (price <= PREV(price))
)
WHERE symbol = 'GLOBEX';

We get the following output:

 

Query showing results of empty matches

 

You can see that the result set contains all 20 rows that make up the data for my symbol “GLOBEX". Rows 1- 3, 9, and 13-15 are identified as empty matches - the classifier returns null. These rows appear because we have defined the search requirements for pattern DOWN as being zero or more occurrences.

Based on this we can state that an empty match is a row that does not map explicitly to a pattern variable (in this case DOWN). However, it is worth noting that an empty match does in fact have a starting row and it is assigned a sequential match number, based on the ordinal position of its starting row. The above situation is largely the result of the specific quantifier that we are using: * (asterisk). Given that the DOWN variable can be matched zero or more times there is the opportunity for an empty match to occur. As the complexity of the PATTERN increases, adding more variables and using different combinations of quantifiers, the probability of getting empty matches decreases but it is something that you need to consider. Why? Because the MATCH_NUMBER() function counts the empty matches and assigns a number to them - as you can see above. Therefore, if we omit the empty matches from the results the MATCH_NUMBER() column no longer contains a contiguous set of numbers:

So that if we run the following code where we have specified “OMIT EMPTY MATCHES”:

SELECT 
  symbol, 
  tstamp,
  price,
  start_tstamp,
  end_tstamp,
  match_num,
  classifier
FROM ticker 
MATCH_RECOGNIZE ( 
  PARTITION BY symbol ORDER BY tstamp 
  MEASURES FIRST(down.tstamp) AS start_tstamp,
           LAST(down.tstamp) AS end_tstamp,
           match_number() AS match_num,
           classifier() AS classifier
  ALL ROWS PER MATCH OMIT EMPTY MATCHES
  PATTERN (DOWN*) 
  DEFINE 
    DOWN AS (price <= PREV(price))
)
WHERE symbol = 'GLOBEX';

We get the following output:

 

Query results based on omitting empty matches

 

as you can see the MATCH_NUMBER() column starts with match number 4 followed by match 6 followed by match 10. Therefore, you need to be very careful if you decide to test for a specific match number within the MATCH_RECOGNIZE section and/or the result set because you might get caught out if you are expecting a contiguous series of numbers.  

Summary of EMPTY MATCHES

Some patterns permit empty matches such as those using the asterisk quantifier, as shown above. Three mains points to remember when your pattern permits this type of matching:

  1. The value of MATCH_NUMBER() is the sequential match number of the empty match.
  2. Any COUNT is 0.
  3. Any other aggregate, row pattern navigation operation, or ordinary row pattern column reference is null.

The default is always to return empty matches, therefore, it is always a good idea to determine from the start if your pattern is capable of returning an empty match and how you want to manage those rows: include them (SHOW EMPTY MATCHES) or exclude them (OMIT EMPTY MATCHES). Be careful if you are using MATCH_NUMBER() within the DEFINE section as part of a formula because empty matches increment the MATCH_NUMBER() counter.

Reporting unmatched rows?

Always useful to view the complete result set - at least when you are running your code against test data sets. Getting all the input rows into your output is relatively easy because you just need to include the phrase ALL ROWS PER MATCH WITH UNMATCHED ROWS. Other than for testing purposes I can’t think of a good use case for using this in production so make sure you check your code before you submit your production-ready code to your DBA.

What about skipping?

Note that if ALL ROWS PER MATCH WITH UNMATCHED ROWS is used with the default skipping behaviour (AFTER MATCH SKIP PAST LAST ROW), then there is exactly one row in the output for every row in the input. This statement will lead us nicely into the next topic in this deep dive series where I will explore SKIPPING. Taking a quick peak into this next topic…obviously there are many different types of skipping behaviours that are permitted when using WITH UNMATCHED ROWS. It does, in fact, become possible for a row to be mapped by more than one match and appear in the row pattern output table multiple times. Unmatched rows will appear in the output only once.

Can a query contain all three types of match?

Now the big question: Can I have a query where it is possible to have both UNMATCHED ROWS and EMPTY MATCHES? Short answer: Yes.

When the PATTERN clause allows empty matches, nothing in the DEFINE clause can stop the empty matches from happening. However, there are special PATTERN symbols that are called anchors. Anchors work in terms of positions rather than rows. They match a position either at the start or end of a partition, or it used together then across the whole partition.

  • ^ matches the position before the first row in the partition
  • $ matches the position after the last row in the partition

Therefore, using these symbols it is possible to create a PATTERN where the keywords SHOW EMPTY MATCHES, OMIT EMPTY MATCHES, and WITH UNMATCHED ROWS all produce different results from the same result set. For example, let’s start with the following:

SELECT symbol, tstamp, price, mnm, nmr, cls
FROM ticker MATCH_RECOGNIZE(
 PARTITION BY symbol
 ORDER BY tstamp
 MEASURES match_number() AS mnm,
          count(*) AS nmr,
          classifier() AS cls
 ALL ROWS PER MATCH SHOW EMPTY MATCHES
 PATTERN ((^A*)|A+)
 DEFINE A AS price > 11)
WHERE symbol = 'GLOBEX'
ORDER BY 1, 2;

 

returns the following 5 rows:

 

Screen Shot 2016 10 07 at 15 32 29

this shows row 1 as an empty match for the pattern A* because we are matching from the start of the partition. This sets the MATCH_NUMBER() counter to 1. After the empty match the state moves to the pattern A+ for the remainder of the rows. The first match for this pattern starts at row 2 and completes at row 4. The final match in our data set is found at the row containing 15-APR-11. Therefore, if we omit the empty match at row 1 we only get 4 rows returned as shown here:

SELECT symbol, tstamp, price, mnm, nmr, cls
FROM ticker MATCH_RECOGNIZE(
 PARTITION BY symbol
 ORDER BY tstamp
 MEASURES match_number() AS mnm, 
          count(*) AS nmr, 
          classifier() AS cls
 ALL ROWS PER MATCH OMIT EMPTY MATCHES
 PATTERN ((^A*)|A+)
 DEFINE A AS price > 11)
WHERE symbol = 'GLOBEX'
ORDER BY 1, 2;

returns the following 4 rows:

Screen Shot 2016 10 07 at 15 34 16

 

Now if we use the last iteration of this example the MATCH_RECOGNIZE statement returns all the rows from the input data. The actual “unmatched rows” are identified as having a NULL match number and NULL classifier. The “empty matches” are identified as having a NULL classifier and in this example the COUNT(*) function returns zero.

SELECT symbol, tstamp, price, mnm, nmr, cls
FROM ticker MATCH_RECOGNIZE(
 PARTITION BY symbol
 ORDER BY tstamp
 MEASURES match_number() AS mnm, 
          count(*) AS nmr, 
          classifier() AS cls
 ALL ROWS PER MATCH WITH UNMATCHED ROWS
 PATTERN ((^A*)|A+)
 DEFINE A AS price > 11)
WHERE symbol = 'GLOBEX'
ORDER BY 1, 2;

returns all 20 rows from our data set:

 

Screen Shot 2016 10 07 at 15 29 13

  

LiveSQL

I have taken all the code and the associated explanations and created a tutorial on LiveSQL so you can try out the code for yourself: https://livesql.oracle.com/apex/livesql/file/tutorial_DZO3CVNYA7IYFU1V8H0PWHPYN.html.

Summary

I hope this helps to explain how the various output keywords that are part of the ALL ROWS PER MATCH syntax can affect the results you get back. You should now understand why your results contains match_number values that are not contiguous and why classifier can return a NULL value along with specific aggregate functions. I expect the hardest concept to understand is the idea of empty matches. As I stated earlier it is always a good idea to determine from the start if your pattern is capable of returning an empty match: are you using an asterisk * within the PATTERN clause? Then you can determine how you want to manage those rows: include the empty matches (SHOW EMPTY MATCHES) or exclude them (OMIT EMPTY MATCHES). Be careful if you are using MATCH_NUMBER() within the DEFINE section as part of a formula because empty matches increment the MATCH_NUMBER() counter.

What should be immediately obvious is that in all the examples I have used the default skip behaviour: AFTER MATCH SKIP PAST LAST ROW. In the next post I will explore the various skip keywords and how they can impact the results returned by your MATCH_RECOGNIZE statement.

 

What’s next?

In the next post in this series I am going to review the keywords that control where we restart searching once a pattern has been found: the keywords SKIP TO. Feel free to contact me if you have an interesting use cases for SQL pattern matching or if you just want some more information. Always happy to help. My email address is keith.laker@oracle.com

 

Looking for more Information

Use the tag search to see more information about pattern matching or SQL Analytics or Database 12c.

Technorati Tags: 

 

 

 

 

Tuesday, 4 October 2016

The complete review of data warehousing and big data content from Oracle OpenWorld 2016


Your COMPLETE REVIEW of data warehousing and big fata from #OOW16

The COMPLETE REVIEW of OpenWorld covers all the most important sessions and related content from this year's conference, including Oracle's key data warehouse and big technologies: Oracle Database 12c Release 2, Oracle Cloud, engineered systems, partitioning, parallel execution, Oracle Optimizer, analytic SQL, analytic views, in-memory, spatial, graph, data mining, multitenant, Big Data SQL, NoSQL Database and industry data models.

The COMPLETEreview covers the following areas:
  • On-demand videos of the most important keynotes
  • Overviews of key data warehouse and big data sessions and links to download each presentation
  • List of data warehouse and big data presenters who were at #oow16
  • Overview of Oracle Cloud services for data warehousing and big data
  • Details of OpenWorld 2017 and details of how to justify your trip to San Francisco
  • Links to the data warehouse and big data product web pages, blogs, social media sites
This review is available in Apple iBooks for people who are living in the 21st Century and for those of you stuck in early 1900's there is the fall-back option of a PDF version. Of course the iBook version offers a complete, exciting and immersive multi-media experience whilst the PDF version is a simply and quite literally just a PDF.
Hope this review is useful. Let me know if you have any questions and/or comments. Enjoy!