Using Zeppelin Notebooks with your Oracle Data Warehouse

Over the past couple of weeks I have been looking at one of the Apache open source projects called Zeppelin. It’s a new style of application called a “notebook” which typically runs within your browser. The idea behind notebook-style applications like Zeppelin is to deliver an adhoc data-discovery tool - at least that is how I see it being used. Like most notebook-style applications, Zeppelin provides a number of useful data-discovery features such as:
a simple way to ingest dataaccess to languages that help with data discovery and data analyticssome basic data visualization toolsa set of collaboration services for sharing notebooks (collections of reports) Zeppelin is essentially a scripting environment for running ordinary SQL statements along with a lot of other languages such as Spark, Python, Hive, R etc. These are controlled by a feature called “interpreters” and there is a list of the latest interpreters available here.

A good example of a notebook-type of application is R Stud…

My query just got faster - brief introduction to 12.2 in-memory cursor duration temp tables

This post covers one of the new SQL performance enhancements that we incorporated into Database 12c Release 2. All of these enhancements are completely automatic, i.e. transparent to the calling app/developer code/query. These features are enabled by default because who doesn’t want their queries running faster with zero code changes?

So in this post I am going to focus on the new In-Memory “cursor duration” temporary table feature. Let’s start by looking at cursor duration temp tables…
Above image courtesy of What is a cursor duration temp table? This is a feature that has been around for quite a long time. Cursor duration temporary tables (CDTs) are used to materialize intermediate results of a query to improve performance or to support complex multi step query execution. The following types of queries commonly use cursor duration temp tables:
WITH Clause and parallel recursive WITHGrouping SetsStar TransformationFrequent Item Set CountingXLATE  What happens during the …

Sneak preview of demo for Oracle Code events

I will be presenting at a number of the Oracle Code events over the coming months on the subject of…..(drum roll please) SQL pattern matching. Oracle Code is a great series of conferences dedicated to developers who want to get the absolute maximum benefit from using today's cutting edge technologies. If you want to register for any of the dates listed below then follow this link to the registration page.

North and Latin America
San Francisco ,  March 1, 2017
Austin ,  March 8, 2017
New York City ,  March 21, 2017
Washington DC ,  March 27, 2017
Toronto ,  April 18, 2017
Atlanta June 22, 2017
Sao Paulo , June 27, 2017
Mexico City ,  June 29, 2017 Europe and Middle East
London , April 20, 2017
Berlin , April 24, 2017
Prague , April 28, 2017
Moscow , May 22, 2017
Brussels , June 6, 2017
Tel Aviv , July 11, 2017 Asia
New Delhi , May 10, 2017
Tokyo , May 18, 2017
Beijing , July 14, 2017
Sydney , July 18, 2017
Seoul , August 30, 2017
Bangalore , August 4, 2017
Back to my session...the ac…

MATCH_RECOGNIZE: Can I use MATCH_NUMBER() as a filter?

Recently I spotted a post on OTN that asked the question: Can MATCH_RECOGNIZE skip out of partition? This requires a bit more detail because it raises all sorts of additional questions. Fortunately the post included more information which went something like this:
after a match is found I would like match_recognize to stop searching - I want at most one match per partition. I don’t want to filter by MATCH_NUMBER() in an outer query - that is too wasteful (or, in some cases, I may know in advance that there is at most one match per partition, and I don’t want match_recognize to waste time searching for more matches which I know don't exist). Can MATCH_RECOGNIZE do this? Short answer is: NO.
Long answer is: Still NO.

Going back to the original question… you could interpret it as asking “is it possible to only return the first match”? The answer to this question is YES, it is possible.

There are a couple of different ways of doing it. Let’s use our good old “TICKER”  data set. The po…

It's out now - Database 12c Release 2 available for download

Database 12c Release 2 available for download Yes, it’s the moment the world has been waiting for: the latest generation of the world’s most popular database, Oracle Database 12c Release 2 (12.2) is now available everywhere - in the Cloud and on-premises. You can download this latest version from the database home page on OTN - click on the Downloads tab.So What’s New in 12.2 for Data Warehousing?This latest release provides some incredible new features for data warehouse and big data. If you attended last year’s OpenWorld event in San Francisco then you probably already know all about the new features that we have added to 12.2 - checkout my blog post from last year for a comprehensive review of #oow16:Blog: The complete review of data warehousing and big data content from Oracle OpenWorld 2016 If you missed OpenWorld and if you are a data warehouse architect, developer or DBA then here are the main feature highlights 12.2 with links to additional content from OpenWorld and my data w…

How to intelligently aggregate approximations

The growth of low-cost storage platforms has allowed many companies to actively seeking out new external data sets and combine them with internal historical data that goes back over a very long time frame. Therefore, as both the type of data and the volume of data continue to grow the challenge for many businesses is how to process this every expanding pool of data and at the same time, make timely decisions based on all the available data.
(Image above courtesy of In previous posts I have discussed whether an approximate answer is just plain wrong and whether approximate answers really are the best way to analyze big data. As with the vast majority of data analysis at some point there is going to be a need to aggregate a data set to get a higher level view across various dimensions. When working with results from approximate queries, dealing with aggregations can get a little complicated because it is not possible to “reuse” an aggregated approximate result as…

Dealing with very very long string lists using Database 12.2

Oracle RDBMS 11gR2 introduced the LISTAGG function for working with string values. It can be used to aggregate values from groups of rows and return a concatenated string where the values are typically separated by a comma or semi-colon - you can determine this yourself within the code by supplying your own separator symbol.

Based on the number of posts across various forums and blogs, it is widely used by developers. However, there is one key issue that has been highlighted by many people: when using LISTAGG on data sets that contain very large strings it is possible to create a list that is too long. This causes the following overflow error to be generated:
ORA-01489: result of string concatenation is too long. Rather annoyingly for developers and DBAs, it is very difficult to determine ahead of time if the concatenation of the values within the specified LISTAGG measure_expr will cause an ORA-01489 error. Many people have posted workarounds to resolve this problem - including mysel…