Friday, 20 December 2013

BAE makes an inspired decision to use Big Data Appliance....

Here's another great story about how to use data warehousing and big data technologies to solve real world problems using diverse sets of data using Oracle technology. BAE Systems is taking unstructured, semi-structured, operational and social media data and using it to solve complex problems such as financial crime, cyber security and digital transformation. The volumes of data that BAE deals with are very large and this creates its own set of challenges and problems in terms of optimising hardware and software to work efficiently and effectively together. Although BAE had their own in-house Hadoop experts they chose Oracle Big Data Appliance for their Hadoop cluster because it’s easier, cheaper, and faster to operate.

BAE is working with many telco customers to explore the new areas that are being opened up by the use of big data to manage browsing data and call record data. These data sources are being transformed to provide additional insight for the network operations teams, analysis of customer quality and to drive marketing campaigns.

 

BAE

 

Click on the image to watch the video, or click here: http://medianetwork.oracle.com/video/player/2940549413001

Technorati Tags: , , , , ,

Tuesday, 17 December 2013

SQL Analytics Part 2- Key Concepts

This post continues on from my first post on analytical SQL "introduction to SQL for reporting and analysis" which looked at the reasons why it makes sense to use analytical SQL in your data warehouse and operational projects.  In this post we are going to examine the key processing concepts behind analytical SQL.  

One of the main advantages of Oracle's SQL analytics is that the key concepts are shared across all functions - in effect we have created a unified SQL framework for delivering analytics. These concepts build on existing SQL features to provide developers and business users with a framework that is both flexible and powerful in terms of its ability to support sophisticated calculations. There are four key concepts that you need to understand when implementing features and functions relating to SQL analytics:

  1. Process order
  2. Result-set Partitions
  3. Windows
  4. Current Row

Let's look at each of these topics in more detail.

1) Processing order.

The execution workflow for SQl statements containing analytical SQL is relatively simple:  first all the HAVING, GROUP BY and JOIN predicates are processed. The output from this step is then passed to the analytical functions so all the calculations can be applied. This typically involves the use of window functions which are applied based on the partitions that have been defined with analytic functions applied to each row in each partition. Finally the ORDER BY clause is processed to provide control over the final output. It is useful to keep this workflow in your mind when you are building your analytical SQL because it will help you understand the inputs flowing into your analytical functions and the resulting output.  

2) Result-set partitions

Oracle's analytic functions allow the input data set to be divided into groups of rows which are referred to as "partitions". It is important to note that in this context the term "partition" is completely unrelated to the table partition feature.

These analytical partitions are created after the groups defined with GROUP BY clauses and are can be used by any analytical aggregate functions such as sums and averages. The partitions can be based on any column that is part of the the input data  set and individual partitions can be any size. It is quite possible to create a single partition contain all the rows from the initial query result set or create a small number of very large partitions or a large number of very small partitions where each partition just contains a few rows.

3) Windows

For each row in a partition it is possible to define a window over the data which determines the range of rows used to perform the calculations for the current row (the next section will explain the concept of the "current row")/ The size of a window can be based on either a physical number of rows or a logical interval, which is typically time-based. The window has a starting row and an ending row and depending on how the window is defined it may move at only one end or, in some cases, both ends.

Physical windows

For example a cumulative sum function would have its starting row fixed at the first row in the partition and the ending row would then slide from the starting row all the way to the last row of the partition to create a running total over the rows in the partition. 

SELECT Qtrs
, Months
, Channels
, Revenue
, SUM(Revenue) OVER (PARTITION BY Qtrs) AS Qtr_Sales
, SUM(Revenue) OVER () AS Total_Sales
FROM sales_table


Window Fixed 1

 

Logical windows

f the data set contains a date column then it is possible to use logical windows by taking advantage of Oracle’s built-in time awareness.  A good example of window where the start row changes is the calculation of a moving average. In this case both the starting and end points slide so that a constant physical or logical range is maintained during the processing. The example below creates a four-period moving average and the images show the current-row, which is identified by the arrow, and the moving window, which is marked as the pink area :

 
Window 1 Window 2
Window 3 Window 4
Window 5 Window 6

The concept of a "window" is very powerful and provides a lot of flexibility in terms of being able to interact with the data. A window can be set as large as all the rows in a partition. At the other extreme it could be just a single row. Users may specify a window containing a constant number of rows, or a window containing all rows where a column value is in a specified numeric range. Windows may also be defined to hold all rows where a date value falls within a certain time period, such as the prior month.

When using window functions, the current row is included during calculations, so you should only specify (n-1) when you are dealing with n items - see the next section for more information….

4) Current Row

Each calculation performed with an analytic function is based on a current row within a partition. The current row serves as the reference point and during processing it begins at the starting row, moves throw the following rows until the end row of the window is reached. For instance, a centered moving average calculation could be defined with a window that holds the current row, the six preceding rows, and the following six rows. In the example below the calculation of a running total would be the result of the current row plus the values from the preceding two rows. At the end of the window the running total will be reset. The example shown below creates running totals within a result set showing the total sales for each channel within a product category within year:

SELECT calendar_year
, prod_category_desc
, channel_desc
, country_name
, sales
, units
, SUM(sales) OVER (PARTITION BY calendar_year, prod_category_desc, channel_desc order by country_name) sales_tot_cat_by_channel
FROM . . .

 

SQL A Current Row

 

 

Summary

This post has outlined the four main processing concepts behind analytical SQL. The next series of posts will provide an overview of the key analytical features and functions that use these concepts. In the next blog post we will review the analytical SQL features and techniques that are linked to enhanced reporting which includes: windowing, lag-lead, reporting aggregate functions, pivoting operations and data densification for reporting and time series calculations. Although these topics will be presented in terms of data warehousing, they are actually applicable to any activity needing analysis and reporting. 

If you have any questions or comments about analytical SQL then feel free to contact me via this blog.

 

 

 

Technorati Tags: , , , ,

Thursday, 12 December 2013

Oracle releases Exadata X4 with optimizations for data warehousing

 

Exadata top closed 0056

Support Quote

”Oracle Exadata Database Machine is the best platform on which to run the Oracle Database and the X4 release extends that value proposition,” said Oracle President Mark Hurd. “As private database clouds grow in popularity, the strengths of Oracle Exadata around performance, availability and quality of service set it apart from all alternatives.”

We have just announced the release of the fifth-generation of our flagship database machine: Oracle Exadata Database Machine X4. This latest release introduces new hardware and software to accelerate performance, increase capacity, and improve efficiency and quality-of-service for enterprise data warehouse deployments.

Performance of all data warehousing workloads is accelerated by new flash caching algorithms that focus on table and partition scan workloads that are common in Data Warehouses. Tables that are larger than flash are now automatically partially cached in flash and read concurrently from both flash and disk to speed throughput.

Other key highlights are:

1) Improved workload management
Exadata X4-2 includes new workload management features that will improve the management of data warehouse workloads. Exadata now has the unique ability to transparently prioritize requests as they flow from database servers, through network adapters and network switches, to storage, and back.

We are using a new generation of InfiniBand network protocols to ensure that network-intensive workloads such as reporting, batch and backups do not delay response-time sensitive interactive workloads. Which is great news for IT teams that have to define and manage service level agreements.

2) Bigger flash cache for even faster performance 
We have increased the amount of physical flash within a full rack to 44 TB per full rack. However, the capacity of the logical flash cache has increased by 100% to 88 TB per full rack.

3) Hardware driven compression/decompression

A feature that is unique to Exadata is the Flash Cache Compression. This transparently compresses database data into flash using hardware acceleration to compress and decompress data with zero performance overhead.

4) In-memory processing
For in-memory workloads we increased maximum memory capacity by 100% to 4TB in full rack (using memory expansion kits) which means more workloads will be able to run in-memory with extremely fast response times.

5) Increased support for big data
To support big data projects we increased the capacity of the high performance disks to over 200 TB per full rack and for high capacity disks the storage capacity is now 672 TB per full rack. Once you factor in Oracle Exadata's compression technologies then a full rack is capable of storing petabytes of user data. 

The full press release is here: http://www.oracle.com/us/corporate/press/2079925

 

Technorati Tags: , , ,

Location Intelligence and Oracle Spatial Summit May 19-21, 2014

If you are interested in spatial analytics or just wanting to understand what Oracle can offer in terms of bringing graph and spatial analytics to your projects then this is the conference for you: Location Intelligence and Oracle Spatial Summit. It is taking place next year between May 19-21 in Washington, DC and as well as great presentations there will also be technical training and workshops. The conference is divided into three areas:

May 19 - Technical Workshops

May 20 - Location Intelligence, HERE Summit and LocationTech Summit

May 21 - Oracle Spatial Summit)Location Intelligence, HERE Summit, and LocationTech Summit + Workshops

This is, most definitely, the premier educational event for Oracle’s spatial and enterprise information technologies. The Oracle Spatial and Graph Summit will take place on May 19 and May 21 and it will offer workshops, technical deep dives and use case sessions led by the Oracle product management teams. Sessions will cover spatial, location and mapping technologies, in the context of big data, BI, location-enabled cloud solutions, and more.

If you are already using the spatial and/or graph option in your projects and have a great story to tell then please think about presenting at the conference - it is great way to raise your profile! The "Call for Speakers" is now open. To view submission guidelines and submit an abstract, go to the proposal page(http://www.locationintelligence.net/dc/proposal-form). The deadline is January 31, 2014 and early submissions are encouraged.

You can register for the full conference or just specific sessions on the event website. For more information, including agenda and speaker information, visit the conference website: http://www.locationintelligence.net/dc/.

 

Technorati Tags: , , ,

Wednesday, 11 December 2013

dunnhumby increases customer loyalty with Oracle Big Data

dunnhumby presented at this year's OpenWorld where they outlined the how and why of data warehousing on Exadata.  Our engineered system delivered a performance improvement of more than 24x. dunnhumby pushes its data warehouse platform really hard with more than 280 billion fact rows and 250 million dimension rows for one large retailer client alone, dunnhumby’s massive data requires the best performance the industry has to offer.

In Oracle Exadata, dunnhumby has found that solution. Using Oracle Exadata’s advanced Smart Scan technology and robust Oracle Database features. This new environment has empowered its analysts to perform complex ad hoc queries across billions of fact rows and hundreds of millions of dimension rows in minutes or seconds, compared to hours or even days on other platforms. 

You can download the presentation by Philip Moore - Exadata Datawarehouse Architect, Dunnhumby USA LLC -  from the OpenWorld site, see here: https://oracleus.activeevents.com/2013/connect/sessionDetail.ww?SESSION_ID=3412.

If you missed Philip's session at OpenWorld then we have just released a new video interview with Chris Wones, Director of Data Solutions at dunnhumby. During the interview Chris outlines some of the challenges his team faced when trying to do joined up analytics across disparate and disconnected data sets and how Exadata allowed them to bring everything together so that they could run advanced analytical queries that were just not possible before and that meant being able to bid on completely new types of contracts. The combination of Exadata and Oracle Advanced Analytics are delivering real business benefit to dunnhumby and its customers.

For more information about Oracle's Advanced Analytics option checkout Charlie Berger's advanced analytics blog: http://blogs.oracle.com/datamining and Charlie's twitter feed: https://twitter.com/CharlieDataMine

To watch the video click on the image: 

Dunnhumby

If the video does not start follow this link: http://medianetwork.oracle.com/video/player/2889835899001

Technorati Tags: , , , ,

Tuesday, 10 December 2013

Introduction to SQL for reporting and analysis

I have decided to start a series of blog posts on Oracle's in-database SQL analytics. The aim of this series of posts is to provide some historical background to show how SQL analytics has evolved and then start to explore some of the core concepts behind our SQL analytics followed by an examination of some of the key features and functions. The reason for creating this series of posts is that in talking to customers and partners it is clear to me that many project teams are just not aware of all the SQL features inside the Oracle Database that support enhanced reporting, data sampling, advanced aggregations, statistical analysis, pattern matching and spreadsheet-like data modelling.

Today, data warehouse and operational development teams need to quickly deliver new and innovative applications that provide significant competitive advantage and drive additional revenue streams. The challenge facing many teams is to find the right platform and language to securely and efficiently manage the data and analytical requirements while at the same time supporting the broadest range of tools and applications to maximize the investment in existing skills.

While many teams opt to use external analytical engines and/or use bespoke application code, Oracle believes that the most efficient way to process data is: a)inside the database (taking the analytics to the data) and b)using SQL as the analytical language. The following post provides some historical background to the evolution of our in-database analytics and explains why using SQL as the default development language for analytics delivers so many important benefits.

Flexible and Comprehensive Development Framework

Oracle’s in-database analytical SQL offers an agile analytical development platform that enhances both query performance and developer productivity. It supports all types of processing, ranging from real-time analysis to interactive decision support to operational batch reporting. Using Oracle Database both corporate developers and independent software vendors can take advantage of a rich set of SQL based analytical features within their projects and applications.

History

First some historical background information. Let’s look at how the list of SQL in-database analytics has evolved over time:

SQL A 1

In versions 5, 6 and 7 we laid down the basic foundations for SQL analytics with functions for aggregations and advance data processing. The development of what we today call SQL Analytics really started with the release of Oracle 8i. In this release we we extended the aggregation capabilities by creating the multi-dimensional-like CUBE and ROLLUP features along with analytic window functions (more on this in a future blog post).

As part of Oracle 9i we introduced the concept of grouping sets, hypothetical ranking (the rank or percentile that a row would hypothetically have if inserted into a specified data set which is useful for what-if analysis), first/last functions and the ability to create inverse distribution.
With 10g we added the SQL model clause that brings a spreadsheet-like approach to data processing (more on this in a future post). We added advanced analytical processing in the form of data mining features and functions.

In 11g we added the pivot/unpivot features that allows you to create spreadsheet-like crosstab reports directly from a table using simple SQL. We extended the capabilities of the window functions and we delivered the recursive WITH syntax that provides more flexibility for recursive processing compared to the CONNECT BY clause. And we added more data mining features and functions.

With 12c we have introduced generic top-n filtering along with big data features such as pattern matching. Obviously this is just a high level summary of core areas of functionality that we have delivered over the last 15-20 years. Here is a more detailed view of what is available in Oracle Database 12c - remember all these features and functions are delivered free of charge, ready to use straight out the box:

  • Ranking functions
    • rank, dense_rank, cume_dist, percent_rank, ntile
  • Window Aggregate functions (moving and cumulative) 
    • Avg, sum, min, max, count, variance, stddev, first_value, last_value, Top-N, pivot, unpivot
  • LAG/LEAD functions
    • Direct inter-row reference using offsets
  • Reporting Aggregate functions
    • Sum, avg, min, max, variance, stddev, count, ratio_to_report, cube, rollup, grouping, grouping sets
  • Pattern Matching
    • match_number, classifier
  • SQL Model clause
  • Statistical Aggregates
    • Correlation, linear regression family, covariance
  • Linear regression
    • Fitting of an ordinary-least-squares regression 
    • Frequently combined with the COVAR_POP, COVAR_SAMP, and CORR functions
  • Descriptive Statistics
    • DBMS_STAT_FUNCS: summarizes numerical columns of a table and returns count, min, max, range, mean, stats_mode, variance, standard deviation, median, quantile values, +/- n sigma values, top/bottom 5 values
  • Correlations
    • Pearson’s correlation coefficients, Spearman's and Kendall's (both nonparametric). 
  • Cross Tabs
    • Enhanced with % statistics: chi squared, phi coefficient, Cramer's V, contingency coefficient, Cohen's kappa
  • Hypothesis Testing
    • Student t-test , F-test, Binomial test, Wilcoxon Signed Ranks test, Chi-square, Mann Whitney test, Kolmogorov-Smirnov test, One-way ANOVA
  • Distribution Fitting
    • Kolmogorov-Smirnov Test, Anderson-Darling Test, Chi-Squared Test, Normal, Uniform, Weibull, Exponential


As you can see there is a very large library of free SQL analytical functions available for DBAs and developers to incorporate into their workflows and applications.

What are the advantages of using SQL as the foundation for delivering sophisticated analytics? 

The key benefits provided by Oracle’s in-database analytical SQL are:

  • Enhanced developer productivity
  • Improved manageability
  • Minimized learning effort
  • Investment protection (through ANSI SQL compliance) 
  • Increased performance

 

 

1) Enhanced developer productivity

Using the latest built-in analytical SQL capabilities, developers can simplify their application code by replacing complex code – written in SQL and/or other languages - with analytical-SQL that is much clearer and more concise. Tasks that in the past required the use of procedural languages or multiple SQL statements can now be expressed using single, simple SQL statements. This simplified SQL (analytical SQL) is quicker to formulate, maintain and deploy compared to older approaches, resulting in greater developer productivity. The productivity benefits also extend to SQL-literate business users who are now able to write their own reports and workflows and manage their own enhancement requests. 
 
In the example shown below, the left box shows the SQL used to calculate the average wait time between two specific events: 1001 and 1002. It requires the use of multiple select statements to extract the data for each event and create two result sets with the data correctly ordered so the main SQL statement can compute the average wait. The box on the right replaces the multiple select statements with a simple LAG function. This makes the code much cleaner, quicker to develop, simpler to test and it is much easier to understand.
 
 
SQL A 2a
 

 

 

2) Improved Manageability

When computations are centralized close to the data then the inconsistency, lack of timeliness and poor security of calculations scattered across multiple specialized processing platforms completely disappears. The ability to access a consolidated view of all your data is simplified when applications share a common relational environment rather than a mix of calculation engines with incompatible data structures and languages.
 
Oracle’s approach allows developers to efficiently layer their analysis using SQL because it can support a broad range of business requirements. For example, a requirement to select all the regions contributing more than 50% of total sales and where the salesperson is contributing more than 18% of regional sales can all be expressed in a single, simple SQL statement using analytical SQL:
 
SELECT person, region, sales FROM
 (SELECT person, region, sales,
         SUM (sales) OVER (PARTITION BY region) r_sales
         SUM (sales) OVER () t_sales
  FROM sales_table)
WHERE r_sales > 0.5 * t_sales AND sales > 0.18 * r_sales;
 
 
 
Some of the key analytical SQL features are shown above in bold and these will be explored in more detail in the following series of blog posts that will rollout over the coming months.
 

3) Minimized Learning Effort

The amount of effort required to understand analytical SQL is minimized through the use of careful syntax design. Built-in analytical SQL constructs typically leverage existing SQL constructs, such as the aggregate functions SUM and AVG, to extend these well-understood keywords. 
 
Most developers and business users have a reasonable level of proficiency with SQL and can quickly adopt and integrate analytical features, such as pareto-distributions, pattern matching, cube and rollup aggregations into their applications and reports. The amount of time required for enhancements, maintenance and upgrades is minimized: more people will be able to review and enhance the existing SQL code rather than having to rely on a few key people with specialized programming skills.
 

4) ANSI SQL compliance

Most of Oracle’s analytical SQL is part of the ANSI SQL standard. This ensures broad support for these features and rapid adoption of newly introduced functionality across applications and tools – both from Oracle’s partner network and other independent software vendors. Oracle is continuously working with its many partners to assist them in exploiting the expanding library of analytic functions. Already many independent software vendors have integrated support for the new Database 12c in-database analytic functions into their products.
 

5) Improved performance

Oracle’s in-database analytical SQL enables significantly better query performance. Not only does it remove the need for specialized data-processing silos but also the internal processing of these purpose-built functions is fully optimized. Using SQL unlocks the full potential of the Oracle database - such as parallel execution - to provide enterprise level scalability unmatched by external specialized processing engines.
 
In many cases the use of analytical SQL can improve performance by removing the need to create self-joins that result in large fact tables being accessed twice. By using analytical SQL the optimiser can create a much simpler query plan with zero joins, as shown below:
 

SQL A 4
 
 

Summary

This section has outlined how Oracle’s in-database analytical SQL provides IT and application development teams with a robust and agile analytical development platform that enhances both query performance and developer productivity while providing investment protection by building on existing standards based skills. 
The following sections will explore some of the key concepts behind Oracle’s analytical SQL and provides an overview of the key analytical features and functions that use these key concepts.
 

Using SQL to solve business problems

These can help you answer typical business questions such as:
 
  • Who are the top ten sales-reps in each region?
    • We can use the Rank function
  • What is the 90-day moving average of stock price?
    • We can create a moving average by using the Window feature
  • What is the percentage growth of  Jan-2010 sales over Jan-2009?
    • We can use a period-over-period comparisons
  • What are January’s sales as a percentage of the entire year’s?
    • Compare aggregates on different levels
 
 

What is next...

Over the next few months I will be posting a series of articles on the processing concepts behind analytical SQL followed by reviews of the main types in-database functions and features that are part of Oracle Database 12c:
  • Enhanced reporting
  • Data Sampling
  • Advanced aggregations
  • Statistical analysis
  • Pattern matching
  • Spreadsheet-like data modelling
 
 
 
 

Technorati Tags: , , , ,

Friday, 6 December 2013

Read-All-About-It: now on your tablet and smartphone

To follow-up on last week's post where I announced the new weekly data warehouse newsletter this week I have setup a Flipboard magazine for tablet/smartphone users. I will update the magazine each month with the very best articles from the various Oracle data warehouse blogs, press releases, YouTube videos, Facebook posts etc. In this first edition there is:
  • a video interview with George Lumpkin, Vice President of Product Management for Data Warehousing and Big Data
  • a customer video of Chinese e-commerce company Yihaodian
  • a look at SQL mashups using pattern matching and spatial analytics
  • a link to the latest big data whitepaper from the Economist and Oracle
  • a review of the multi-tenant feature that is a key part of Database 12c
  • a review of simple and advanced time series with Oracle R Enterprise by the R PM team
…. and so much more.
Screen Shot 2013 11 27 at 18 54 18
You can subscribe to this new monthly  magazine by following this link: http://flip.it/nw8FU. I hope you enjoy this month's edition and I will be refreshing the content at the end of December. Flipboard is available on iOS, Android, Windows and Blackberry platforms and you can download the relevant app from each vendors apple store. There is more information at www.flipboard.com.
Please let me know if you find this new magazine useful. There are so many ways to make our content available it is useful to know what works and what does not work.
Enjoy our new monthly magazine.
Technorati Tags: , , , ,