Wednesday, 10 December 2014

Analytical SQL scripts now on Github

After this year’s OpenWorld I ran a 1-day workshop on analytical SQL for our Data Warehouse and Big Data Global Leaders customers. This was part of the Global Leaders ‘Days with Development' programme. We had a pack room at the Oracle Conference Center and I was very lucky to have Stew Ashton, Technical Architect, BNP Paribas, Dr. Holger Friedrich, CTO at sumIT AG and Joerg Otto, Head of DB Engineering, IDS GmbH co-present with me and have them explain how they were using analytical SQL in their projects.
 The workshop covered the following topics:
  • Analytic SQL concepts and foundations
  • Analytic functions for reporting
  • Analytic functions for aggregation
  • More advanced and new 12c features: Pattern Matching
  • SQL Model clause
For the workshop I created a comprehensive slide deck (I will post the presentation shortly on our OTN home page) which included code samples and explain plans to highlight the key benefits of using our analytical SQL features and functions. The great news is that I now have a repository for analytical SQL code samples on the Github repository. To kick things off on this new repository I have posted all the SQL scripts that I created for this workshop so you can now download and work through a series of use cases that explain how to use window functions, intelligently aggregate data, manipulate rows and columns of data, find patterns and create what-if scenarios. Below is my repository home page where you can download the code:
Github Repository for Analytical SQL

So what is Github?

At a simple level, it is an online version control system (and a lot more!) that stores and manages modifications to code script files within in a central repository. Its key benefit is that it makes it very easy for developers to work together on a common project. This environment makes it easy to download a new version of code scripts, make changes, and upload the revisions to those files. Everyone can then see these new changes, download them, and contribute. This system is very popular with developers so we have decided to join this community and make our SQL scripts available via this site. It is the ability to “collaborate” which is most important for me.
To help you get started there is a great section on the website called “Set Up Git”. If like me you are using a Mac then GitHub has a Mac client! You can use it without ever touching the command line interface (which can be a little frustrating at times!).

You can contribute too!

It would be great if you could contribute your own scripts to this repository so I can build up a library of scripts for analytical SQL. All you need to do is create an account on Github, search for the analytical SQL repository and then either download the repository as a zip file or use the “Clone in Desktop” option. What I want to do is build up a series of well documented use cases and when we have enough content then I will create industry specific folders to help organize the content.
So my new repository is now live, please have a look and feel free to upload your own scripts that show how you have used analytical SQL to solve specific business problems within your projects. Looking forward to lots of files arriving into this great new repository.

Technorati Tags: , , ,

Friday, 5 December 2014

Part 5: X-Charging for Sandboxes

This is the next part in my on-going series of posts on the topic of how to successfully manage sandboxes within an Oracle data warehouse environment. In Part 1 I provided an overview of sandboxing (key characteristics, deployment models) and introduced the concept of a lifecycle called BOX’D (Build, Observe, X-Charge and Drop). In Part 2 I briefly explored the key differences between data marts and sandboxes. Part 3 explored the Build-phase of our lifecycle. Part 4 explored the Observer-phase of our lifecycle so we have now arrived at the X-Charge part of our model.
To manage the chargeback process for our sandbox environment we are going to use the new Enterprise Manager 12c Cloud Management pack, for more information visit the EM home page on OTN.
Why charge for your providing sandbox services? The simple answer is that placing a price or cost on a service ensures that the resources are used wisely. If a project team incurred zero costs for their database environment then there is no incentive to evaluate the effectiveness of the data set and the cost-benefit calculation for the project is skewed by the lack of real-world cost data. This type of approach is the main reason why sandbox projects evolve over time into “production” data marts. Even if the project is not really delivering on its expected goals there is absolutely no incentive to kill the project and free up resources. Therefore, by not knowing the cost, it is impossible to establish the value.
The benefits of metering and x-charging are that it enables project teams to focus on the real value of their analysis. If all analysis is free then it is almost impossible to quantify the benefits or costs of a particular analysis. Project teams can also use x-charging as a way to adjust their consumption of resources and control their IT costs. It benefits the IT team as it enables them to achieve higher utilisation rates across their servers. Most importantly the cost-element attached to running a sandbox acts as a string incentive to finalize and shutdown sandboxes ensuring that they do not morph into uncontrolled marts.
There is a fantastic whitepaper on this topic, which explores the much wider topic of metering and chargeback within a cloud environment which is available on the Enterprise Manager webpage, click here to view the whitepaper.


Enterprise Manager 12c uses the rich monitoring and configuration data that is collected for Enterprise Manager targets as the basis for a metering and chargeback solution. Enterprise Manager Chargeback provides the administrator with:
  • Assignment of rates to metered resources
  • Management of a cost center hierarchy
  • Assignment of resources to cost centers
  • Usage and charge-back reports
This set of features can be used to implement a chargeback regime for analytical sandboxes. There is a rich set of API’s that allow you to extract metering and charge data so that it can be incorporated into enterprise billing solutions such as Oracle Billing and Revenue Management application.
Setting up a x-charging framework for our analytical sandboxes involves three key stages:
  • Creating chargeback plans for resources and database options
  • Defining users and cost centers to “take” charges
  • Reporting on usage and charges
Let’s look at each of this stages in more details:

Step 1: Creating charge plans

A Charge Plan is created by the DBA and it defines the metered resources along with the associated rates. Enterprise Manager Chargeback offers two types of Charge Plan – Universal Charge Plan and Extended Charge Plans.
The Universal Charge Plan is the simplest way to enable chargeback for sandboxes and is probably adequate for the vast majority of projects. It contains just 3 metrics:
  • CPU Usage
  • Memory Allocation
  • Storage Allocation
and the DBA can set the rates for each metric as shown here:
Charge Plans

Even with this basic profile you can implement quite sophisticated charging models. It is possible to vary the rates used in charge calculations by month/period. Each “period" is known as a “Reporting Cycle”. If rates are modified, the updated rates will be used to re-calculate the charges for all days from the first of the current period onwards.
Some projects may need access to analytical features that are costed database options. For example, if a project needs to build data mining models then they will require the Oracle Advanced Analytics option. Alternatively, to support semantic analysis or social network analysis requires the use of the spatial and graph option. Extended Charge Plans allow the DBA to factor in charging for database options alongside the standard charging metrics of the Universal Charge Plan. For database options it makes sense to make use of the ability to create fixed cost charges to effectively “rent-out" each option for each sandbox environment. Of course if a project suddenly decides it needs access to a specific type of analytical option, such as in-memory, it simply a case of adding the relevant cross-charge item to the profile for the specific sandbox and the project team can start using that feature right away (assuming the database instance has the correct options pre-installed).
Charge Plans Extended

Step 2 Setting up users and costs centres

When administering a self-service analytic sandbox, it is necessary to meter resource consumption for each self-service user. These costs then need to rolled up into an aggregate level such as cost centers to generate a total charge for each department/project-team accessing the sandbox. For ease of administration and chargeback the self-service users can be represented within a Cost Center structure. Each cost center contains list of “consumers” who have access to the sandbox and of course its associated resources. The cost centers can be organized in a hierarchical fashion to support aggregation and drill down with the cost analysis or billing reports. A typical hierarchical cost centers within a project might look something like this:
Cost center hierarchy

Step 3: Chargeback Reports

Any chargeback solution will involve reporting so that users can understand how their use of sandbox (storing data, running reports etc) translates to charges. Enterprise Manager provides reports that show both resource usage and charging information. This is broken down into two categories of reports: summary and trending reports.
Summary Reports show information related to charge or resource utilisation broken down by cost center, target type and resource. These reports allow both sandbox owners and business users to drill down and quickly assess analyse charges in terms of type of target (database instance, host operating environment, virtual machine etc) or cost centers as shown below.
EM summary report
Trending Reports These reports show metric or charge trends over time and are useful for project teams who want to see how their charges change over time. At an aggregate level the I.T. team can use this information to help them with capacity planning. A report of CPU usage is shown below.
EM trend report

What’s missing?

While this latest version of enterprise manager has some great features for managing analytical sandboxes it would be really useful if the project team could enter a total budget for their sandbox. This budget could then shown on graphs such as the trending report. It would be useful to know how much of the budget has been spent, how many days-periods of budget remain based on current spending patterns etc. Of course once the budget has been used up it would be useful if the sandbox could be locked - this would focus the minds of the project team and ensure that a sandbox does not evolve into a “live” data mart. Which brings us nicely to the next blog post which will be on the final part of our lifecycle model: ensuring that sandboxes have a “Drop” phase.
If you want more information about how to setup the chargeback plans then there is a great video on the Oracle Learning Library: Oracle Enterprise Manager 12c: Setup and Use Chargeback.