Introduction

In this post and in the following two, we are going to discuss about the Oracle Analytical Functions.

Analytic functions compute an aggregate value based on a group of rows. They differ from aggregate functions in that they return multiple rows for each group. The group of rows is called a window and is defined by the analytic_clause. For each row, a sliding window of rows is defined.

The window determines the range of rows used to perform the calculations for the current row. Window sizes can be based on either a physical number of rows or a logical interval such as time.

Classification

In Oracle Database there is a large set of analytical functions. Broadly, we can classify these functions in the following main categories:

  • Rankings and percentiles
  • Moving window calculations
  • Lag/lead analysis
  • First/last analysis
  • Linear regression statistics

If we look at them from the perspective of applicability, the analytical functions can be framed in the following types:

Type Used for
Ranking Calculating ranks, percentiles, and n-tiles of the values in a result set.
 Windowing Calculating cumulative and moving aggregates. Works with these functions: SUM, AVG, MIN, MAX, COUNT, VARIANCE, STDDEV, FIRST_VALUE, LAST_VALUE, and new statistical functions. Note that the DISTINCT keyword is not supported in windowing functions except for MAX and MIN.
 Reporting Calculating shares, for example, market share. Works with these functions: SUM, AVG, MIN, MAX, COUNT (with/without DISTINCT), VARIANCE, STDDEV, RATIO_TO_REPORT, and new statistical functions. Note that the DISTINCT keyword may be used in those reporting functions that support DISTINCT in aggregate mode.
 LAG/LEAD  

Finding a value in a row a specified number of rows from a current row.

 FIRST/LAST  

First or last value in an ordered group.

 

Linear Regression

 

Calculating linear regression and other statistics (slope, intercept, and so on).

 

Inverse Percentile

 

The value in a data set that corresponds to a specified percentile.

 

Hypothetical Rank and Distribution

 

The rank or percentile that a row would have if inserted into a specified data set.

 

For a better understanding and a correct application of analytical functions you must:

  1. Have o visual representation of the analytical functions
  2. Understand some general concepts behind all them.

1. Anatomy of analytical functions

So, the analytic functions have three basic components:

  • partition-by-clause
  • order-by-clause
  • the windowing-clause

The basic syntax is as follows:

analytical_function (argument1, argument2,…,argumentN)
over ([partition-by-clause] [order-by-clause] [windowing-clause])

where

analytical_function is the analytic function that we wish to call (this function can accepts zero or more arguments)

partition-by-clause groups the rows by partitioning column values.

order-by-clause sorts the rows in a data partition by a column or expression

windowing-clause specifies the subset of rows on which the analytic function operates (it is not supported by all analytical functions)

2. Concepts for analytical functions

a. Processing order

Query processing using analytic functions takes place in three steps:

  • First, all joins, WHERE, GROUP BY and HAVING clauses are performed
  • Second, the result set is made available to the analytic functions, and all their calculations take place
  • Third, if the query has an ORDER BY clause at its end, the ORDER BY is processed to allow for precise output ordering

b. Result set partition

Note. For analytical functions the ‘partition’ is a concept that is unrelated with table partition feature.

The analytic functions allow users to divide query result sets into groups of rows called partitions.

c. For each row in a partition, you can define a sliding window of data.

By window you can determine the range of rows used to perform the calculations for the current row. Windows sizes is either a physical number of rows or a logical interval such as time. The window has a starting row and an ending row and may move at one or both ends.

d. Current row

In an analytical function there is a current row within a partition. The current row serves as the reference point determining the start and end of the window.

To be continued…

 

Bibliography:

Database Data Warehousing Guide/SQL for Analysis and Reporting

Pro Oracle SQL by Morton, K., Osborne, K., Sands, R., Shamsudeen, R., Still, J.

 

Leave a Reply

Your email address will not be published. Required fields are marked *

Post comment