Introduction
In this post and in the following two, we are going to discuss about the Oracle Analytical Functions.
Analytic functions compute an aggregate value based on a group of rows. They differ from aggregate functions in that they return multiple rows for each group. The group of rows is called a window and is defined by the analytic_clause. For each row, a sliding window of rows is defined.
The window determines the range of rows used to perform the calculations for the current row. Window sizes can be based on either a physical number of rows or a logical interval such as time.
Classification
In Oracle Database there is a large set of analytical functions. Broadly, we can classify these functions in the following main categories:
- Rankings and percentiles
- Moving window calculations
- Lag/lead analysis
- First/last analysis
- Linear regression statistics
If we look at them from the perspective of applicability, the analytical functions can be framed in the following types:
| Type | Used for |
| Ranking | Calculating ranks, percentiles, and n-tiles of the values in a result set. |
| Windowing | Calculating cumulative and moving aggregates. Works with these functions: SUM, AVG, MIN, MAX, COUNT, VARIANCE, STDDEV, FIRST_VALUE, LAST_VALUE, and new statistical functions. Note that the DISTINCT keyword is not supported in windowing functions except for MAX and MIN. |
| Reporting | Calculating shares, for example, market share. Works with these functions: SUM, AVG, MIN, MAX, COUNT (with/without DISTINCT), VARIANCE, STDDEV, RATIO_TO_REPORT, and new statistical functions. Note that the DISTINCT keyword may be used in those reporting functions that support DISTINCT in aggregate mode. |
| LAG/LEAD |
Finding a value in a row a specified number of rows from a current row. |
| FIRST/LAST |
First or last value in an ordered group. |
|
Linear Regression |
Calculating linear regression and other statistics (slope, intercept, and so on). |
|
Inverse Percentile |
The value in a data set that corresponds to a specified percentile. |
|
Hypothetical Rank and Distribution |
The rank or percentile that a row would have if inserted into a specified data set. |
For a better understanding and a correct application of analytical functions you must:
- Have o visual representation of the analytical functions
- Understand some general concepts behind all them.
1. Anatomy of analytical functions
So, the analytic functions have three basic components:
- partition-by-clause
- order-by-clause
- the windowing-clause
The basic syntax is as follows:
analytical_function (argument1, argument2,…,argumentN)
over ([partition-by-clause] [order-by-clause] [windowing-clause])
where
analytical_function is the analytic function that we wish to call (this function can accepts zero or more arguments)
partition-by-clause groups the rows by partitioning column values.
order-by-clause sorts the rows in a data partition by a column or expression
windowing-clause specifies the subset of rows on which the analytic function operates (it is not supported by all analytical functions)
2. Concepts for analytical functions
a. Processing order
Query processing using analytic functions takes place in three steps:
- First, all joins, WHERE, GROUP BY and HAVING clauses are performed
- Second, the result set is made available to the analytic functions, and all their calculations take place
- Third, if the query has an ORDER BY clause at its end, the ORDER BY is processed to allow for precise output ordering
b. Result set partition
Note. For analytical functions the ‘partition’ is a concept that is unrelated with table partition feature.
The analytic functions allow users to divide query result sets into groups of rows called partitions.
c. For each row in a partition, you can define a sliding window of data.
By window you can determine the range of rows used to perform the calculations for the current row. Windows sizes is either a physical number of rows or a logical interval such as time. The window has a starting row and an ending row and may move at one or both ends.
d. Current row
In an analytical function there is a current row within a partition. The current row serves as the reference point determining the start and end of the window.
To be continued…
Bibliography:
Database Data Warehousing Guide/SQL for Analysis and Reporting
Pro Oracle SQL by Morton, K., Osborne, K., Sands, R., Shamsudeen, R., Still, J.