Oracle Analytical Functions (3)

Other Oracle Analytical Functions

Continued from previous post Oracle Analytical Functions (2)

Rank Function

The rank function returns the position of a row, as a number, in an ordered set of rows.

If the rows are sorted by columns, then the position of a row in a window reflects the rank of the value in that window of rows.

In the case of a tie, rows with equal value have the same rank and the ranks are skipped, leaving gaps in the rank values. This means that two rows can have the same rank, and the ranks are not necessarily consecutive.

Syntax is as follow:

rank() over (partition-clause order-by-clause)

There is an example using the rank function below:

SELECT first_name,
last_name,
salary,
hire_date,
d.department_id,
department_name,
RANK() OVER (ORDER BY salary DESC) rnk,
FROM employees e
INNER JOIN departments d
ON e.department_id = d.department_id

Dense Rank function

dense_rank is a variant of the rank function. The difference between the rank and dense_rank functions is that the dense_rank function does not skip the ranks in the case of tie. The dense_rank function is useful in finding top, bottom, or inner n rows in a result set. In the following example the dense_rank function is used instead of the rank function.

SELECT first_name,
last_name,
salary,
hire_date,
d.department_id,
department_name,
DENSE_RANK() OVER (PARTITION BY e.department_id ORDER BY salary, hire_date DESC) drnk
FROM employees e
INNER JOIN departments d
ON e.department_id = d.department_id;

Row_number function

The row_number function assigns a unique number for each row in the ordered result set. If the partitioning clause is specified, then each row is assigned a unique number within a data partition, based on its position in the sort order in that partition.

If the partitioning clause is not specified, then each row in the result set is assigned a unique number.

The row_number function is also useful to fetch top, bottom, or inner n queries, similar to the rank and dense_rank functions.

Syntax for the row_number function is as follows:

row_number() over (partition-clause order-by-clause)

NOTE. The row_number function is a nondeterministic function. The value of the row_number function is undetermined if two rows have the same value in a data partition. On the contrary, rank and dense_rank functions are deterministic and always return consistent values if a query is reexecuted.

Below is an example of row_number function compared to rank and dense_rank functions:

SELECT first_name,
last_name,
salary,
hire_date,
d.department_id,
department_name,
ROW_NUMBER() OVER (ORDER BY salary DESC) rn,
RANK() OVER (ORDER BY salary DESC) rnk,
DENSE_RANK() OVER (ORDER BY salary DESC) drnk
FROM employees e
INNER JOIN departments d
ON e.department_id = d.department_id;

ratio_to_report function

The analytic function ratio_to_report calculates the ratio of a value to the sum of values in the data partition. If the partitioning clause is not specified, this function calculates the ratio of a value to the sum values in the whole result set. This analytic function is very useful in calculating ratios at various levels without a need for self-joins.

ratio_to_report is useful in computing the percentage of a value compared with the total value in a report.

SELECT first_name,
last_name,
salary,
hire_date,
d.department_id,
department_name,
RATIO_TO_REPORT(salary) OVER (PARTITION BY e.department_id) ratio_rep,
RATIO_TO_REPORT(salary) OVER () ratio_rep_all
FROM employees e
INNER JOIN departments d
ON e.department_id = d.department_id;

percent_rank function

The percent_rank function returns the rank of a value in a data partition, expressed as a fraction between 0 and 1.

percent_rank is calculated as (rank – 1)/(n – 1), where n is the number of elements in the data partition if the partitioning clause is specified, or the total number of rows in the result set if the partitioning clause is not specified.

And an example for percent_rank function:

SELECT * FROM (
SELECT first_name,
last_name,
salary,
hire_date,
d.department_id,
department_name,
100*PERCENT_RANK() OVER (PARTITION BY e.department_id ORDER BY salary, hire_date DESC) prc_rnk
FROM employees e
INNER JOIN departments d
ON e.department_id = d.department_id
)
WHERE prc_rnk < 50;

percentile_cont function

The percentile_cont function is useful to compute the interpolated values, such as the median household income per region or city. The percentile_cont function takes a probability value between 0 and 1 and returns an interpolated percentile value that equals the percent_rank value with respect to the sort specification.

Syntax for the percentile_cont function is as follows:

percentile_cont(expr) within group (sort-clause) over (partition-clause order-by-clause)

We observe that the syntax for the percentile_cont function is slightly different from the previous analytic functions. There is a new clause within group (order by sale desc) that replaces the order by clause, and it is functionally the same as specifying an order by clause.

SELECT first_name,
last_name,
salary,
hire_date,
d.department_id,
department_name,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary DESC) OVER (PARTITION BY e.department_id) prc_cnt,
PERCENT_RANK() OVER (PARTITION BY e.department_id ORDER BY salary, hire_date DESC) prc_rnk
FROM employees e
INNER JOIN departments d
ON e.department_id = d.department_id;

percentile_disc function

The percentile_disc function is functionally similar to percentile_cont except that the percentile_cont function uses a continuous distribution model and the percentile_disc function assumes a discrete distribution model. When there is no value matching exactly with the specified percent_rank, then percentile_cont (0.5) computes an average of the two nearest values.

SELECT first_name,
last_name,
salary,
hire_date,
d.department_id,
department_name,
PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY salary DESC) OVER (PARTITION BY e.department_id) prc_disc_desc,
PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY salary) OVER (PARTITION BY e.department_id) prc_disc_asc
FROM employees e
INNER JOIN departments d
ON e.department_id = d.department_id;

NTILE function

The NTILE function divides an ordered set of rows in a data partition, groups them into buckets, and assigns a unique group number to each group. This function is useful in statistical analysis. For example, if you want to remove the outliers, you can group them in the top or bottom buckets and eliminate those values from the statistical analysis.

SELECT first_name,
last_name,
salary,
hire_date,
d.department_id,
department_name,
NTILE(10) OVER (PARTITION BY e.department_id ORDER BY salary) group#
FROM employees e
INNER JOIN departments d
ON e.department_id = d.department_id;

stddev function

The stddev function can be used to calculate standard deviation among a set of rows in a data partition, or in the result set if no partitioning clause is specified. This function calculates the standard deviation, defined as the square root of variance, for a data partition specified using a partitioning clause.

SELECT first_name,
last_name,
salary,
hire_date,
d.department_id,
department_name,
STDDEV(salary) OVER (PARTITION BY e.department_id ORDER BY salary, hire_date ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) stddv
FROM employees e
INNER JOIN departments d
ON e.department_id = d.department_id;

listagg function

Beginning with Oracle Database version 11gR2 was introduced another analytic function, the listagg function, which is very useful in string manipulation. This analytic function provides the ability to convert column values from multiple rows into groups in a list format based on one or more expressions in the partition-by-clause.

Syntax for this function is of the following format:

listagg (string, separator ) within group (order-by-clause) over (partition-by-clause)

Below is an example of listagg function:

SELECT LISTAGG (department_name, ',')
WITHIN GROUP (ORDER BY department_name DESC)
FROM (
SELECT DISTINCT department_name FROM departments
ORDER BY department_name
);

NOTE. One restriction of the listagg function is that the results of listagg are constrained to the maximum size of a VARCHAR2 datatype.

Beginning in 12c, the maximum size of a VARCHAR2 datatype was increased from 4000 to 32,767 bytes. However, it appears that this increase did not affect the maximum size of the result string for listagg, as shown in the following example. For the first script the maximum size is 4000, but for the second script we get Oracle ORA-01489 error.

SELECT LENGTH (acol)
FROM (SELECT LISTAGG (object_name) WITHIN GROUP (ORDER BY NULL) acol
FROM all_objects
WHERE ROWNUM < 308);

SELECT LENGTH (acol)
FROM (SELECT LISTAGG (object_name) WITHIN GROUP (ORDER BY NULL) acol
FROM all_objects
WHERE ROWNUM < 309);

ERROR at line 3:

ORA-01489: result of string concatenation is too long

These was the most usual Oracle analytical functions.

Bibliography:

Database Data Warehousing Guide/SQL for Analysis and Reporting

Pro Oracle SQL by Morton, K., Osborne, K., Sands, R., Shamsudeen, R., Still, J.

Leave a Reply Cancel reply