Share via


max_by aggregate function

Applies to: check marked yes Databricks SQL check marked yes Databricks Runtime

Returns the value of an expression associated with the largest value of a second expression in a group. With the optional third argument, returns an array of up to limit values corresponding to the largest values of the ordering expression.

Syntax

max_by(expr, ordExpr) [FILTER ( WHERE cond ) ]
max_by(expr, ordExpr, limit) [FILTER ( WHERE cond ) ]

This function can also be invoked as a window function using the OVER clause.

Arguments

  • expr: The expression whose value is to be returned. Can be any data type.
  • ordExpr: The expression used to determine the maximum value (the sorting key). Must be an orderable type.
  • limit: (Optional) An INT value representing the maximum number of values to return. Must be greater than 0 and less than or equal to 100,000. When provided, the function returns an array of up to limit values.
  • cond: An optional boolean expression filtering the rows used for aggregation.

Returns

Without limit: the result type matches the type of expr. If multiple rows have the same maximum value of ordExpr, the result is non-deterministic.

With limit: an ARRAY of values whose element type matches the type of expr. The array contains up to limit elements. Values corresponding to NULL orderings are not included. If all ordering values are NULL, the result is NULL. If multiple rows share the same largest values, the function is non-deterministic.

Note

For certain STRING collations, such as UTF8_LCASE, the result may be non-deterministic as well.

Examples

> SELECT max_by(x, y, 2) FROM VALUES (('a', 10)), (('b', 50)), (('c', 20)) AS tab(x, y);
 [b, c]

-- Return top 2 values by ordering expression
> SELECT year, max_by(course, earnings, 2) FROM data GROUP BY year;
 year: 2012, max_by(course, earnings, 2): [Java, c]

> SELECT max_by(x, y COLLATE UTF8_LCASE) FROM VALUES (('a', 'X')), (('b', 'x')), (('c', 'v')) AS tab(x, y);
 a (or b)