Share via


min_by aggregate function

Applies to: check marked yes Databricks SQL check marked yes Databricks Runtime

Returns the value of an expression associated with the smallest value of a second expression in a group. With the optional third argument, returns an array of up to limit values corresponding to the smallest values of the ordering expression. Semantics are the same as max_by aggregate function with opposite ordering.

Syntax

min_by(expr, ordExpr) [FILTER ( WHERE cond ) ]
min_by(expr, ordExpr, limit) [FILTER ( WHERE cond ) ]

This function can also be invoked as a window function using the OVER clause.

Arguments

  • expr: The expression whose value is to be returned. Can be any data type.
  • ordExpr: The expression used to determine the minimum value (the sorting key). Must be an orderable type.
  • limit: (Optional) An INT value representing the maximum number of values to return. Must be greater than 0 and less than or equal to 100,000. When provided, the function returns an array of up to limit values.
  • cond: An optional boolean expression filtering the rows used for aggregation.

Returns

Without limit: the result type matches the type of expr. If multiple rows have the same minimum value of ordExpr, the result is non-deterministic.

With limit: an ARRAY of values whose element type matches the type of expr. The array contains up to limit elements. Values corresponding to NULL orderings are not included. If all ordering values are NULL, the result is NULL. If multiple rows share the same smallest values, the function is non-deterministic.

Note

For certain STRING collations, such as UTF8_LCASE, the result may be non-deterministic as well.

Examples

> SELECT min_by(x, y) FROM VALUES (('a', 10)), (('b', 50)), (('c', 20)) AS tab(x, y);
 a

> SELECT min_by(x, y, 2) FROM VALUES (('a', 10)), (('b', 50)), (('c', 20)) AS tab(x, y);
 [a, c]

> SELECT min_by(x, y COLLATE UTF8_LCASE) FROM VALUES (('a', 'X')), (('b', 'x')), (('c', 'Z')) AS tab(x, y);
 a (or b)