Merk
Tilgang til denne siden krever autorisasjon. Du kan prøve å logge på eller endre kataloger.
Tilgang til denne siden krever autorisasjon. Du kan prøve å endre kataloger.
Splits str around matches of the given pattern.
For the corresponding Databricks SQL function, see split function.
Syntax
from pyspark.databricks.sql import functions as dbf
dbf.split(str=<str>, pattern=<pattern>, limit=<limit>)
Parameters
| Parameter | Type | Description |
|---|---|---|
str |
pyspark.sql.Column or str |
a string expression to split |
pattern |
pyspark.sql.Column or literal string |
a string representing a regular expression. The regex string should be a Java regular expression. accepted as a regular expression representation, for backwards compatibility. In addition to int, limit now accepts column and column name. |
limit |
pyspark.sql.Column or str or int |
an integer which controls the number of times pattern is applied. _ limit > 0: The resulting array's length will not be more than limit, and the resulting array's last entry will contain all input beyond the last matched pattern. _ limit <= 0: pattern will be applied as many times as possible, and the resulting array can be of any size. |
Returns
pyspark.sql.Column: array of separated strings.
Examples
from pyspark.databricks.sql import functions as dbf
df = spark.createDataFrame([('oneAtwoBthreeC',)], ['s',])
df.select('*', dbf.split(df.s, '[ABC]')).show()
df.select('*', dbf.split(df.s, '[ABC]', 2)).show()
df.select('*', dbf.split('s', '[ABC]', -2)).show()
df = spark.createDataFrame([
('oneAtwoBthreeC', '[ABC]', 2),
('1A2B3C', '[1-9]+', 1),
('aa2bb3cc4', '[1-9]+', -1)], ['s', 'p', 'l'])
df.select('*', dbf.split(df.s, df.p)).show()
df.select(dbf.split('s', df.p, 'l')).show()