Categories of FunctionsΒΆ
There are approximately 300 functions under pyspark.sql.functions
. At a higher level they can be grouped into a few categories.
String Manipulation Functions
Case Conversion -
lower
,upper
Getting Length -
length
Extracting substrings -
substring
,split
Trimming -
trim
,ltrim
,rtrim
Padding -
lpad
,rpad
Concatenating string -
concat
,concat_ws
Date Manipulation Functions
Getting current date and time -
current_date
,current_timestamp
Date Arithmetic -
date_add
,date_sub
,datediff
,months_between
,add_months
,next_day
Beginning and Ending Date or Time -
last_day
,trunc
,date_trunc
Formatting Date -
date_format
Extracting Information -
dayofyear
,dayofmonth
,dayofweek
,year
,month
Aggregate Functions
count
,countDistinct
sum
,avg
min
,max
Other Functions - We will explore depending on the use cases.
CASE
andWHEN
CAST
for type castingFunctions to manage special types such as
ARRAY
,MAP
,STRUCT
type columnsMany others