Categories of FunctionsΒΆ
There are approximately 300 functions under pyspark.sql.functions. At a higher level they can be grouped into a few categories.
String Manipulation Functions
Case Conversion -
lower,upperGetting Length -
lengthExtracting substrings -
substring,splitTrimming -
trim,ltrim,rtrimPadding -
lpad,rpadConcatenating string -
concat,concat_ws
Date Manipulation Functions
Getting current date and time -
current_date,current_timestampDate Arithmetic -
date_add,date_sub,datediff,months_between,add_months,next_dayBeginning and Ending Date or Time -
last_day,trunc,date_truncFormatting Date -
date_formatExtracting Information -
dayofyear,dayofmonth,dayofweek,year,month
Aggregate Functions
count,countDistinctsum,avgmin,max
Other Functions - We will explore depending on the use cases.
CASEandWHENCASTfor type castingFunctions to manage special types such as
ARRAY,MAP,STRUCTtype columnsMany others