Categories of FunctionsΒΆ

There are approximately 300 functions under pyspark.sql.functions. At a higher level they can be grouped into a few categories.

  • String Manipulation Functions

    • Case Conversion - lower, upper

    • Getting Length - length

    • Extracting substrings - substring, split

    • Trimming - trim, ltrim, rtrim

    • Padding - lpad, rpad

    • Concatenating string - concat, concat_ws

  • Date Manipulation Functions

    • Getting current date and time - current_date, current_timestamp

    • Date Arithmetic - date_add, date_sub, datediff, months_between, add_months, next_day

    • Beginning and Ending Date or Time - last_day, trunc, date_trunc

    • Formatting Date - date_format

    • Extracting Information - dayofyear, dayofmonth, dayofweek, year, month

  • Aggregate Functions

    • count, countDistinct

    • sum, avg

    • min, max

  • Other Functions - We will explore depending on the use cases.

    • CASE and WHEN

    • CAST for type casting

    • Functions to manage special types such as ARRAY, MAP, STRUCT type columns

    • Many others