Date and Time Manipulation Functions

Let us get started with Date and Time manipulation functions. As part of this topic we will focus on the date and timestamp format.

  • We can use current_date to get today’s server date.

    • Date will be returned using yyyy-MM-dd format.

  • We can use current_timestamp to get current server time.

    • Timestamp will be returned using yyyy-MM-dd HH:mm:ss:SSS format.

    • Hours will be by default in 24 hour format.

from pyspark.sql import SparkSession

import getpass
username = getpass.getuser()

spark = SparkSession. \
    builder. \
    config('spark.ui.port', '0'). \
    config("spark.sql.warehouse.dir", f"/user/{username}/warehouse"). \
    enableHiveSupport(). \
    appName(f'{username} | Python - Processing Column Data'). \
    master('yarn'). \

If you are going to use CLIs, you can use Spark SQL using one of the 3 approaches.

Using Spark SQL

spark2-sql \
    --master yarn \
    --conf spark.ui.port=0 \
    --conf spark.sql.warehouse.dir=/user/${USER}/warehouse

Using Scala

spark2-shell \
    --master yarn \
    --conf spark.ui.port=0 \
    --conf spark.sql.warehouse.dir=/user/${USER}/warehouse

Using Pyspark

pyspark2 \
    --master yarn \
    --conf spark.ui.port=0 \
    --conf spark.sql.warehouse.dir=/user/${USER}/warehouse
l = [("X", )]
df = spark.createDataFrame(l).toDF("dummy")
|    X|
from pyspark.sql.functions import current_date, current_timestamp #yyyy-MM-dd
|    2021-02-28|
+--------------+ #yyyy-MM-dd HH:mm:ss.SSS
|current_timestamp()    |
|2021-02-28 18:34:08.548|
  • We can convert a string which contain date or timestamp in non-standard format to standard date or time using to_date or to_timestamp function respectively.

from pyspark.sql.functions import lit, to_date, to_timestamp'20210228'), 'yyyyMMdd').alias('to_date')).show()
|   to_date|
+----------+'20210228 1725'), 'yyyyMMdd HHmm').alias('to_timestamp')).show()
|       to_timestamp|
|2021-02-28 17:25:00|