Launching Spark CLIΒΆ

Let us understand how to launch Pyspark CLI. We will be covering both local as well as our labs.

  • Once pyspark is installed you can run pyspark to launch Pyspark CLI.

  • In our labs, we have integrated Spark with Hadoop and Hive and you can interact with Hive Database as well.

  • You need to run the following command to launch Pyspark using Terminal.

export PYSPARK_PYTHON=python3
export SPARK_MAJOR_VERSION=2
pyspark --master yarn \
    --conf spark.ui.port=0 \
    --conf spark.sql.warehouse.dir=/user/${USER}/warehouse
  • Alternatively, you can also run the following command to launch Pyspark CLI.

pyspark --master yarn \
    --conf spark.ui.port=0 \
    --conf spark.sql.warehouse.dir=/user/${USER}/warehouse
  • Here is what happens when you launch Pyspark CLI.

    • Launches Python CLI.

    • All Spark related libraries will be loaded.

    • Creates SparkSession as well as SparkContext objects.

    • It facilitates us to explore Spark APIs in interactive fashion.