Launching Spark CLIΒΆ
Let us understand how to launch Pyspark CLI. We will be covering both local as well as our labs.
Once pyspark is installed you can run
pyspark
to launch Pyspark CLI.In our labs, we have integrated Spark with Hadoop and Hive and you can interact with Hive Database as well.
You need to run the following command to launch Pyspark using Terminal.
export PYSPARK_PYTHON=python3
export SPARK_MAJOR_VERSION=2
pyspark --master yarn \
--conf spark.ui.port=0 \
--conf spark.sql.warehouse.dir=/user/${USER}/warehouse
Alternatively, you can also run the following command to launch Pyspark CLI.
pyspark --master yarn \
--conf spark.ui.port=0 \
--conf spark.sql.warehouse.dir=/user/${USER}/warehouse
Here is what happens when you launch Pyspark CLI.
Launches Python CLI.
All Spark related libraries will be loaded.
Creates SparkSession as well as SparkContext objects.
It facilitates us to explore Spark APIs in interactive fashion.