Setup Spark Locally - UbuntuΒΆ
Let us understand how to setup Spark locally on Ubuntu.
Here are the pre-requisites to setup Spark Locally on Ubuntu.
At least 8 GB RAM is highly desired.
Make sure JDK 1.8 is setup
Make sure to have Python 3. If you do not have it, you can install it using apt or snap.
Here are the steps to setup Pyspark and validate.
Create Python Virtual Environment -
python3 -m venv spark-venv
.Activate the virtual environment -
source spark-venv/bin/activate
.Run
pip install pyspark==2.4.6
to install Spark 2.4.6.Run
pyspark
to launch Spark CLI using Python as programming language.
Here are some of the limitations related to running Spark locally.
You will be able to run Spark using local mode by default. But you will not be able to get the feel of Big Data.
Actual production implementations will be on multinode cluters, which run using YARN or Spark Stand Alone or Mesos.
You can understand the development process but you will not be able to explore best practices to build effective large scale data engineering solutions.