Setup Spark Locally - MacΒΆ

Let us understand how to setup Spark locally on Mac.

  • Here are the pre-requisites to setup Spark Locally on mac.

    • At least 8 GB RAM is highly desired.

    • Make sure JDK 1.8 is setup

    • Make sure to have Python 3. If you do not have it, you can install it using homebrew.

  • Here are the steps to setup Pyspark and validate.

    • Create Python Virtual Environment - python3 -m venv spark-venv.

    • Activate the virtual environment - source spark-venv/bin/activate.

    • Run pip install pyspark==2.4.6 to install Spark 2.4.6.

    • Run pyspark to launch Spark CLI using Python as programming language.

  • Here are some of the limitations related to running Spark locally.

    • You will be able to run Spark using local mode by default. But you will not be able to get the feel of Big Data.

    • Actual production implementations will be on multinode cluters, which run using YARN or Spark Stand Alone or Mesos.

    • You can understand the development process but you will not be able to explore best practices to build effective large scale data engineering solutions.