Overview of File Systems¶

Let us get an overview of File Systems you can work with while learning Spark.

Here are the file systems that can be used to learn Spark.
- Local file system when you run in local mode.
- Hadoop Distributed File System.
- AWS S3
- Azure Blob
- GCP Cloud Storage
- and other supported file systems.
It is quite straight forward to learn underlying file system. You just need to focus on the following:
- Copy files into the file system from different sources.
- Validate files in the file system.
- Ability to preview the data using Spark related APIs or direct tools.
- Delete files from the file system.
Typically we ingest data into underlying file system using tools such as Informatica, Talend, NiFi, Kafka, custom applications etc.

Mastering Pyspark