Overview of File SystemsΒΆ

Let us get an overview of File Systems you can work with while learning Spark.

  • Here are the file systems that can be used to learn Spark.

    • Local file system when you run in local mode.

    • Hadoop Distributed File System.

    • AWS S3

    • Azure Blob

    • GCP Cloud Storage

    • and other supported file systems.

  • It is quite straight forward to learn underlying file system. You just need to focus on the following:

    • Copy files into the file system from different sources.

    • Validate files in the file system.

    • Ability to preview the data using Spark related APIs or direct tools.

    • Delete files from the file system.

  • Typically we ingest data into underlying file system using tools such as Informatica, Talend, NiFi, Kafka, custom applications etc.