Overview of File SystemsΒΆ
Let us get an overview of File Systems you can work with while learning Spark.
Here are the file systems that can be used to learn Spark.
Local file system when you run in local mode.
Hadoop Distributed File System.
AWS S3
Azure Blob
GCP Cloud Storage
and other supported file systems.
It is quite straight forward to learn underlying file system. You just need to focus on the following:
Copy files into the file system from different sources.
Validate files in the file system.
Ability to preview the data using Spark related APIs or direct tools.
Delete files from the file system.
Typically we ingest data into underlying file system using tools such as Informatica, Talend, NiFi, Kafka, custom applications etc.