full-fledged Apache Spark applications
Analyzing actions on GitHub using the SparkApplicationTemplate and wconf.
Reading CSV files using Spark core methods and writing Parquet datasets with different compression formats on different targets (local file system, S3 and Azure Blob Storage).
The application loads a structured text file and applies some business rules using Spark Core module. The result of the processing is then written to the local file system as a text file with the same structure.
Illustrates how to count the words from file downloaded from the Internet using Spark Core module. By contrast to 03 — Processing a structured Purchase Log with Spark Core, the sorting is performed in a materialized Map
instead of on an RDD.