Skip to content

Latest commit

 

History

History

99-spark-apps

99 — Spark Apps

full-fledged Apache Spark applications

Analyzing actions on GitHub using the SparkApplicationTemplate and wconf.

Reading CSV files using Spark core methods and writing Parquet datasets with different compression formats on different targets (local file system, S3 and Azure Blob Storage).

The application loads a structured text file and applies some business rules using Spark Core module. The result of the processing is then written to the local file system as a text file with the same structure.

Illustrates how to count the words from file downloaded from the Internet using Spark Core module. By contrast to 03 — Processing a structured Purchase Log with Spark Core, the sorting is performed in a materialized Map instead of on an RDD.