One Trillion Row Challenge

Inspired by Gunnar Morling's one billion row challenge, we thought we'd take things one step further and start the one trillion row challenge (1TRC).

We describe the 1TRC, dataset, and running the challenge with Dask on Coiled in this blog post.

The Challenge

Your task is to use any tool(s) you’d like to calculate the min, mean, and max temperature per weather station, sorted alphabetically. The data is stored in Parquet on S3 in the s3://coiled-datasets-rp/1trc requester-pays bucket in AWS region us-east-1. Each file is 10 million rows and there are 100,000 files. For an extra challenge, you could also generate the data yourself.

How To Participate

Open an issue in this repository with your submission and enough details for someone else to be able to run your implementation. This includes things like:

Hardware
Runtime
Reproducible code snippet

There is no prize and everyone is a winner. Really, the idea is to solicit ideas and generate discussion.

Data Generation

You can generate the dataset yourself using the data generation script, adapted from Jacob Tomlinson's 1BRC data generation script. We've also hosted the dataset in a requester pays S3 bucket s3://coiled-datasets-rp/1trc in us-east-1.

It draws a random sample of weather stations and normally distributed temperatures drawn from the mean for each station based on the values in lookup.csv.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
generate_data.py		generate_data.py
lookup.csv		lookup.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

environment.yml

environment.yml

generate_data.py

generate_data.py

lookup.csv

lookup.csv

Repository files navigation

One Trillion Row Challenge

The Challenge

How To Participate

Data Generation

About

Releases

Packages

Contributors 2

Languages

License

coiled/1trc

Folders and files

Latest commit

History

Repository files navigation

One Trillion Row Challenge

The Challenge

How To Participate

Data Generation

About

Resources

License

Stars

Watchers

Forks

Languages