Jacob Hayes - Project Lead #425
JacobHayes
started this conversation in
Intros
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hey, I'm Jacob! I've been dreaming about Artigraph for a bit over 3 years as a new framework for building data pipelines, but one that is actually aware of your data (not just the tasks) and can finally take out a lot of the boilerplate and frequently re-engineered work building every data product - checkpointing, quality tracking, partitioning, schema validation, versioning, etc.
Existing data pipelining tools describe the tasks within a pipeline, but are missing a formal concept for the actual data the pipeline produces - the thing you (and your customers) actually care about. Artigraph formalizes this missing concept as "Artifacts", from which the project is named - Artifact + Graph = Artigraph. Artifacts describe the data's schema, storage, and other metadata. Tasks in Artigraph are called "Producers" as they take specific input Artifacts and produce specific output Artifacts. Artifacts and Producers come together in a Graph, which defines (and validates) the relationships between each step and provides a handle to the generated data. With this metadata, the framework itself can now take on a lot more responsibility for building, verifying, and publishing the data.
Artigraph is inspired by my work over the years building data pipelines across biology, finance, and urban mobility. In particular, my time with some great folks at Replica and Dyno Therapeutics was highly influential.
I'll expand on this over time, but for now feel free to share any thoughts or questions you have!
Beta Was this translation helpful? Give feedback.
All reactions