"Pachyderm is a data lake that offers complete version control for data and leverages the container ecosystem to provide reproducible data processing.
PFS is Git for Data. We offer the same version control primitives you're
used to using for code, but for massive data sets. You can also think
of it like Time Machine for your data -- clear historical snapshots of
how you data looked at different points in time.
A Pachyderm Data Repository (PDR) is the core organization primitive for your data, just like Git repos for code.
What Git does for Collaboration and Reproducibility
for code, Pachyderm does for your data. Using a consistent data state
is imperative to understanding your data. Data streams from different
sources should generally go into different repositories.
http://pachyderm.io/
No comments:
Post a Comment