Thursday, February 9, 2017

Pachyderm

"Pachyderm is a data lake that offers complete version control for data and leverages the container ecosystem to provide reproducible data processing.

PFS is Git for Data. We offer the same version control primitives you're used to using for code, but for massive data sets. You can also think of it like Time Machine for your data -- clear historical snapshots of how you data looked at different points in time.

A Pachyderm Data Repository (PDR) is the core organization primitive for your data, just like Git repos for code.

What Git does for Collaboration and Reproducibility for code, Pachyderm does for your data. Using a consistent data state is imperative to understanding your data. Data streams from different sources should generally go into different repositories.

http://pachyderm.io/

No comments:

Post a Comment