The development of DALiuGE is largely based on SDP requirements, functions and the overall architecture. Although specifically designed for SDP and SKA, DALiuGE has adopted a generic, data-driven framework potentially applicable to many other data-intensive applications.
DALiuGE stands on shoulders of many previous studies on dataflow, data management, distributed systems (databases), graph theory, and HPC scheduling. DALiuGE has also borrowed useful ideas from existing dataflow-related open sources (mostly Python!) such as Luigi, TensorFlow, Airflow, Snakemake, etc. Nevertheless, we believe DALiuGE has some unique features well suited for data-intensive applications:
- Completely data-driven, and data DROP is the graph “node” (no longer just the edge) that has persistent states and events
- Integration of data-lifecycle management within the data processing framework
- Separation between logical graphs and physical graphs
- Docker-based pipeline component interface
https://dfms.readthedocs.io/en/latest/
https://github.com/SKA-ScienceDataProcessor/dfms
DALiuGE: A Graph Execution Framework for Harnessing the Astronomical Data Deluge - https://arxiv.org/abs/1702.07617
No comments:
Post a Comment