Sunday, April 2, 2017

DALiuGE

"The Data Activated 流 (Liu) Graph Engine (DALiuGE) prototype represents the execution framework of the Science Data Processor (SDP) element of the Square Kilometer Array (SKA) observatory. DALiuGE aims to provide a distributed data management platform and a scalable pipeline execution environment to support continuous, soft real-time, data-intensive processing for producing SKA science ready products.

The development of DALiuGE is largely based on SDP requirements, functions and the overall architecture. Although specifically designed for SDP and SKA, DALiuGE has adopted a generic, data-driven framework potentially applicable to many other data-intensive applications.

DALiuGE stands on shoulders of many previous studies on dataflow, data management, distributed systems (databases), graph theory, and HPC scheduling. DALiuGE has also borrowed useful ideas from existing dataflow-related open sources (mostly Python!) such as Luigi, TensorFlow, Airflow, Snakemake, etc. Nevertheless, we believe DALiuGE has some unique features well suited for data-intensive applications:
  • Completely data-driven, and data DROP is the graph “node” (no longer just the edge) that has persistent states and events
  • Integration of data-lifecycle management within the data processing framework
  • Separation between logical graphs and physical graphs
  • Docker-based pipeline component interface
In Overview we give a glimpse to the main concepts present in DALiuGE. Later sections of the documentation describe more in detail how DALiuGE works."

https://dfms.readthedocs.io/en/latest/

https://github.com/SKA-ScienceDataProcessor/dfms

DALiuGE: A Graph Execution Framework for Harnessing the Astronomical Data Deluge - https://arxiv.org/abs/1702.07617

No comments:

Post a Comment