We have
developed a software package, called Arkouda, which allows a user to
interactively issue massively parallel computations on distributed
data using functions and syntax that mimic NumPy, the underlying
computational library used in the vast majority of Python data science
workflows. The computational heart of Arkouda is a Chapel interpreter
that
accepts a pre-defined set of commands from a client (currently
implemented in Python) and
uses Chapel's built-in machinery for multi-locale and multithreaded
execution. Arkouda has benefited greatly from Chapel's distinctive
features and has also helped guide the development of the language.
In early applications, users of Arkouda have tended to iterate rapidly
between multi-node execution with Arkouda and single-node analysis in
Python, relying on Arkouda to filter a large dataset down to a smaller
collection suitable for analysis in Python, and then feeding the results
back into Arkouda computations on the full dataset. This paradigm has
already proved very fruitful for EDA. Our goal is to enable users to
progress seamlessly from EDA to specialized algorithms by making Arkouda
an integration point for HPC implementations of expensive kernels like
FFTs, sparse linear algebra, and graph traversal. With Arkouda serving
the role of a shell, a data scientist could explore, prepare, and call
optimized HPC libraries on massive datasets, all within the same
interactive session.
Arkouda is not trying to replace Pandas but to allow for some Pandas-style
operation at a much larger scale. In our experience Pandas can handle dataframes
up to about 500 million rows before performance becomes a real issue, this is
provided that you run on a sufficently capable compute server. Arkouda breaks
the shared memory paradigm and scales its operations to dataframes with over
200 billion rows, maybe even a trillion. In practice we have run Arkouda server
operations on columns of one trillion elements running on 512 compute nodes.
This yielded a >20TB dataframe in Arkouda.
https://github.com/Bears-R-Us/arkouda
https://arkouda.readthedocs.io/en/latest/
https://www.youtube.com/watch?v=hzLbJF-fvjQ&t=3s
https://www.youtube.com/watch?v=g-G_Z_3pgUE