Fugue provides an easier interface to using distributed compute effectively and accelerates big data projects. It does this by minimizing the amount of code you need to write, in addition to taking care of tricks and optimizations that lead to more efficient execution.
Fugue is an abstraction framework that lets users write code in native
Python or Pandas, and then port it over to Spark and Dask. The transform
function can take in a Python or pandas function and scale it out in
Spark or Dask without having to modify the function. This provides a
very simple interface to parallelize Python and pandas code on
distributed compute engines, such as Spark and Dask. Fugue is a framework that is designed to unify the interface
between pandas, Spark, and Dask, allowing one codebase to be used across
all three engines.
https://fugue-tutorials.readthedocs.io/en/latest/index.html
https://github.com/fugue-project/fugue
https://www.kdnuggets.com/2021/10/query-pandas-dataframes-sql.html
No comments:
Post a Comment