Monday, November 8, 2021

Fugue

Fugue provides an easier interface to using distributed compute effectively and accelerates big data projects. It does this by minimizing the amount of code you need to write, in addition to taking care of tricks and optimizations that lead to more efficient execution.

Fugue is an abstraction framework that lets users write code in native Python or Pandas, and then port it over to Spark and Dask. The transform function can take in a Python or pandas function and scale it out in Spark or Dask without having to modify the function. This provides a very simple interface to parallelize Python and pandas code on distributed compute engines, such as Spark and Dask.  Fugue is a framework that is designed to unify the interface between pandas, Spark, and Dask, allowing one codebase to be used across all three engines.

https://fugue-tutorials.readthedocs.io/en/latest/index.html 

https://github.com/fugue-project/fugue

https://www.kdnuggets.com/2021/10/query-pandas-dataframes-sql.html 

No comments:

Post a Comment