Tuesday, August 16, 2016

Distributed Dask Arrays

In this post we analyze weather data across a cluster using NumPy in parallel with dask.array. We focus on the following:

1. How to set up the distributed scheduler with a job scheduler like Sun GridEngine
2. How to load NetCDF data from a network file system (NFS) into distributed RAM
3. How to manipulate data with dask.arrays
4. How to interact with distributed data using IPython widget

We wanted to emulate the typical academic cluster setup using a job scheduler like SunGridEngine (similar to SLURM, Torque, PBS scripts and other technologies), a shared network file system, and typical binary stored arrays in NetCDF files (similar to HDF5).

http://matthewrocklin.com/blog/work/2016/02/26/dask-distributed-part-3

No comments:

Post a Comment