In this post we analyze weather data across a cluster using NumPy in
parallel with dask.array. We focus on the following:
1. How to set up the distributed scheduler with a job scheduler like Sun
GridEngine
2. How to load NetCDF data from a network file system (NFS) into distributed
RAM
3. How to manipulate data with dask.arrays
4. How to interact with distributed data using IPython widget
We wanted to emulate the typical academic cluster setup using a job scheduler
like SunGridEngine (similar to SLURM, Torque, PBS scripts and other
technologies), a shared network file system, and typical binary stored arrays
in NetCDF files (similar to HDF5).
http://matthewrocklin.com/blog/work/2016/02/26/dask-distributed-part-3
No comments:
Post a Comment