It has two main components:
- Representation, lazy indexing, and conversion to persistent files and NumPy arrays.
- Lazy calculation.
Representation
At the core of Biggus is theArray which provides a
simple, consistent, NumPy-esque interface to n-dimensional data which
avoids reading data until explicitly requested by user code. Commonly
these Array objects are created by wrapping "concrete" data sources such
as HDF5 variables, netCDF4 variables, or even just NumPy arrays.Once created, Array objects can be concatenated and stacked to form new Array objects, which can themselves be concatenated and stacked as required. In this way it is possible to construct virtual arrays of arbitrary size, spanning multiple data sources.
In addition, all Array objects can be indexed to extract subsets. As with the concatenation and stacking operations, this does not cause any data to be read.
User code may request any Array object be saved to a "concrete" data form (e.g. HDF5, etc. as above). The size of this operation is not limited by system memory. Alternatively, user code may explicitly request any Array object to provide the corresponding NumPy array in memory. It is the responsibility of the user code to determine if this is an appropriate action.
Calculation
Currently (Aug 2013), this is still at the proof-of-concept stage. But early results have indicated that it is quite possible for simple Python code to perform large, out-of-core calculations at a rate that meets or even exceeds other tools in common usage."https://github.com/SciTools/biggus
No comments:
Post a Comment