"bcolz provides columnar, chunked data containers that can be
compressed either in-memory and on-disk. Column storage allows for
efficiently querying tables, as well as for cheap column addition and
removal. It is based on NumPy, and uses it
as the standard data container to communicate with bcolz objects, but
it also comes with support for import/export facilities to/from
HDF5/PyTables tables and pandas
dataframes.
bcolz objects are compressed by default not only for reducing
memory/disk storage, but also to improve I/O speed. The compression
process is carried out internally by Blosc, a
high-performance, multithreaded meta-compressor that is optimized for
binary data (although it works with text data just fine too).
bcolz can also use numexpr
internally (it does that by default if it detects numexpr installed)
or dask so as to accelerate many
vector and query operations (although it can use pure NumPy for doing
so too). numexpr/dask can optimize the memory usage and use
multithreading for doing the computations, so it is blazing fast.
This, in combination with carray/ctable disk-based, compressed
containers, can be used for performing out-of-core computations
efficiently, but most importantly transparently."
https://github.com/Blosc/bcolz
No comments:
Post a Comment