Deck Chairs and Fiddles: November 2021

Wednesday, November 10, 2021

anbox

Anbox is a container-based approach to boot a full Android system on a regular GNU/Linux system like Ubuntu. In other words: Anbox will let you run Android on your Linux system without the slowness of virtualization.

Anbox uses Linux namespaces (user, pid, uts, net, mount, ipc) to run a full Android system in a container and provide Android applications on any GNU/Linux-based platform.

The Android inside the container has no direct access to any hardware. All hardware access is going through the anbox daemon on the host. We're reusing what Android implemented within the QEMU-based emulator for OpenGL ES accelerated rendering. The Android system inside the container uses different pipes to communicate with the host system and sends all hardware access commands through these.

https://github.com/anbox/anbox

DisCo

Extracting actionable insight from complex unlabeled scientific data is an open challenge and key to unlocking data-driven discovery in science. Complementary and alternative to supervised machine learning approaches, unsupervised physics-based methods based on behavior-driven theories hold great promise. Due to computational limitations, practical application on real-world domain science problems has lagged far behind theoretical development. We present our first step towards bridging this divide - DisCo - a high-performance distributed workflow for the behavior-driven local causal state theory. DisCo provides a scalable unsupervised physics-based representation learning method that decomposes spatiotemporal systems into their structurally relevant components, which are captured by the latent local causal state variables. Complex spatiotemporal systems are generally highly structured and organize around a lower-dimensional skeleton of coherent structures, and in several firsts we demonstrate the efficacy of DisCo in capturing such structures from observational and simulated scientific data. To the best of our knowledge, DisCo is also the first application software developed entirely in Python to scale to over 1000 machine nodes, providing good performance along with ensuring domain scientists' productivity. We developed scalable, performant methods optimized for Intel many-core processors that will be upstreamed to open-source Python library packages. Our capstone experiment, using newly developed DisCo workflow and libraries, performs unsupervised spacetime segmentation analysis of CAM5.1 climate simulation data, processing an unprecedented 89.5 TB in 6.6 minutes end-to-end using 1024 Intel Haswell nodes on the Cori supercomputer obtaining 91% weak-scaling and 64% strong-scaling efficiency.

https://arxiv.org/abs/1909.11822

https://github.com/adamrupe/DisCo

Tuesday, November 9, 2021

Polars

Polars is a blazingly fast DataFrames library implemented in Rust using Apache Arrow(2) as memory model.

The goal of Polars is being a lightning fast DataFrame library that utilizes all available cores on your machine.

Polars is semi-lazy. It allows you to do most of your work eagerly, similar to pandas, but it does provide you with a powerful expression syntax that will be optimized executed on polars' query engine.

Polars also supports full lazy query execution that allows for more query optimization.

Polars keeps track of your query in a logical plan. This plan is optimized and reordered before running it. When a result is requested Polars distributes the available work to different executors that use the algorithms available in the eager API to come up with the result. Because the whole query context is known to the optimizer and executors of the logical plan, processes dependent on separate data sources can be parallelized on the fly.

Below a concise list of the features allowing Polars to meet its goals:

Copy-on-write (COW) semantics
- "Free" clones
- Cheap appends
Appending without clones
Column oriented data storage
- No block manager (-i.e.- predictable performance)
Missing values indicated with bitmask
- NaN are different from missing
- Bitmask optimizations
Efficient algorithms
Query optimizations
- Predicate pushdown
  - Filtering at scan level
- Projection pushdown
  - Projection at scan level
- Simplify expressions
- Parallel execution of physical plan
SIMD vectorization
NumPy universal functions

https://pola-rs.github.io/polars-book/user-guide/index.html

https://github.com/pola-rs/polars

https://www.kdnuggets.com/2021/05/pandas-faster-pypolars.html

Hub

Hub is a dataset format with a simple API for creating, storing, and collaborating on AI datasets of any size. The hub data layout enables rapid tranformations and streaming of data while training models at scale. Hub is used by Google, Waymo, Red Cross, Oxford University, and Omdena.

Hub includes the following features:

Storage agnostic API: Use the same API to upload, download, and stream datasets to/from AWS S3/S3-compatible storage, GCP, Activeloop cloud, local storage, as well as in-memory.
Compressed storage: Store images and audios in their native compression, decompressing them only when needed, for e.g., when training a model.
Lazy NumPy-like slicing: Treat your S3 or GCP datasets as if they are a collection of NumPy arrays in your system's memory. Slice them, index them, or iterate through them. Only the bytes you ask for will be downloaded!
Dataset version control: Commits, branches, checkout - Concepts you are already familiar with in your code repositories can now be applied to your datasets as well.
Third-party integrations: Hub comes with built-in integrations for Pytorch and Tensorflow. Train your model with a few lines of code - we even take care of dataset shuffling. :)
Distributed transforms: Rapidly apply transformations on your datasets using multi-threading, multi-processing, or our built-in Ray integration.

https://github.com/activeloopai/Hub

https://www.kdnuggets.com/2021/11/after-hdf5-data-storage-format-deep-learning.html

PetscSF

PetscSF, the communication component of the Portable, Extensible Toolkit for Scientific Computation (PETSc), is designed to provide PETSc's communication infrastructure suitable for exascale computers that utilize GPUs and other accelerators. PetscSF provides a simple application programming interface (API) for managing common communication patterns in scientific computations by using a star-forest graph representation. PetscSF supports several implementations based on MPI and NVSHMEM, whose selection is based on the characteristics of the application or the target architecture. An efficient and portable model for network and intra-node communication is essential for implementing large-scale applications. The Message Passing Interface, which has been the de facto standard for distributed memory systems, has developed into a large complex API that does not yet provide high performance on the emerging heterogeneous CPU-GPU-based exascale systems. In this paper, we discuss the design of PetscSF, how it can overcome some difficulties of working directly with MPI on GPUs, and we demonstrate its performance, scalability, and novel features.

https://arxiv.org/abs/2102.13018

https://www.nextplatform.com/2021/03/01/rethinking-mpi-for-gpu-accelerated-supercomputers/

Arkouda

We have developed a software package, called Arkouda, which allows a user to interactively issue massively parallel computations on distributed data using functions and syntax that mimic NumPy, the underlying computational library used in the vast majority of Python data science workflows. The computational heart of Arkouda is a Chapel interpreter that accepts a pre-defined set of commands from a client (currently implemented in Python) and uses Chapel's built-in machinery for multi-locale and multithreaded execution. Arkouda has benefited greatly from Chapel's distinctive features and has also helped guide the development of the language.

In early applications, users of Arkouda have tended to iterate rapidly between multi-node execution with Arkouda and single-node analysis in Python, relying on Arkouda to filter a large dataset down to a smaller collection suitable for analysis in Python, and then feeding the results back into Arkouda computations on the full dataset. This paradigm has already proved very fruitful for EDA. Our goal is to enable users to progress seamlessly from EDA to specialized algorithms by making Arkouda an integration point for HPC implementations of expensive kernels like FFTs, sparse linear algebra, and graph traversal. With Arkouda serving the role of a shell, a data scientist could explore, prepare, and call optimized HPC libraries on massive datasets, all within the same interactive session.

Arkouda is not trying to replace Pandas but to allow for some Pandas-style operation at a much larger scale. In our experience Pandas can handle dataframes up to about 500 million rows before performance becomes a real issue, this is provided that you run on a sufficently capable compute server. Arkouda breaks the shared memory paradigm and scales its operations to dataframes with over 200 billion rows, maybe even a trillion. In practice we have run Arkouda server operations on columns of one trillion elements running on 512 compute nodes. This yielded a >20TB dataframe in Arkouda.

https://github.com/Bears-R-Us/arkouda

https://arkouda.readthedocs.io/en/latest/

https://www.youtube.com/watch?v=hzLbJF-fvjQ&t=3s

https://www.youtube.com/watch?v=g-G_Z_3pgUE

Chapel

Chapel is a modern programming language designed for productive parallel computing at scale. Chapel's design and implementation have been undertaken with portability in mind, permitting Chapel to run on multicore desktops and laptops, commodity clusters, and the cloud, in addition to the high-end supercomputers for which it was originally undertaken.

Why Chapel? Because it simplifies parallel programming through elegant support for:

distributed arrays that can leverage thousands of nodes' memories and cores
a global namespace supporting direct access to local or remote variables
data parallelism to trivially use the cores of a laptop, cluster, or supercomputer
task parallelism to create concurrency within a node or across the system

Chapel Characteristics

productive: code tends to be similarly readable/writable as Python
scalable: runs on laptops, clusters, the cloud, and HPC systems
fast: performance competes with or beats C/C++ & MPI & OpenMP
portable: compiles and runs in virtually any *nix environment
open-source: hosted on GitHub, permissively licensed

https://github.com/chapel-lang/chapel

https://github.com/Bears-R-Us/arkouda

https://github.com/pnnl/chgl

https://github.com/marcoscleison/awesome-chapel

https://www.youtube.com/channel/UCHmm27bYjhknK5mU7ZzPGsQ

https://news.ycombinator.com/item?id=22708041

Monday, November 8, 2021

DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

DeepSpeed delivers extreme-scale model training for everyone, from data scientists training on massive supercomputers to those training on low-end clusters or even on a single GPU:

Extreme scale: Using current generation of GPU clusters with hundreds of devices, 3D parallelism of DeepSpeed can efficiently train deep learning models with trillions of parameters.
Extremely memory efficient: With just a single GPU, ZeRO-Offload of DeepSpeed can train models with over 10B parameters, 10x bigger than the state of arts, democratizing multi-billion-parameter model training such that many deep learning scientists can explore bigger and better models.
Extremely long sequence length: Sparse attention of DeepSpeed powers an order-of-magnitude longer input sequence and obtains up to 6x faster execution comparing with dense transformers.
Extremely communication efficient: 3D parallelism improves communication efficiency allows users to train multi-billion-parameter models 2–7x faster on clusters with limited network bandwidth. 1-bit Adam/1-bit LAMB reduce communication volume by up to 5x while achieving similar convergence efficiency to Adam/LAMB, allowing for scaling to different types of GPU clusters and networks.

Below we provide a brief feature list, see our detailed feature overview for descriptions and usage.

Distributed Training with Mixed Precision
- 16-bit mixed precision
- Single-GPU/Multi-GPU/Multi-Node
Model Parallelism
- Support for Custom Model Parallelism
- Integration with Megatron-LM
Pipeline Parallelism
- 3D Parallelism
The Zero Redundancy Optimizer (ZeRO)
- Optimizer State and Gradient Partitioning
- Activation Partitioning
- Constant Buffer Optimization
- Contiguous Memory Optimization
ZeRO-Offload
- Leverage both CPU/GPU memory for model training
- Support 10B model training on a single GPU
Ultra-fast dense transformer kernels
Sparse attention
- Memory- and compute-efficient sparse kernels
- Support 10x longer sequences than dense
- Flexible support to different sparse structures
1-bit Adam and 1-bit LAMB
- Custom communication collective
- Up to 5x communication volume saving
Additional Memory and Bandwidth Optimizations
- Smart Gradient Accumulation
- Communication/Computation Overlap
Training Features
- Simplified training API
- Gradient Clipping
- Automatic loss scaling with mixed precision
Training Optimizers
- Fused Adam optimizer and arbitrary torch.optim.Optimizer
- Memory bandwidth optimized FP16 Optimizer
- Large Batch Training with LAMB Optimizer
- Memory efficient Training with ZeRO Optimizer
- CPU-Adam
Training Agnostic Checkpointing
Advanced Parameter Search
- Learning Rate Range Test
- 1Cycle Learning Rate Schedule
Simplified Data Loader
Curriculum Learning
- A curriculum learning-based data pipeline that presents easier or simpler examples earlier during training
- Stable and 3.3x faster GPT-2 pre-training with 8x/4x larger batch size/learning rate while maintaining token-wise convergence speed
- Complementary to many other DeepSpeed features
Performance Analysis and Debugging
Mixture of Experts (MoE)

https://github.com/microsoft/DeepSpeed

https://www.deepspeed.ai/

cppyy

cppyy is an automatic, run-time, Python-C++ bindings generator, for calling C++ from Python and Python from C++. Run-time generation enables detailed specialization for higher performance, lazy loading for reduced memory use in large scale projects, Python-side cross-inheritance and callbacks for working with C++ frameworks, run-time template instantiation, automatic object downcasting, exception mapping, and interactive exploration of C++ libraries. cppyy delivers this without any language extensions, intermediate languages, or the need for boiler-plate hand-written code. For design and performance, see this PyHPC paper, albeit that the CPython/cppyy performance has been vastly improved since.

cppyy is based on Cling, the C++ interpreter, to match Python’s dynamism, interactivity, and run-time behavior.

cppyy is available for both CPython (v2 and v3) and PyPy, reaching C++-like performance with the latter. It makes judicious use of precompiled headers, dynamic loading, and lazy instantiation, to support C++ programs consisting of millions of lines of code and many thousands of classes. cppyy minimizes dependencies to allow its use in distributed, heterogeneous, development environments.

https://cppyy.readthedocs.io/en/latest/index.html

https://github.com/wlav/cppyy/

Glow

Glow is a machine learning compiler and execution engine for hardware accelerators. It is designed to be used as a backend for high-level machine learning frameworks. The compiler is designed to allow state of the art compiler optimizations and code generation of neural network graphs. This library is in active development. The project plan is described in the Github issues section and in the Roadmap wiki page.

Glow lowers a traditional neural network dataflow graph into a two-phase strongly-typed intermediate representation (IR). The high-level IR allows the optimizer to perform domain-specific optimizations. The lower-level instruction-based address-only IR allows the compiler to perform memory-related optimizations, such as instruction scheduling, static memory allocation and copy elimination. At the lowest level, the optimizer performs machine-specific code generation to take advantage of specialized hardware features. Glow features a lowering phase which enables the compiler to support a high number of input operators as well as a large number of hardware targets by eliminating the need to implement all operators on all targets. The lowering phase is designed to reduce the input space and allow new hardware backends to focus on a small number of linear algebra primitives. The design philosophy is described in an arXiv paper.

https://arxiv.org/abs/1805.00907

https://github.com/pytorch/glow

Neanderthal

Neanderthal is a Clojure library for fast matrix and linear algebra computations based on the highly optimized native libraries of BLAS and LAPACK computation routines for both CPU and GPU.

https://neanderthal.uncomplicate.org/

https://neanderthal.uncomplicate.org/articles/guides.html

https://dragan.rocks/articles/20/Clojure-Numpy-Cupy-CPU-GPU

https://dragan.rocks/articles/20/Clojure-Numpy-Cupy-CPU-GPU-2

https://github.com/uncomplicate/neanderthal

Taichi

Taichi (太极) is a parallel programming language for high-performance numerical computations. It is embedded in Python, and its just-in-time compiler offloads compute-intensive tasks to multi-core CPUs and massively parallel GPUs.

Advanced features of Taichi include spatially sparse computing, differentiable programming [examples], and quantized computation.

https://github.com/taichi-dev/taichi

https://docs.taichi.graphics/

cfgrib

Python interface to map GRIB files to the Unidata's Common Data Model v4 following the CF Conventions. The high level API is designed to support a GRIB engine for xarray and it is inspired by netCDF4-python and h5netcdf. Low level access and decoding is performed via the ECMWF ecCodes library and the eccodes python package.

Features with development status Beta:

enables the engine='cfgrib' option to read GRIB files with xarray,
reads most GRIB 1 and 2 files including heterogeneous ones with cfgrib.open_datasets,
supports all modern versions of Python 3.9, 3.8, 3.7 and PyPy3,
the 0.9.6.x series with support for Python 2 will stay active and receive critical bugfixes,
works wherever eccodes-python does: Linux, MacOS and Windows
conda-forge package on all supported platforms,
reads the data lazily and efficiently in terms of both memory usage and disk access,
allows larger-than-memory and distributed processing via xarray and dask,
supports translating coordinates to different data models and naming conventions,
supports writing the index of a GRIB file to disk, to save a full-file scan on open.

https://github.com/ecmwf/cfgrib

xerus

The xerus library is a general purpose library for numerical calculations with higher order tensors, Tensor-Train Decompositions / Matrix Product States and general Tensor Networks. The focus of development was the simple usability and adaptibility to any setting that requires higher order tensors or decompositions thereof.

The key features include:

Modern code and concepts incorporating many features of the C++11 standard.
Full python bindings with very similar syntax for easy transitions from and to c++.
Calculation with tensors of arbitrary orders using an intuitive Einstein-like notation A(i,j) = B(i,k,l) * C(k,j,l);.
Full implementation of the Tensor-Train decompositions (MPS) with all neccessary capabilities (including Algorithms like ALS, ADF and CG).
Lazy evaluation of (multiple) tensor contractions featuring heuristics to automatically find efficient contraction orders.
Direct integration of the blas and lapack, as high performance linear algebra backends.
Fast sparse tensor calculation by usage of the suiteSparse sparse matrix capabilities.
Capabilites to handle arbitrary Tensor Networks.

https://libxerus.org/

https://git.hemio.de/xerus/xerus

ODL

Operator Discretization Library (ODL) is a Python library that enables research in inverse problems on realistic or real data. The framework allows to encapsulate a physical model into an Operator that can be used like a mathematical object in, e.g., optimization methods. Furthermore, ODL makes it easy to experiment with reconstruction methods and optimization algorithms for variational regularization, all without sacrificing performance.

For more details and an introduction into the inner workings of ODL, please refer to the documentation. The features include:

A versatile and efficient library of optimization routines for smooth and non-smooth problems, such as CGLS, BFGS, PDHG and Douglas-Rachford splitting.
Support for tomographic imaging with a unified geometry representation and bindings to external libraries for efficient computation of projections and back-projections.
And much more, including support for deep learning libraries, figures of merits, phantom generation, data handling, etc.

https://github.com/odlgroup/odl

https://odlgroup.github.io/odl/

NeuralPDE.jl

NeuralPDE.jl is a solver package which consists of neural network solvers for partial differential equations using scientific machine learning (SciML) techniques such as physics-informed neural networks (PINNs) and deep BSDE solvers. This package utilizes deep neural networks and neural stochastic differential equations to solve high-dimensional PDEs at a greatly reduced cost and greatly increased generality compared with classical methods.

https://github.com/SciML/NeuralPDE.jl

https://neuralpde.sciml.ai/stable/

https://github.com/xiaoyuxie-vico/Awesome-ML-PDE

DeepXDE

DeepXDE is a library for scientific machine learning. Use DeepXDE if you need a deep learning library that

solves forward and inverse partial differential equations (PDEs) via physics-informed neural network (PINN),
solves forward and inverse integro-differential equations (IDEs) via PINN,
solves forward and inverse fractional partial differential equations (fPDEs) via fractional PINN (fPINN),
approximates nonlinear operators via deep operator network (DeepONet),
approximates functions from multi-fidelity data via multi-fidelity NN (MFNN),
approximates functions from a dataset with/without constraints.

DeepXDE supports three tensor libraries as backends: TensorFlow 1.x (tensorflow.compat.v1 in TensorFlow 2.x), TensorFlow 2.x, and PyTorch.

DeepXDE has implemented many algorithms as shown above and supports many features:

complex domain geometries without tyranny mesh generation. The primitive geometries are interval, triangle, rectangle, polygon, disk, cuboid, and sphere. Other geometries can be constructed as constructive solid geometry (CSG) using three boolean operations: union, difference, and intersection.
multi-physics, i.e., (time-dependent) coupled PDEs.
5 types of boundary conditions (BCs): Dirichlet, Neumann, Robin, periodic, and a general BC, which can be defined on an arbitrary domain or on a point set.
different neural networks, such as (stacked/unstacked) fully connected neural network, residual neural network, and (spatio-temporal) multi-scale fourier feature networks.
6 sampling methods: uniform, pseudorandom, Latin hypercube sampling, Halton sequence, Hammersley sequence, and Sobol sequence. The training points can keep the same during training or be resampled every certain iterations.
conveniently save the model during training, and load a trained model.
uncertainty quantification using dropout.
many different (weighted) losses, optimizers, learning rate schedules, metrics, etc.
callbacks to monitor the internal states and statistics of the model during training, such as early stopping.
enables the user code to be compact, resembling closely the mathematical formulation.

All the components of DeepXDE are loosely coupled, and thus DeepXDE is well-structured and highly configurable. It is easy to customize DeepXDE to meet new demands.

https://github.com/lululxvi/deepxde

https://epubs.siam.org/doi/10.1137/19M1274067

https://deepxde.readthedocs.io/en/latest/index.html

https://github.com/xiaoyuxie-vico/Awesome-ML-PDE

nbdev

A library that allows you to develop a python library in Jupyter Notebooks, putting all your code, tests and documentation in one place. It makes debugging and refactoring your code much easier relative to traditional programming environments. Furthermore, using nbdev promotes software engineering best practices because tests and documentation are first class citizens.

nbdev provides the following tools for developers:

Automatically generate docs from Jupyter notebooks. These docs are searchable and automatically hyperlinked to appropriate documentation pages by introspecting keywords you surround in backticks.
Utilities to automate the publishing of pypi and conda packages including version number management.
A robust, two-way sync between notebooks and source code, which allow you to use your IDE for code navigation or quick edits if desired.
Fine-grained control on hiding/showing cells: you can choose to hide entire cells, just the output, or just the input. Furthermore, you can embed cells in collapsible elements that are open or closed by default.
Ability to write tests directly in notebooks without having to learn special APIs. These tests get executed in parallel with a single CLI command. You can even define certain groups of tests such that you don't have to always run long-running tests.
Tools for merge/conflict resolution with notebooks in a human readable format.
Continuous integration (CI) comes setup for you with GitHub Actions out of the box, that will run tests automatically for you. Even if you are not familiar with CI or GitHub Actions, this starts working right away for you without any manual intervention.
Integration With GitHub Pages for docs hosting: nbdev allows you to easily host your documentation for free, using GitHub pages.
Create Python modules, following best practices such as automatically defining __all__ (more details) with your exported functions, classes, and variables.
Math equation support with LaTeX.
... and much more! See the Getting Started section below for more information.

https://github.com/fastai/nbdev

https://github.blog/2020-11-20-nbdev-a-literate-programming-environment-that-democratizes-software-engineering-best-practices/

Juttle

Juttle is an analytics system and language for developers built upon a stream-processing core and targeted for presentation-layer scale. Juttle gives you an agile way to query, analyze, and visualize live and historical data from many different big data backends or other web services. Using the Juttle dataflow language, you can specify your presentation analytics in a single place with a syntax modeled after the classic unix shell pipeline. There is no need to program against data query and visualization libraries. Juttle scripts, or juttles, tie everything together and abstract away the details.

While the Juttle syntax embraces the simplicity of the unix pipeline design pattern, it also includes a number of more powerful language concepts including functions, dataflow subgraph notation, native expressions, modules, scoping, and special aggregation functions called reducers. The details of the language and the dataflow model are desribed in the Juttle Language Reference.

In Juttle, you read data from a backend service, analyze it using dataflow processors, and send derived data or synthesized events to some output, e.g., streaming results to a browser view, writing data to a storage backend, posting http events to slack, hipchat, pagerduty, etc.

Juttle presently includes a number of adapters for various big-data backends and we are continually adding more. This means that you can interoperate with your existing infrastructure, whether it is a cassandra cluster, elastic search, a SQL database, something from the hadoop ecosystem, and so forth. If a particular backend is not yet supported, an adapter for it can be added in a relatively straightforward manner using Juttle's backend adapter API.

Under the hood, the Juttle compiler generates JavaScript output that implements the Juttle dataflow computation by executing alongside the Juttle runtime, either in node or the browser. The Juttle optimizer figures out the pattern of queries needed to run on the various big-data backends to perform the analytics specified in your Juttle programs.

https://juttle.github.io/juttle/

https://github.com/juttle/juttle

http://juttle.github.io/

Fugue

Fugue provides an easier interface to using distributed compute effectively and accelerates big data projects. It does this by minimizing the amount of code you need to write, in addition to taking care of tricks and optimizations that lead to more efficient execution.

Fugue is an abstraction framework that lets users write code in native Python or Pandas, and then port it over to Spark and Dask. The transform function can take in a Python or pandas function and scale it out in Spark or Dask without having to modify the function. This provides a very simple interface to parallelize Python and pandas code on distributed compute engines, such as Spark and Dask. Fugue is a framework that is designed to unify the interface between pandas, Spark, and Dask, allowing one codebase to be used across all three engines.

https://fugue-tutorials.readthedocs.io/en/latest/index.html

https://github.com/fugue-project/fugue

https://www.kdnuggets.com/2021/10/query-pandas-dataframes-sql.html

XGBoost

XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable. It implements machine learning algorithms under the Gradient Boosting framework. XGBoost provides a parallel tree boosting (also known as GBDT, GBM) that solve many data science problems in a fast and accurate way. The same code runs on major distributed environment (Kubernetes, Hadoop, SGE, MPI, Dask) and can solve problems beyond billions of examples.

https://github.com/dmlc/xgboost

https://xgboost.readthedocs.io/en/latest/

https://xgboost.ai/

https://towardsdatascience.com/https-medium-com-vishalmorde-xgboost-algorithm-long-she-may-rein-edd9f99be63d

statsmodels

statsmodels is a Python module that provides classes and functions for the estimation of many different statistical models, as well as for conducting statistical tests, and statistical data exploration. An extensive list of result statistics are available for each estimator. The results are tested against existing statistical packages to ensure that they are correct.

The features include:

Linear regression models:
- Ordinary least squares
- Generalized least squares
- Weighted least squares
- Least squares with autoregressive errors
- Quantile regression
- Recursive least squares
Mixed Linear Model with mixed effects and variance components
GLM: Generalized linear models with support for all of the one-parameter exponential family distributions
Bayesian Mixed GLM for Binomial and Poisson
GEE: Generalized Estimating Equations for one-way clustered or longitudinal data
Discrete models:
- Logit and Probit
- Multinomial logit (MNLogit)
- Poisson and Generalized Poisson regression
- Negative Binomial regression
- Zero-Inflated Count models
RLM: Robust linear models with support for several M-estimators.
Time Series Analysis: models for time series analysis
- Complete StateSpace modeling framework
  - Seasonal ARIMA and ARIMAX models
  - VARMA and VARMAX models
  - Dynamic Factor models
  - Unobserved Component models
- Markov switching models (MSAR), also known as Hidden Markov Models (HMM)
- Univariate time series analysis: AR, ARIMA
- Vector autoregressive models, VAR and structural VAR
- Vector error correction model, VECM
- exponential smoothing, Holt-Winters
- Hypothesis tests for time series: unit root, cointegration and others
- Descriptive statistics and process models for time series analysis
Survival analysis:
- Proportional hazards regression (Cox models)
- Survivor function estimation (Kaplan-Meier)
- Cumulative incidence function estimation
Multivariate:
- Principal Component Analysis with missing data
- Factor Analysis with rotation
- MANOVA
- Canonical Correlation
Nonparametric statistics: Univariate and multivariate kernel density estimators
Datasets: Datasets used for examples and in testing
Statistics: a wide range of statistical tests
- diagnostics and specification tests
- goodness-of-fit and normality tests
- functions for multiple testing
- various additional statistical tests
Imputation with MICE, regression on order statistic and Gaussian imputation
Mediation analysis
Graphics includes plot functions for visual analysis of data and model results
I/O
- Tools for reading Stata .dta files, but pandas has a more recent version
- Table output to ascii, latex, and html
Miscellaneous models

https://github.com/statsmodels/statsmodels

https://www.statsmodels.org/stable/index.html

OpenCV

OpenCV (Open Source Computer Vision Library: http://opencv.org) is an open-source BSD-licensed library that includes several hundreds of computer vision algorithms. The document describes the so-called OpenCV 2.x API, which is essentially a C++ API, as opposed to the C-based OpenCV 1.x API (C API is deprecated and not tested with "C" compiler since OpenCV 2.4 releases)

OpenCV has a modular structure, which means that the package includes several shared or static libraries. The following modules are available:

Core functionality (core) - a compact module defining basic data structures, including the dense multi-dimensional array Mat and basic functions used by all other modules.
Image Processing (imgproc) - an image processing module that includes linear and non-linear image filtering, geometrical image transformations (resize, affine and perspective warping, generic table-based remapping), color space conversion, histograms, and so on.
Video Analysis (video) - a video analysis module that includes motion estimation, background subtraction, and object tracking algorithms.
Camera Calibration and 3D Reconstruction (calib3d) - basic multiple-view geometry algorithms, single and stereo camera calibration, object pose estimation, stereo correspondence algorithms, and elements of 3D reconstruction.
2D Features Framework (features2d) - salient feature detectors, descriptors, and descriptor matchers.
Object Detection (objdetect) - detection of objects and instances of the predefined classes (for example, faces, eyes, mugs, people, cars, and so on).
High-level GUI (highgui) - an easy-to-use interface to simple UI capabilities.
Video I/O (videoio) - an easy-to-use interface to video capturing and video codecs.
... some other helper modules, such as FLANN and Google test wrappers, Python bindings, and others.

https://docs.opencv.org/3.4/d1/dfb/intro.html

https://opencv.org/

FairScale

FairScale is a PyTorch extension library for high performance and large scale training. This library extends basic PyTorch capabilities while adding new SOTA scaling techniques. FairScale makes available the latest distributed training techniques in the form of composable modules and easy to use APIs. These APIs are a fundamental part of a researcher’s toolbox as they attempt to scale models with limited resources.

ML training at scale traditionally means data parallelism which allows us to use multiple devices at the same time to train a large batch size per step thereby achieving the goal accuracy in a shorter period of time as compared to training on a single device. With recent advances in ML research, the size of ML models has only increased over the years and data parallelism no longer serves all “scaling” purposes.

There are multiple axes across which you can scale training and FairScale provides the following broad categories of solutions:

Parallelism → These techniques allow scaling of models by layer parallelism and tensor parallelism.
Sharding Methods → Memory and computation are usually trade-offs and in this category we attempt to achieve both low memory utilization and efficient computation by sharding model layers or parameters, optimizer state and gradients.
Optimization → This bucket deals with optimizing memory usage irrespective of the scale of the model, training without hyperparameter tuning and all other techniques that attempt to optimize training performance in some way.

https://fairscale.readthedocs.io/en/latest/
https://github.com/facebookresearch/fairscale

VISSL

VISSL is a computer VIsion library for state-of-the-art Self-Supervised Learning research with PyTorch. VISSL aims to accelerate research cycle in self-supervised learning: from designing a new self-supervised task to evaluating the learned representations. Key features include:

Reproducible implementation of SOTA in Self-Supervision: All existing SOTA in Self-Supervision are implemented - SwAV, SimCLR, MoCo(v2), PIRL, NPID, NPID++, DeepClusterV2, ClusterFit, RotNet, Jigsaw. Also supports supervised trainings.
Benchmark suite: Variety of benchmarks tasks including linear image classification (places205, imagenet1k, voc07, food, CLEVR, dsprites, UCF101, stanford cars and many more), full finetuning, semi-supervised benchmark, nearest neighbor benchmark, object detection (Pascal VOC and COCO).
Ease of Usability: easy to use using yaml configuration system based on Hydra.
Modular: Easy to design new tasks and reuse the existing components from other tasks (objective functions, model trunk and heads, data transforms, etc.). The modular components are simple drop-in replacements in yaml config files.
Scalability: Easy to train model on 1-gpu, multi-gpu and multi-node. Several components for large scale trainings provided as simple config file plugs: Activation checkpointing, ZeRO, FP16, LARC, Stateful data sampler, data class to handle invalid images, large model backbones like RegNets, etc.
Model Zoo: Over 60 pre-trained self-supervised model weights.
https://github.com/facebookresearch/vissl

Detectron2

Detectron2 is Facebook AI Research's next generation library that provides state-of-the-art detection and segmentation algorithms. It is the successor of Detectron and maskrcnn-benchmark. It supports a number of computer vision research projects and production applications in Facebook.

Detectron2 is a ground-up rewrite of Detectron that started with maskrcnn-benchmark. The platform is now implemented in PyTorch. With a new, more modular design, Detectron2 is flexible and extensible, and able to provide fast training on single or multiple GPU servers. Detectron2 includes high-quality implementations of state-of-the-art object detection algorithms, including DensePose, panoptic feature pyramid networks, and numerous variants of the pioneering Mask R-CNN model family also developed by FAIR. Its extensible design makes it easy to implement cutting-edge research projects without having to fork the entire codebase.

https://ai.facebook.com/blog/-detectron2-a-pytorch-based-modular-object-detection-library-/

https://github.com/facebookresearch/detectron2

newspaper

Newspaper is a Python 3 module for article scraping and curation.

Newspaper can extract and detect languages seamlessly. If no language is specified, Newspaper will attempt to auto detect a language.

The features include:

Multi-threaded article download framework
News url identification
Text extraction from html
Top image extraction from html
All image extraction from html
Keyword extraction from text
Summary extraction from text
Author extraction from text
Google trending terms extraction
Works in 10+ languages (English, Chinese, German, Arabic, ...)

https://github.com/codelucas/newspaper

https://www.kdnuggets.com/2021/10/simple-text-scraping-parsing-processing-python-library.html

PyCaret

PyCaret is an open-source, low-code machine learning library in Python that automates machine learning workflows. It is an end-to-end machine learning and model management tool that speeds up the experiment cycle exponentially and makes you more productive.

In comparison with the other open-source machine learning libraries, PyCaret is an alternate low-code library that can be used to replace hundreds of lines of code with few lines only. This makes experiments exponentially fast and efficient. PyCaret is essentially a Python wrapper around several machine learning libraries and frameworks such as scikit-learn, XGBoost, LightGBM, CatBoost, spaCy, Optuna, Hyperopt, Ray, and few more.

The design and simplicity of PyCaret are inspired by the emerging role of citizen data scientists, a term first used by Gartner. Citizen Data Scientists are power users who can perform both simple and moderately sophisticated analytical tasks that would previously have required more technical expertise.

https://github.com/pycaret/pycaret

https://www.kdnuggets.com/2021/04/multiple-time-series-forecasting-pycaret.html

fastai

fastai is a deep learning library which provides practitioners with high-level components that can quickly and easily provide state-of-the-art results in standard deep learning domains, and provides researchers with low-level components that can be mixed and matched to build new approaches. It aims to do both things without substantial compromises in ease of use, flexibility, or performance. This is possible thanks to a carefully layered architecture, which expresses common underlying patterns of many deep learning and data processing techniques in terms of decoupled abstractions. These abstractions can be expressed concisely and clearly by leveraging the dynamism of the underlying Python language and the flexibility of the PyTorch library. fastai includes:

A new type dispatch system for Python along with a semantic type hierarchy for tensors
A GPU-optimized computer vision library which can be extended in pure Python
An optimizer which refactors out the common functionality of modern optimizers into two basic pieces, allowing optimization algorithms to be implemented in 4–5 lines of code
A novel 2-way callback system that can access any part of the data, model, or optimizer and change it at any point during training
A new data block API
And much more...

fastai is organized around two main design goals: to be approachable and rapidly productive, while also being deeply hackable and configurable. It is built on top of a hierarchy of lower-level APIs which provide composable building blocks. This way, a user wanting to rewrite part of the high-level API or add particular behavior to suit their needs does not have to learn how to use the lowest level.

https://github.com/fastai/fastai

https://www.fast.ai/

https://en.wikipedia.org/wiki/Fast.ai

PyTorch Lightning

The lightweight PyTorch wrapper for high-performance AI research.

Lightning forces the following structure to your code which makes it reusable and shareable:

Research code (the LightningModule).
Engineering code (you delete, and is handled by the Trainer).
Non-essential research code (logging, etc... this goes in Callbacks).
Data (use PyTorch DataLoaders or organize them into a LightningDataModule).

Once you do this, you can train on multiple-GPUs, TPUs, CPUs and even in 16-bit precision without changing your code.

Lightning has over 40+ advanced features designed for professional AI research at scale.

Advantages over unstructured PyTorch

Models become hardware agnostic
Code is clear to read because engineering code is abstracted away
Easier to reproduce
Make fewer mistakes because lightning handles the tricky engineering
Keeps all the flexibility (LightningModules are still PyTorch modules), but removes a ton of boilerplate
Lightning has dozens of integrations with popular machine learning tools.
Tested rigorously with every new PR. We test every combination of PyTorch and Python supported versions, every OS, multi GPUs and even TPUs.
Minimal running speed overhead (about 300 ms per epoch compared with pure PyTorch).

https://github.com/PyTorchLightning/pytorch-lightning

https://www.exxactcorp.com/blog/Deep-Learning/getting-started-with-pytorch-lightning

Lightning Flash

Lightning Flash offers a suite of functionality facilitating more efficient transfer learning and data handling, and a recipe book of state-of-the-art approaches to typical deep learning problems.

Like a set of Russian nesting dolls of deep learning abstraction libraries, Lightning Flash adds further abstractions and simplification on top of PyTorch Lightning. In fact we can train an image classification task in only 7 lines. We’ll use the CIFAR10 dataset and a classification model based on the ResNet18 backbone built into Lightning Flash. Then we’ll show how the model backbone can be repurposed for classifying a new dataset, CIFAR100,

While Lightning Flash is very much still under active development and has plenty of sharp edges, you can already put together certain workflows with very little code, and there’s even a “no-code” capability they call Flash Zero. For our purposes, we can put together a transfer learning workflow with less than 20 lines.

https://www.kdnuggets.com/2021/11/advanced-pytorch-lightning-torchmetrics-lightning-flash.html

https://github.com/PyTorchLightning/lightning-flash

Torchmetrics

TorchMetrics is a collection of 50+ PyTorch metrics implementations and an easy-to-use API to create custom metrics. It offers:

A standardized interface to increase reproducibility
Reduces boilerplate
Automatic accumulation over batches
Metrics optimized for distributed-training
Automatic synchronization between multiple devices

You can use TorchMetrics with any PyTorch model or with PyTorch Lightning to enjoy additional features such as:

Module metrics are automatically placed on the correct device.
Native support for logging metrics in Lightning to reduce even more boilerplate.

https://github.com/PyTorchLightning/metrics

https://torchmetrics.readthedocs.io/en/latest/

Netron

Netron is a viewer for neural network, deep learning and machine learning models.

Netron supports ONNX, TensorFlow Lite, Caffe, Keras, Darknet, PaddlePaddle, ncnn, MNN, Core ML, RKNN, MXNet, MindSpore Lite, TNN, Barracuda, Tengine, CNTK, TensorFlow.js, Caffe2 and UFF.

Netron has experimental support for PyTorch, TensorFlow, TorchScript, OpenVINO, Torch, Vitis AI, Arm NN, BigDL, Chainer, Deeplearning4j, MediaPipe, ML.NET and scikit-learn.

https://github.com/lutzroeder/netron

Streamlit

Streamlit is an open-source Python library that makes it easy to create and share beautiful, custom web apps for machine learning and data science. In just a few minutes you can build and deploy powerful data apps.

https://docs.streamlit.io/

https://github.com/streamlit/streamlit

https://streamlit.io/

https://towardsdatascience.com/coding-ml-tools-like-you-code-ml-models-ddba3357eace

https://www.kdnuggets.com/2021/09/create-stunning-web-apps-data-science-projects.html

Darknet

Darknet is an open source neural network framework written in C and CUDA. It is fast, easy to install, and supports CPU and GPU computation.

There are a huge number of features which are said to improve Convolutional Neural Network (CNN) accuracy. Practical testing of combinations of such features on large datasets, and theoretical justification of the result, is required. Some features operate on certain models exclusively and for certain problems exclusively, or only for small-scale datasets; while some features, such as batch-normalization and residual-connections, are applicable to the majority of models, tasks, and datasets. We assume that such universal features include Weighted-Residual-Connections (WRC), Cross-Stage-Partial-connections (CSP), Cross mini-Batch Normalization (CmBN), Self-adversarial-training (SAT) and Mish-activation. We use new features: WRC, CSP, CmBN, SAT, Mish activation, Mosaic data augmentation, CmBN, DropBlock regularization, and CIoU loss, and combine some of them to achieve state-of-the-art results.

https://pjreddie.com/darknet/

https://github.com/pjreddie/darknet

https://github.com/AlexeyAB/darknet

https://arxiv.org/abs/2004.10934

https://jonathan-hui.medium.com/yolov4-c9901eaa8e61

https://alexeyab84.medium.com/yolov4-the-most-accurate-real-time-neural-network-on-ms-coco-dataset-73adfd3602fe

PyTorch-YOLOv4

This is PyTorch implementation of YOLOv4 which is based on ultralytics/yolov3.

https://github.com/WongKinYiu/PyTorch_YOLOv4

YOLOR

People ``understand'' the world via vision, hearing, tactile, and also the past experience. Human experience can be learned through normal learning (we call it explicit knowledge), or subconsciously (we call it implicit knowledge). These experiences learned through normal learning or subconsciously will be encoded and stored in the brain. Using these abundant experience as a huge database, human beings can effectively process data, even they were unseen beforehand. In this paper, we propose a unified network to encode implicit knowledge and explicit knowledge together, just like the human brain can learn knowledge from normal learning as well as subconsciousness learning. The unified network can generate a unified representation to simultaneously serve various tasks. We can perform kernel space alignment, prediction refinement, and multi-task learning in a convolutional neural network. The results demonstrate that when implicit knowledge is introduced into the neural network, it benefits the performance of all tasks. We further analyze the implicit representation learnt from the proposed unified network, and it shows great capability on catching the physical meaning of different tasks.

https://github.com/WongKinYiu/yolor

Sunday, November 7, 2021

OSv

OSv is an open-source versatile modular unikernel designed to run single unmodified Linux application securely as microVM on top of a hypervisor, when compared to traditional operating systems which were designed for a vast range of physical machines. Built from the ground up for effortless deployment and management of microservices and serverless apps, with superior performance.

OSv has been designed to run unmodified x86-64 and AArch64 Linux binaries as is, which effectively makes it a Linux binary compatible unikernel (for more details about Linux ABI compatibility please read this doc). In particular OSv can run many managed language runtimes including JVM, Python, Node.JS, Ruby, Erlang, and applications built on top of those runtimes. It can also run applications written in languages compiling directly to native machine code like C, C++, Golang and Rust as well as native images produced by GraalVM and WebAssembly/Wasmer.

OSv can boot as fast as ~5 ms on Firecracker using as low as 15 MB of memory. OSv can run on many hypervisors including QEMU/KVM, Firecracker, Cloud Hypervisor, Xen, VMWare, VirtualBox and Hyperkit as well as open clouds like AWS EC2, GCE and OpenStack.

https://github.com/cloudius-systems/osv

StaticFrame

A library of immutable and grow-only Pandas-like DataFrames with a more explicit and consistent interface. StaticFrame is suitable for applications in data science, data engineering, finance, scientific computing, and related fields where reducing opportunities for error by prohibiting in-place mutation is critical.

While many interfaces are similar to Pandas, StaticFrame deviates from Pandas in many ways: all data is immutable, and all indices are unique; the full range of NumPy data types is preserved, and date-time indices use discrete NumPy types; hierarchical indices are seamlessly integrated; and uniform approaches to element, row, and column iteration and function application are provided. Core StaticFrame depends only on NumPy and two C-extension packages (maintained by the StaticFrame team): Pandas is not a dependency.

A wide variety of table storage and representation formats are supported, including input from and output to CSV, TSV, JSON, MessagePack, Excel XLSX, SQLite, HDF5, NumPy, Pandas, Arrow, and Parquet; additionally, output to xarray, VisiData, HTML, RST, Markdown, and LaTeX is supported, as well as HTML representations in Jupyter notebooks.

StaticFrame features a family of multi-table containers: the Bus is a lazily-loaded container of tables, the Batch is a deferred processor of tables, the Yarn is virtual concatenation of many Buses, and the Quilt is a virtual concatenation of all tables within a single Bus or Yarn. All permit operating on large collections of tables with minimal memory overhead, as well as writing too and reading from zipped bundles of pickles, Parquet, or delimited files, as well as XLSX workbooks, SQLite, and HDF5.

https://static-frame.readthedocs.io/en/latest/

https://dev.to/flexatone/ten-reasons-to-use-staticframe-instead-of-pandas-4aad

https://github.com/InvestmentSystems/static-frame

CliMetLab

CliMetLab is a Python package which is intended to be used in Jupyter notebooks. Its main goal is to greatly reduce boilerplate code by providing high-level unified access to meteorological and climate datasets, allowing scientists to focus on their research instead of solving technical issues. Datasets are automatically downloaded, cached and transform into standard Python data structures such as NumPy, Pandas or Xarray, that can then be fed into scientific packages like SciPy and TensorFlow. CliMetLab also aims at simplifying plotting of 2D maps, by automatically selecting the most appropriate styles and projections for any given data.

The goal of CliMetLab is to simplify access to climate and meteorological datasets, by hiding the access methods and data formats.

CliMetLab also provides very high-level map plotting facilities. By default CliMetLab will automatically select the most appropriate way to plot a dataset, choosing the best projection, colours and other graphical attributes. Users can then control how maps are drawn by overriding the automatic choices with their own.

https://climetlab.readthedocs.io/en/latest/

https://github.com/ecmwf/climetlab

https://www.youtube.com/watch?v=gY-vzNHtYsg

STUMPY

STUMPY is a powerful and scalable library that efficiently computes something called the matrix profile, which can be used for a variety of time series data mining tasks such as:

pattern/motif (approximately repeated subsequences within a longer time series) discovery
anomaly/novelty (discord) discovery
shapelet discovery
semantic segmentation
streaming (on-line) data
fast approximate matrix profiles
time series chains (temporally ordered set of subsequence patterns)
snippets for summarizing long time series
pan matrix profiles for selecting the best subsequence window size(s)

https://stumpy.readthedocs.io/en/latest/index.html
https://github.com/TDAmeritrade/stumpy
https://towardsdatascience.com/stumpy-basics-21844a2d2d92

PSBLAS

The PSBLAS library, developed with the aim to facilitate the parallelization of computationally intensive scientific applications, is designed to address parallel implementation of iterative solvers for sparse linear systems through the distributed memory paradigm. It includes routines for multiplying sparse matrices by dense matrices, solving block diagonal systems with triangular diagonal entries, preprocessing sparse matrices, and contains additional routines for dense matrix operations. The current implementation of PSBLAS addresses a distributed memory execution model operating with message passing.

The PSBLAS library version 3 is implemented in the Fortran 2003 programming language, with reuse and/or adaptation of existing Fortran 77 and Fortran 95 software, plus a handful of C routines.

https://psctoolkit.github.io/products/psblas/

https://psctoolkit.github.io/

https://github.com/sfilippone/psblas3

librsb

librsb is a library for sparse matrix computations featuring the Recursive Sparse Blocks (RSB) matrix format. This format allows cache efficient and multi-threaded (that is, shared memory parallel) operations on large sparse matrices. The most common operations necessary to iterative solvers are available, e.g.: matrix-vector multiplication, triangular solution, rows/columns scaling, diagonal extraction / setting, blocks extraction, norm computation, formats conversion. The RSB format is especially well suited for symmetric and transposed multiplication variants. Most numerical kernels code is auto generated, and the supported numerical types can be chosen by the user at build time. librsb can also be built serially (without OpenMP parallelism), if required. librsb also implements the Sparse BLAS standard, as specified in the [BLAS Technical Forum] documents.

This library is dual-interfaced; it supports: a native (`RSB') interface (with identifiers prefixed by `rsb_' or `RSB_'), and a (mostly complete) Sparse BLAS interface, as a wrapper around the RSB interface. Many computationally intensive operations are implemented with thread parallelism, by using OpenMP. Thread parallelism can be turned off at configure time, if desired, or limited at execution time. Many of the computational kernels source code files (mostly internals) were automatically generated.

http://librsb.sourceforge.net/

https://www.youtube.com/watch?v=yHejtO1qNEU&list=PLYx7XA2nY5GesARqNMImG3NnX3_bWq-lT&index=27

PyRSB is a Cython-based Python interface to librsb.

https://github.com/michelemartone/pyrsb

Pythran

Pythran is an ahead of time compiler for a subset of the Python language, with a focus on scientific computing. It takes a Python module annotated with a few interface descriptions and turns it into a native Python module with the same interface, but (hopefully) faster.

It is meant to efficiently compile scientific programs, and takes advantage of multi-cores and SIMD instruction units.

Pythran is a Python-to-c++ translator that turns Python modules into native c++11 modules. Pythran is not a full Python-to-c++ converter, as is shedskin. Instead it takes a subset of the Python language and turns it into heavily templatized c++ code instantiated for your particular types.

https://pythran.readthedocs.io/en/latest/

https://www.youtube.com/watch?v=6a9D9WL6ZjQ

Requests

Requests is an elegant and simple HTTP library for Python, built for human beings.

Requests allows you to send HTTP/1.1 requests extremely easily. There’s no need to manually add query strings to your URLs, or to form-encode your POST data. Keep-alive and HTTP connection pooling are 100% automatic, thanks to urllib3.

The features include:

Keep-Alive & Connection Pooling
International Domains and URLs
Sessions with Cookie Persistence
Browser-style SSL Verification
Automatic Content Decoding
Basic/Digest Authentication
Elegant Key/Value Cookies
Automatic Decompression
Unicode Response Bodies
HTTP(S) Proxy Support
Multipart File Uploads
Streaming Downloads
Connection Timeouts
Chunked Requests
.netrc Support
https://docs.python-requests.org/en/latest/
https://github.com/psf/requests

VisiData

A terminal interface for exploring and arranging tabular data. VisiData supports tsv, csv, sqlite, json, xlsx (Excel), hdf5, and many other formats.

https://github.com/saulpw/visidata

meshio

There are various mesh formats available for representing unstructured meshes. meshio can read and write all of the following and smoothly converts between them:

Abaqus (.inp), ANSYS msh (.msh), AVS-UCD (.avs), CGNS (.cgns), DOLFIN XML (.xml), Exodus (.e, .exo), FLAC3D (.f3grid), H5M (.h5m), Kratos/MDPA (.mdpa), Medit (.mesh, .meshb), MED/Salome (.med), Nastran (bulk data, .bdf, .fem, .nas), Netgen (.vol, .vol.gz), Neuroglancer precomputed format, Gmsh (format versions 2.2, 4.0, and 4.1, .msh), OBJ (.obj), OFF (.off), PERMAS (.post, .post.gz, .dato, .dato.gz), PLY (.ply), STL (.stl), Tecplot .dat, TetGen .node/.ele, SVG (2D output only) (.svg), SU2 (.su2), UGRID (.ugrid), VTK (.vtk), VTU (.vtu), WKT (TIN) (.wkt), XDMF (.xdmf, .xmf).

https://github.com/nschloe/meshio

primitive

Reproducing images with geometric primitives.

A target image is provided as input. The algorithm tries to find the single most optimal shape that can be drawn to minimize the error between the target image and the drawn image. It repeats this process, adding one shape at a time. Around 50 to 200 shapes are needed to reach a result that is recognizable yet artistic and abstract.

Features

Hill Climbing or Simulated Annealing for optimization (hill climbing multiple random shapes is nearly as good as annealing and faster)
Scanline rasterization of shapes in pure Go (preferable for implementing the features below)
Optimal color computation based on affected pixels for each shape (color is directly computed, not optimized for)
Partial image difference for faster scoring (only pixels that change need be considered)
Anti-aliased output rendering

https://github.com/fogleman/primitive

Arrow

Arrow is a Python library that offers a sensible and human-friendly approach to creating, manipulating, formatting and converting dates, times and timestamps. It implements and updates the datetime type, plugging gaps in functionality and providing an intelligent module API that supports many common creation scenarios. Simply put, it helps you work with dates and times with fewer imports and a lot less code.

Features

Fully-implemented, drop-in replacement for datetime
Support for Python 3.6+
Timezone-aware and UTC by default
Super-simple creation options for many common input scenarios
shift method with support for relative offsets, including weeks
Format and parse strings automatically
Wide support for the ISO 8601 standard
Timezone conversion
Support for dateutil, pytz, and ZoneInfo tzinfo objects
Generates time spans, ranges, floors and ceilings for time frames ranging from microsecond to year
Humanize dates and times with a growing list of contributed locales
Extensible for your own Arrow-derived types
Full support for PEP 484-style type hints

https://github.com/arrow-py/arrow

scikit-learn

Scikit-learn (formerly scikits.learn and also known as sklearn) is a free software machine learning library for the Python programming language. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy.

Scikit-learn is largely written in Python, and uses NumPy extensively for high-performance linear algebra and array operations. Furthermore, some core algorithms are written in Cython to improve performance. Support vector machines are implemented by a Cython wrapper around LIBSVM; logistic regression and linear support vector machines by a similar wrapper around LIBLINEAR. In such cases, extending these methods with Python may not be possible.

https://en.wikipedia.org/wiki/Scikit-learn

https://scikit-learn.org/stable/

Extensions:

scikit-survival

scikit-survival is a Python module for survival analysis built on top of scikit-learn. It allows doing survival analysis while utilizing the power of scikit-learn, e.g., for pre-processing or doing cross-validation.

https://github.com/sebp/scikit-survival

TPOT

Consider TPOT your Data Science Assistant. TPOT is a Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

TPOT will automate the most tedious part of machine learning by intelligently exploring thousands of possible pipelines to find the best one for your data. Once TPOT is finished searching (or you get tired of waiting), it provides you with the Python code for the best pipeline it found so you can tinker with the pipeline from there.

http://epistasislab.github.io/tpot/

https://github.com/EpistasisLab/tpot

Polars

The goal of Polars is being a lightning fast DataFrame library that utilizes all available cores on your machine.

Polars also supports full lazy query execution that allows for more query optimization.

https://pola-rs.github.io/polars-book/user-guide/index.html

https://github.com/pola-rs/polars

Lux

Lux is a Python library that facilitate fast and easy data exploration by automating the visualization and data analysis process. By simply printing out a dataframe in a Jupyter notebook, Lux recommends a set of visualizations highlighting interesting trends and patterns in the dataset. Visualizations are displayed via an interactive widget that enables users to quickly browse through large collections of visualizations and make sense of their data.

Check out our notebook gallery with examples of how Lux can be used with different datasets and analyses.

https://github.com/lux-org/lux