Deck Chairs and Fiddles: February 2017

Tuesday, February 28, 2017

netCDF-LD

"netCDF-LD is an approach for constructing Linked Data descriptions using the metadata and structures found in netCDF files. Linked Data is a method of publishing structured data on the web so that it can be interlinked and become more useful through semantic queries. It uses the W3 Resource Description Framework (RDF) standard to express the information and relationships

netCDF-LD enhances netCDF metadata, enabling information found in netCDF files to be linked with published conventions and controlled vocabularies used to express the content."

https://binary-array-ld.github.io/netcdf-ld/

https://github.com/binary-array-ld/bald/

https://docs.google.com/presentation/d/1S8_WOpsIL7Sw27sa4ylGoDoMcAhPUppsPM-xyQdcj0s/edit#slide=id.g1334a8bd12_4_695

"Linked Data is a method of publishing structured data so that it can be interlinked and become more useful through semantic queries. It builds upon standard Web technologies such as HTTP, RDF and URIs, but rather than using them to serve web pages for human readers, it extends them to share information in a way that can be read automatically by computers. This enables data from different sources to be connected and queried."

https://en.wikipedia.org/wiki/Linked_data

UGRID/SGRID

"This document is a standard for storing unstructured grid (a.k.a. unstructured mesh, flexible mesh) model data in a Unidata Network Common Data Form (NetCDF) file."

https://github.com/ugrid-conventions/ugrid-conventions

" The CF-conventions are widely used for storing and distributing environmental / earth sciences / climate data. The CF-conventions use a data perspective: every data value points to the latitude and longitude at which that value has been defined; the combination of latitude and longitude bounds and cell methods attributes can be used to define spatially averaged rather than point values.

This is all great for the distribution of (interpolated) data for general visualization and spatial data processing, but it doesn’t capture the relationship of the variables as computed by a numerical model (such as Arakawa staggering). Many models use staggered grids (using finite differences, or finite volume approach) or use a finite element approach of which the correct meaning may not be captured easily by simple cell methods descriptors. This becomes a problem if you don’t want to just look at the big picture of the model results, but also at the details at the grid resolution:

What is the exact meaning of a flux on the output file in discrete terms?
Can we verify the mass balance?
Can the data be used for restarting the model?

Correctly handling the staggered data has always been a crucial element of model post-processing tools. In the UGRID conventions, we have defined the (unstructured) grid as a separate entity on the file which consists of nodes and connections of nodes defining edges, faces, and volumes. For a structured (staggered) grid we are currently lacking a consistent convention. Although one could store structured grid data using UGRID conventions, some fundamental aspects such as distinction between grid directions would be lost.

In this context we have created these lightweight SGRID conventions to define the core aspects of a structured staggered grid without trying to capture the details of finite element formulations. This is an attempt to bring conventions for structured grids on par with those for unstructured grids.

https://sgrid.github.io/sgrid/

https://github.com/sgrid/pysgrid

https://github.com/NOAA-ORR-ERD/pyugrid

https://docs.google.com/presentation/d/1E4oZMMi7bs2MonV7kDQzHyUWjwxyN29yR4GSzWjA2Kw/edit#slide=id.p4

https://docs.google.com/presentation/d/15bimkLqixfXo5E7oZPylEAgp_ZucW7eCzHXQJTCXLyk/edit#slide=id.p

http://sci-wms.github.io/sci-wms/

G.Projector

"G.Projector transforms an equirectangular map image into any of over 125 global and regional map projections. Longitude-latitude gridlines and continental outlines may be drawn on the map, and the resulting image may be saved to disk in GIF, JPEG, PDF, PNG, PS or TIFF form.

G.Projector is a cross-platform application that runs on Macintosh, Windows, Linux and other desktop computers."

https://www.giss.nasa.gov/tools/gprojector/

Rosetta

"Welcome to Rosetta, a data transformation tool. Rosetta is a web-based service that provides an easy, wizard-based interface for data collectors to transform their datalogger generated ASCII output into Climate and Forecast (CF) compliant netCDF files. These files will contain the metadata describing what data is contained in the file, the instruments used to collect the data, and other critical information that otherwise may be lost in one of many dreaded README files.

In addition, with the understanding that the observational community does appreciate the ease of use of ASCII files, methods for transforming the netCDF back into a user defined CSV or spreadsheet formats are also incorporated into Rosetta.

We hope that Rosetta will be of value to the science community users who have needs for transforming the data they have collected or stored in non-standard formats"

http://rosetta.unidata.ucar.edu/

Panoply

"Panoply plots geo-referenced and other arrays from netCDF, HDF, GRIB, and other datasets. With Panoply 4 you can:

Slice and plot geo-referenced latitude-longitude, latitude-vertical, longitude-vertical, time-latitude or time-vertical arrays from larger multidimensional variables.
Slice and plot "generic" 2D arrays from larger multidimensional variables.
Slice 1D arrays from larger multidimensional variables and create line plots.
Combine two geo-referenced arrays in one plot by differencing, summing or averaging.
Plot lon-lat data on a global or regional map using any of over 100 map projections or make a zonal average line plot.
Overlay continent outlines or masks on lon-lat map plots.
Use any of numerous color tables for the scale colorbar, or apply your own custom ACT, CPT, or RGB color table.
Save plots to disk GIF, JPEG, PNG or TIFF bitmap images or as PDF or PostScript graphics files.
Export lon-lat map plots in KMZ format.
Export animations as AVI or MOV video or as a collection of invididual frame images.
Explore remote THREDDS and OpenDAP catalogs and open datasets served from them.

Panoply is a cross-platform application that runs on Macintosh, Windows, Linux and other desktop computers."

https://www.giss.nasa.gov/tools/panoply/

OpenTopography

"OpenTopography facilitates community access to high-resolution, Earth science-oriented, topography data, and related tools and resources.

Over the past decade, there has been dramatic growth in the acquisition of publicly funded high-resolution topographic and bathymetric data for scientific, environmental, engineering and planning purposes. Because of the richness of these data sets, they are often extremely valuable beyond the application that drove their acquisition and thus are of interest to a large and varied user community. However, because of the large volumes of data produced by high-resolution mapping technologies such as lidar, it is often difficult to distribute these datasets. Furthermore, the data can be technically challenging to work with, requiring software and computing resources not readily available to many users. OpenTopography aims to democratize access to high-resolution topographic data in a manner that serves users with varied expertise, application domains, and computing resources.

OpenTopography data access levels:

Google Earth:

Google Earth provides an excellent platform to deliver lidar-derived visualizations for research, education, and outreach purposes. These files display full-resolution images derived from lidar in the Google Earth virtual globe. The virtual globe environment provides a freely available and easily navigated viewer and enables quick integration of the lidar visualizations with imagery, geographic layers, and other relevant data available in KML format.

Raster:

Pre-computed raster data include digital elevation model (DEM) layers computed from aerial lidar surveys and raster data from the Satellite Radar Topography Mission (SRTM) global dataset. DEMs from aerial lidar surveys are available as bare earth (ground), highest hit (first or all return), or intensity (strength of laser pulse) tiles. Some datasets also have orthophtotographs available. The DEMs are in common GIS formats (e.g. ESRI Arc Binary) and are compressed (zipped) to reduce their size.

Lidar point cloud data and on-demand processing:

This aspect of OpenTopography allows users to define an area of interest, as well as subset of the data (e.g. “ground returns only"), and then to download the results of this query in ASCII or LAS binary point cloud formats. Also available is the option to generate custom derivative products such as digital elevation models (DEMs) produced with user-defined resolution and algorithm parameters, and downloaded in a number of different file formats. The system will also generate geomorphic metrics such as hillshade and slope maps, and will dynamically generate visualizations of the data products for display in the web browser or Google Earth."

http://www.opentopography.org/

http://acid.sdsc.edu/projects/opentopo

Kepler

"The Kepler Project is dedicated to furthering and supporting the capabilities, use, and awareness of the free and open source, scientific workflow application, Kepler. Kepler is designed to help scientists, analysts, and computer programmers create, execute, and share models and analyses across a broad range of scientific and engineering disciplines. Kepler can operate on data stored in a variety of formats, locally and over the internet, and is an effective environment for integrating disparate software components, such as merging "R" scripts with compiled "C" code, or facilitating remote, distributed execution of models. Using Kepler's graphical user interface, users simply select and then connect pertinent analytical components and data sources to create a "scientific workflow"—an executable representation of the steps required to generate results. The Kepler software helps users share and reuse data, workflows, and components developed by the scientific community to address common needs.

The features include:

Kepler is based on the Ptolemy II system, a mature platform supporting multiple models of computation suited to distinct types of analysis (processing sensor data, for example, or integrating differential equations).

Kepler workflows can be nested, allowing complex tasks to be composed from simpler components, and enabling workflow designers to build re-usable, modular sub-workflows that can be saved and used for many different applications.

Kepler workflows can leverage the computational power of grid technologies (e.g., Globus, SRB, Web and Soaplab Services), as well as take advantage of Kepler’s native support for parallel processing.

Kepler workflows and customized components can be saved, reused, and shared with colleagues using the Kepler archive format (KAR).

Kepler ships with a searchable library containing over 350 ready-to-use processing components ('actors') that can be easily customized, connected and then run from a desktop environment to perform an analysis, automate data management, and integrate applications efficiently.

Kepler's Component Repository provides a centralized server where components and workflows can be uploaded, downloaded, searched and shared with the community or designated users.

Currently, Kepler has support for data described by Ecological Metadata Language (EML), data accessible using the DiGIR protocol, the OPeNDAP protocol, DataTurbine, GridFTP, JDBC, SRB, and others."

https://kepler-project.org/

root_numpy

"Python extension module that provides an efficient interface between ROOT and NumPy. root_numpy’s internals are compiled C++ and can therefore handle large amounts of data much faster than equivalent pure Python implementations.

With your ROOT data in NumPy form, make use of NumPy’s broad library, including fancy indexing, slicing, broadcasting, random sampling, sorting, shape transformations, linear algebra operations, and more. See this tutorial to get started. NumPy is the fundamental library of the scientific Python ecosystem. Using NumPy arrays opens up many new possibilities beyond what ROOT offers. Convert your TTrees into NumPy arrays and use SciPy for numerical integration and optimization, matplotlib for plotting, pandas for data analysis, statsmodels for statistical modelling, scikit-learn for machine learning, and perform quick exploratory analysis in a Jupyter notebook.

At the core of root_numpy are powerful and flexible functions for converting ROOT TTrees into structured NumPy arrays as well as converting NumPy arrays back into ROOT TTrees. root_numpy can convert branches of strings and basic types such as bool, int, float, double, etc. as well as variable-length and fixed-length multidimensional arrays and 1D or 2D vectors of basic types and strings. root_numpy can also create columns in the output array that are expressions involving the TTree branches."

http://scikit-hep.org/root_numpy/

Kotori

"Kotori is a multi-channel, multi-protocol data acquisition and graphing toolkit based on Grafana, InfluxDB, Mosquitto and Twisted. It is written in Python.

Use convenient software and hardware components for building telemetry solutions, test benches and sensor networks. Build upon a flexible data acquisition integration framework. Address all aspects of collecting and storing sensor data from a multitude of data sources and devices."

https://getkotori.org/docs/

"The Hiveeyes project conceives a data collection platform for bee hive monitoring voluntarily operated by the beekeeper community. Together with Mosquitto, InfluxDB, Grafana, mqttwarn and BERadio, Kotori powers the Hiveeyes system on swarm.hiveeyes.org as a data collection hub for a Berlin-based beekeeper collective."

https://getkotori.org/docs/applications/hiveeyes.html

mbed

"An open-source embedded operating system designed specifically for the "things" in the Internet of Things (IoT). It includes all the features you need to develop a connected product based on an ARM Cortex-M microcontroller. mbed OS accelerates the process of creating a connected product by providing a platform operating system that includes robust security foundations, standards based communication capabilities, built-in cloud management services, and drivers for sensors, I/O devices and connectivity.

mbed OS is built as a modular, configurable software stack so that you can readily customize it to the device you're developing for, and reduce memory requirements by excluding unnecessary software components.

https://github.com/ARMmbed/mbed-os

https://www.mbed.com/en/

"mbed CLI is the name of the ARM mbed command-line tool, packaged as mbed-cli, which enables the full mbed workflow: repositories version control, maintaining dependencies, publishing code, updating from remotely hosted repositories (GitHub, GitLab and mbed.org) and invoking ARM mbed's own build system and export functions, among other operations."

https://github.com/ARMmbed/mbed-cli

Open Hub

"The Black Duck Open Hub (formerly Ohloh.net) is an online community and public directory of free and open source software (FOSS), offering analytics and search services for discovering, evaluating, tracking, and comparing open source code and projects. Open Hub Code Search is free code search engine indexing over 21,000,000,000 lines of open source code from projects on the Black Duck Open Hub.

The Open Hub is editable by everyone, like a wiki. All are welcome to join, add new projects, and make corrections to existing project pages. This public review helps to make the Black Duck Open Hub one of the largest, most accurate, and up-to-date FOSS software directories available. We encourage contributors to join the Open Hub and claim their commits on existing projects and add projects not yet on the site. By doing so, Open Hub users can assemble a complete profile of all their FOSS code contributions.

The Open Hub is not a forge — it does not host projects and code. The Open Hub is a directory and community, offering analytics and search services and tools. By connecting to project source code repositories, analyzing both the code’s history and ongoing updates, and attributing those updates to specific contributors, the Black Duck Open Hub can provide reports about the composition and activity of project code bases and aggregate this data to track the changing demographics of the FOSS world."

http://blog.openhub.net/about/

Ptolemy

"Ptolemy II [1][6] is an open-source software framework supporting experimentation with actor-oriented design. Actors are software components that execute concurrently and communicate through messages sent via interconnected ports. A model is a hierarchical interconnection of actors. In Ptolemy II, the semantics of a model is not determined by the framework, but rather by a software component in the model called a director, which implements a model of computation. The Ptolemy Project has developed directors supporting process networks (PN), discrete-events (DE), dataflow (SDF), synchronous/reactive(SR), rendezvous-based models, 3-D visualization, and continuous-time models. Each level of the hierarchy in a model can have its own director, and distinct directors can be composed hierarchically. A major emphasis of the project has been on understanding the heterogeneous combinations of models of computation realized by these directors. Directors can be combined hierarchically with state machines to make modal models [2]. A hierarchical combination of continuous-time models with state machines yields hybrid systems [3]; a combination of synchronous/reactive with state machines yields StateCharts [4] (the Ptolemy II variant is close to SyncCharts).

Ptolemy II has been under development since 1996; it is a successor to Ptolemy Classic, which was developed since 1990. The core of Ptolemy II is a collection of Java classes and packages, layered to provide increasingly specific capabilities. The kernel supports an abstract syntax, a hierarchical structure of entities with ports and interconnections. A graphical editor called Vergil supports visual editing of this abstract syntax. An XML concrete syntax called MoML provides a persistent file format for the models[5]. Various specialized tools have been created from this framework, including HyVisual (for hybrid systems modeling), Kepler (for scientific workflows), VisualSense (for modeling and simulation of wireless networks), Viptos (for sensor network design), and some commercial products. Key parts of the infrastructure include an actor abstract semantics, which enables the interoperability of distinct models of computation with a well-defined semantics; a model of time (specifically, super-dense time, which enables interaction of continuous dynamics and imperative logic); and a sophisticated type system supporting type checking, type inference, and polymorphism. The type system has recently been extended to support user-defined ontologies [6]. Various experiments with synthesis of implementation code and abstractions for verification are included in the project

Current work in Ptolemy II is focusing on Accessors, which are a technology for making the Internet of Things accessible to a broader community of citizens, inventors, and service providers through open interfaces, an open community of developers, and an open repository of technology. Ptolemy II includes the Cape Code Accessor Host [7]"

http://ptolemy.eecs.berkeley.edu/ptolemyII/index.htm

Zoltan

"The Zoltan library is a collection of data management services for parallel, unstructured, adaptive, and dynamic applications. It simplifies the load-balancing, data movement, unstructured communication, and memory usage difficulties that arise in dynamic applications such as adaptive finite-element methods, particle methods, and crash simulations. Zoltan's data-structure neutral design also lets a wide range of applications use it without imposing restrictions on application data structures. Its object-based interface provides a simple and inexpensive way for application developers to use the library and researchers to make new capabilities available under a common interface."

http://www.cs.sandia.gov/zoltan/

VisIt

"VisIt is an Open Source, interactive, scalable, visualization, animation and analysis tool. From Unix, Windows or Mac workstations, users can interactively visualize and analyze data ranging in scale from small (<10¹ core) desktop-sized projects to large (>10⁵ core) leadership-class computing facility simulation campaigns. Users can quickly generate visualizations, animate them through time, manipulate them with a variety of operators and mathematical expressions, and save the resulting images and animations for presentations. VisIt contains a rich set of visualization features to enable users to view a wide variety of data including scalar and vector fields defined on two- and three-dimensional (2D and 3D) structured, adaptive and unstructured meshes. Owing to its customizeable plugin design, VisIt is capabable of visualizing data from over 120 different scientific data formats.

The basic design is a client-server model, where the server is parallelized. The client-server aspect allows for effective visualization in a remote setting, while the parallelization of the server allows for the largest data sets to be processed reasonably interactively. The tool has been used to visualize many large data sets, including a two hundred and sixteen billion data point structured grid, a one billion point particle simulation, and curvilinear, unstructured, and AMR meshes with hundreds of millions to billions of elements. The most common form of the server is as a stand alone process that reads in data from files. However, an alternate form exists where a simulation code can link in "lib-VisIt" and become itself the server, allowing for in situ visualization and analysis.

VisIt follows a data flow network paradigm where interoperable modules are connected to perform custom analysis. The modules come from VisIt's five primary user interface abstractions and there are many examples of each. There are twenty one ``plots" (ways to render data), forty-two ``operators" (ways to manipulate data), eighty-five file format readers, over fifty ``queries" (ways to extract quantitative information), and over one hundred ``expressions" (ways to create derived quantities). Further, a plugin capability allows for dynamic incorporation of new plot, operator, and database modules. These plugins can be partially code generated, even including automatic generation of Qt and Python user interfaces.

VisIt also supports C++, Python and Java interfaces. The C++ and Java interfaces make it possible to provide alternate user interfaces for VisIt or allow existing C++ or Java applications to add visualization support. The Python scripting interface gives users the ability to batch process data using a powerful scripting language. This feature can be used to create extremely sophisticated animations or implement regression suites. It also allows simulation systems that use Python as a back-plane to easily integrate visualization capabilities into their systems."

https://wci.llnl.gov/simulation/computer-codes/visit

http://www.visitusers.org/index.php?title=Main_Page

SUNDIALS

"SUNDIALS is implemented with the goal of providing robust time integrators and nonlinear solvers that can easily be incorporated into existing simulation codes. The primary design goals are to require minimal information from the user, allow users to easily supply their own data structures underneath the packages, and allow for easy incorporation of user-supplied linear solvers and preconditioners.

The main numerical operations performed in these codes are operations on data vectors, and the codes have been written in terms of interfaces to these vector operations. The result of this design is that users can relatively easily provide their own data structures to the solvers by telling the solver about their structures and providing the required operations on them. The codes also come with default vector structures with pre-defined operation implementations for serial, shared-memory parallel (openMP and PThreads), and distributed memory parallel (MPI) environments in case a user prefers not to supply their own structures. Wrappers for the hypre ParVector and a PETSc vector are also provided. In addition, all parallelism is contained within specific vector operations (norms, dot products, etc.). No other operations within the solvers require knowledge of parallelism. Thus, using a solver in parallel consists of using a parallel vector implementation, either one provided with SUNDIALS or the user’s own parallel vector structure, underneath the solver. Hence, we do not make a distinction between parallel and serial versions of the codes.

SUNDIALS (SUite of Nonlinear and DIfferential/ALgebraic equation Solvers) consists of the following six solvers:

CVODE - solves initial value problems for ordinary differential equation (ODE) systems.

CVODES - solves ODE systems and includes sensitivity analysis capabilities (forward and adjoint).

ARKode - solves initial value ODE problems with additive Runge-Kutta methods, including support for IMEX methods.

IDA - solves initial value problems for differential-algebraic equation (DAE) systems.

IDAS - solves DAE systems and includes sensitivity analysis capabilities (forward and adjoint).

KINSOL - solves nonlinear algebraic systems."

http://computation.llnl.gov/projects/sundials

HYPRE

"Livermore’s HYPRE library of linear solvers makes possible larger, more detailed simulations by solving problems faster than traditional methods at large scales. It offers a comprehensive suite of scalable solvers for large-scale scientific simulation, featuring parallel multigrid methods for both structured and unstructured grid problems. The HYPRE library is highly portable and supports a number of languages.

The HYPRE team was one of the first to develop algebraic multigrid algorithms and software for extreme-scale parallel supercomputers. The team maintains an active role in the multigrid research community and is recognized for its leadership in both algorithm and software development."

http://computation.llnl.gov/projects/hypre-scalable-linear-solvers-multigrid-methods

SAMRAI

"SAMRAI (Structured Adaptive Mesh Refinement Application Infrastructure) is an object-oriented C++ software library that enables exploration of numerical, algorithmic, parallel computing, and software issues associated with applying structured adaptive mesh refinement (SAMR) technology in large-scale parallel application development. SAMRAI provides software tools for developing SAMR applications that involve coupled physics models, sophisticated numerical solution methods, and which require high-performance parallel computing hardware. SAMRAI enables integration of SAMR technology into existing codes and simplifies the exploration of SAMR methods in new application domains. Due to judicious application of object-oriented design, SAMRAI capabilities are readily enhanced and extended to meet specific problem requirements. The SAMRAI team collaborates with application researchers at LLNL and other institutions. These interactions motivate the continued evolution of the SAMRAI library.

The SAMRAI library provides a rich set of reusable, extensible software components for SAMR application development. The capabilities provided by SAMRAI include:

Automatic (user-controlled) dynamic mesh refinement
Uniform, non-uniform, and user-defined load balancing
Various array data types for representing simulation quantities on a mesh with different centerings (e.g., node, face, cell, etc.), and support for data defined on irregular sets of cell indices
Support for user-defined data on a SAMR mesh hierarchy with full to parallel data communication functionality (w/o recompiling the library)
Customizable adaptive meshing and integration algorithms (via object-oriented composition and inheritance)
Support for meshes with arbitrary spatial dimension
Multiblock AMR allowing irregular block connectivity
Interfaces to solver libraries, such as hypre, PETSc, and SUNDIALS
Flexible parallel restart (HDF5) and input parser
Tools for measuring performance, gathering statistics
Visualization support via VisIt

The SAMRAI library is partitioned into a collection of software “packages”. Each package is a set of logically-related C++ classes that constitutes a functional role in SAMR application development."

http://computation.llnl.gov/projects/samrai/software

http://computation.llnl.gov/projects/samrai

VTK

"The Visualization Toolkit (VTK) is an open-source, freely available software system for 3D computer graphics, image processing and visualization. VTK consists of a C++ class library and several interpreted interface layers including Tcl/Tk, Java, and Python. Kitware, whose team created and continues to extend the toolkit, offers professional support and consulting services for VTK. VTK supports a wide variety of visualization algorithms including: scalar, vector, tensor, texture, and volumetric methods; and advanced modeling techniques such as: implicit modeling, polygon reduction, mesh smoothing, cutting, contouring, and Delaunay triangulation. VTK has an extensive information visualization framework, has a suite of 3D interaction widgets, supports parallel processing, and integrates with various databases and GUI toolkits such as Qt and Tk. VTK is cross-platform and runs on Linux, Windows, Mac and Unix platforms. VTK also includes ancillary support for 3D interaction widgets, two and three-dimensional annotation, and parallel computing. At its core VTK is implemented as a C++ toolkit, requiring users to build applications by combining various objects into an application. The system also supports automated wrapping of the C++ core into Python, Java and Tcl, so that VTK applications may also be written using these interpreted programming languages."

http://www.vtk.org/

VTK-m

"One of the biggest recent changes in high-performance computing is the increasing use of accelerators. Accelerators contain processing cores that independently are inferior to a core in a typical CPU, but these cores are replicated and grouped such that their aggregate execution provides a very high computation rate at a much lower power. Current and future CPU processors also require much more explicit parallelism. Each successive version of the hardware packs more cores into each processor, and technologies like hyperthreading and vector operations require even more parallel processing to leverage each core’s full potential

VTK-m is a toolkit of scientific visualization algorithms for emerging processor architectures. VTK-m supports the fine-grained concurrency for data analysis and visualization algorithms required to drive extreme scale computing by providing abstract models for data and execution that can be applied to a variety of algorithms across many different processor architectures."

http://m.vtk.org/index.php/Main_Page

VTK-m: Accelerating the Visualization Toolkit for Massively Threaded Architectures - http://ieeexplore.ieee.org/document/7466740/

Visualization for Exascale: Portable Performance is Critical - http://superfri.org/superfri/article/view/77

Kokkos

"Kokkos implements a programming model in C++ for writing performance portable applications targeting all major HPC platforms. For that purpose it provides abstractions for both parallel execution of code and data management. Kokkos is designed to target complex node architectures with N-level memory hierarchies and multiple types of execution resources. It currently can use OpenMP, Pthreads and CUDA as backend programming models."

https://github.com/kokkos/kokkos

Tutorials - https://github.com/kokkos/kokkos-tutorials

"The Kokkos Clang compiler is a version of the Clang C++ compiler that has been modified to perform targeted code generation for Kokkos constructs in the goal of generating highly optimized code and to provide semantic (domain) awareness throughout the compilation toolchain of these constructs such as parallel for and parallel reduce. This approach is taken to explore the possibilities of exposing the developer’s intentions to the underlying compiler infrastructure (e.g. optimization and analysis passes within the middle stages of the compiler) instead of relying solely on the restricted capabilities of C++ template metaprogramming. To date our current activities have focused on correct GPU code generation and thus we have not yet focused on improving overall performance. The compiler is implemented by recognizing specific (syntactic) Kokkos constructs in order to bypass normal template expansion mechanisms and instead use the semantic knowledge of Kokkos to directly generate code in the compiler’s intermediate representation (IR); which is then translated into an NVIDIA-centric GPU program and supporting runtime calls. In addition, by capturing and maintaining the higher-level semantics of Kokkos directly within the lower levels of the compiler has the potential for significantly improving the ability of the compiler to communicate with the developer in the terms of their original programming model/semantics."

https://github.com/lanl/kokkos-clang

LaGriT

"LaGriT (Los Alamos Grid Toolbox) LA-CC-15-069 is a library of user callable tools that provide mesh generation, mesh optimization and dynamic mesh maintenance in two and three dimensions. LaGriT is used for a variety of geology and geophysics modeling applications including porous flow and transport model construction, finite element modeling of stress/strain in crustal fault systems, seismology, discrete fracture networks, asteroids and hydrothermal systems. The general capabilities of LaGriT can also be used outside of earth science applications and applied to nearly any system that requires a grid/mesh and initial and boundary conditions, setting of material properties and other model setup functions. It can also be use as a tool to pre- and post-process and analyze vertex and mesh based data.

Geometric regions for LaGriT are defined as combinations of bounding surfaces, where the surfaces are described analytically or as tessellated surfaces (triangles and/or quadrilaterals). A variety of techniques for distributing points within these geometric regions are provided. Mesh connectivity uses a Delaunay tetrahedralization algorithm that respects material interfaces. The data structures created to implement this algorithm are compact and powerful and expandable to include hybrid meshes (tet, hex, prism, pyramid, quadrilateral, triangle, line) however the main algorithms are for triangle and tetrahedral Delaunay meshes.

Mesh refinement, derefinement and smoothing are available to modify the mesh to provide more resolution in areas of interest. Mesh refinement adds nodes to the mesh based on geometric criteria such as edge length or based on field variable shape. Mesh smoothing moves nodes to adapt the mesh to field variable measures, and, at the same time, maintains quality elements.

LaGriT has three modes of use, 1) command line 2) batch driven via a control file 3) calls from C/Fortran programs. There is no GUI interface.

PyLaGriT is a Python interface that allows LaGriT functionality to be used interactively and in batch mode. It combines the meshing capabilities of LaGriT with the numeric and scientific functionality of Python including the quering of mesh properties, enhanced looping functionality, and user defined error checking. PyLaGriT has been developed to easily generate meshes by extrusion, dimensional reduction, coarsening and refinement of synthetic and realistic data. PyLaGriT enhances the workflow, enabling rapid iterations for use in simulations incorporating uncertainty in system geometry and automatic mesh generation."

http://lagrit.lanl.gov/

https://github.com/lanl/LaGriT

Chombo

"Chombo provides a set of tools for implementing finite difference and finite volume methods for the solution of partial differential equations on block-structured adaptively refined rectangular grids. Both elliptic and time-dependent modules are included. Chombo supports calculations in complex geometries with both embedded boundaries and mapped grids, and Chombo also supports particle methods. Most parallel platforms are supported, and cross-platform self-describing file formats are included.

The core of the software distribution is divided into five parts:

BoxTools: Provides infrastructure to do any calculations over unions of rectangles. BoxTools provides tools to perform set calculus for points in a domain and data holders for unions of rectangles.
AMRTools: Provides tools for data communication between refinement levels, including coarse-fine interpolation tools.
AMRTimeDependent: Manages sub-cycling in time for time-dependent adaptive calculations.
AMRElliptic: a multigrid-based elliptic equation solver for adaptive hierarchies.
EBTools: Embedded Boundary discretizations and tools
ParticleTools: Release 2.0 has taken the ParticleTools out of the public API while it is being re-engineered.

Chombo also includes test programs and some examples of how to do calculations on block-structured, adaptively refined meshes. The examples include a cell-centered Poisson solver, several variations on a node-centered elliptic solver, a Helmholtz equation solver, a couple implementations of a Godunov method for gas dynamics, a simple wave equation solver and some basic I/O code.

Finally Chombo includes a system for writing dimension-independent FORTRAN which we call "Chombo Fortran". Fortran subroutines are used for the most compute-intensive parts of Chombo applications because they produce faster results."

https://commons.lbl.gov/display/chombo/Chombo+-+Software+for+Adaptive+Solutions+of+Partial+Differential+Equations

deal.ii

"A C++ program library targeted at the computational solution of partial differential equations using adaptive finite elements. It uses state-of-the-art programming techniques to offer you a modern interface to the complex data structures and algorithms required.

The main aim of deal.II is to enable rapid development of modern finite element codes, using among other aspects adaptive meshes and a wide array of tools classes often used in finite element program. Writing such programs is a non-trivial task, and successful programs tend to become very large and complex. We believe that this is best done using a program library that takes care of the details of grid handling and refinement, handling of degrees of freedom, input of meshes and output of results in graphics formats, and the like. Likewise, support for several space dimensions at once is included in a way such that programs can be written independent of the space dimension without unreasonable penalties on run-time and memory consumption.

If you are active in the field of adaptive finite element methods, deal.II might be the right library for your projects. Among other features, it offers:

Support for one, two, and three space dimensions, using a unified interface that allows to write programs almost dimension independent.
Handling of locally refined grids, including different adaptive refinement strategies based on local error indicators and error estimators. Both h, p, and hp refinement is fully supported for continuous and discontinuous elements.
Support for a variety of finite elements: Lagrange elements of any order, continuous and discontinuous; Nedelec and Raviart-Thomas elements of any order; elements composed of other elements.
Parallelization on single machine through the Threading Build Blocks and across nodes via MPI. deal.II has been shown to scale to at least 16k processors.
Extensive documentation: all documentation is available online in a logical tree structure to allow fast access to the information you need. If printed it comprises more than 500 pages of tutorials, several reports, and presently some 5,000 pages of programming interface documentation with explanations of all classes, functions, and variables. All documentation comes with the library and is available online locally on your computer after installation.
Modern software techniques that make access to the complex data structures and algorithms as transparent as possible. The use of object oriented programming allows for program structures similar to the structures in mathematical analysis.
A complete stand-alone linear algebra library including sparse matrices, vectors, Krylov subspace solvers, support for blocked systems, and interface to other packages such as Trilinos, PETSc and METIS.
Support for several output formats, including many common formats for visualization of scientific data.
Portable support for a variety of computer platforms and compilers.

https://www.dealii.org/

FiPy

"FiPy is an object oriented, partial differential equation (PDE) solver, written in Python, based on a standard finite volume (FV) approach. The framework has been developed in the Materials Science and Engineering Division (MSED) and Center for Theoretical and Computational Materials Science (CTCMS), in the Material Measurement Laboratory (MML) at the National Institute of Standards and Technology (NIST).

The solution of coupled sets of PDEs is ubiquitous to the numerical simulation of science problems. Numerous PDE solvers exist, using a variety of languages and numerical approaches. Many are proprietary, expensive and difficult to customize. As a result, scientists spend considerable resources repeatedly developing limited tools for specific problems. Our approach, combining the FV method and Python, provides a tool that is extensible, powerful and freely available. A significant advantage to Python is the existing suite of tools for array calculations, sparse matrices and data rendering.

The FiPy framework includes terms for transient diffusion, convection and standard sources, enabling the solution of arbitrary combinations of coupled elliptic, hyperbolic and parabolic PDEs."

http://www.ctcms.nist.gov/fipy/

MOOSE

"The Multiphysics Object-Oriented Simulation Environment (MOOSE) is a finite-element, multiphysics framework primarily developed by Idaho National Laboratory. It provides a high-level interface to some of the most sophisticated nonlinear solver technology on the planet. MOOSE presents a straightforward API that aligns well with the real-world problems scientists and engineers need to tackle. Every detail about how an engineer interacts with MOOSE has been thought through, from the installation process through running your simulation on state of the art supercomputers, the MOOSE system will accelerate your research.

Some of the capability at your fingertips:

Fully-coupled, fully-implicit multiphysics solver
Dimension independent physics
Automatically parallel (largest runs >100,000 CPU cores!)
Modular development simplifies code reuse
Built-in mesh adaptivity
Continuous and Discontinuous Galerkin (DG) (at the same time!)
Intuitive parallel multiscale solves (see videos below)
Dimension agnostic, parallel geometric search (for contact related applications)
Flexible, plugable graphical user interface
~30 plugable interfaces allow specialization of every part of the solve
Physics modules providing general capability for solid mechanics, phase field modeling, Navier-Stokes, heat conduction and more

MOOSE is different. MOOSE is a way of developing software just as much as it is a finite-element framework. When we change something in the framework we contribute patches to you that fix your application! As MOOSE is developed we test against your tests each step of the way to ensure that we're not creating problems. MOOSE is developed directly on GitHub providing a unique workflow that ensures smooth community involvement."

http://mooseframework.org/

DAE Tools

"DAE Tools is a cross-platform equation-based and object-oriented process modelling and optimisation software. It is not a modelling language nor a collection of numerical libraries but rather a higher level structure – an architectural design of interdependent software components providing an API for:

Model development/specification
Activities on developed models, such as simulation, optimisation, and parameter estimation
Processing of the results, such as plotting and exporting to various file formats
Report generation
Code generation, co-simulation and model exchange

DAE Tools is initially developed to model and simulate processes in chemical process industry (mass, heat and momentum transfers, chemical reactions, separation processes, thermodynamics). However, DAE Tools can be used to develop high-accuracy models of (in general) many different kind of processes/phenomena, simulate/optimise them, visualise and analyse the results.
The following approaches/paradigms are adopted in DAE Tools:

A hybrid approach between general-purpose programming languages (such as c++ and Python) and domain-specific modelling languages (such as Modelica, gPROMS, Ascend etc.) (more information: The Hybrid approach).
An object-oriented approach to process modelling (more information: The Object-Oriented approach).
An Equation-Oriented (acausal) approach where all model variables and equations are generated and gathered together and solved simultaneously using a suitable mathematical algorithm (more information: The Equation-Oriented approach).
Separation of the model definition from the activities that can be carried out on that model. The structure of the model (parameters, variables, equations, state transition networks etc.) is given in the model class while the runtime information in the simulation class. This way, based on a single model definition, one or more different simulation/optimisation scenarios can be defined.
Core libraries are written in standard c++, however Python is used as the main modelling language (more information: Programming language).

All core libraries are written in standard c++. It is highly portable - it runs on all major operating systems (GNU/Linux, MacOS, Windows) and all platforms with a decent c++ compiler, Boost and standard c/c++ libraries (by now it is tested on 32/64 bit x86 and ARM architectures making it suitable for use in embedded systems). Models can be developed in Python (pyDAE module) or c++ (cDAE module), compiled into an independent executable and deployed without a need for any run time libraries.

DAE Tools support a large number of solvers. Currently Sundials IDAS solver is used to solve DAE systems and calculate sensitivities, while BONMIN, IPOPT, and NLOPT solvers are used to solve NLP/MINLP problems. DAE Tools support direct dense and sparse matrix linear solvers (sequential and multi-threaded versions) at the moment. In addition to the built-in Sundials linear solvers, several third party libraries are interfaced: SuperLU/SuperLU_MT, Pardiso, Intel Pardiso, Trilinos Amesos (KLU, Umfpack, SuperLU, Lapack), and Trilinos AztecOO (with built-in, Ifpack or ML preconditioners) which can take advantage of multi-core/cpu computers. Linear solvers that exploit general-purpose graphics processing units (GPGPU, such as NVidia CUDA) are also available (CUSP) but in an early development stage."

http://www.daetools.com/

Trilinos

"The Trilinos Project is an effort to develop algorithms and enabling technologies within an object-oriented software framework for the solution of large-scale, complex multi-physics engineering and scientific problems. A unique design feature of Trilinos is its focus on packages.

The Trilinos Capability Area homepage organizes Trilinos capabilities into nine collections of functionality and describes which packages are relevant to each area.

User Experience -- [overview] [webpage]
Parallel Programming Environments [overview][webpage]
Framework & Tools -- [overview] [webpage]
Software Engineering Technologies and Integration -- [overview] [webpage]
I/O Support -- [overview]
Meshes, Geometry, & Load Balancing -- [overview] [webpage]
Discretizations -- [overview] [webpage]
Scalable Linear Algebra -- [overview] [webpage]
Linear & Eigen Solvers -- [overview] [webpage]
Embedded Nonlinear Analysis Tools -- [overview] [webpage]

https://trilinos.org/

Build Reference - https://trilinos.org/docs/files/TrilinosBuildReference.html

Packages - https://trilinos.org/packages/

Capabilities - https://trilinos.org/about/capabilities/

PyTrilinos - https://trilinos.org/packages/pytrilinos/

Conda - https://anaconda.org/guyer/trilinos

http://content.iospress.com/journals/scientific-programming/20/2

http://content.iospress.com/journals/scientific-programming/20/3

ELI - A System for Programming with Arrays

"ELI has most of the functionality of the ISO APL standard, but it also has facilities not described there such as lists for non-homogeneous data, complex numbers, symbols, temporal data, control structures, scripting files, dictionaries, tables and SQL-like statements. It comes with a compiler for flat array programs. ELI is succinct, easy to learn and versatile. Compared with MATLAB or Python, ELI encourages a dataflow style of programming where the output of one operation feeds the input of another, resulting in greater productivity and clarity of code.

ELI is freely available on Windows, Linux and Mac OS; see Download for versions and update information. An introductory paper, a tutorial on Programming with Arrays , ELI for Kids a novel way to learn math and coding , a Primer and a Compiler User’s Guide are available in Documents. We give a sample here to illustrate the flavor of the language. People already familiar with APL can jump directly to examine last 3 examples and the APL/ELI Symbol Comparison Table. A line of ELI executes from right to left as a chain of operations; anything to the right of // is a comment. A simple example is given to solve a coin tossing problem in one line of ELI."

http://fastarray.appspot.com/default.html

APL in R

"APL was introduced by Iverson (1962). It is an array language, with many functions to manipulate multidimensional arrays. R also has multidimensional arrays, but not as many functions to work with them."

https://bookdown.org/jandeleeuw6/apl/

The 50 Greatest Live Jazz Albums

"We’ve picked the 50 greatest live jazz albums, and while they are not in any particular order, we have featured what we think are the ten greatest examples of jazz played live, records that should be in everyone’s collection"

http://www.udiscovermusic.com/playlists/the-50-greatest-live-jazz-albums

How to create a video DVD from the command line

"Using ffmpeg you can convert any video file to an mpg file, that dvdauthor can use later:

ffmpeg -i video.avi -aspect 16:9 -target pal-dvd dvd.mpg

You might want to change the aspect ratio to 4:3 or the target to ntsc-dvd, depending on your
preferences and region. If you need to define video bitrate use "-b bitrate" option:

ffmpeg -i video.avi -aspect 16:9 -target pal-dvd  -b 1800000 dvd.mpg

I`m not sure what units are used but the above example gives bitrate ca. 2300kbits/s which is usually enough for typical avi. Bigger bitrate gives better quality but a larger file. Just test the output and adjust the bitrate according to your needs.

Now add the mpg file to your project using dvdauthor:

dvdauthor -o dvd/ -t dvd.mpg

You can convert and add any number of files this way. After you've added all of them, run:

export VIDEO_FORMAT=PAL 
dvdauthor -o dvd/ -T

You might want to set VIDEO_FORMAT=NTSC instead.
And then you can create an iso with mkisofs:

mkisofs -dvd-video -o dvd.iso dvd/

which you can burn to a DVD disc with any DVD burning software. cdrecord from the command line will do just fine"

See the article for more advanced usage.

https://docs.salixos.org/wiki/How_to_create_a_video_DVD_from_the_command_line

ODK

"Open Data Kit (ODK) is a free and open-source set of tools which help organizations author, field, and manage mobile data collection solutions. ODK provides an out-of-the-box solution for users to:

Build a data collection form or survey (XLSForm is recommended for larger forms);
Collect the data on a mobile device and send it to a server; and
Aggregate the collected data on a server and extract it in useful formats.

In addition to socio-economic and health surveys with GPS locations and images, ODK is being used to create decision support for clinicians and for building multimedia-rich nature mapping tools. See the list available tools, featured deployments, and implementation companies for more examples of what the ODK community is doing."

https://opendatakit.org/

Yocto

"The Yocto Project is a Linux Foundation workgroup whose goal is to produce tools and processes that will enable the creation of Linux distributions for embedded software that are independent of the underlying architecture of the embedded software itself. The Yocto Project is an open source project whose focus is on improving the software development process for embedded Linux distributions. The Yocto Project provides interoperable tools, metadata, and processes that enable the rapid, repeatable development of Linux-based embedded systems.

The Yocto Project has the aim and objective of attempting to improve the lives of developers of customised Linux systems supporting the ARM, MIPS, PowerPC and x86/x86 64 architectures. A key part of this is an open source build system, based around the OpenEmbedded architecture, that enables developers to create their own Linux distribution specific to their environment. There are several other sub-projects under the project umbrella which include EGLIBC, pseudo, cross-prelink, Eclipse integration, ADT/SDK, the matchbox suite of applications, and many others. One of the central goals of the project is interoperability among these tools.

The project offers different sized targets from "tiny" to fully featured images which are configurable and customisable by the end user. The project encourages interaction with upstream projects and has contributed heavily to OpenEmbedded-Core and BitBake as well as to numerous upstream projects, including the Linux kernel. The resulting images are typically useful in systems where embedded Linux would be used, these being single-use focused systems or systems without the usual screens/input devices associated with desktop Linux systems."

https://en.wikipedia.org/wiki/Yocto_Project

https://www.yoctoproject.org/

https://www.howtoforge.com/tutorial/how-to-create-your-own-linux-distribution-with-yocto-on-ubuntu/

Monday, February 27, 2017

PETSc

"The Portable, Extensible Toolkit for Scientific Computation (PETSc, pronounced PET-see; the S is silent), is a suite of data structures and routines developed by Argonne National Laboratory for the scalable (parallel) solution of scientific applications modeled by partial differential equations. It employs the Message Passing Interface (MPI) standard for all message-passing communication. The current version of PETSc is 3.7. PETSc is the world’s most widely used parallel numerical software library for partial differential equations and sparse matrix computations.

PETSc is intended for use in large-scale application projects, many ongoing computational science projects are built around the PETSc libraries. Its careful design allows advanced users to have detailed control over the solution process. PETSc includes a large suite of parallel linear and nonlinear equation solvers that are easily used in application codes written in C, C++, Fortran and now Python. PETSc provides many of the mechanisms needed within parallel application code, such as simple parallel matrix and vector assembly routines that allow the overlap of communication and computation. In addition, PETSc includes support for parallel distributed arrays useful for finite difference methods.

PETSc consists of a variety of components consisting of major classes and supporting infrastructure. Users typically interact with objects of the highest level classes relevant to their application, essential lower level objects such as vectors, and may customize or extend any others. All major components of PETSc have an extensible plugin architecture."

Third-party packages PETSc can use for various purposes include

CUDA,

CUSP,

ViennaCL,

OpenCL,

Elemental,

METIS,

MSTK,

PtSCOTCH,

Zoltan,

NetCDF,

HDF5,

TRILINOS,

SuperLU,

MKL,

Hypre,

SCALAPACK,

MUMPS -

Boost - http://www.boost.org/

PAPI - http://icl.cs.utk.edu/papi/index.html

Numpy

FFTW - http://www.fftw.org/

Suitesparse - http://faculty.cse.tamu.edu/davis/suitesparse.html

Chombo - https://commons.lbl.gov/display/chombo/Chombo+-+Software+for+Adaptive+Solutions+of+Partial+Differential+Equations

SUNDIALS - http://computation.llnl.gov/projects/sundials

Chaco - http://www3.cs.stonybrook.edu/~algorith/implement/chaco/implement.shtml

FIAT - http://fenics.readthedocs.io/projects/fiat/en/latest/

SPAI - https://cccs.unibas.ch/lehre/software-packages/

https://www.mcs.anl.gov/petsc/

Sunday, February 26, 2017

pbdR

"The "Programming with Big Data in R" project (pbdR) is a set of highly scalable R packages for distributed computing and profiling in data science.

Our packages include high performance, high-level interfaces to MPI, ZeroMQ, ScaLAPACK, NetCDF4, PAPI, and more. While these libraries shine brightest on large distributed platforms, they also work rather well on small clusters and usually, surprisingly, even on a laptop with only two cores."

https://rbigdata.github.io/index.html

hdf5-json

"A specification, library, and utilities for describing HDF5 content in JSON. The utilities can be used to convert any HDF5 file to JSON or from a JSON file (using the convention described here to HDF5).

The library is useful for any Python application that needs to translate between HDF5 objects and JSON serializations. In addition to the utilities provided in this repository, the library is used by HDF Server (a RESTful web service for HDF5), and HDF Product Designer (an application for creating product designs).

This respository also include utilities to generate code in Fortran or Python based on a JSON file."

https://github.com/HDFGroup/hdf5-json

h5serv

"A web service that can be used to send and receive HDF5 data. h5serv uses a REST interface to support CRUD (create, read, update, delete) operations on the full spectrum of HDF5 objects including: groups, links, datasets, attributes, and committed data types. As a REST-based service a variety of clients can be developed in JavaScript, Python, C, and other common languages.

https://github.com/HDFGroup/h5serv

http://h5serv.readthedocs.io/en/latest/

h5pyd - A Python client library for the HDF5 REST interface

https://github.com/HDFGroup/h5pyd

HDF Compass

"An experimental viewer program for HDF5 and related formats, designed to complement other more complex applications like HDFView. Strong emphasis is placed on clean minimal design, and maximum extensibility through a plugin system for new formats.

HDF Compass is written in Python, but ships as a native application on Windows, OS X, and Linux, by using PyInstaller to package the app."

https://github.com/HDFGroup/hdf-compass

https://support.hdfgroup.org/projects/compass/

HDFql

"Scientists, data managers, engineers, and students using the data format HDF currently waste a lot of unecessary time managing HDF files. That is because to date, the APIs available for HDF have been highly complex. With HDF becoming increasingly common in the big data arena, a faster and simpler solution is needed.

HDFql stands for "Hierarchical Data Format query language" and is the first high-level language for HDF. Designed to be simple and similar to SQL, HDFql dramatically reduces users' learning effort and time needed to manage HDF files. HDFql can be seen as a clean interface alternative to the C API (which contains more than 400 low-level functions that are far from easy to use!) and to existing wrappers for Java, Python and C#"

http://www.hdfql.com/

OFED

"The OpenFabrics Enterprise Distribution (OFED)/OpenFabrics Software is open-source software for RDMA and kernel bypass applications. OFS is used in business, research and scientific environments that require highly efficient networks, storage connectivity and parallel computing. The software provides high performance computing sites and enterprise data centers with flexibility and investment protection as computing evolves towards applications that require extreme speeds, massive scalability and utility-class reliability.

OFS includes kernel-level drivers, channel-oriented RDMA and send/receive operations, kernel bypasses of the operating system, both kernel and user-level application programming interface (API) and services for parallel message passing (MPI), sockets data exchange (e.g., RDS, SDP), NAS and SAN storage (e.g. iSER, NFS-RDMA, SRP) and file system/database systems.

The network and fabric technologies that provide RDMA performance with OFS include: legacy 10 Gigabit Ethernet, iWARP for Ethernet, RDMA over Converged Ethernet (RoCE), and 10/20/40 Gigabit InfiniBand.

The OFED stack includes software drivers, core kernel-code, middleware, and user-level interfaces. It offers a range of standard protocols, including IPoIB (IP over InfiniBand), SDP, SRP, iSER, RDS and DAPL (the Direct Access Programming Library). It also supports many other protocols, including various MPI implementations, and it supports many file systems, including Lustre and NFS over RDMA."

https://www.openfabrics.org/index.php/openfabrics-software.html

https://en.wikipedia.org/wiki/OpenFabrics_Alliance

https://ofiwg.github.io/libfabric/

MaTEx

"MaTEx is a collection of parallel machine learning and data mining (MLDM) algorithms, targeted for desktops, supercomputers and cloud computing systems. MaTEx provides a handful of widely used algorithms in Clustering, Classification and Association Rule Mining (ARM).

MaTEx primarily provides high performance implementations of Deep Learning algorithms. The current implementations use MPI for inter-node communication and multi-threading/CUDA (cuDNN) for intra-node execution, by using Google TensorFlow as the baseline.

MaTEx also supports K-means, Spectral Clustering algorithms for Clustering, Support Vector Machines, KNN algorithms for Classification, and FP-Growth for Association Rule Mining.

MaTEx uses state-of-the-art programming models such as Message Passing Interface (MPI), CUDA and multi-threading models for targeting massively parallel systems readily available on modern desktops, supercomputers and cloud computing systems.

The required software such as mpich-3.1 is bundled with MaTEx. These package are automatically built, if they are not found on your system.

https://github.com/abhinavvishnu/matex/wiki

ARMCI

"The purpose of the Aggregate Remote Memory Copy (ARMCI) library is to provide a general-purpose, efficient, and widely portable remote memory access (RMA) operations (one-sided communication) optimized for contiguous and noncontiguous (strided, scatter/gather, I/O vector) data transfers. In addition, ARMCI includes a set of atomic and mutual exclusion operations. The development ARMCI is driven by the need to support the global-addres space communication model in context of distributed regular or irregular distributed data structures, communication libraries, and compilers. ARMCI is a standalone system that could be used to support user-level libraries and applications that use MPI or PVM.

ARMCI exploits native network communication interfaces and system resources (such as shared memory) to achieve the best possible performance of the remote memory access/one-sided communication. It exploits high-performance network protocols on clustered systems. Optimized implementations of ARMCI are available for the Portals, Myrinet (GM), Quadrics, Infiniband (using OPENIB and Mellanox verbs API), and Ethernet.

ARMCI is compatible with MPI. However, by design it is impartial to a selection of the message-passing libraries in the user program. In addition to MPI, on some platforms ARMCI was also used with PVM and TCGMSG message-passing libraries.

http://hpc.pnl.gov/armci/index.shtml

https://github.com/jeffhammond/armci-mpi

Global Arrays

"Global Arrays (GA) is a Partitioned Global Address Space (PGAS) programming model. It provides primitives for one-sided communication (Get, Put, Accumulate) and Atomic Operations (read increment). It supports blocking and non-blocking primtives, and supports location consistency.

The Global Arrays toolkit consists of many useful and related pieces.

Communication Runtime for Extreme Scale (ComEx) provides vector and strided interfaces to optimize performance of remote memory copy operations for non-contiguous data.
ChemIO aka Parallel IO (pario) is a package consisting of three independent parallel I/O libraries for high-performance computers. It was designed for computational chemistry; however, the supported abstractions and features are general enough to be of interest to other applications.
- Disk Resident Arrays extend the GA Non-Uniform Memory Access (NUMA) programming model to disk.
- Shared Files to which multiple processors can read and write independently.
- Exclusive Access Files (EAF) per-processor private files.
Memory Allocator (MA) is a local memory manager/allocator with several useful features not available in Fortran or C languages.
Task Scheduling Library (tascel)
TCGMSG is an efficient but limited in functionality (comparing to MPI) message-passing library available on many current (and legacy) systems.
TCGMSG-MPI is a portability layer between TCGMSG and MPI. It is recommended as a transition library from TCGMSG to MPI for existing TCGMSG codes.

http://hpc.pnl.gov/globalarrays/index.shtml

GridFTP

"GridFTP is a high-performance, secure, reliable data transfer protocol optimized for high-bandwidth wide-area networks. The GridFTP protocol is based on FTP, the highly-popular Internet file transfer protocol. We have selected a set of protocol features and extensions defined already in IETF RFCs and added a few additional features to meet requirements from current data grid projects.

The aim of GridFTP is to provide a more reliable and high performance file transfer, for example to enable the transmission of very large files. GridFTP is used extensively within large science projects such as the Large Hadron Collider and by many supercomputer centers and other scientific facilities.

GridFTP also addresses the problem of incompatibility between storage and access systems. Previously, each data provider would make their data available in their own specific way, providing a library of access functions. This made it difficult to obtain data from multiple sources, requiring a different access method for each, and thus dividing the total available data into partitions. GridFTP provides a uniform way of accessing the data, encompassing functions from all the different modes of access, building on and extending the universally accepted FTP standard. FTP was chosen as a basis for it because of its widespread use, and because it has a well defined architecture for extensions to the protocol (which may be dynamically discovered)."

http://toolkit.globus.org/toolkit/docs/latest-stable/gridftp/

https://en.wikipedia.org/wiki/GridFTP

http://www.mcs.anl.gov/~kettimut/tutorials/SC07GridFTPTutorialSlides.pdf

https://hub.docker.com/r/wraithy/gridftp-server/~/dockerfile/

http://globustoolkit.com/alliance/publications/papers/Pipelining.pdf

https://www.mcs.anl.gov/software/data-intensive-software

BDM

Bulk Data Mover (BDM) is a scalable data transfer management tool for GridFTP transfer protocol.

https://sdm.lbl.gov/twiki/bin/view/Software/BDM/WebHome