Deck Chairs and Fiddles: February 2018

Wednesday, February 7, 2018

Web of Things

"Many of the new devices connecting to the Internet are insecure, do not receive software updates to fix vulnerabilities, and raise new privacy questions around the collection, storage, and use of large quantities of extremely personal data.

Additionally, most IoT devices today use proprietary vertical technology stacks which are built around a central point of control and which don’t always talk to each other. When they do talk to each other it requires per-vendor integrations to connect those systems together. There are efforts to create standards, but the landscape is extremely complex and there’s still not yet a single dominant model or market leader.

The “Web of Things” (WoT) is an effort to take the lessons learned from the World Wide Web and apply them to IoT. It’s about creating a decentralized Internet of Things by giving Things URLs on the web to make them linkable and discoverable, and defining a standard data model and APIs to make them interoperable.

The Web of Things is not just another vertical IoT technology stack to compete with existing platforms. It is intended as a unifying horizontal application layer to bridge together multiple underlying IoT protocols.

Rather than start from scratch, the Web of Things is built on existing, proven web standards like REST, HTTP, JSON, WebSockets and TLS (Transport Layer Security). The Web of Things will also require new web standards. In particular, we think there is a need for a Web Thing Description format to describe things, a REST style Web Thing API to interact with them, and possibly a new generation of HTTP better optimised for IoT use cases and use by resource constrained devices.

The Web of Things is not just a Mozilla Initiative, there is already a well established Web of Things community and related standardization efforts at the IETF, W3C, OCF and OGC. Mozilla plans to be a participant in this community to help define new web standards and promote best practices around privacy, security and interoperability."

https://iot.mozilla.org/

https://hacks.mozilla.org/2017/06/building-the-web-of-things/

https://iot.mozilla.org/gateway/

https://iot.mozilla.org/wot/

Jug

"It is a light-weight, Python only, distributed computing framework.

Jug allows you to write code that is broken up into tasks and run different tasks on different processors. You can also think of it as a lightweight map-reduce type of system, although it's a bit more flexible (and less scalable).

It has two storage backends: One uses the filesystem to communicate between processes and works correctly over NFS, so you can coordinate processes on different machines. The other uses a redis database and all it needs is for different processes to be able to communicate with a common redis server.

Jug is a pure Python implementation and should work on any platform. Python 3 is also supported (at least 3.3 and greater).

http://jug.readthedocs.io/en/latest/

https://github.com/luispedro/jug

Saturday, February 3, 2018

DML

"DataMover-Lite (DML) is a simple file transfer tool with graphical user interface which supports multi-protocol data movement.DML is available in both webstart and standalone version.

Currently, DML supports http, https, ftp, gridftp, lahfs and scp.

For GridFTP, DML also supports directory browsing and transferring."

https://sdm.lbl.gov/twiki/bin/view/Software/DML

https://sdm.lbl.gov/

BDM

"Bulk Data Movement is an ESG (http://www.earthsystemgrid.org

) project.
Bulk Data Mover (BDM) is a scalable data transfer management tool for GridFTP transfer protocol. The goal is to manage as much as 1+ PB with millions of files transfers reliably."

https://sdm.lbl.gov/twiki/bin/view/Software/BDM/WebHome

https://wiki.ucar.edu/display/esgcet/Bulk+Data+Movement

ArrayUDF

"This package contains the initial prototype of ArrayUDF from the project SDS (Scientific Data Services) framework.

User-Defined Functions (UDF) allow application programmers to specify analysis operations on data, while leaving the data management tasks to the system. This general approach enables numerous custom analysis functions and is at the heart of the modern Big Data systems. Even though the UDF mechanism can theoretically support arbitrary operations, a wide variety of common operations -- such as computing the moving average of a time series, the vorticity of a fluid flow, etc., -- are hard to express and slow to execute. Since these operations are traditionally performed on multi-dimensional arrays, we propose to extend the expressiveness of structural locality for supporting UDF operations on arrays. We further propose an in situ UDF mechanism, called ArrayUDF, to implement the structural locality. ArrayUDF allows users to define computations on adjacent array cells without the use of join operations and executes the UDF directly on arrays stored in data files without requiring to load their content into a data management system. Additionally, we present a thorough theoretical analysis of the data access cost to exploit the structural locality, which enables ArrayUDF to automatically select the best array partitioning strategy for a given UDF operation. In a series of performance evaluations on large scientific datasets, we have observed that -- using the generic UDF interface -- ArrayUDF consistently outperforms Spark, SciDB, and RasDaMan."

https://bitbucket.org/arrayudf/arrayudf

http://crd.lbl.gov/departments/data-science-and-technology/sdm/

https://sdm.lbl.gov/

https://www.slideshare.net/Goon83/arrayudf-userdefined-scientific-data-analysis-on-arrays

https://dl.acm.org/citation.cfm?id=3078599

https://cs.lbl.gov/news-media/news/2017/berkeley-labs-arrayudf-tool-turns-large-scale-scientific-array-data-analysis-into-a-cakewalk/

CIL

"CIL (C Intermediate Language) is a high-level representation along with a set of tools that permit easy analysis and source-to-source transformation of C programs.

CIL is both lower-level than abstract-syntax trees, by clarifying ambiguous constructs and removing redundant ones, and also higher-level than typical intermediate languages designed for compilation, by maintaining types and a close relationship with the source program. The main advantage of CIL is that it compiles all valid C programs into a few core constructs with a very clean semantics. Also CIL has a syntax-directed type system that makes it easy to analyze and manipulate C programs. Furthermore, the CIL front-end is able to process not only ANSI-C programs but also those using Microsoft C or GNU C extensions. If you do not use CIL and want instead to use just a C parser and analyze programs expressed as abstract-syntax trees then your analysis will have to handle a lot of ugly corners of the language (let alone the fact that parsing C itself is not a trivial task).

In essence, CIL is a highly-structured, “clean” subset of C. CIL features a reduced number of syntactic and conceptual forms. For example, all looping constructs are reduced to a single form, all function bodies are given explicit return statements, syntactic sugar like "->" is eliminated and function arguments with array types become pointers. (For an extensive list of how CIL simplifies C programs, see Section 4.) This reduces the number of cases that must be considered when manipulating a C program. CIL also separates type declarations from code and flattens scopes within function bodies. This structures the program in a manner more amenable to rapid analysis and transformation. CIL computes the types of all program expressions, and makes all type promotions and casts explicit. CIL supports all GCC and MSVC extensions except for nested functions and complex numbers. Finally, CIL organizes C’s imperative features into expressions, instructions and statements based on the presence and absence of side-effects and control-flow. Every statement can be annotated with successor and predecessor information. Thus CIL provides an integrated program representation that can be used with routines that require an AST (e.g. type-based analyses and pretty-printers), as well as with routines that require a CFG (e.g., dataflow analyses). CIL also supports even lower-level representations (e.g., three-address code).

https://people.eecs.berkeley.edu/~necula/cil/

http://perso.univ-perp.fr/guillaume.revy/index.php?page=debugging

https://extremescaleresearch.labworks.org/projects/corvette

VTK-m

"VTK-m is an open source C++ header toolkit that provides the ability to do fine-grained concurrency on scientific visualization data structures. VTK-m has few required dependencies but supports several optional components.

One of the biggest recent changes in high-performance computing is the increasing use of accelerators. Accelerators contain processing cores that independently are inferior to a core in a typical CPU, but these cores are replicated and grouped such that their aggregate execution provides a very high computation rate at a much lower power. Current and future CPU processors also require much more explicit parallelism. Each successive version of the hardware packs more cores into each processor, and technologies like hyperthreading and vector operations require even more parallel processing to leverage each core’s full potential.

VTK-m is a toolkit of scientific visualization algorithms for emerging processor architectures. VTK-m supports the fine-grained concurrency for data analysis and visualization algorithms required to drive extreme scale computing by providing abstract models for data and execution that can be applied to a variety of algorithms across many different processor architectures."

http://m.vtk.org/index.php/Main_Page

Pages

Wednesday, February 7, 2018

Saturday, February 3, 2018