"This package contains the initial prototype of ArrayUDF from the project SDS (Scientific Data Services) framework.
User-Defined Functions (UDF) allow application programmers to specify
analysis operations on data, while leaving the data management tasks to
the system. This general approach enables numerous custom analysis
functions and is at the heart of the modern Big Data systems. Even
though the UDF mechanism can theoretically support arbitrary operations,
a wide variety of common operations -- such as computing the moving
average of a time series, the vorticity of a fluid flow, etc., -- are
hard to express and slow to execute. Since these operations are
traditionally performed on multi-dimensional arrays, we propose to
extend the expressiveness of structural locality for supporting UDF
operations on arrays. We further propose an in situ UDF mechanism,
called ArrayUDF, to implement the structural locality. ArrayUDF allows
users to define computations on adjacent array cells without the use of
join operations and executes the UDF directly on arrays stored in data
files without requiring to load their content into a data management
system. Additionally, we present a thorough theoretical analysis of the
data access cost to exploit the structural locality, which enables
ArrayUDF to automatically select the best array partitioning strategy
for a given UDF operation. In a series of performance evaluations on
large scientific datasets, we have observed that -- using the generic
UDF interface -- ArrayUDF consistently outperforms Spark, SciDB, and
RasDaMan."
https://bitbucket.org/arrayudf/arrayudf
http://crd.lbl.gov/departments/data-science-and-technology/sdm/
https://sdm.lbl.gov/
https://www.slideshare.net/Goon83/arrayudf-userdefined-scientific-data-analysis-on-arrays
https://dl.acm.org/citation.cfm?id=3078599
https://cs.lbl.gov/news-media/news/2017/berkeley-labs-arrayudf-tool-turns-large-scale-scientific-array-data-analysis-into-a-cakewalk/
No comments:
Post a Comment