National Academies Press, 2013. — 190 p. — ISBN: 0309287782, 9780309287784
From Facebook to Google searches to bookmarking a webpage in our browsers, today's society has become one with an enormous amount of data. Some internet-based companies such as Yahoo! are even storing exabytes (10 to the 18 bytes) of data. Like these companies and the rest of the world, scientific communities are also generating large amounts of data - mostly terabytes and in some cases near petabytes-from experiments, observations, and numerical simulation. However, the scientific community, along with defense enterprise, has been a leader in generating and using large data sets for many years. The issue that arises with this new type of large data is how to handle it - this includes sharing the data, enabling data security, working with different data formats and structures, dealing with the highly distributed data sources, and more. Frontiers in Massive Data Analysis presents the Committee on the Analysis of Massive Data's work to make sense of the current state of data analysis for mining of massive sets of data, to identify gaps in the current practice and to develop methods to fill these gaps. The committee thus examines the frontiers of research that is enabling the analysis of massive data which includes data representation and methods for including humans in the data-analysis loop. The report includes the committee's recommendations, details concerning types of data that build into massive data, and information on the seven computational giants of massive data analysis.
Massive Data in Science, Technology, Commerce, National Defense, Telecommunications, and other Endeavors
Scaling the Infrastructure for Data Management
Temporal Data and Real-Time Algorithms
Large-scale Data Representations
Resources, Trade-Offs, and Limitations
Building Models from Massive Data
Sampling and Massive data
Human Interaction with Data
The Seven Computational Giants of Massive Data Analysis
Acronyms
Biographical Sketches of Committee Members