Hopcroft J., Kannan R. Foundations of Data Science

pdf file
size 1,98 MB

added by Masherov 04/22/2015 23:58
info modified 04/24/2015 17:27

Hopcroft J., Kannan R. Foundations of Data Science

N.-Y.: draft edition, 2014. — 419 p.

While traditional areas of computer science are still important and highly skilled individuals are needed in these areas, the majority of researchers will be involved with using computers to understand and make usable massive data arising in applications, not just how to make computers useful on speci c well-defi ned problems. With this in mind we have written this book to cover the theory likely to be useful in the next 40 years, just as automata theory, algorithms and related topics gave students an advantage in the last 40 years. One of the major changes is the switch from discrete mathematics to more of an emphasis on probability, statistics, and numerical methods.
Early drafts of the book have been used for both undergraduate and graduate courses.
Background material needed for an undergraduate course has been put in the appendix.
For this reason, the appendix has homework problems.
This book starts with the treatment of high dimensional geometry. Modern data in diverse fi elds such as Information Processing, Search, Machine Learning, etc., is often represented advantageously as vectors with a large number of components. This is so even in cases when the vector representation is not the natural first choice. Our intuition from two or three dimensional space can be surprisingly off the mark when it comes to high dimensional space. Chapter 2 works out the fundamentals needed to understand the di fferences. The emphasis of the chapter, as well as the book in general, is to get across the mathematical foundations rather than dwell on particular applications that are only briefly described.
The mathematical areas most relevant to dealing with high-dimensional data are matrix algebra and algorithms. We focus on singular value decomposition, a central tool in this area. Chapter 4 gives a from- first-principles description of this. Applications of singular value decomposition include principal component analysis, a widely used technique which we touch upon, as well as modern applications to statistical mixtures of probability densities, discrete optimization, etc., which are described in more detail.

High-Dimensional Space
Best-Fit Subspaces and Singular Value Decomposition (SVD)
Random Graphs
Random Walks and Markov Chains
Learning and VC-dimension
Algorithms for Massive Data Problems
Clustering
Topic Models, Hidden Markov Process, Graphical Models, and Belief Propagation
Other Topics

Hopcroft J., Kannan R. Foundations of Data Science

See also

Chen L.M., Su Z., Jiang B. Mathematical Problems in Data Science: Theoretical and Practical Methods