2nd Edition. — De Gruyter, 2022. — 422 p. — ISBN: 978-3-11-079588-2.
The book aims to help students become data scientists. Since this requires a series of courses over a considerable period, the book intends to accompany students from the beginning to an advanced understanding of the knowledge and skills that define a modern data scientist.
The book presents a comprehensive overview of the mathematical foundations of the programming language R and its applications to data science.
Relationships between mathematical subjects and data science.
Structure of the book.
Part one.
Part two.
Part three.
Our motivation for writing this book.
Examples and listings.
How to use this book.Introduction to ROverview of programming paradigms.
Imperative programming.
Functional programming.
Object-oriented programming.
Logic programming.
Other programming paradigms.
Compiler versus interpreter languages.
The semantics of programming languages.
Further reading.
Setting up and installing the R program.
Installing R on Linux.
Installing R on MAC OS X.
Installing R on Windows.
Using R.
Installation of R packages.
Installing packages from CRAN.
Installing packages from Bioconductor.
Installing packages from GitHub.
Installing packages manually.
Activation of a package in an R session.
Introduction to programming in R.
Basic elements of R.
Basic programming.
Data structures.
Handling character strings.
Sorting vectors.
Writing functions.
Writing and reading data.
Useful commands.
Practical usage of R.
Creating R packages.
Requirements.
R code optimization.
S3, S4, and RC object-oriented systems.
Creating an R package based on the S3 class system.
Checking the package.
Installation and usage of the package.
Loading and using a package.
Graphics in RBasic plotting functions.
Plot.
Histograms.
Bar plots.
Pie charts.
Dot plots.
Strip and rug plots.
Density plots.
Combining a scatterplot with histograms: the layout function.
Three-dimensional plots.
Contour and image plots.
Advanced plotting functions: ggplot2.
qplot().
ggplot().
Visualization of networks.
igraph.
NetBioV.
Mathematical basics of data scienceMathematics as a language for science.
Numbers and number operations.
Sets and set operations.
Boolean logic.
Sum, product, and Binomial coefficients.
Further symbols.
Importance of definitions and theorems.
Computability and complexity.
A brief history of computer science.
Turing machines.
Computability.
The complexity of algorithms.
Linear algebra.
Vectors and matrices.
Operations with matrices.
Special matrices.
Trace and determinant of a matrix.
Subspaces, dimensions, and rank of a matrix.
Eigenvalues and eigenvectors of a matrix.
Matrix norms.
Matrix factorization.
Systems of linear equations.
Exercises.
Analysis.
Limiting values.
Differentiation.
Extrema of a function.
Taylor series expansion.
Integrals.
Polynomial interpolation.
Root finding methods.
Further reading.
Exercises.
Differential equations.
Ordinary differential equations (ODE).
Partial differential equations (PDE).
Exercises.
Dynamical systems.
Population growth models.
The Lotka – Volterra or predator-prey system.
Cellular automata.
Random Boolean networks.
Case studies of dynamical system models with complex attractors.
Fractals.
Exercises.
Graph theory and network analysis.
Basic types of networks.
Quantitative network measures.
Graph algorithms.
Network models and graph classes.
Further reading.
Exercises.
Probability theory.
Events and sample space.
Set theory.
Definition of probability.
Conditional probability.
Conditional probability and independence.
Random variables and their distribution function.
Discrete and continuous distributions.
Expectation values and moments.
Bivariate distributions.
Multivariate distributions.
Important discrete distributions.
Important continuous distributions.
Bayes’ theorem.
Information theory.
Law of large numbers.
Central limit theorem.
Concentration inequalities.
Further reading.
Exercises.
Optimization.
Formulation of an optimization problem.
Unconstrained optimization problems.
Constrained optimization problems.
Some applications in statistical machine learning.
Further reading.
Exercises.
[b]Bibliography.