Sign up
Forgot password?
FAQ: Login

Shalizi Cosma Rohilla. Advanced Data Analysis from an Elementary Point of View

  • pdf file
  • size 9,33 MB
  • added by
  • info modified
Shalizi Cosma Rohilla. Advanced Data Analysis from an Elementary Point of View
Unpublished, 2017. — 860 p.
This book began as the notes for 36-402, Advanced Data Analysis, at Carnegie Mellon University. This is the methodological capstone of the core statistics sequence taken by our undergraduate majors (usually in their third year), and by undergraduate students from a range of other departments. By this point, students have taken classes in introductory statistics and data analysis, probability theory, mathematical statistics, and modern linear regression (“401”). This book does not presume that you once learned but have forgotten the material from the pre-requisites; it presumes that you know that material and can go beyond it. The book also presumes a firm grasp on linear algebra and multivariable calculus, and that you can read and write simple functions in R. If you are lacking in any of these areas, now would be an excellent time to leave.
ADA is a class in statistical methodology: its aim is to get students to understand something of the range of modern methods of data analysis, and of the considerations which go into choosing the right method for the job at hand (rather than distorting the problem to fit the methods you happen to know). Statistical theory is kept to a minimum, and largely introduced as needed. Since ADA is also a class in data analysis, there are a lot of assignments in which large, real data sets are analyzed with the new methods.
There is no way to cover every important topic for data analysis in just a semester. Much of what’s not here — sampling theory and survey methods, experimental design, advanced multivariate methods, hierarchical models, the intricacies of categorical data, graphics, data mining — gets covered by our other undergraduate classes.
Other important areas, like dependent data, inverse problems, advanced model selection or robust estimation, have to wait for graduate school.
The mathematical level of these notes is deliberately low; nothing should be beyond a competent third-year undergraduate. But every subject covered here can be profitably studied using vastly more sophisticated techniques; that’s why this is advanced data analysis from an elementary point of view. If reading these pages inspires anyone to study the same material from an advanced point of view, I will consider my troubles to have been amply repaid.
Regression and Its Generalizations
Regression Basics
The Truth about Linear Regression
Model Evaluation
Smoothing in Regression
Simulation
The Bootstrap
Weighting and Variance
Splines
Additive Models
Testing Regression Specifications
Logistic Regression
GLMs and GAMs
Trees
Distributions and Latent Structure
Density Estimation
Relative Distributions and Smooth Tests
Principal Components Analysis
Factor Models
Nonlinear Dimensionality Reduction
Mixture Models
Graphical Models
Causal Inference
Graphical Causal Models
Identifying Causal Effects
Experimental Causal Inference
Estimating Causal Effects
Discovering Causal Structure
Dependent Data
Time Series
Spatial and Network Data
Simulation-Based Inference
Data-Analysis Problem Sets
Linear Algebra Reminders
Big O and Little o Notation
Taylor Expansions
Multivariate Distributions
Algebra with Expectations and Variances
Propagation of Error
Optimization
χ2 and Likelihood Ratios
Proof of the Gauss-Markov Theorem
Rudimentary Graph Theory
Information Theory
More about Hypothesis Testing
Programming
Generating Random Variables
  • Sign up or login using form at top of the page to download this file.
  • Sign up
Up