Timbers T., Campbell T., Lee M. Data Science: A First Introduction

pdf file
size 53,00 MB

added by morozov_97 02/04/2024 15:40

Timbers T., Campbell T., Lee M. Data Science: A First Introduction

CRC Press, 2022. – 443 p. – (Data Science Series). – ISBN: 978-0-367-52468-5.

Data Science: A First Introduction focuses on using the R programming language in Jupyter Notebooks to perform data manipulation and cleaning, create effective visualizations, and extract insights from data using classification, regression, clustering, and inference.

The text emphasizes workflows that are clear, reproducible, and shareable, and includes coverage of the basics of version control. All source code is available online, demonstrating the use of good reproducible project workflows.

Based on educational research and active learning principles, the book uses a modern approach to R and includes accompanying auto-graded Jupyter worksheets for interactive, self-directed learning. The book will leave readers well-prepared for Data Science projects. The use of Jupyter notebooks for exercises immediately places the student in an environment that encourages auditability and reproducibility of analyses. The integration of Git and GitHub into the course is a key tool for teaching about collaboration and community, key concepts that are critical to Data Science.

You will spend the first four chapters learning how to use R to load, clean, wrangle (i.e., restructure the data into a usable format), and visualize data while answering descriptive and exploratory data analysis questions. In the next six chapters, you will learn how to answer predictive, exploratory, and inferential data analysis questions with common methods in data science, including classification, regression, clustering, and estimation. In the final chapters (11 – 13), you will learn how to combine R code, formatted text, and images in a single coherent document with Jupyter, use version control for collaboration, and install and configure the software needed for data science on your computer. If you are reading this book as part of a course that you are taking, the instructor may have set up all of these tools already for you; in this case, you can continue through the book reading the chapters in order. But if you are reading this independently, you may want to jump to these last three chapters early before going on to make sure your computer is set up in such a way that you can try out the example code that we include throughout the book.

The book is designed for learners from all disciplines with minimal prior knowledge of mathematics and programming. The authors have honed the material through years of experience teaching thousands of undergraduates in the University of British Columbia’s DSCI100: Introduction to Data Science course.

R and the Tidyverse.
Reading in data locally and from the web.
Cleaning and wrangling data.
Effective data visualization.
Classification I: training & predicting.
Classification II: evaluation & tuning.
Regression I: K-nearest neighbors.
Regression II: linear regression.
Clustering.
Statistical inference.
Combining code and text with Jupyter.
Collaboration with version control.
Setting up your computer.