Second Edition — O’Reilly, April 2022 - 462 p. — ISBN: 978-1-098-11895-2.
Implementing End-to-End Real-Time Data Pipelines: From Ingest to Machine LearningLearn how easy it is to apply sophisticated statistical and machine learning methods to real-world problems when you build using Google Cloud Platform (GCP). This hands-on guide shows data engineers and data scientists how to implement an end-to-end data pipeline, using statistical and machine learning methods and tools on GCP.
Through the course of this updated second edition, you'll work through a sample business decision by employing a variety of data science approaches. Follow along by implementing these statistical and machine learning solutions in your project on GCP, and discover how this platform provides a transformative and more collaborative way of doing data science.
You'll learn how to:Employ best practices in building highly scalable data and ML pipelines on Google Cloud.
Automate and schedule data ingest using Cloud Run.
Create and populate a dashboard in Data Studio.
Build a real-time analytics pipeline using Pub/Sub, Dataflow, and BigQuery.
Conduct interactive data exploration with BigQuery.
Create a Bayesian model with Spark on Cloud Dataproc.
Forecast time series and do anomaly detection with BigQuery ML.
Aggregate within time windows with Dataflow.
Train explainable machine learning models with Vertex AI Operationalize ML with Vertex AI Pipelines.
Who This Book Is ForIf you use computers to work with data, this book is for you. You might go by the title of a data analyst, database administrator, data engineer, data scientist, or systems programmer today. Although your role might be narrower today (perhaps you do only data analysis, or only model building, or only DevOps), you want to stretch your wings a bit — you want to learn how to create data science models as well as how to implement them at scale in production systems.