Vermeulen Andreas Francois. Practical Data Science

pdf file
size 7,57 MB

added by XXL 02/21/2018 22:31
info modified 08/04/2022 19:10

Vermeulen Andreas Francois. Practical Data Science

Apress, 2018. — 821 p. — ISBN: 1484230531.

Learn how to build a data science technology stack and perform good data science with repeatable methods. You will learn how to turn data lakes into business assets.

The data science technology stack demonstrated in Practical Data Science is built from components in general use in the industry. Data scientist Andreas Vermeulen demonstrates in detail how to build and provision a technology stack to yield repeatable results. He shows you how to apply practical methods to extract actionable business knowledge from data lakes consisting of data from a polyglot of data types and dimensions.

Data Science Technology Stack
Rapid Information Factory Ecosystem
Data Science Storage Tools
Data Lake
Data Vaul
Data Warehouse Bus Matrix
Data Science Processing Tools Spark
Mesos
Akka
Cassandra
Kafka
Elastic Search
R
Scala
Python
MQTT (MQ Telemetry Transport)
What’s Next?
Vermeulen-Krennwallner-Hillman-Clark
Windows
Linux
It’s Now Time to Meet Your Customer
Processing Ecosystem
Example Ecosystem
Sample Data

Layered Framework
Definition of Data Science Framework
Cross-Industry Standard Process for Data Mining (CRISP-DM)
Homogeneous Ontology for Recursive Uniform Schema
The Top Layers of a Layered Framewor
Layered Framework for High-Level Data Science and Engineering

Business Layer
Business Layer
Engineering a Practical Business Layer

Utility Layer
Basic Utility Design
Engineering a Practical Utility Layer

Three Management Layers
Operational Management Layer
Audit, Balance, and Control Layer
Balance
Control
Yoke Solution
Cause-and-Effect Analysis System
Functional Layer
Data Science Process

Retrieve Superstep
Data Lakes
Data Swamps
Training the Trainer Model
Understanding the Business Dynamics of the Data Lake
Actionable Business Knowledge from Data Lakes
Engineering a Practical Retrieve Superstep
Connecting to Other Data Sources

Assess Superstep
Assess Superstep
Errors
Analysis of Data
Practical Actions
Engineering a Practical Assess Superstep

Process Superstep
Data Vault
Time-Person-Object-Location-Event Data Vault
Data Science Process
Data Science

Transform Superstep
Transform Superstep
Building a Data Warehouse
Transforming with Data Science
Hypothesis Testing
Overfitting and Underfitting
Precision-Recall
Cross-Validation Test
Univariate Analysis
Bivariate Analysis
Multivariate Analysis
Linear Regression
Logistic Regression
Clustering Techniques
ANOVA
Principal Component Analysis (PCA)
Decision Trees
Support Vector Machines, Networks, Clusters, and Grids
Data Mining
Pattern Recognition
Machine Learning
Bagging Data
Random Forests
Computer Vision (CV)
Natural Language Processing (NLP)
Neural Networks
TensorFlow

Organize and Report Supersteps
Organize Superstep
Report Superstep
Graphics
Pictures
Showing the Difference

Closing Words

Home

Vermeulen Andreas Francois. Practical Data Science

See also

Cady Field. The Data Science Handbook

Cao Longbing. Data Science Thinking

Cooper S. Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees

Goodfellow Ian, Bengio Yoshua, Courville Aaron. Deep Learning Book

Knaflic Cole Nussbaumer. Storytelling with Data: A Data Visualization Guide for Business Professionals

Nisbet R., Elder J., Miner G. Handbook of Statistical Analysis and Data Mining Applications