Chapman and Hall/CRC, Taylor & Francis Group, 2017. — 377 p. — (Statistics in the social and behavioral sciences series). — ISBN: 1498751407, 9781498751407
Foster I., Ghani R., Jarmin R.S., Kreuter F., Lane J.
Both Traditional Students and Working Professionals Acquire the Skills to Analyze Social Problems.
Big Data and Social Science: A Practical Guide to Methods and Tools shows how to apply data science to real-world problems in both research and the practice. The book provides practical guidance on combining methods and tools from computer science, statistics, and social science. This concrete approach is illustrated throughout using an important national problem, the quantitative study of innovation.
The text draws on the expertise of prominent leaders in statistics, the social sciences, data science, and computer science to teach students how to use modern social science research principles as well as the best analytical and computational tools. It uses a real-world challenge to introduce how these tools are used to identify and capture appropriate data, apply data science models and tools to that data, and recognize and respond to data errors and limitations.
For more information, including sample chapters and news, please visit the author's website.
Why this book?
Defining big data and its value
Social science, inference, and big data
Social science, data quality, and big data
New tools for new data
The book’s “use case”
The structure of the book
Resources
Capture and CurationCameron NeylonWorking with Web Data and APIsScraping information from the web
New data in the research enterprise
A functional view
Programming against an API
Using the ORCID API via a wrapper
Quality, scope, and management
Integrating data from multiple sources
Working with the graph of relationships
Bringing it together: Tracking pathways to impact
Resources
Acknowledgements and copyright
Joshua Tokle and Stefan Bender
Record LinkageMotivation
Introduction to record linkage
Preprocessing data for record linkage
Indexing and blocking
Matching
Classification
Record linkage and data protection
Resources
Ian Foster and Pascal HeusDatabasesDBMS: When and why
Relational DBMSs
Linking DBMSs and other tools
NoSQL databases
Spatial databases
Which database to use?
Resources
Huy Vo and Claudio SilvaProgramming with Big DataThe MapReduce programming model
Apache Hadoop MapReduce
Apache Spark
Resources
Modeling and AnalysisRayid Ghani and Malte SchierholzMachine LearningWhat is machine learning?
The machine learning process
Problem formulation: Mapping a problem to machine learning methods
Methods
Evaluation
Practical tips
How can social scientists benefit from machine learning?
Advanced topics
Resources
Evgeny Klochikhin and Jordan Boyd-GraberText AnalysisUnderstanding what people write
How to analyze text
Approaches and applications
Evaluation
Text analysis tools
Resources
Jason Owen-SmithNetworks: The BasicsNetwork data
Network measures
Comparing collaboration networks
Resources
Inference and Ethics
M. Adil Yalcın and Catherine PlaisantInformation VisualizationDeveloping effective visualizations
A data-by-tasks taxonomy
Challenges
Resources
Paul P. BiemerErrors and InferenceThe total error paradigm
Illustrations of errors in big data
Errors in big data analytics
Some methods for mitigating, detecting, and compensating for errors
Resources
Stefan Bender, Ron Jarmin, Frauke Kreuter, and Julia LanePrivacy and ConfidentialityWhy is access important?
Providing access
The new challenges
Legal and ethical framework
Resources
Jonathan Scott Morgan, Christina Jones, and Ahmad EmadWorkbooksEnvironment
Workbook detailsResources