Table of Contents
Preface xi
Part 1 Dark Data: Their Origins and Consequences
Chapter 1 Dark Data: What We Don't See Shapes Our World 3
The Ghost of Data 3
So You Think You Have All the Data? 12
Nothing Happened, So We Ignored It 17
The Power of Dark Data 22
All around Us 24
Chapter 2 Discovering Dark Data: What We Collect and What We Don't 28
Dark Data on All Sides 28
Data Exhaust, Selection, and Self-Selection 31
From the Few to the Many 43
Experimental Data 56
Beware Human Frailties 67
Chapter 3 Definitions and Dark Data: What Do You Want to Know? 72
Different Definitions and Measuring the Wrong Thing 72
You Can't Measure Everything 80
Screening 90
Selection on the Basis of Past Performance 94
Chapter 4 Unintentional Dark Data: Saying One Thing, Doing Another 98
The Big Picture 98
Summarizing 102
Human Error 103
Instrument Limitations 108
Linking Data Sets 111
Chapter 5 Strategic Dark Data: Gaming, Feedback, and Information Asymmetry 114
Gaming 114
Feedback 122
Information Asymmetry 128
Adverse Selection and Algorithms 130
Chapter 6 Intentional Dark Data: Fraud and Deception 140
Fraud 140
Identity Theft and Internet Fraud 144
Personal Financial Fraud 149
Financial Market Fraud and Insider Trading 153
Insurance Fraud 158
And More 163
Chapter 7 Science and Dark Data: The Nature of Discovery 167
The Nature of Science 167
If Only I'd Known That 172
Tripping over Dark Data 181
Dark Data and the Big Picture 184
Hiding the Facts 199
Retraction 215
Provenance and Trustworthiness: Who Told You That? 217
Part II Illuminating and Using Dark Data
Chapter 8 Dealing with Dark Data: Shining a Light 223
Hope! 223
Linking Observed and Missing Data 224
Identifying the Missing Data Mechanism 233
Working with the Data We Have 236
Going Beyond the Data: What If You Die First? 241
Going Beyond the Data: Imputation 245
Iteration 252
Wrong Number! 256
Chapter 9 Benefiting from Dark Data: Reframing the Question 262
Hiding Data 262
Hiding Data from Ourselves: Randomized Controlled Trials 263
What Might Have Been 265
Replicated Data 269
Imaginary Data: The Bayesian Prior 276
Privacy and Confidentiality Preservation 278
Collecting Data in the Dark 287
Chapter 10 Classifying Dark Data: A Route through the Maze 291
A Taxonomy of Dark Data 291
Illumination 298
Notes 307
Index 319