Table of Contents
Preface vii
1 What Is a Model? 1
Algorithms Versus Models: What's the Difference? 6
A Note on Terminology 7
Modeling Limitations 8
Statistics and Computation in Modeling 10
Data Training 11
Cross-Validation 12
Why Use R? 13
The Good 13
R and Machine Learning 15
The Bad 16
Summary 17
2 Supervised and Unsupervised Machine Learning 19
Supervised Models 20
Regression 20
Training and Testing of Data 22
Classification 24
Logistic Regression 24
Supervised Clustering Methods 26
Mixed Methods 31
Tree-Based Models 31
Random Forests 34
Neural Networks 35
Support Vector Machines 39
Unsupervised Learning 40
Unsupervised Clustering Methods 41
Summary 43
3 Sampling Statistics and Model Training in R 45
Bias 46
Sampling in R 51
Training and Testing 54
Roles of Training and Test Sets 55
Why Make a Test Set? 55
Training and Test Sets: Regression Modeling 55
Training and Test Sets: Classification Modeling 63
Cross-Validation 67
k-Fold Cross-Validation 67
Summary 69
4 Regression in a Nutshell 71
Linear Regression 72
Multivariate Regression 74
Regularization 78
Polynomial Regression 81
Goodness of Fit with Data-The Perils of Overfitting 87
Root-Mean-Square Error 87
Model Simplicity and Goodness of Fit 89
Logistic Regression 91
The Motivation for Classification 92
The Decision Boundary 93
The Sigmoid Function 94
Binary Classification 98
Multiclass Classification 101
Logistic Regression with Caret 105
Summary 106
Linear Regression 106
Logistic Regression 107
5 Neural Networks in a Nutshell 109
Single-Layer Neural Networks 109
Building a Simple Neural Network by Using R 111
Multiple Compute Outputs 113
Hidden Compute Nodes 114
Multilayer Neural Networks 120
Neural Networks for Regression 125
Neural Networks for Classification 130
Neural Networks with caret 131
Regression 131
Classification 132
Summary 133
6 Tree-Based Methods 135
A Simple Tree Model 135
Deciding How to Split Trees 138
Tree Entropy and Information Gain 139
Pros and Cons of Decision Trees 140
Tree Overfitting 141
Pruning Trees 145
Decision Trees for Regression 151
Decision Trees for Classification 151
Conditional Inference Trees 152
Conditional Inference Tree Regression 154
Conditional Inference Tree Classification 155
Random Forests 155
Random Forest Regression 156
Random Forest Classification 157
Summary 158
7 Other Advanced Methods 159
Naive Bayes Classification 159
Bayesian Statistics in a Nutshell 159
Application of Naive Bayes 161
Principal Component Analysis 163
Linear Discriminant Analysis 169
Support Vector Machines 175
k-Nearest Neighbors 179
Regression Using kNN 181
Classification Using kNN 182
Summary 184
8 Machine Learning with the caret Package 185
The Titanic Dataset 186
Data Wrangling 187
Caret Unleashed 188
Imputation 188
Data Splitting 190
Caret Under the Hood 191
Model Training 194
Comparing Multiple caret Models 197
Summary 199
A Encyclopedia of Machine Learning Models in caret 201
Index 209