Python for Data Science: A Hands-On Introduction

Python for Data Science: A Hands-On Introduction

by Yuli Vasiliev
Python for Data Science: A Hands-On Introduction

Python for Data Science: A Hands-On Introduction

by Yuli Vasiliev

Paperback

$59.99 
  • SHIP THIS ITEM
    Qualifies for Free Shipping
  • PICK UP IN STORE
    Check Availability at Nearby Stores

Related collections and offers


Overview

A hands-on, real-world introduction to data analysis with the Python programming language, loaded with wide-ranging examples.

Python is an ideal choice for accessing, manipulating, and gaining insights from data of all kinds. Python for Data Science introduces you to the Pythonic world of data analysis with a learn-by-doing approach rooted in practical examples and hands-on activities. You’ll learn how to write Python code to obtain, transform, and analyze data, practicing state-of-the-art data processing techniques for use cases in business management, marketing, and decision support.

You will discover Python’s rich set of built-in data structures for basic operations, as well as its robust ecosystem of open-source libraries for data science, including NumPy, pandas, scikit-learn, matplotlib, and more. Examples show how to load data in various formats, how to streamline, group, and aggregate data sets, and how to create charts, maps, and other visualizations. Later chapters go in-depth with demonstrations of real-world data applications, including using location data to power a taxi service, market basket analysis to identify items commonly purchased together, and machine learning to predict stock prices.

Product Details

ISBN-13: 9781718502208
Publisher: No Starch Press
Publication date: 08/02/2022
Pages: 240
Sales rank: 622,990
Product dimensions: 7.00(w) x 9.10(h) x 0.80(d)

About the Author

Yuli Vasiliev is a programmer, freelance writer, and consultant, who has been working with databases for more than two decades. He specializes in open-source development, and is experienced in building data structures and models, as well as designing and implementing database backends for various applications using Oracle technologies, MySQL, and natural language processing. Vasiliev is the author of Natural Language Processing with spaCy (No Starch Press.

Table of Contents

Introduction xv

Using Python for Data Science xvi

Who Should Read This Book? xvi

What's in the Book? xvii

1 The Basics of Data 1

Categories of Data 2

Unstructured Data 2

Structured Data 2

Semi structured Data 4

Time Series Data 5

Sources of Data 6

APIs 7

Web Pages 7

Databases 8

Files 9

The Data Processing Pipeline 9

Acquisition 10

Cleansing 10

Transformation 11

Analysis 11

Storage 12

The Pythonic Way 13

Summary 13

2 Python Data Structures 15

Lists 16

Creating a List 16

Using Common List Object Methods 16

Using Slice Notation 18

Using a List as a Queue 19

Using a List as a Stack 20

Using Lists and Stacks for Natural Language Processing 21

Making Improvements with List Comprehensions 23

Tuples 27

A List of Tuples 27

Immutability 28

Dictionaries 28

A List of Dictionaries 29

Adding to a Dictionary with setdefault() 29

Loading JSON into a Dictionary 31

Sets 32

Removing Duplicates From Sequences 32

Performing Common Set Operations 33

Exercise #1 Improved Photo Tag Analysis 34

Summary 35

3 Python Data Science Libraries 37

NumPy 37

Installing NumPy 38

Creating a NumPy Array 38

Performing Element-Wise Operations 39

Using NumPy Statistical Functions 39

Exercise #2 Using NumPy Statistical Functions 40

Pandas 40

Pandas Installation 41

Pandas Series 41

Exercise #3 Combining Three Series 43

Pandas DataFrames 43

Exercise #4 Using Different Joins 50

Scikit-learn 52

Installing scikit-learn 53

Obtaining a Sample Dataset 53

Loading the Sample Dataset into a pandas DataFrame 54

Splitting the Sample Dataset into a Training Set and a Test Set 54

Transforming Text into Numerical Feature Vectors 54

Training and Evaluating the Model 55

Making Predictions on New Data 56

Summary 56

4 Accessing Data from Files and APIs 57

Importing Data Using Python's open() Function 57

Text Files 58

Tabular Data Files 59

Exercise #5 Opening JSON Files 61

Binary Files 62

Exporting Data to Files 62

Accessing Remote Files and APIs 63

How HTTP Requests Work 64

The urllib3 Library 65

The Requests Library 67

Exercise #6 Accessing an API with Requests 67

Moving Data to and from a DataFrame 68

Importing Nested JSON Structures 68

Converting a DataFrame to JSON r 69

Exercise #7 Manipulating Complex JSON Structures 70

Loading Online Data into a DataFrame with pandas-data reader 71

Summary 72

5 Working with Databases 73

Relational Databases 74

Understanding SQL Statements 75

Getting Started with MySQL 75

Defining the Database Structure 76

Inserting Data into the Database 79

Querying Database Data 80

Exercise #8 Performing a One-to-Mony Join 82

Using Database Analytics Tools 82

NoSQL Databases 88

Key-Value Stores 89

Document-Oriented Databases 90

Exercise #9 Inserting and Querying Multiple Documents 92

Summary 93

6 Aggregating Data 95

Data to Aggregate 96

Combining DataFrames 98

Grouping and Aggregating the Data 100

Viewing Specific Aggregations by MultiIndex 101

Slicing a Range of Aggregated Values 103

Slicing Within Aggregation Levels 103

Adding a Grand Total 104

Adding Subtotals 105

Exercise #10 Excluding Total Rows from the DataFrame 106

Selecting All Rows in a Group 106

Summary 107

7 Combining Datasets 109

Combining Built-in Data Structures 110

Combining Lists and Tuples, with + 110

Combining Dictionaries with ** 111

Combining Corresponding Rows from Two Structures 112

Implementing Different Types of Joins for Lists 114

Concatenating NumPy Arrays 116

Exercise #11 Adding New Rows/Columns to a NumPy Array 117

Combining pandas Data Structures 117

Concatenating DataFrames 118

Joining Two DataFrames 122

Summary 126

8 Creating Visualizations 127

Common Visualizations 128

Line Graphs 128

Bar Graphs 129

Pie Charts 130

Histograms 130

Plotting with Matplotlib 131

Installing Matplotlib 131

Using matplotlib.pyplot 131

Working with Figure and Axes Objects 133

Exercise #12 Combining Bins into an "Other" Slice 136

Using Other Libraries with Matplotlib 137

Plotting pandas Data 137

Plotting Geospatial Data with Cartopy 139

Exerctse #13 Drawing a Map with Cartopy and Matplotlib 143

Summary 143

9 Analyzing Location Data 145

Obtaining Location Data 146

Turning a Human-Readable Address into Geo Coordinates 146

Getting the Geo Coordinates of a Moving Object 147

Spatial Data Analysis with geopy and Shapely 150

Finding the Closest Object 150

Finding Objects in a Certain Area 152

Exercise #14 Defining Two or More Polygons 154

Combining Both Approaches 154

Exercise #15 Further Improving the Pick-Up Algorithm 156

Combining Spatial and Nonspatial Data 156

Deriving Nonspatial Attributes 156

Exercise #16 Filtering Data with a List Comprehension 158

Joining Spatial and Nonspatial Datasets 158

Summary 159

10 Analyzing Time Series Data 161

Regular vs. Irregular Time Series 161

Common Time Series Analysis Techniques 163

Calculating Percentage Changes 164

Rolling Window Calculations 166

Calculating the Percentage Change of a Rolling Average 167

Multivariate Time Series 167

Processing Multivariate Time Series 168

Analyzing Dependencies Between Variables 169

Exercise #17 Adding More Metrics to Analyze Dependencies 172

Summary 174

11 Gaining Insights from Data 175

Association Rules 176

Support 177

Confidence 177

Lift 178

The Apriori Algorithm 178

Creating a Transaction Dataset 179

Identifying Frequent Itemsets 180

Generating Association Rules 181

Visualizing Association Rules 182

Gaining Actionable Insights from Association Rules 186

Generating Recommendations 186

Planning Discounts Based on Association Rules 187

Exercise #18 Mining Real Transaction Data 189

Summary 192

12 Machine Learning for Data Analysis 193

Why Machine Learning? 194

Types of Machine Learning 194

Supervised Learning 194

Unsupervised Learning 195

How Machine Learning Works 196

Data to Learn From 196

A Statistical Model 197

Previously Unseen Data 197

A Sentiment Analysis Example: Classifying Product Reviews 198

Obtaining Product Reviews 198

Cleansing the Data 199

Splitting and Transforming the Data 201

Training the Model 203

Evaluating the Model 203

Exercise #19 Expanding the Example Set 206

Predicting Stock Trends 206

Getting Data 207

Deriving Features from Continuous Data 208

Generating the Output Variable 209

Training and Evaluating the Model 210

Exercise #20 Experimenting with Different Stocks and New Metrics 211

Summary 211

Index 213

From the B&N Reads Blog

Customer Reviews