![Real-World Python: A Hacker's Guide to Solving Problems with Code](http://img.images-bn.com/static/redesign/srcs/images/grey-box.png?v11.8.5)
![Real-World Python: A Hacker's Guide to Solving Problems with Code](http://img.images-bn.com/static/redesign/srcs/images/grey-box.png?v11.8.5)
Paperback
-
PICK UP IN STORECheck Availability at Nearby Stores
Available within 2 business hours
Related collections and offers
Overview
You've mastered the basics. Now you're ready to explore some of Python's more powerful tools. Real-World Python will show you how.
Through a series of hands-on projects, you'll investigate and solve real-world problems using sophisticated computer vision, machine learning, data analysis, and language processing tools. You'll be introduced to important modules like OpenCV, NumPy, Pandas, NLTK, Bokeh, Beautiful Soup, Requests, HoloViews, Tkinter, turtle, matplotlib, and more. You'll create complete, working programs and think through intriguing projects that show you how to:
If you're tired of learning the bare essentials of Python Programming with isolated snippets of code, you'll relish the relevant and geeky fun of Real-World Python!
Product Details
ISBN-13: | 9781718500624 |
---|---|
Publisher: | No Starch Press |
Publication date: | 11/05/2020 |
Pages: | 360 |
Sales rank: | 623,428 |
Product dimensions: | 6.90(w) x 9.20(h) x 1.00(d) |
About the Author
(No Starch Press). As a former executive-level scientist at ExxonMobil, he spent decades constructing and reviewing complex computer models, developed and tested software, and trained geoscientists and engineers.
Read an Excerpt
ATTRIBUTING AUTHORSHIP WITH STYLOMETRY
Stylometry is the quantitative study of literary style through computational text analysis. It’s based on the idea that we all have a unique, consistent, and recognizable style to our writing. This includes our vocabulary, our use of punctuation, the average length of our sentences and words, and so on.
A common application of stylometry is authorship attribution. Do you ever wonder if Shakespeare really wrote all his plays? Or if John Lennon or Paul McCartney wrote the song “In My Life”? Could Robert Galbraith, author of A Cuckoo’s Calling, really be J. K. Rowling in disguise? Stylometry can find the answer!
Stylometry has been used to overturn murder convictions and even helped identify and convict the Unabomber in 1996. Other uses include detecting plagiarism and determining the emotional tone behind words, such as in social media posts. Stylometry can even be used to detect signs of mental depression and suicidal tendencies.
In this chapter, you’ll use multiple stylometric techniques to determine whether Sir Arthur Conan Doyle or H. G. Wells wrote the novel The Lost World.
Project #2: The Hound, The War, and The Lost World
Sir Arthur Conan Doyle (1859–1930) is best known for the Sherlock Holmes stories, considered milestones in the field of crime fiction. H. G. Wells (1866–1946) is famous for several groundbreaking science fiction novels including The War of The Worlds, The Time Machine, The Invisible Man, and The Island of Dr. Moreau.
In 1912, the Strand Magazine published The Lost World, a serialized version of a science fiction novel. It told the story of an Amazon basin expedition, led by zoology professor George Edward Challenger, that encountered living dinosaurs and a vicious tribe of ape-like creatures.
Although the author of the novel is known, for this project, let’s pretend it’s in dispute and it’s your job to solve the mystery. Experts have narrowed the field down to two authors, Doyle and Wells. Wells is slightly favored because The Lost World is a work of science fiction, which is his purview. It also includes brutish troglodytes redolent of the morlocks in his 1895 work The Time Machine. Doyle, on the other hand, is known for detective stories and historical fiction.
THE OBJECTIVE
Write a Python program that uses stylometry to determine whether Sir Arthur Conan Doyle or H. G. Wells wrote the novel The Lost World.
THE STRATEGY
The science of natural language processing (NLP) deals with the interactions between the precise and structured language of computers and the nuanced, frequently ambiguous “natural” language used by humans. Example uses for NLP include machine translations, spam detection, comprehension of search engine questions, and predictive text recognition for cell phone users.
The most common NLP tests for authorship analyze the following features of a text:
• Word length A frequency distribution plot of the length of words in a document
• Stop words A frequency distribution plot of stop words (short, noncontextual function words like the, but, and if)
• Parts of speech A frequency distribution plot of words based on their syntactic functions (such as nouns, pronouns, verbs, adverbs, adjectives, and so on)
• Most common words A comparison of the most commonly used words in a text
• Jaccard similarity A statistic used for gauging the similarity and diversity of a sample set
If Doyle and Wells have distinctive writing styles, these five tests should be enough to distinguish between them. We’ll talk about each test in more detail in the coding section.
To capture and analyze each author’s style, you’ll need a representative corpus, or a body of text. For Doyle, use the famous Sherlock Holmes novel The Hound of the Baskervilles, published in 1902. For Wells, use The War of the Worlds, published in 1898. Both these novels contain more than 50,000 words, more than enough for a sound statistical sampling. You’ll then compare each author’s sample to The Lost World to determine how closely the writing styles match.
To perform stylometry, you’ll use the Natural Language Toolkit (NLTK), a popular suite of programs and libraries for working with human language data in Python. It’s free and works on Windows, macOS, and Linux. Created in 2001 as part of a computational linguistics course at the
University of Pennsylvania, NLTK has continued to develop and expand with the help of dozens of contributors.
Table of Contents
Acknowledgments xvii
Introduction xix
Who Should Read This Book? xx
Why Python? xx
What's in This Book? xx
Python Version, Platform, and IDE xxii
Installing Python xxii
Running Python xxiv
Using a Virtual Environment xxv
Onward! xxv
1 Saving Shipwrecked Sailors with Bayes' Rule 1
Bayes' Rule 2
Project #1: Search and Rescue 5
The Strategy 6
Installing the Python Libraries 6
The Bayes Code 9
Playing the Game 22
Summary 24
Further Reading 24
Challenge Project: Smarter Searches 24
Challenge Project: Finding the Best Strategy with MCS 25
Challenge Project: Calculating the Probability of Detection 25
2 Attributing Authorship with Stylometry 27
Project #2: The Hound, The War, and The Lost World 28
The Strategy 28
Installing NLTK 29
The Corpora 32
The Stylometry Code 32
Summary 47
Further Reading 48
Practice Project: Hunting the Hound with Dispersion 48
Practice Project: Punctuation Heatmap 49
Challenge Project: Fixing Frequency 50
3 Summarizing Speeches with Natural Language Processing 51
Project #3: I Have a Dream … to Summarize Speeches! 52
The Strategy 52
Web Scraping 53
The "I Have a Dream" Code 53
Project #4: Summarizing Speeches with gensim 61
Installing gensim 61
The Make Your Bed Code 61
Project #5: Summarizing Text with Word Clouds 64
The Word Cloud and PIL Modules 65
The Word Cloud Code 66
Fine-Tuning the Word Cloud 70
Summary 71
Further Reading 72
Challenge Project: Game Night 72
Challenge Project: Summarizing Summaries 73
Challenge Project: Summarizing a Novel 74
Challenge Project: It's Not Just What You Say, It's How You Say It! 75
4 Sending Super-Secret Messages with a Book Cipher 77
The One-Time Pad 78
The Rebecca Cipher 80
Project #6: The Digital Key to Rebecca 80
The Strategy 81
The Encryption Code 82
Sending Messages 90
Summary 91
Further Reading 91
Practice Project: Charting the Characters 92
Practice Project: Sending Secrets the WWII Way 93
5 Finding Pluto 95
Project #7: Replicating a Blink Comparator 96
The Strategy 97
The Data 98
The Blink Comparator Code 99
Using the Blink Comparator 110
Project #8: Detecting Astronomical Transients with Image Differencing 112
The Strategy 113
The Transient Detector Code 113
Using the Transient Detector 119
Summary 119
Further Reading 119
Practice Project: Plotting the Orbital Path 119
Practice Project: What's the Difference? 120
Challenge Project: Counting Stars 120
6 Winning the Moon Race with Apollo 8 123
Understanding the Apollo 8 Mission 124
The Free Return Trajectory 125
The Three-Body Problem 126
Project #9: To the Moon with Apollo 8! 127
Using the turtle Module 127
The Strategy 131
The Apollo 8 Free Return Code 132
Running the Simulation 144
Summary 146
Further Reading 146
Practice Project: Simulating a Search Pattern 146
Practice Project: Start Me Up! 147
Practice Project: Shut Me Down! 148
Challenge Project: True-Scale Simulation 149
Challenge Project: The Real Apollo 8 149
7 Selecting Martian Landing Sites 151
How to Land on Mars 152
The MOLA Map 153
Project #10: Selecting Martian Landing Sites 153
The Strategy 154
The Site Selector Code 155
Results 170
Summary 171
Further Reading 171
Practice Project: Confirming That Drawings Become Part of an Image 172
Practice Project: Extracting an Elevation Profile 172
Practice Project: Plotting in 3D 173
Practice Project: Mixing Maps 173
Challenge Project: Making It Three in a Row 175
Challenge Project: Wrapping Rectangles 175
8 Detecting Distant Exoplanets 177
Transit Photometry 178
Project #11: Simulating an Exoplanet Transit 179
The Strategy 180
The Transit Code 181
Experimenting with Transit Photometry 186
Project #12: Imaging Exoplanets 188
The Strategy 188
The Pixelator Code 189
Summary 194
Further Reading 194
Practice Project: Detecting Alien Megastructures 195
Practice Project: Detecting Asteroid Transits 197
Practice Project: Incorporating Limb Darkening 198
Practice Project: Detecting Starspots 200
Practice Project: Detecting an Alien Armada 200
Practice Project: Detecting a Planet with a Moon 201
Practice Project: Measuring the Length of an Exoplanet's Day 201
Challenge Project: Generating a Dynamic Light Curve 202
9 Identifying Friend or Foe 203
Defecting Faces in Photographs 204
Project #13: Programming a Robot Sentry Gun 205
The Strategy 207
The Code 207
Results 218
Detecting Faces from a Video Stream 219
Summary 221
Further Reading 222
Practice Project: Blurring Faces 222
Challenge Project: Detecting Cat Faces 223
10 Restricting Access with Face Recognition 225
Recognizing Faces with Local Binary Pattern Histograms 226
The Face Recognition Flowchart 226
Extracting Local Binary Pattern Histograms 228
Project #14: Restricting Access to the Alien Artifact 231
The Strategy 231
Supporting Modules and Files 231
The Video Capture Code 232
The Face Trainer Code 236
The Face Predictor Code 238
Results 241
Summary 242
Further Reading 242
Challenge Project: Adding a Password and Video Capture 242
Challenge Project: Look-Alikes and Twins 243
Challenge Project: Time Machine 243
11 Creating an Interactive Zombie Escape Map 245
Project #15: Visualizing Population Density with a Choropleth Map 246
The Strategy 247
The Python Data Analysis Library 248
The bokeh and holoviews Libraries 249
Installing pandas, bokeh, and holoviews 250
Accessing the County, State, Unemployment, and Population Data 250
Hacking holoviews 252
The Choropleth Code 254
262
Summary 266
Further Reading 266
Challenge Project: Mapping US Population Change 266
12 Are We Living in a Computer Simulation? 269
Project #16: Life, the Universe, and Yertle's Pond 270
The Pond Simulation Code 270
Implications of the Pond Simulation 273
Measuring the Cost of Crossing the Lattice 275
Results 277
The Strategy 278
Summary 279
Further Reading 279
Moving On 279
Challenge Project: Finding a Safe Space 279
Challenge Project: Here Comes the Sun 280
Challenge Project: Seeing Through a Dog's Eyes 281
Challenge Project: Customized Word Search 281
Challenge Project: Simplifying a Celebration Slideshow 281
Challenge Project: What a Tangled Web We Weave 281
Challenge Project: Go Tell It on the Mountain 281
Appendix Practice Project Solutions 283
Chapter 2 Attributing Authorship with Stylometry 283
Hunting the Hound with Dispersion 283
Punctuation Heatmap 284
Chapter 4 Sending Super-Secret Messages with a Book Cipher 285
Charting the Characters 285
Sending Secrets the WWII Way 286
Chapter 5 Finding Pluto 289
Plotting the Orbital Path 289
What's the Difference? 290
Chapter 6 Winning the Moon Race with Apollo 8 292
Simulating a Search Pattern 292
Start Me Up! 293
Shut Me Down! 296
Chapter 7 Selecting Martian Landing Sites 298
Confirming That Drawings Become Part of an Image 298
Extracting an Elevation Profile 298
Plotting in 3D 299
Mixing Maps 300
Chapter 8 Detecting Distant Exoplaners 304
Detecting Alien Megastructures 304
Detecting Asteroid Transits 305
Incorporating Limb Darkening 306
Detecting an Alien Armada 307
Detecting a Planet with a Moon 309
Measuring the Length of an Exoplanet's Day 311
Chapter 9 Identifying Friend or Foe 312
Blurring Faces 312
Chapter 10 Restricting Access with Face Recognition 312
Challenge Project: Adding a Password and Video Capture 312
Index 315