![Event Streams in Action: Integrating and processing event streams](http://img.images-bn.com/static/redesign/srcs/images/grey-box.png?v11.9.4)
Event Streams in Action: Integrating and processing event streams
344![Event Streams in Action: Integrating and processing event streams](http://img.images-bn.com/static/redesign/srcs/images/grey-box.png?v11.9.4)
Event Streams in Action: Integrating and processing event streams
344Paperback(1st Edition)
-
PICK UP IN STORECheck Availability at Nearby Stores
Available within 2 business hours
Related collections and offers
Overview
Event Streams in Action is a foundational book introducing the ULP paradigm and presenting techniques to use it effectively in data-rich environments.
Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications.
About the Technology
Many high-profile applications, like LinkedIn and Netflix, deliver nimble, responsive performance by reacting to user and system events as they occur. In large-scale systems, this requires efficiently monitoring, managing, and reacting to multiple event streams. Tools like Kafka, along with innovative patterns like unified log processing, help create a coherent data processing architecture for event-based applications.
About the Book
Event Streams in Action teaches you techniques for aggregating, storing, and processing event streams using the unified log processing pattern. In this hands-on guide, you'll discover important application designs like the lambda architecture, stream aggregation, and event reprocessing. You'll also explore scaling, resiliency, advanced stream patterns, and much more! By the time you're finished, you'll be designing large-scale data-driven applications that are easier to build, deploy, and maintain.
What's inside
- Validating and monitoring event streams
- Event analytics
- Methods for event modeling
- Examples using Apache Kafka and Amazon Kinesis
About the Reader
For readers with experience coding in Java, Scala, or Python.
About the Author
Alexander Dean developed Snowplow, an open source event processing and analytics platform. Valentin Crettaz is an independent IT consultant with 25 years of experience.
Table of Contents
- Introducing event streams
- The unified log 24
- Event stream processing with Apache Kafka
- Event stream processing with Amazon Kinesis
- Stateful stream processing
- Schemas
- Archiving events
- Railway-oriented processing
- Commands
- Analytics-on-read
- Analytics-on-write
Product Details
ISBN-13: | 9781617292347 |
---|---|
Publisher: | Manning |
Publication date: | 05/30/2019 |
Edition description: | 1st Edition |
Pages: | 344 |
Product dimensions: | 7.30(w) x 9.10(h) x 0.80(d) |
About the Author
Valentin Crettaz is an independent IT consultant who's been working for the past 25 years on many different challenging projects across the globe. His expertise ranges from software engineering and architecture to data science and business intelligence. His daily job boils down to leveraging the latest and most cutting-edge web, data, and streaming technologies to implement IT solutions that will help reduce the cultural gap between IT and business people.
Table of Contents
Preface xiii
Acknowledgments xiv
About this book xvi
About the authors xix
About the cover illustration xx
Part 1 Event Streams and Unified Logs 1
1 Introducing event streams 3
1.1 Defining our terms 4
Events 5
Continuous event streams 6
1.2 Exploring familiar event streams 7
Application-level logging 7
Web analytics 8
Publish/subscribe messaging 10
1.3 Unifying continuous event streams 12
The classic era 13
The hybrid era 16
The unified era 17
1.4 Introducing use cases for the unified log 19
Customer feedback loops 19
Holistic systems monitoring 21
Hot-swapping data application versions 22
2 The unified log 24
2.1 Understanding the anatomy of a unified log 25
Unified 25
Append-only 26
Distributed 27
Ordered 28
2.2 Introducing our application 29
Identifying our key events 30
Unified log, e-commerce style 31
Modeling our first event 32
2.3 Setting up our unified log 34
Downloading and installing Apache Kafka 34
Creating our stream 35
Sending and receiving events 36
3 Event stream processing with Apache Kafka 38
3.1 Event stream processing 101 39
Why process event streams? 39
Single-event processing 41
Multiple-event processing 42
3.2 Designing our first stream-processing app 42
Using Kafka as our company's glue 43
Locking down our requirements 44
3.3 Writing a simple Kafka worker 46
Setting up OUT development environment 46
Configuring our application 47
Reading from Kafka 49
Writing to Kafka 50
Stitching it all together 51
Testing 52
3.4 Writing a single-event processor 54
Writing our event processor 54
Updating our main function 56
Testing, redux 57
4 Event stream processing with Amazon Kinesis 60
4.1 Writing events to Kinesis 61
Systems monitoring and the unified log 61
Terminology differences from Kafka 63
Setting up our stream 64
Modeling our events 65
Writing our agent 66
4.2 Reading from Kinesis 72
Kinesis frameworks and SDKs 72
Reading events with the AWS CLI 73
Monitoring our stream with boto 79
5 Stateful stream processing 88
5.1 Detecting abandoned shopping carts 89
What management wants 89
Defining our algorithm 90
Introducing our derived events stream 91
5.2 Modeling our new events 92
Shopper adds item to cart 92
Shopper places order 93
Shopper abandons cart 93
5.3 Stateful stream processing 94
Introducing state management 94
Stream windowing 96
Stream processing frameworks and their capabilities 97
Stream processing frameworks 97
Choosing a stream processing framework for Nile 100
5.4 Detecting abandoned carts 101
Designing our Samza job 101
Preparing our project 102
Configuring our job 103
Writing our job's Java task 104
5.5 Running our Samza job 110
Introducing YARN 110
Submitting our job 111
Testing our job 112
Improving our job 113
Part 2 Data Engineering with Streams 115
6 Schemas 117
6.1 An introduction to schemas 118
Introducing Plum 118
Event schemas as contracts 120
Capabilities of schema technologies 121
Some schema technologies 123
Choosing a schema technology for Plum 125
6.2 Modeling our event in Avro 125
Setting up a development harness 126
Writing our health check event schema 127
From Avro to Java, and back again 129
Testing 131
6.3 Associating events with their schemas 132
Some modest proposals 132
A self-describing event for Plum 135
Plum's schema registry 137
7 Archiving events 140
7.1 The archivist's manifesto 141
Resilience 142
Reprocessing 143
Refinement 144
7.2 A design for archiving 146
What to archive 146
Where to archive 147
How to archive 148
7.3 Archiving Kafka with Secor 149
Warning up Kafka 150
Creating our event archive 152
Setting up Secor 153
7.4 Batch processing our archive 155
Batch processing 101 155
Designing our batch processing job 158
Writing our job in Apache Spark 159
Running our job on Elastic MapReduce 163
8 Railway-oriented processing 171
8.1 Leaving the happy path 172
Failure and Unix programs 172
Failure and Java 175
Failure and the log-industrial complex 178
8.2 Failure and the unified log 179
A design for failure 179
Modeling failures as events 181
Composing our happy path across jobs 183
8.3 Failure composition with Scalaz 184
Planning for failure 184
Setting up our Scala project 186
From Java to Scala 187
Better failure handling through Scalaz 189
Composing failures 191
8.4 Implementing railway-oriented processing 196
Introducing railway-oriented processing 196
Building the railway 199
9 Commands 208
9.1 Commands and the unified log 209
Events and commands 209
Implicit vs. explicit commands 210
Working with commands in a unified log 212
9.2 Making decisions 213
Introducing commands at Plum 213
Modeling commands 214
Writing our alert schema 216
Defining our alert schema 218
9.3 Consuming our commands 219
The right tool for the job 219
Reading our commands 220
Parsing our commands 221
Stitching it all together 224
Testing 224
9.4 Executing our commands 226
Signing up for Mailgun 226
Completing our executor 226
Final testing 230
9.5 Scaling up commands 231
One stream of commands, or many? 231
Handling command-execution failures 231
Command hierarchies 233
Part 3 Event Analytics 235
10 Analytics-on-read 237
10.1 Analytics-on-read, analytics-on-write 238
Analytics-on-read 238
Analytics-on-write 239
Choosing an approach 240
10.2 The OOPS event stream 242
Delivery truck events and entities 242
Delivery driver events and entities 243
The OOPS event model 243
The OOPS events archive 245
10.3 Getting started with Amazon Redshift 246
Introducing Redshift 246
Setting up Redshift 248
Designing an event warehouse 251
Creating our fat events table 255
10.4 ETL, ELT 256
Loading our events 256
Dimension widening 259
A detour on data volatility 263
10.5 Finally, some analysis 264
Analysis 1: Who does the most oil changes? 264
Analysis 2: Who is our most unreliable customer? 265
11 Analytics-on-write 268
11.1 Back to OOPS 269
Kinesis setup 269
Requirements gathering 271
Our analytics-on-write algorithm 272
11.2 Building our Lambda function 276
Setting up DynamoDB 276
Introduction to AWS Lambda 277
Lambda setup and event modeling 279
Revisiting our analytics-on-write algorithm 281
Conditional writes to DynarnoDB 286
Finalizing our Lambda 289
11.3 Running our Lambda function 290
Deploying our Lambda function 290
Testing our Lambda function 293
Appendix AWS primer 297
Index 309