Azure Storage, Streaming, and Batch Analytics: A guide for data engineers
The Microsoft Azure cloud is an ideal platform for data-intensive applications. Designed for productivity, Azure provides pre-built services that make collection, storage, and analysis much easier to implement and manage. Azure Storage, Streaming, and Batch Analytics teaches you how to design a reliable, performant, and cost-effective data infrastructure in Azure by progressively building a complete working analytics system.

Summary
The Microsoft Azure cloud is an ideal platform for data-intensive applications. Designed for productivity, Azure provides pre-built services that make collection, storage, and analysis much easier to implement and manage. Azure Storage, Streaming, and Batch Analytics teaches you how to design a reliable, performant, and cost-effective data infrastructure in Azure by progressively building a complete working analytics system.

Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications.

About the technology
Microsoft Azure provides dozens of services that simplify storing and processing data. These services are secure, reliable, scalable, and cost efficient.

About the book
Azure Storage, Streaming, and Batch Analytics shows you how to build state-of-the-art data solutions with tools from the Microsoft Azure platform. Read along to construct a cloud-native data warehouse, adding features like real-time data processing. Based on the Lambda architecture for big data, the design uses scalable services such as Event Hubs, Stream Analytics, and SQL databases. Along the way, you’ll cover most of the topics needed to earn an Azure data engineering certification.

What's inside

Configuring Azure services for speed and cost
Constructing data pipelines with Data Factory
Choosing the right data storage methods

About the reader
For readers familiar with database management. Examples in C# and PowerShell.

About the author
Richard Nuckolls is a senior developer building big data analytics and reporting systems in Azure.

Table of Contents

1 What is data engineering?

2 Building an analytics system in Azure

3 General storage with Azure Storage accounts

4 Azure Data Lake Storage

5 Message handling with Event Hubs

6 Real-time queries with Azure Stream Analytics

7 Batch queries with Azure Data Lake Analytics

8 U-SQL for complex analytics

9 Integrating with Azure Data Lake Analytics

10 Service integration with Azure Data Factory

11 Managed SQL with Azure SQL Database

12 Integrating Data Factory with SQL Database

13 Where to go next
"1137330200"
Azure Storage, Streaming, and Batch Analytics: A guide for data engineers
The Microsoft Azure cloud is an ideal platform for data-intensive applications. Designed for productivity, Azure provides pre-built services that make collection, storage, and analysis much easier to implement and manage. Azure Storage, Streaming, and Batch Analytics teaches you how to design a reliable, performant, and cost-effective data infrastructure in Azure by progressively building a complete working analytics system.

Summary
The Microsoft Azure cloud is an ideal platform for data-intensive applications. Designed for productivity, Azure provides pre-built services that make collection, storage, and analysis much easier to implement and manage. Azure Storage, Streaming, and Batch Analytics teaches you how to design a reliable, performant, and cost-effective data infrastructure in Azure by progressively building a complete working analytics system.

Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications.

About the technology
Microsoft Azure provides dozens of services that simplify storing and processing data. These services are secure, reliable, scalable, and cost efficient.

About the book
Azure Storage, Streaming, and Batch Analytics shows you how to build state-of-the-art data solutions with tools from the Microsoft Azure platform. Read along to construct a cloud-native data warehouse, adding features like real-time data processing. Based on the Lambda architecture for big data, the design uses scalable services such as Event Hubs, Stream Analytics, and SQL databases. Along the way, you’ll cover most of the topics needed to earn an Azure data engineering certification.

What's inside

Configuring Azure services for speed and cost
Constructing data pipelines with Data Factory
Choosing the right data storage methods

About the reader
For readers familiar with database management. Examples in C# and PowerShell.

About the author
Richard Nuckolls is a senior developer building big data analytics and reporting systems in Azure.

Table of Contents

1 What is data engineering?

2 Building an analytics system in Azure

3 General storage with Azure Storage accounts

4 Azure Data Lake Storage

5 Message handling with Event Hubs

6 Real-time queries with Azure Stream Analytics

7 Batch queries with Azure Data Lake Analytics

8 U-SQL for complex analytics

9 Integrating with Azure Data Lake Analytics

10 Service integration with Azure Data Factory

11 Managed SQL with Azure SQL Database

12 Integrating Data Factory with SQL Database

13 Where to go next
49.99 In Stock
Azure Storage, Streaming, and Batch Analytics: A guide for data engineers

Azure Storage, Streaming, and Batch Analytics: A guide for data engineers

by Richard L. Nuckolls
Azure Storage, Streaming, and Batch Analytics: A guide for data engineers

Azure Storage, Streaming, and Batch Analytics: A guide for data engineers

by Richard L. Nuckolls

Paperback

$49.99 
  • SHIP THIS ITEM
    Qualifies for Free Shipping
  • PICK UP IN STORE
    Check Availability at Nearby Stores

Related collections and offers


Overview

The Microsoft Azure cloud is an ideal platform for data-intensive applications. Designed for productivity, Azure provides pre-built services that make collection, storage, and analysis much easier to implement and manage. Azure Storage, Streaming, and Batch Analytics teaches you how to design a reliable, performant, and cost-effective data infrastructure in Azure by progressively building a complete working analytics system.

Summary
The Microsoft Azure cloud is an ideal platform for data-intensive applications. Designed for productivity, Azure provides pre-built services that make collection, storage, and analysis much easier to implement and manage. Azure Storage, Streaming, and Batch Analytics teaches you how to design a reliable, performant, and cost-effective data infrastructure in Azure by progressively building a complete working analytics system.

Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications.

About the technology
Microsoft Azure provides dozens of services that simplify storing and processing data. These services are secure, reliable, scalable, and cost efficient.

About the book
Azure Storage, Streaming, and Batch Analytics shows you how to build state-of-the-art data solutions with tools from the Microsoft Azure platform. Read along to construct a cloud-native data warehouse, adding features like real-time data processing. Based on the Lambda architecture for big data, the design uses scalable services such as Event Hubs, Stream Analytics, and SQL databases. Along the way, you’ll cover most of the topics needed to earn an Azure data engineering certification.

What's inside

Configuring Azure services for speed and cost
Constructing data pipelines with Data Factory
Choosing the right data storage methods

About the reader
For readers familiar with database management. Examples in C# and PowerShell.

About the author
Richard Nuckolls is a senior developer building big data analytics and reporting systems in Azure.

Table of Contents

1 What is data engineering?

2 Building an analytics system in Azure

3 General storage with Azure Storage accounts

4 Azure Data Lake Storage

5 Message handling with Event Hubs

6 Real-time queries with Azure Stream Analytics

7 Batch queries with Azure Data Lake Analytics

8 U-SQL for complex analytics

9 Integrating with Azure Data Lake Analytics

10 Service integration with Azure Data Factory

11 Managed SQL with Azure SQL Database

12 Integrating Data Factory with SQL Database

13 Where to go next

Product Details

ISBN-13: 9781617296307
Publisher: Manning
Publication date: 11/03/2020
Pages: 448
Product dimensions: 7.38(w) x 9.25(h) x 0.70(d)

About the Author

Richard Nuckolls is a senior developer building a big data analytics and reporting system in Azure. During his nearly 20 years of experience, he’s done server and database administration, desktop and web development, and more recently has led teams in building a production content management system in Azure.

Table of Contents

Preface xiii

Acknowledgements xv

About this book xvi

About the author xix

About the cover illustration xx

1 What is data engineering? 1

1.1 What is data engineering? 2

1.2 What do data engineers do? 3

1.3 How does Microsoft define data engineering? 4

Data acquisition 6

Data storage 6

Data processing 6

Data queries 7

Orchestration 7

Data retrieval 7

1.4 What tools does Azure provide for data engineering? 7

1.5 Azure Data Engineers 8

1.6 Example application 9

2 Building an analytics system in Azure 12

2.1 Fundamentals of Azure architecture 13

Azure subscriptions 13

Azure regions 14

Azure naming conventions 14

Resource groups 16

Finding resources 17

2.2 Lambda architecture 17

2.3 Azure cloud services 19

Azure analytics system architecture 20

Event Hubs 20

Stream Analytics 21

Data Lake Storage 21

Data Lake Analytics 21

SQL Database 22

Data Factory 22

Azure PowerShell 22

2.4 Walk-through of processing a series of event data records 22

Hot path 22

Cold path 23

Choosing abstract Azure services 23

2.5 Calculating cloud hosting costs 28

Event Hubs 29

Stream Analytics 29

Data Lake-Storage 29

Data Lake Analytics 30

SQL Database 31

Data Factory 31

3 General storage with Azure Storage accounts 33

3.1 Cloud storage services 35

Before you begin 35

3.2 Creating an Azure Storage account 35

Using Azure portal 36

Using Azure PowerShell 37

Azure Storage replication 38

3.3 Storage account services 39

Blob storage 40

Creating a Blobs service container 40

Blob tiering 41

Copy tools 42

Queues 45

Creating a queue 49

Azure Storage queue options 52

3.4 Storage account access 53

Blob container security 54

Designing Storage account access 54

3.5 Exercises 61

Exercise 1 61

Exercise 2 61

4 Azure Data Lake Storage 63

4.1 Create an Azure Data Lake store 65

Using Azure Portal 65

Using Azure PowerShell 66

4.2 Data Lake store access 68

Access schemes 68

Configuring access 69

Hierarchy structure in the Data Lake store 73

4.3 Storage folder structure and data drift 77

Hierarchy structure revisited 77

Data drift 82

4.4 Copy tools for Data Lake stores 85

Data Explorer 85

ADLCopy tool 87

Azure Storage Explorer tool 89

4.5 Exercises 91

Exercise 1 91

Exercise 2 91

5 Message handling with Event Hubs 93

5.1 How does an Event Hub work? 94

5.2 Collecting data in Azure 94

5.3 Create an Event Hubs namespace 96

Using Azure PowerShell 96

Throughput units 97

Event Hub geo-disaster recovery 97

Failover with geo-disaster recovery 99

5.4 Creating an Event Hub 100

Using Azure portal 100

Using Azure PowerShell 100

Shared access policy 101

5.5 Event Hub partitions 102

Multiple consumers 102

Why specify a partition? 103

Why not specify a partition? 103

Event Hubs message journal 104

Partitions and throughput units 104

5.6 Configuring Capture 104

File name formats 105

Secure access for Capture 105

Enabling Capture 106

The importance of time 109

5.7 Securing access to Event Hubs 109

Shared Access Signature policies 110

Writing to Event Hubs 111

5.8 Exercises 114

Exercise 1 114

Exercise 2 114

Exercise 3 115

6 Real-time queries with Azure Stream Analytics 116

6.1 Creating a Stream Analytics service 118

Elements of a Stream Analytics job 119

Create an ASA job using the Azure portal 119

Create an ASA job using Azure PowerShell 120

6.2 Configuring inputs and outputs 122

Event Hub job input 123

ASA job outputs 126

6.3 Creating a job query 135

Starting the ASA job 137

Failure to start 138

Output exceptions 139

6.4 Writing job queries 139

Window functions 140

Machine learning functions 146

6.5 Managing performance 148

Streaming units 148

Event ordering 150

6.6 Exercises 155

Exercise 1 155

Exercise 2 156

7 Batch queries with Azure Data Lake Analytics 158

7.1 U-SQL language 160

Extractors 161

Outputters 162

File selectors 163

Expressions 165

7.2 U-SQL jobs 165

Selecting the biometric data files 166

Schema extraction 167

Aggregation 169

Writing files 169

7.3 Creating a Data Lake Analytics service 171

Using Azure portal 172

Using Azure PowerShell 172

7.4 Submitting jobs to ADLA 174

Using Azure portal 174

Using Azure PowerShell 176

7.5 Efficient U-SQL job executions 178

Monitoring a U-SQL job 178

Analytics units 179

Vertexes 179

Scaling the job execution 182

7.6 Using Blob Storage 185

Constructing Blob file selectors 185

Adding a new data source 186

Filtering rowsets 188

7.7 Exercises 191

Exercise 1 191

Exercise 2 191

8 U-SQL for complex analytics 193

8.1 Data Lake Analytics Catalog 194

Simplifying U-SQL queries 194

Simplifying data access 195

Loading data for reuse 205

8.2 Window functions 215

8.3 Local C# functions 217

8.4 Exercises 220

Exercise 1 221

Exercise 2 222

9 Integrating with Azure Data Lake Analytics 223

9.1 Processing unstructured data 225

Azure Cognitive Services 225

Managing assemblies in the Data Lake 226

Image data extraction with Advanced Analytics 230

9.2 Reading different file types 233

Adding custom libraries with a Catalog 233

Creating a catalog database 233

Building the U-SQL DataFormats solution 234

Code folders 235

Using custom assemblies 236

9.3 Connecting to remote sources 248

External databases 248

Credentials 251

Data Source 251

Tables and views 253

9.4 Exercises 254

Exercise 1 254

Exercise 2 255

10 Service integration with Azure Data Factory 257

10.1 Creating an Azure Data Factory service 259

10.2 Secure authentication 262

Azure Active Directory integration 263

Azure Key Vault 266

10.3 Copying files with ADF 272

Creating a Files storage container 272

Adding secrets to AKV 273

Creating a Files storage linkedservice 274

Creating an ADLS linkedservice 276

Creating a pipeline and activity 280

Creating a scheduled trigger 288

10.4 Running an ADLA job 291

Creating an ADLA linkedservice 292

Creating a pipeline and activity 294

10.5 Exercises 296

Exercise 1 296

Exercise 2 297

11 Managed SQL with Azure SQL Database 299

11.1 Creating an Azure SQL Database 301

Create a SQL Sewer and SQLDB 302

11.2 Securing SQLDB 302

11.3 Availability and recovery 304

Restoring and moving SQLDB 304

Database safeguards 311

Creating alerts for SQLDB 317

11.4 Optimizing costs for SQLDB 318

Pricing structure 319

Scaling SQLDB 321

Serverless 323

Elastic Pools 325

11.5 Exercises 328

Exercise 1 328

Exercise 2 329

Exercise 3 330

Exercise 4 330

12 Integrating Data Factory with SQL Database 332

12.1 Before you begin 333

12.2 Importing data with external data sources 334

Creating a database scoped credential 336

Creating an external data source 338

Creating an external table 339

Importing Blob files 340

12.3 Importing file data with ADF 341

Authenticating between ADF and SQLDB 343

Creating SQL Database linkedservice 344

Creating datasets 347

Creating a copy activity and pipeline 351

12.4 Exercises 356

Exercise 1 356

Exercise 2 357

Exercise 3 357

13 Where to go next 360

13.1 Data catalog 361

Data Catalog as a service 362

Data locations 362

Data definitions 362

Data frequency 363

Business drivers 363

13.2 Version control and backups 363

Blob Storage 364

Data Lake Storage 364

Stream Analytics 365

Data Lake Analytics 365

Data Factory configuration files 365

SQL Database 371

13.3 Microsoft certifications 372

13.4 Signing off 372

Appendix A Setting up Azure services through PowerShell 374

Appendix B Configuring the Jonestown Sluggers analytics system 389

Index 415

From the B&N Reads Blog

Customer Reviews