PostgreSQL Query Optimization: The Ultimate Guide to Building Efficient Queries

PostgreSQL Query Optimization: The Ultimate Guide to Building Efficient Queries

PostgreSQL Query Optimization: The Ultimate Guide to Building Efficient Queries

PostgreSQL Query Optimization: The Ultimate Guide to Building Efficient Queries

eBook2nd ed. (2nd ed.)

$37.49  $49.99 Save 25% Current price is $37.49, Original price is $49.99. You Save 25%.

Available on Compatible NOOK devices, the free NOOK App and in My Digital Library.
WANT A NOOK?  Explore Now

Related collections and offers


Overview

Write optimized queries. This book helps you write queries that perform fast and deliver results on time. You will learn that query optimization is not a dark art practiced by a small, secretive cabal of sorcerers. Any motivated professional can learn to write efficient queries from the get-go and capably optimize existing queries. You will learn to look at the process of writing a query from the database engine’s point of view, and know how to think like the database optimizer.

The book begins with a discussion of what a performant system is and progresses to measuring performance and setting performance goals. It introduces different classes of queries and optimization techniques suitable to each, such as the use of indexes and specific join algorithms. You will learn to read and understand query execution plans along with techniques for influencing those plans for better performance. The book also covers advanced topics such as the use of functions and procedures, dynamic SQL, and generated queries. All of these techniques are then used together to produce performant applications, avoiding the pitfalls of object-relational mappers.

This second edition includes new examples using Postgres 15 and the newest version of the PostgresAir database. It includes additional details and clarifications about advanced topics, and covers configuration parameters in greater depth. Finally, it makes use of advancements in NORM, using automatically generated functions.


What You Will Learn

• Identify optimization goals in OLTP and OLAP systems
• Read and understand PostgreSQL execution plans

• Distinguish between short queries and long queries
• Choose the right optimization technique for each query type
• Identify indexes that will improve query performance
• Optimize full table scans
• Avoid the pitfalls of object-relational mapping systems
• Optimize the entire application rather than just database queries


Who This Book Is For

IT professionals working in PostgreSQL who want to develop performant and scalable applications, anyone whose job title contains the words “database developer” or “database administrator" or who is a backend developer charged with programming database calls, and system architects involved in the overall design of application systems running against a PostgreSQL database

Product Details

ISBN-13: 9798868800696
Publisher: Apress
Publication date: 01/08/2024
Sold by: Barnes & Noble
Format: eBook
File size: 11 MB
Note: This product may take a few minutes to download.

About the Author

Henrietta Dombrovskaya is a database researcher and developer with over 40 years of academic and industrial experience. She holds a PhD in Computer Science from the University of Saint Petersburg, Russia. At present, she is a Database Architect at DRW Holdings, Chicago, Illinois. She is an active member of the PostgreSQL community, a frequent speaker at the PostgreSQL conference, and the local organizer of the Chicago PostgreSQL User Group. Her research interests are tightly coupled with practice and are focused on developing efficient interactions between applications and databases.


Boris Novikov's experience includes leading post-graduate research groups and advising dozens of PhD students, while partnering and collaborating with industrial clients. His research interests are in the broad area of information management and include design, development, and tuning of databases, applications, and database management systems (DBMSs). He alsohas interests in distributed scalable systems for stream processing and analytics..

Anna Bailliekova is Senior Data Engineer at UrbanFootprint, where she works on data platform tools and cloud ops for geospatial applications. Previously, she worked in data engineering at Zendesk, and held a variety of DBA and BI roles at Epic. Her writing can be found in the newsletter Better Streets MKE. She received her undergraduate degree with College Honors in Political Science and Computer Science from Knox College in Galesburg, Illinois.

Table of Contents

About the Authors xi

About the Technical Reviewer xiii

Acknowledgments xv

Introduction xvii

Chapter 1 Why Optimize? 1

What Do We Mean by Optimization? 1

Why It Is Difficult: Imperative and Declarative 2

Optimization Goals 5

Optimizing Processes 7

Optimizing OLTP and OLAP 8

Database Design and Performance 8

Application Development and Performance 10

Other Stages of the Lifecycle 10

PostgreSQL Specifics 11

Summary 12

Chapter 2 Theory: Yes, We Need It! 13

Query Processing Overview 13

Compilation 13

Optimization and Execution 14

Relational, Logical, and Physical Operations 15

Relational Operations 15

Logical Operations 19

Queries as Expressions: Thinking in Sets 20

Operations and Algorithms 20

Summary 21

Chapter 3 Even More Theory: Algorithms 23

Algorithm Cost Models 23

Data Access Algorithms 24

Storage Structures 25

Full Scan 26

Index-Based Table Access 27

Index-Only Scan 28

Comparing Data Access Algorithms 28

Index Structures 31

What Is an Index? 31

B-Tree Indexes 33

Why Are B-Trees Used So Often? 35

Bitmaps 35

Other Kinds of Indexes 36

Combining Relations 37

Nested Loops 37

Hash-Based Algorithms 39

Sort-Merge Algorithm 41

Comparing Algorithms 42

Summary 42

Chapter 4 Understanding Execution Plans 43

Putting Everything Together: How an Optimizer Builds an Execution Plan 43

Reading Execution Plans 44

Understanding Execution Plans 49

What Is Going On During Optimization? 49

Why Are There So Many Execution Plans to Choose From? 50

How Are Execution Costs Calculated? 51

How Can the Optimizer Be Led Astray? 54

Summary 55

Chapter 5 Short Queries and Indexes 57

Which Queries Are Considered Short? 57

Choosing Selection Criteria 59

Index Selectivity 60

Unique Indexes and Constraints 61

Indexes and Non-equal Conditions 65

Indexes and Column Transformations 65

Indexes and the like Operator 70

Using Multiple Indexes 72

Compound Indexes 73

How Do Compound Indexes Work? 74

Lower Selectivity 76

Using Indexes for Data Retrieval 76

Covering Indexes 77

Excessive Selection Criteria 79

Partial Indexes 83

Indexes and Join Order 85

When Are Indexes Not Used 88

Avoiding Index Usage 89

Why Does PostgreSQL Ignore My Index? 89

Let PostgreSQL Do Its Job! 92

How to Build the Right Index(es)? 97

To Build or Not to Build 97

Which Indexes Are Needed? 98

Which Indexes Are Not Needed? 98

Indexes and Short Query Scalability 99

Summary 100

Chapter 6 Long Queries and Full Scans 101

Which Queries Are Considered Long? 101

Long Queries and Full Scans 103

Long Queries and Hash Joins 104

Long Queries and the Order of Joins 105

What Is a Semi-join? 105

Semi-joins and Join Order 107

More on Join Order 109

What is an Anti-join? 112

Semi- and Anti-joins Using the JOIN Operator 113

When Is It Necessary to Specify Join Order? 116

Grouping: Filter First, Group Last 118

Grouping: Group First, Select Last 125

Using SET operations 128

Avoiding Multiple Scans 132

Conclusion 138

Chapter 7 Long Queries: Additional Techniques 139

Structuring Queries 139

Temporary Tables and CTEs 140

Temporary Tables 140

Common Table Expressions (CTEs) 142

Views: To Use or Not to Use 147

Why Use Views? 154

Materialized Views 154

Creating and Using Materialized Views 154

Refreshing Materialized Views 156

Create a Materialized View or Not? 156

Do Materialized Views Need to Be Optimized? 158

Dependencies 159

Partitioning 160

Parallelism 165

Summary 166

Chapter 8 Optimizing Data Modification 167

What Is DML? 167

Two Ways to Optimize Data Modification 167

How Does DML Work? 168

Low-Level Input/Output 168

The Impact of Concurrency Control 169

Data Modification and Indexes 172

Mass Updates and Frequent Updates 173

Referential Integrity and Triggers 174

Summary 175

Chapter 9 Design Matters 177

Design Matters 177

Why Use a Relational Model? 181

Types of Databases 182

Entity-Attribute-Value Model 182

Key-Value Model 183

Hierarchical Model 184

Combining the Best of Different Worlds 185

Flexibility vs. Efficiency and Correctness 185

Must We Normalize? 187

Use and Misuse of Surrogate Keys 189

Summary 195

Chapter 10 Application Development and Performance 197

Response Time Matters 197

World Wide Wait 198

Performance Metrics 199

Impedance Mismatch 200

The Road Paved with Good intentions 200

Application Development Patterns 201

"Shopping List Problem" 203

Interfaces 205

Welcome to the World of ORM 205

In Search of a Better Solution 207

Summary 210

Chapter 11 Functions 211

Function Creation 211

Internal Functions 212

User-Defined Functions 212

Introducing Procedural Language 213

Dollar Quoting 214

Function Parameters and Function Output: Void Functions 215

Function Overloading 216

Function Execution 218

Function Execution Internals 220

Functions and Performance 223

How Using Functions Can Worsen Performance 224

Any Chance Functions Can Improve Performance? 226

Functions and User-Defined Types 226

User-Defined Data Types 226

Functions Returning Composite Types 227

Using Composite Types with Nested Structure 231

Functions and Type Dependencies 235

Data Manipulation with Functions 236

Functions and Security 238

What About Business Logic? 239

Functions in OLAP Systems 240

Parameterizing 240

No Explicit Dependency on Tables and Views 241

Ability to Execute Dynamic SQL 241

Stored Procedures 241

Functions with No Results 241

Functions and Stored Procedures 242

Transaction Management 242

Exception Processing 243

Summary 244

Chapter 12 Dynamic SQL 245

What Is Dynamic SQL 245

Why It Works Better in Postgres 245

What About SQL Injection? 246

How to Use Dynamic SQL in OLTP Systems 246

How to Use Dynamic SQL in OLAP Systems 252

Using Dynamic SQL for Flexibility 256

Using Dynamic SQL to Aid the Optimizer 263

FDWs and Dynamic SQL 266

Summary 267

Chapter 13 Avoiding the Pitfalls of Object-Relational Mapping 269

Why Application Developers Like NORM 269

ORM vs. NORM 270

NORM Explained 272

Implementation Details 278

Complex Searches 283

Updates 286

Insert 287

Update 288

Delete 290

Why Not Store JSON?! 291

Performance Gains 291

Working Together with Application Developers 292

Summary 292

Chapter 14 More Complex Filtering and Search 293

Full Text Search 293

Multidimensional and Spatial Search 295

Generalized Index Types in PostgreSQL 295

GIST Indexes 296

Indexes for Full Text Search 296

Indexing Very Large Tables 297

Indexing JSON and JSONB 298

Summary 302

Chapter 15 Ultimate Optimization Algorithm 303

Major Steps 303

Step-by-Step Guide 304

Step 1 Short or Long? 304

Step 2 Short 304

Step 3 Long 306

Step 4 Incremental Updates 306

Step 5 Non-incremental Long Query 306

But Wait, There Is More! 307

Summary 308

Chapter 16 Conclusion 309

Index 311

From the B&N Reads Blog

Customer Reviews