Learn T-SQL Querying: A guide to developing efficient and elegant T-SQL code
By Pedro Lopes and Pam Lahoud
()
Pedro Lopes
Pedro Lopes is a Program Manager in the Database Systems group, based in Redmond, WA, USA. He has over 19 years of industry experience and has been with Microsoft for 9 years. He is currently responsible for program management of Database Engine features for in-market and vNext versions of SQL Server, with a special focus on the Relational Engine. He has extensive experience with query performance troubleshooting and is a regular speaker at numerous conferences such as SQLBits, PASS Summit, SQLIntersection, Microsoft Ignite, and Microsoft Build. He blogs about SQL on the SQL Server Team blog. He has authored several tools in the Tiger toolbox on GitHub: AdaptiveIndexDefrag maintenance solution, BPCheck, and usp_WhatsUp.
Related to Learn T-SQL Querying
Related ebooks
Ultimate SQL Server and Azure SQL for Data Management and Modernization: Full Spectrum Expert Solutions for Deploying, Securing, and Optimizing SQL Server in Linux, Cloud, and Hybrid Environments with Performance Tuning Strategies (English Edition) Rating: 0 out of 5 stars0 ratingsProfessional Microsoft SQL Server 2014 Administration Rating: 0 out of 5 stars0 ratingsSQL Server Mastery: Advanced Techniques for Database Optimization and Administration Rating: 0 out of 5 stars0 ratingsMastering SQL Server: From Basics to Expert Proficiency Rating: 0 out of 5 stars0 ratingsIntroduction to Microsoft SQL Server Rating: 0 out of 5 stars0 ratingsAdvanced SQL Performance Tuning: Optimize Your Database Workloads Rating: 0 out of 5 stars0 ratingsSQL: For Beginners: Your Guide To Easily Learn SQL Programming in 7 Days Rating: 5 out of 5 stars5/5SQL Server: Advanced Performance Tuning and Optimization Strategies Rating: 0 out of 5 stars0 ratingsComprehensive SQL Techniques: Mastering Data Analysis and Reporting Rating: 0 out of 5 stars0 ratingsMicrosoft Power BI Performance Best Practices: Learn practical techniques for building high-speed Power BI solutions Rating: 0 out of 5 stars0 ratingsAdvanced SQL Queries: Writing Efficient Code for Big Data Rating: 5 out of 5 stars5/5SQL All-in-One For Dummies Rating: 4 out of 5 stars4/5SQL Fundamentals for New Developers: A Practical Guide with Examples Rating: 0 out of 5 stars0 ratingsSQL Mastery: From Novice Queries to Advanced Database Wizardry Rating: 0 out of 5 stars0 ratingsSQL Made Simple Rating: 0 out of 5 stars0 ratingsSQL Server Functions and tutorials 50 examples Rating: 1 out of 5 stars1/5SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL Rating: 4 out of 5 stars4/5Mastering SQL and Database: From Basics to Expert Proficiency Rating: 0 out of 5 stars0 ratingsSQL Demystified: A Beginner's Roadmap to Data Retrieval and Management Rating: 0 out of 5 stars0 ratingsSQL Database Mastery: Advanced Techniques for Database Management Rating: 0 out of 5 stars0 ratingsSQL Made Easy: Tips and Tricks to Mastering SQL Programming Rating: 0 out of 5 stars0 ratingsLearn Microsoft Fabric: A practical guide to performing data analytics in the era of artificial intelligence Rating: 0 out of 5 stars0 ratingsSQL Mastermind: Unleashing the Power of Advanced Database Programming Rating: 2 out of 5 stars2/5Beginning T-SQL with Microsoft SQL Server 2005 and 2008 Rating: 3 out of 5 stars3/5Exploring Data with Access 2019 Rating: 0 out of 5 stars0 ratingsSQL Essentials For Dummies Rating: 0 out of 5 stars0 ratings
Computers For You
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates Rating: 4 out of 5 stars4/5Mastering ChatGPT: 21 Prompts Templates for Effortless Writing Rating: 4 out of 5 stars4/5Algorithms For Dummies Rating: 4 out of 5 stars4/5The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology Rating: 4 out of 5 stars4/5Data Visualization with Excel Dashboards and Reports Rating: 4 out of 5 stars4/5The Innovators: How a Group of Hackers, Geniuses, and Geeks Created the Digital Revolution Rating: 4 out of 5 stars4/5Elon Musk Rating: 4 out of 5 stars4/5CompTIA Security+ Get Certified Get Ahead: SY0-701 Study Guide Rating: 5 out of 5 stars5/5The Self-Taught Computer Scientist: The Beginner's Guide to Data Structures & Algorithms Rating: 0 out of 5 stars0 ratingsSQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL Rating: 4 out of 5 stars4/5Learning the Chess Openings Rating: 5 out of 5 stars5/5Data Analytics for Beginners: Introduction to Data Analytics Rating: 4 out of 5 stars4/5Procreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad Rating: 5 out of 5 stars5/5Storytelling with Data: Let's Practice! Rating: 4 out of 5 stars4/5UX/UI Design Playbook Rating: 4 out of 5 stars4/5Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are Rating: 4 out of 5 stars4/5Quantum Computing For Dummies Rating: 3 out of 5 stars3/5Deep Search: How to Explore the Internet More Effectively Rating: 5 out of 5 stars5/5Becoming a Data Head: How to Think, Speak, and Understand Data Science, Statistics, and Machine Learning Rating: 5 out of 5 stars5/5Excel 101: A Beginner's & Intermediate's Guide for Mastering the Quintessence of Microsoft Excel (2010-2019 & 365) in no time! Rating: 0 out of 5 stars0 ratingsMicrosoft Azure For Dummies Rating: 0 out of 5 stars0 ratingsThe Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling Rating: 0 out of 5 stars0 ratings
Reviews for Learn T-SQL Querying
0 ratings0 reviews
Book preview
Learn T-SQL Querying - Pedro Lopes
Learn T-SQL Querying
Copyright © 2024 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Group Product Manager: Kaustubh Manglurkar
Publishing Product Manager: Heramb Bhavsar
Book Project Manager: Hemangi Lotlikar
Content Development Editor: Joseph Sunil
Technical Editor: Rahul Limbachiya
Copy Editor: Safis Editing
Proofreader: Safis Editing
Indexer: Tejal Daruwale Soni
Production Designer: Prafulla Nikalje
DevRel Marketing Executive: Nivedita Singh
First published: May 2019
Second edition: February 2024
Production reference: 2010324
Published by Packt Publishing Ltd.
Grosvenor House
11 St Paul’s Square
Birmingham
B3 1RB, UK
ISBN 978-1-83763-899-4
www.packtpub.com
To my wife and life partner, Sandra, and to my esteemed friends, mentors, and former colleagues in Azure Data who develop the SQL Database Engine and keep pushing the boundaries of excellence – sorry, I can’t list you all here! To the unique people I had the privilege of working with – Amit Banerjee, Bob Ward, Conor Cunningham, Hanuma Kodavalla, and Slava Oks – for inspiring me to always move forward and do better, and to everyone who keeps developing and supporting applications on this most-scalable RDBMS.
– Pedro Lopes
To Andrew and Linus, for spending countless nights and weekends without me. To the entire #SQLFamily, who continue to inspire me, support me (and each other), and drive me to be better every day. To my computer-illiterate friends, Jodie, Liza, and Erin, who I know will proudly display this book on their shelves despite having no idea what any of this means. And to my mom, who bought me my first computer when I was 8 years old and said Sure!
when I decided that adding computer science as a second major in my junior year of college seemed like a good idea.
– Pam Lahoud
Foreword
When I first met Pedro Lopes and Pam Lahoud, I already knew that they had both achieved recognition as experts in SQL Server, especially in areas such as query processing and performance. As I started working with them, I quickly realized that not only was the reputation warranted but I also came to see their characteristics of professionalism, thoroughness, and presentation skills.
All these traits come out in this book, and you gain all the benefits. I love how this book is organized. If I want to read the entire book end to end, I will first learn the fundamentals and mechanics of the optimizer from the perspective of writing T-SQL queries. Then, I will get practical advice on how to write effective queries for maximum performance on topics such as indexing. And then, I’m able to dive deep into detailed query troubleshooting techniques using the full capabilities of SQL Server. This organization of the book also allows me to jump to any section aligning with my skills and knowledge. This powerful story is now brought to life in the second edition of this book, bringing in enhancements from SQL Server 2019, SQL Server 2022, and Azure SQL designed to make your applications faster with no code changes.
Even if you believe that you understand query processing with SQL Server, you will benefit from this book. Using visual flows and examples, the first part of the book gives you a great perspective on how queries in SQL Server are compiled, executed, and cached. This part also includes key details of query processing such as cardinality estimation, optimization phases, and methods to control query optimization.
The second part is the crown jewel of the book. Pedro and Pam pour in their years of experience to give you the advice you need on topics such as analyzing query plans, proper indexing, best practices for crafting T-SQL queries, and the often-overlooked area of anti-pattern queries. These chapters are full of rich advice and examples for you to try out yourself.
Finish off the book by learning how to get faster to tune and troubleshoot query performance using powerful tools such as Query Profiling, Query Store, and Extended Events. The power of the T-SQL language comes to life as you learn how to write queries to debug the queries from your application. As readers, you get the benefit of unique information throughout the book because the authors have directly worked on these parts of the product.
SQL Server and Azure SQL have evolved over the years to provide more automation and simplify the requirements to build and manage successful database applications. However, understanding how to use the power of T-SQL is critical to achieving maximum performance and efficiency. Furthermore, to take your game to the next level, you need to understand the nuances and mechanics of the query optimizer and query execution with T-SQL in the engine. This book provides it all in a manner that you can easily understand, with all the latest updates, and in a format that you can use as a reference for years to come.
– Bob Ward
Principal architect, Microsoft
Contributors
About the authors
Pedro Lopes is a senior director of engineering at Salesforce, based in WA, USA, leading the organization responsible for the Marketing Cloud’s DB Management Plane. Previously, he was a principal architect in Microsoft Azure Data, leading the SQL Server 2022 release until its public preview. He has 20+ years of industry experience and was with Microsoft for 12+ years. He has extensive experience with query performance troubleshooting and has been a speaker at numerous conferences, such as SQLBits, PASS Summit, SQLIntersection, Microsoft Ignite, and Microsoft Build. He has written multiple technical articles about SQL that are currently in the product documentation, including several on Engine internals at https://aka.ms/sqlserverguides.
I want to thank the people who have been close to me and supported me throughout, especially my wife, Sandra. And special thanks to my writing partner and good friend Pam Lahoud, who is both a technical powerhouse and an outstanding human being, without whom this project would not have been nearly as rewarding to complete.
Pam Lahoud is a principal PM manager in Microsoft Azure Data, based in Seattle, WA, USA. She has been with Microsoft since 2006 and currently leads the SQL Server in Azure Virtual Machines product manager team. She is passionate about SQL Server performance and has focused on performance tuning and optimization, particularly from the developers’ perspective, throughout her career. She is a SQL 2008 Microsoft Certified Master with over 20 years of experience working with SQL Server and has been a speaker at several global events such as the PASS Summit, SQLBits, Microsoft Build, and Microsoft Ignite.
A big thank you to my partner, Andrew, and our son, Linus, for putting up with countless Sundays at home with Mama locked up in the office. Thank you to my writing partner, Pedro; we are the classic example of better together, and this is a life achievement I never would have reached without you! Forever your work-spouse, no matter which companies we’re at :-)
About the reviewer
Sergey Ten has been working in the database space for over 20 years, primarily on the Microsoft SQL Server and Oracle database servers. His main areas of expertise are data processing, programming language design, and high availability.
Sergey currently works at Microsoft as a principal software engineer, working in the SQL Server team in the areas of query processing and high availability. Prior to that, he worked at Quest Software developing various Oracle management solutions, and at Guidance Software working on computer forensics and eDiscovery products.
Table of Contents
Preface
Part 1: Query Processing Fundamentals
1
Understanding Query Processing
Technical requirements
Logical statement processing flow
Query compilation essentials
Query optimization essentials
Query execution essentials
Plan caching and reuse
Stored procedures
Ad hoc plan caching
Parameterization
The sp_executesql procedure
Prepared statements
How query processing impacts plan reuse
The importance of parameters
Security
Performance
Parameter sniffing
To cache or not to cache
Summary
2
Mechanics of the Query Optimizer
Technical requirements
Introducing the Cardinality Estimator
Understanding the query optimization workflow
The Trivial Plan stage
The Exploration stage
The Transaction Processing phase
The Quick Plan phase
The Full Optimization phase
Knobs for query optimization
Summary
Part 2: Dos and Don’ts of T-SQL
3
Exploring Query Execution Plans
Technical requirements
What is a query plan?
Accessing a query plan
Navigating a query plan
Query plan operators of interest
Blocking versus non-blocking operators
Data access operators
Joins
Spools
Sort and aggregation operators
Query plan properties of interest
Plan-level properties
Operator-level properties
Summary
4
Indexing for T-SQL Performance
Technical requirements
Understanding predicate SARGability
Data access using indexes
Structure of a rowstore index
Data access using rowstore indexes
Inserting and updating data in a rowstore index
Indexing strategy using rowstore indexes
Best practices for clustered indexes
Best practices for non-clustered indexes
Index maintenance
Summary
5
Writing Elegant T-SQL Queries
Technical requirements
Best practices for T-SQL querying
Referencing objects
Joining tables
Using NOLOCK
Using cursors
The perils of SELECT *
Functions in our predicate
Deconstructing table-valued functions
Complex expressions
Optimizing OR logic
NULL means unknown
Fuzzy string matching
Inequality logic
EXECUTE versus sp_executesql
Composable logic
Summary
6
Discovering T-SQL Anti- Patterns in Depth
Technical requirements
Implicit conversions
Avoiding unnecessary sort operations
UNION ALL versus UNION
SELECT DISTINCT
Avoiding UDF pitfalls
Avoiding unnecessary overhead with stored procedures
Pitfalls of complex views
Pitfalls of correlated sub-queries
Properly storing intermediate results
Using table variables and temporary tables
Using Common Table Expressions (CTEs)
Summary
Part 3: Assembling Our Query Troubleshooting Toolbox
7
Building Diagnostic Queries Using DMVs and DMFs
Technical requirements
Introducing DMVs
Exploring query execution DMVs
sys.dm_exec_sessions
sys.dm_exec_requests
sys.dm_exec_sql_text
sys.dm_os_waiting_tasks
Exploring query plan cache DMVs
sys.dm_exec_query_stats
sys.dm_exec_procedure_stats
sys.dm_exec_query_plan
sys.dm_exec_cached_plans
Troubleshooting common scenarios with DMV queries
Investigating blocking
Cached query plan issues
Single-use plans (query fingerprints)
Finding resource-intensive queries
Queries with excessive memory grants
Mining XML query plans
Plans with missing indexes
Plans with warnings
Plans with implicit conversions
Plans with lookups
Summary
8
Building XEvent Profiler Traces
Technical requirements
Introducing XEvents
Getting up and running with XEvent Profiler
Remote collection with SQL LogScout
Analyzing traces with RML Utilities
Summary
9
Comparative Analysis of Query Plans
Technical requirements
Query plan analyzer
Summary
10
Tracking Performance History with Query Store
Technical requirements
Introducing the Query Store
Inner workings of the Query Store
Configuring the Query Store
Tracking expensive queries
Fixing regressed queries
Features that rely on the Query Store
Query Store for readable secondary replicas
Query Store hinting
Parameter Sensitive Plan Optimization
Automatic Plan Correction
Degree of parallelism feedback
Optimized plan forcing
Summary
11
Troubleshooting Live Queries
Technical requirements
Using Live Query Statistics
Understanding the need for lightweight profiling
Diagnostics available with Lightweight Profiling
Activity Monitor gets new life
Summary
12
Managing Optimizer Changes
Technical requirements
Understanding where QTA and CE Feedback are needed
Understanding QTA fundamentals
Exploring the QTA workflow
Summary
Index
Other Books You May Enjoy
Preface
Experienced and novice users have always faced choices and trade-offs to achieve the very best performance when writing T-SQL code for their applications. This book is for all data professionals who want to master the art of writing efficient T-SQL code in modern SQL Server versions, as well as Azure SQL Database.
This book will start with query processing fundamentals to help you write solid, performant T-SQL queries. You will be introduced to query execution plans and how to leverage them for query troubleshooting. Later, you will learn how to identify various T-SQL patterns and anti-patterns. This will help you analyze execution plans to gain insights into current performance, as well as determine whether a query is scalable. You will learn how to build diagnostic queries using dynamic management views (DMVs) and dynamic management functions (DMFs) to unlock the secrets of T-SQL execution. Furthermore, you will learn how to leverage SQL Server’s built-in tools to shorten the time to address query performance and scalability issues. You will learn how to implement various features such as Extended Events, Query Store, Query Tuning Assistant, and more, using hands-on examples.
By the end of the book, you will be able to determine where query performance bottlenecks are and understand what anti-patterns may be in use and what you need to do to avoid such pitfalls going forward. It’s essentially all you need to know to squeeze every last bit of performance out of your T-SQL queries.
Who this book is for
This book is for database administrators, database developers, data analysts, data scientists, and T-SQL practitioners who want to master the art of writing efficient T-SQL code and troubleshooting query performance issues using practical examples. A basic understanding of T-SQL syntax, writing queries in SQL Server, and using the SQL Server Management Studio tool is helpful to get started.
What this book covers
Chapter 1, Understanding Query Processing, introduces T-SQL query optimization and execution essentials: how does SQL Server optimize and execute T-SQL? How does SQL Server use parameters? Are parameters an advantage? When and why does SQL Server cache execution plans for certain T-SQL statements but not for others? When is that an advantage and when is it a problem? This is information that any T-SQL practitioner needs to keep as a reference for proactive T-SQL query writing, as well as reactive troubleshooting and optimization purposes. This chapter will be referenced throughout the Execution Plan-related chapters, as we link architectural topics to real-world uses.
Chapter 2, Mechanics of the Query Optimizer, introduces T-SQL query optimization internals and architecture, starting with the infamous Cardinality Estimation process and its building blocks. From there, you will understand how the Query Optimizer uses that information to produce a just-in-time, good-enough execution plan. This chapter will be referenced throughout the Execution Plan-related chapters, as we bridge architectural topics to real-world uses.
Chapter 3, Exploring Query Execution Plans, shows you how to read and analyze a graphical query execution plan, where to look for relevant performance information in the plan, and how to use the plan to troubleshoot query performance issues.
Chapter 4, Indexing for T-SQL Performance, introduces guidelines to keep in mind for writing T-SQL queries that perform and scale well. Some basics of database physical design structure such as indexes will be covered, as well as how the optimizer estimates cost and chooses access methods based on how the query is written.
Chapter 5, Writing Elegant T-SQL Queries, reveals various common T-SQL patterns and anti-patterns, specifically those that should be easily identifiable just by looking at the T-SQL construct. This chapter will have more of a cookbook structure. For each of the patterns, we will show a T-SQL example that contains the pattern, learn how to rewrite the query to avoid the pattern, and examine query execution plans before and after the change to show improved performance.
Chapter 6, Discovering T-SQL Anti-Patterns in Depth, reveals various common T-SQL patterns and anti-patterns that require some more in-depth analysis to be identified – the proverbial elephant in the room. This chapter will also follow the cookbook structure introduced in Chapter 5, Writing Elegant T-SQL Queries.
Chapter 7, Building Diagnostic Queries Using DMVs and DMFs, introduces dynamic management views and functions that expose relevant just-in-time information to unlock the secrets of T-SQL execution. It includes real-world examples of how to use these artifacts to troubleshoot different poor performance scenarios, either leveraging snippets provided in this book or in GitHub, and how to build customized scripts.
Chapter 8, Building XEvent Profiler Traces, introduces Extended Events (XEvents), the lightweight infrastructure that exposes relevant just-in-time information from every component of the SQL Database Engine, focused on those related to T-SQL execution. You will get real-world examples of how to use these XEvents to troubleshoot different poor performance scenarios, leveraging collection and analysis tools such as the XEvent Profiler, LogScout, and Replay Markup Language for event analysis, and dropping a note on the infamously deprecated SQL Server Profiler.
Chapter 9, Comparative Analysis of Query Plans, introduces rich-UI tools that ship with SQL Server Management Studio to enable standalone query plan analysis or compare plans from different points in time. It then moves on to visually pinpoint the interesting parts that may provide the key to improving T-SQL query performance and scalability.
Chapter 10, Tracking Performance History with Query Store, introduces a flagship feature: Query Store. This is a practical approach to leveraging what is effectively a flight recorder for your SQL Database Engine T-SQL execution, for the purpose of trend analysis or T-SQL performance troubleshooting and analysis, through rich UI reports that ship with SQL Server Management Studio. Then, you will see how Query Store integrates with the Query Plan Comparison and Query Plan Analysis functionalities for a complete, UI-driven workflow for query performance insights. Lastly, we’ll review some of the SQL Database Engine features that rely on the data collected by Query Store.
Chapter 11, Troubleshooting Live Queries, introduces the profiling infrastructure that exposes real-time query execution plans, which enable scenarios such as production system troubleshooting. You will see a real-world example of how to leverage rich UI tools: Live Query Statistics as a standalone case or as part of the Activity Monitor functionality of SQL Server Management Studio.
Chapter 12, Managing Optimizer Changes, discusses two features – QTA (client-side) and CE Feedback (server-side) – which aim to address some of the most common causes of cardinality estimation (CE)-related performance regressions that may affect our T-SQL queries after an upgrade from an older version of the SQL Database Engine to a newer version.
To get the most out of this book
A basic understanding of using the SQL Database Engine and writing T-SQL queries will help get you started with this book. Some familiarity with SQL Server Management Studio or Azure Data Studio is also helpful for running the sample queries and viewing query execution plans.
The examples used in this book are designed for use on SQL Server 2022 and Azure SQL Database, but they should work on any version of SQL Server 2012 or later. The Developer Edition of SQL Server is free for development environments and can be used to run all the code samples. There is also a free tier of Azure SQL Database that you can use for testing at https://aka.ms/freedb.
You will need the AdventureWorks2016_EXT (referred to as AdventureWorks) and AdventureWorksDW2016_EXT (referred to as AdventureWorksDW) sample databases, which can be found on GitHub at https://github.com/Microsoft/sql-server-samples/releases/tag/adventureworks.
If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book’s GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.
Note
This book contains many horizontally long screenshots. These have been captured to provide readers with an overview of the execution plans for various SQL queries. As a result, the text in these images may appear small at 100% zoom. Additionally, you will be able to see these plans in more depth in the output in SQL Server as you code along.
Download the example code files
You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Learn-T-SQL-Querying-Second-Edition. If there’s an update to the code, it will be updated in the GitHub repository.
We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
Conventions used
There are a number of text conventions used throughout this book.
Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: Mount the downloaded WebStorm-10*.dmg disk image file as another disk in your system.
A block of code is set as follows:
SELECT LastName, FirstName
FROM Person.Person
WHERE FirstName = N'Andrew';
Bold: Indicates a new term, an important word, or words that you see onscreen. For instance, words in menus or dialog boxes appear in bold. Here is an example: Select System info from the Administration panel.
Tips or important notes
Appear like this.
Get in touch
Feedback from our readers is always welcome.
General feedback: If you have questions about any aspect of this book, email us at [email protected] and mention the book title in the subject of your message.
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata and fill in the form.
Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.
Share Your Thoughts
Once you’ve read Learn T-SQL Querying, Second Edition, we’d love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.
Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.
Download a free PDF copy of this book
Thanks for purchasing this book!
Do you like to read on the go but are unable to carry your print books everywhere?
Is your e-book purchase not compatible with the device of your choice?
Don’t worry!, Now with every Packt book, you get a DRM-free PDF version of that book at no cost.
Read anywhere, any place, on any device. Search, copy, and paste code from your favorite technical books directly into your application.
The perks don’t stop there, you can get exclusive access to discounts, newsletters, and great free content in your inbox daily
Follow these simple steps to get the benefits:
Scan the QR code or visit the following link:
https://packt.link/free-ebook/9781837638994
Submit your proof of purchase.
That’s it! We’ll send your free PDF and other benefits to your email directly.
Part 1: Query Processing Fundamentals
To understand how to write solid, performant T-SQL queries, users should know how SQL Server runs the T-SQL syntax to deliver the intended result sets in a scalable fashion. This part introduces you to concepts that are used throughout the remaining parts of the book to explain most patterns and anti-patterns, as well as mitigation strategies.
This part has the following chapters:
Chapter 1, Understanding Query Processing
Chapter 2, Mechanics of the Query Optimizer
1
Understanding Query Processing
Transact-SQL, or T-SQL as it has become commonly known, is the language used to communicate with Microsoft SQL Server and Azure SQL Database. Any actions a user wishes to perform in a server, such as retrieving or modifying data in a database, creating objects, or changing server configurations, are all done via T-SQL commands.
The first step in learning to write efficient T-SQL queries is understanding how the SQL Database Engine processes and executes the query. The Query Processor is a component, therefore a noun, should not be all lowercased includes query compilation, query optimization, and query execution essentials: how does the SQL Database Engine compile an incoming T-SQL statement? How does the SQL Database Engine optimize and execute a T-SQL statement? How does the SQL Database Engine use parameters? Are parameters an advantage? When and why does the SQL Database Engine cache execution plans for certain T-SQL statements but not for others? When is that an advantage and when is it a problem? This is information that any T-SQL practitioner needs to keep as a reference for proactive T-SQL query writing, as well as reactive troubleshooting and optimization purposes. This chapter will be referenced throughout all following chapters, as we bridge the gap between architectural topics and real-world usage.
In this chapter, we’re going to cover the following main topics:
Logical statement processing flow
Query compilation essentials
Query optimization essentials
Query execution essentials
Plan caching and reuse
The importance of parameters
Technical requirements
The examples used in this chapter are designed for use on SQL Server 2022 and Azure SQL Database, but they should work on SQL Server version 2012 or later. The Developer Edition of SQL Server is free for development environments and can be used to run all the code samples. There is also a free tier of Azure SQL Database you can use for testing at https://aka.ms/freedb.
You will need the sample database AdventureWorks2016_EXT (referred to as AdventureWorks), which can be found on GitHub at https://github.com/Microsoft/sql-server-samples/releases/tag/adventureworks. The code samples for this chapter can also be found on GitHub at https://github.com/PacktPublishing/Learn-T-SQL-Querying-Second-Edition/tree/main/ch1.
Logical statement processing flow
When writing T-SQL, it is important to be familiar with the order in which the SQL Database Engine interprets queries, to later create an execution plan. This helps anticipate possible performance issues arising from poorly written queries, as well as helping you understand cases of unintended results. The following steps outline a summarized view of the method that the Database Engine follows to process a T-SQL statement:
Process all the source and target objects stated in the FROM clause (tables, views, and TVFs), together with the intended logical operation (JOIN and APPLY) to perform on those objects.
Apply whatever pre-filters are defined in the WHERE clause to reduce the number of incoming rows from those objects.
Apply any aggregation defined in the GROUP BY or aggregate functions (for example, a MIN or MAX function).
Apply filters that can only be applied on the aggregations as defined in the HAVING clause.
Compute the logic for windowing functions such as ROW_NUMBER, RANK, NTILE, LAG, and LEAD.
Keep only the required columns for the output as specified in the SELECT clause, and if a UNION clause is present, combine the row sets.
Remove duplicates from the row set if a DISTINCTclause exists.
Order the resulting row set as specified by the ORDER BY clause.
Account for any limits stated in the TOP clause.
It becomes clearer now that properly defining how tables are joined (the logical join type) is important to any scalable T-SQL query, namely by carefully planning on which columns the tables are joined. For example, in an inner join, these join arguments are the first level of data filtering that can be enforced, because only the rows that represent the intersection of two tables are eligible for subsequent operations.
Then it also makes sense to filter out rows from the result set using a WHERE clause, rather than applying any post-filtering conditions that apply to sub-groupings using a HAVING clause. Consider these two example queries:
SELECT p.ProductNumber, AVG(sod.UnitPrice)
FROM Production.Product AS p
INNER JOIN Sales.SalesOrderDetail AS sod ON p.ProductID = sod.ProductID
GROUP BY p.ProductNumber
HAVING p.ProductNumber LIKE 'L%';
SELECT p.ProductNumber, AVG(sod.UnitPrice)
FROM Production.Product AS p
INNER JOIN Sales.SalesOrderDetail AS sod ON p.ProductID = sod.ProductID
WHERE p.ProductNumber LIKE 'L%'
GROUP BY p.ProductNumber;
While these two queries are logically equivalent, the second one is more efficient because the rows that do not have a ProductNumber starting with L will be filtered out of the results before the aggregation is calculated. This is because the SQL Database Engine evaluates a WHERE clause before a HAVING clause and can limit the row count earlier in the execution phase, translating into reduced I/O and memory requirements, and also reduced CPU usage when applying the post-filter to the group.
The following diagram summarizes the logical statement-processing flow for the building blocks discussed previously in this chapter:
Figure 1.1: Flow chart summarizing the logical statement-processing flow of a queryFigure 1.1: Flow chart summarizing the logical statement-processing flow of a query
Now that we understand the order in which the SQL Database Engine processes queries, let’s explore the essentials of query compilation.
Query compilation essentials
The main stages of query processing can be seen in the following overview diagram, which we will expand on throughout this chapter:
Figure 1.2: Flow chart representing the states of query processingFigure 1.2: Flow chart representing the states of query processing
The Query Processor is the component inside the SQL Database Engine that is responsible for compiling a query. In this section, we will focus on the highlighted steps of the following diagram that handle query compilation:
Figure 1.3: States of query processing related to query compilationFigure 1.3: States of query processing related to query compilation
The first stage of query processing is generally known as query compilation and includes a series of tasks that will eventually lead to the creation of a query plan. When an incoming T-SQL statement is parsed to perform syntax validations and ensure that it is correct T-SQL, a query hash value is generated that represents the statement text exactly as it was written. If that query hash is already mapped to a cached query plan, then it can just attempt to reuse that plan. However, if a query plan for the incoming query is not already found in the cache, query compilation proceeds with the following tasks:
Perform binding, which is the process of verifying that the referenced tables and columns exist in the database schema.
References to a view are replaced with the definition of that view (this is called expanding the view).
Load metadata for the referenced tables and columns. This metadata is as follows:
The definition of tables, indexes, views, constraints, and so on, that apply to the query.
Data distribution statistics on the applicable schema object.
Verify whether data conversions are required for the query.
Note
When the query compilation process is complete, a structure that can be used by the Query Optimizer is produced, known as the algebrizer tree or query tree.
The following diagram further details these compilation tasks:
Figure 1.4: Flow of compilation tasks for T-SQL statementsFigure 1.4: Flow of compilation tasks for T-SQL statements
If the T-SQL statement is a Data Definition Language (DDL) statement, there’s no possible optimization, and so a plan is produced immediately. However, if the T-SQL statement is a Data Manipulation Language (DML) statement, the SQL Database Engine will move to an exploratory process known as query optimization, which we will explore in the next section.
Query optimization essentials
The Query Processor is also the component inside the SQL Database Engine that is responsible for query optimization. This is the second stage of query processing and its goal is to produce a query plan that can then be cached for all subsequent uses of the same query. In this section, we will focus on the highlighted sections of the following diagram that handle query optimization:
Figure 1.5: States of query processing related to query optimizationFigure 1.5: States of query processing related to query optimization
The SQL Database Engine uses cost-based optimization, which means that the Query Optimizer is driven mostly by estimations of the required cost to access and transform data (such as joins and aggregations) that will produce the intended result set. The purpose of the optimization process is to reasonably minimize the I/O, memory, and compute resources needed to execute a query in the fastest way possible. But it is also a time-bound process and can time out. This means that the Query Optimizer may not iterate through all the possible optimization permutations of a given T-SQL statement, but rather stops itself after finding an estimated good enough
compromise between low resource usage and faster execution times.
For this, the Query Optimizer takes several inputs to later produce what is called a query execution plan. These inputs are the following:
The incoming T-SQL statement, including any input