Explore 1.5M+ audiobooks & ebooks free for days

Only $12.99 CAD/month after trial. Cancel anytime.

Learn T-SQL Querying: A guide to developing efficient and elegant T-SQL code
Learn T-SQL Querying: A guide to developing efficient and elegant T-SQL code
Learn T-SQL Querying: A guide to developing efficient and elegant T-SQL code
Ebook817 pages6 hours

Learn T-SQL Querying: A guide to developing efficient and elegant T-SQL code

Rating: 0 out of 5 stars

()

Read preview
LanguageEnglish
PublisherPackt Publishing
Release dateFeb 29, 2024
ISBN9781837636839
Learn T-SQL Querying: A guide to developing efficient and elegant T-SQL code
Author

Pedro Lopes

Pedro Lopes is a Program Manager in the Database Systems group, based in Redmond, WA, USA. He has over 19 years of industry experience and has been with Microsoft for 9 years. He is currently responsible for program management of Database Engine features for in-market and vNext versions of SQL Server, with a special focus on the Relational Engine. He has extensive experience with query performance troubleshooting and is a regular speaker at numerous conferences such as SQLBits, PASS Summit, SQLIntersection, Microsoft Ignite, and Microsoft Build. He blogs about SQL on the SQL Server Team blog. He has authored several tools in the Tiger toolbox on GitHub: AdaptiveIndexDefrag maintenance solution, BPCheck, and usp_WhatsUp.

Related to Learn T-SQL Querying

Related ebooks

Computers For You

View More

Reviews for Learn T-SQL Querying

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Learn T-SQL Querying - Pedro Lopes

    Cover.png

    Learn T-SQL Querying

    Copyright © 2024 Packt Publishing

    All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

    Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

    Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

    Group Product Manager: Kaustubh Manglurkar

    Publishing Product Manager: Heramb Bhavsar

    Book Project Manager: Hemangi Lotlikar

    Content Development Editor: Joseph Sunil

    Technical Editor: Rahul Limbachiya

    Copy Editor: Safis Editing

    Proofreader: Safis Editing

    Indexer: Tejal Daruwale Soni

    Production Designer: Prafulla Nikalje

    DevRel Marketing Executive: Nivedita Singh

    First published: May 2019

    Second edition: February 2024

    Production reference: 2010324

    Published by Packt Publishing Ltd.

    Grosvenor House

    11 St Paul’s Square

    Birmingham

    B3 1RB, UK

    ISBN 978-1-83763-899-4

    www.packtpub.com

    To my wife and life partner, Sandra, and to my esteemed friends, mentors, and former colleagues in Azure Data who develop the SQL Database Engine and keep pushing the boundaries of excellence – sorry, I can’t list you all here! To the unique people I had the privilege of working with – Amit Banerjee, Bob Ward, Conor Cunningham, Hanuma Kodavalla, and Slava Oks – for inspiring me to always move forward and do better, and to everyone who keeps developing and supporting applications on this most-scalable RDBMS.

    – Pedro Lopes

    To Andrew and Linus, for spending countless nights and weekends without me. To the entire #SQLFamily, who continue to inspire me, support me (and each other), and drive me to be better every day. To my computer-illiterate friends, Jodie, Liza, and Erin, who I know will proudly display this book on their shelves despite having no idea what any of this means. And to my mom, who bought me my first computer when I was 8 years old and said Sure! when I decided that adding computer science as a second major in my junior year of college seemed like a good idea.

    – Pam Lahoud

    Foreword

    When I first met Pedro Lopes and Pam Lahoud, I already knew that they had both achieved recognition as experts in SQL Server, especially in areas such as query processing and performance. As I started working with them, I quickly realized that not only was the reputation warranted but I also came to see their characteristics of professionalism, thoroughness, and presentation skills.

    All these traits come out in this book, and you gain all the benefits. I love how this book is organized. If I want to read the entire book end to end, I will first learn the fundamentals and mechanics of the optimizer from the perspective of writing T-SQL queries. Then, I will get practical advice on how to write effective queries for maximum performance on topics such as indexing. And then, I’m able to dive deep into detailed query troubleshooting techniques using the full capabilities of SQL Server. This organization of the book also allows me to jump to any section aligning with my skills and knowledge. This powerful story is now brought to life in the second edition of this book, bringing in enhancements from SQL Server 2019, SQL Server 2022, and Azure SQL designed to make your applications faster with no code changes.

    Even if you believe that you understand query processing with SQL Server, you will benefit from this book. Using visual flows and examples, the first part of the book gives you a great perspective on how queries in SQL Server are compiled, executed, and cached. This part also includes key details of query processing such as cardinality estimation, optimization phases, and methods to control query optimization.

    The second part is the crown jewel of the book. Pedro and Pam pour in their years of experience to give you the advice you need on topics such as analyzing query plans, proper indexing, best practices for crafting T-SQL queries, and the often-overlooked area of anti-pattern queries. These chapters are full of rich advice and examples for you to try out yourself.

    Finish off the book by learning how to get faster to tune and troubleshoot query performance using powerful tools such as Query Profiling, Query Store, and Extended Events. The power of the T-SQL language comes to life as you learn how to write queries to debug the queries from your application. As readers, you get the benefit of unique information throughout the book because the authors have directly worked on these parts of the product.

    SQL Server and Azure SQL have evolved over the years to provide more automation and simplify the requirements to build and manage successful database applications. However, understanding how to use the power of T-SQL is critical to achieving maximum performance and efficiency. Furthermore, to take your game to the next level, you need to understand the nuances and mechanics of the query optimizer and query execution with T-SQL in the engine. This book provides it all in a manner that you can easily understand, with all the latest updates, and in a format that you can use as a reference for years to come.

    Bob Ward

    Principal architect, Microsoft

    Contributors

    About the authors

    Pedro Lopes is a senior director of engineering at Salesforce, based in WA, USA, leading the organization responsible for the Marketing Cloud’s DB Management Plane. Previously, he was a principal architect in Microsoft Azure Data, leading the SQL Server 2022 release until its public preview. He has 20+ years of industry experience and was with Microsoft for 12+ years. He has extensive experience with query performance troubleshooting and has been a speaker at numerous conferences, such as SQLBits, PASS Summit, SQLIntersection, Microsoft Ignite, and Microsoft Build. He has written multiple technical articles about SQL that are currently in the product documentation, including several on Engine internals at https://aka.ms/sqlserverguides.

    I want to thank the people who have been close to me and supported me throughout, especially my wife, Sandra. And special thanks to my writing partner and good friend Pam Lahoud, who is both a technical powerhouse and an outstanding human being, without whom this project would not have been nearly as rewarding to complete.

    Pam Lahoud is a principal PM manager in Microsoft Azure Data, based in Seattle, WA, USA. She has been with Microsoft since 2006 and currently leads the SQL Server in Azure Virtual Machines product manager team. She is passionate about SQL Server performance and has focused on performance tuning and optimization, particularly from the developers’ perspective, throughout her career. She is a SQL 2008 Microsoft Certified Master with over 20 years of experience working with SQL Server and has been a speaker at several global events such as the PASS Summit, SQLBits, Microsoft Build, and Microsoft Ignite.

    A big thank you to my partner, Andrew, and our son, Linus, for putting up with countless Sundays at home with Mama locked up in the office. Thank you to my writing partner, Pedro; we are the classic example of better together, and this is a life achievement I never would have reached without you! Forever your work-spouse, no matter which companies we’re at :-)

    About the reviewer

    Sergey Ten has been working in the database space for over 20 years, primarily on the Microsoft SQL Server and Oracle database servers. His main areas of expertise are data processing, programming language design, and high availability.

    Sergey currently works at Microsoft as a principal software engineer, working in the SQL Server team in the areas of query processing and high availability. Prior to that, he worked at Quest Software developing various Oracle management solutions, and at Guidance Software working on computer forensics and eDiscovery products.

    Table of Contents

    Preface

    Part 1: Query Processing Fundamentals

    1

    Understanding Query Processing

    Technical requirements

    Logical statement processing flow

    Query compilation essentials

    Query optimization essentials

    Query execution essentials

    Plan caching and reuse

    Stored procedures

    Ad hoc plan caching

    Parameterization

    The sp_executesql procedure

    Prepared statements

    How query processing impacts plan reuse

    The importance of parameters

    Security

    Performance

    Parameter sniffing

    To cache or not to cache

    Summary

    2

    Mechanics of the Query Optimizer

    Technical requirements

    Introducing the Cardinality Estimator

    Understanding the query optimization workflow

    The Trivial Plan stage

    The Exploration stage

    The Transaction Processing phase

    The Quick Plan phase

    The Full Optimization phase

    Knobs for query optimization

    Summary

    Part 2: Dos and Don’ts of T-SQL

    3

    Exploring Query Execution Plans

    Technical requirements

    What is a query plan?

    Accessing a query plan

    Navigating a query plan

    Query plan operators of interest

    Blocking versus non-blocking operators

    Data access operators

    Joins

    Spools

    Sort and aggregation operators

    Query plan properties of interest

    Plan-level properties

    Operator-level properties

    Summary

    4

    Indexing for T-SQL Performance

    Technical requirements

    Understanding predicate SARGability

    Data access using indexes

    Structure of a rowstore index

    Data access using rowstore indexes

    Inserting and updating data in a rowstore index

    Indexing strategy using rowstore indexes

    Best practices for clustered indexes

    Best practices for non-clustered indexes

    Index maintenance

    Summary

    5

    Writing Elegant T-SQL Queries

    Technical requirements

    Best practices for T-SQL querying

    Referencing objects

    Joining tables

    Using NOLOCK

    Using cursors

    The perils of SELECT *

    Functions in our predicate

    Deconstructing table-valued functions

    Complex expressions

    Optimizing OR logic

    NULL means unknown

    Fuzzy string matching

    Inequality logic

    EXECUTE versus sp_executesql

    Composable logic

    Summary

    6

    Discovering T-SQL Anti- Patterns in Depth

    Technical requirements

    Implicit conversions

    Avoiding unnecessary sort operations

    UNION ALL versus UNION

    SELECT DISTINCT

    Avoiding UDF pitfalls

    Avoiding unnecessary overhead with stored procedures

    Pitfalls of complex views

    Pitfalls of correlated sub-queries

    Properly storing intermediate results

    Using table variables and temporary tables

    Using Common Table Expressions (CTEs)

    Summary

    Part 3: Assembling Our Query Troubleshooting Toolbox

    7

    Building Diagnostic Queries Using DMVs and DMFs

    Technical requirements

    Introducing DMVs

    Exploring query execution DMVs

    sys.dm_exec_sessions

    sys.dm_exec_requests

    sys.dm_exec_sql_text

    sys.dm_os_waiting_tasks

    Exploring query plan cache DMVs

    sys.dm_exec_query_stats

    sys.dm_exec_procedure_stats

    sys.dm_exec_query_plan

    sys.dm_exec_cached_plans

    Troubleshooting common scenarios with DMV queries

    Investigating blocking

    Cached query plan issues

    Single-use plans (query fingerprints)

    Finding resource-intensive queries

    Queries with excessive memory grants

    Mining XML query plans

    Plans with missing indexes

    Plans with warnings

    Plans with implicit conversions

    Plans with lookups

    Summary

    8

    Building XEvent Profiler Traces

    Technical requirements

    Introducing XEvents

    Getting up and running with XEvent Profiler

    Remote collection with SQL LogScout

    Analyzing traces with RML Utilities

    Summary

    9

    Comparative Analysis of Query Plans

    Technical requirements

    Query plan analyzer

    Summary

    10

    Tracking Performance History with Query Store

    Technical requirements

    Introducing the Query Store

    Inner workings of the Query Store

    Configuring the Query Store

    Tracking expensive queries

    Fixing regressed queries

    Features that rely on the Query Store

    Query Store for readable secondary replicas

    Query Store hinting

    Parameter Sensitive Plan Optimization

    Automatic Plan Correction

    Degree of parallelism feedback

    Optimized plan forcing

    Summary

    11

    Troubleshooting Live Queries

    Technical requirements

    Using Live Query Statistics

    Understanding the need for lightweight profiling

    Diagnostics available with Lightweight Profiling

    Activity Monitor gets new life

    Summary

    12

    Managing Optimizer Changes

    Technical requirements

    Understanding where QTA and CE Feedback are needed

    Understanding QTA fundamentals

    Exploring the QTA workflow

    Summary

    Index

    Other Books You May Enjoy

    Preface

    Experienced and novice users have always faced choices and trade-offs to achieve the very best performance when writing T-SQL code for their applications. This book is for all data professionals who want to master the art of writing efficient T-SQL code in modern SQL Server versions, as well as Azure SQL Database.

    This book will start with query processing fundamentals to help you write solid, performant T-SQL queries. You will be introduced to query execution plans and how to leverage them for query troubleshooting. Later, you will learn how to identify various T-SQL patterns and anti-patterns. This will help you analyze execution plans to gain insights into current performance, as well as determine whether a query is scalable. You will learn how to build diagnostic queries using dynamic management views (DMVs) and dynamic management functions (DMFs) to unlock the secrets of T-SQL execution. Furthermore, you will learn how to leverage SQL Server’s built-in tools to shorten the time to address query performance and scalability issues. You will learn how to implement various features such as Extended Events, Query Store, Query Tuning Assistant, and more, using hands-on examples.

    By the end of the book, you will be able to determine where query performance bottlenecks are and understand what anti-patterns may be in use and what you need to do to avoid such pitfalls going forward. It’s essentially all you need to know to squeeze every last bit of performance out of your T-SQL queries.

    Who this book is for

    This book is for database administrators, database developers, data analysts, data scientists, and T-SQL practitioners who want to master the art of writing efficient T-SQL code and troubleshooting query performance issues using practical examples. A basic understanding of T-SQL syntax, writing queries in SQL Server, and using the SQL Server Management Studio tool is helpful to get started.

    What this book covers

    Chapter 1, Understanding Query Processing, introduces T-SQL query optimization and execution essentials: how does SQL Server optimize and execute T-SQL? How does SQL Server use parameters? Are parameters an advantage? When and why does SQL Server cache execution plans for certain T-SQL statements but not for others? When is that an advantage and when is it a problem? This is information that any T-SQL practitioner needs to keep as a reference for proactive T-SQL query writing, as well as reactive troubleshooting and optimization purposes. This chapter will be referenced throughout the Execution Plan-related chapters, as we link architectural topics to real-world uses.

    Chapter 2, Mechanics of the Query Optimizer, introduces T-SQL query optimization internals and architecture, starting with the infamous Cardinality Estimation process and its building blocks. From there, you will understand how the Query Optimizer uses that information to produce a just-in-time, good-enough execution plan. This chapter will be referenced throughout the Execution Plan-related chapters, as we bridge architectural topics to real-world uses.

    Chapter 3, Exploring Query Execution Plans, shows you how to read and analyze a graphical query execution plan, where to look for relevant performance information in the plan, and how to use the plan to troubleshoot query performance issues.

    Chapter 4, Indexing for T-SQL Performance, introduces guidelines to keep in mind for writing T-SQL queries that perform and scale well. Some basics of database physical design structure such as indexes will be covered, as well as how the optimizer estimates cost and chooses access methods based on how the query is written.

    Chapter 5, Writing Elegant T-SQL Queries, reveals various common T-SQL patterns and anti-patterns, specifically those that should be easily identifiable just by looking at the T-SQL construct. This chapter will have more of a cookbook structure. For each of the patterns, we will show a T-SQL example that contains the pattern, learn how to rewrite the query to avoid the pattern, and examine query execution plans before and after the change to show improved performance.

    Chapter 6, Discovering T-SQL Anti-Patterns in Depth, reveals various common T-SQL patterns and anti-patterns that require some more in-depth analysis to be identified – the proverbial elephant in the room. This chapter will also follow the cookbook structure introduced in Chapter 5, Writing Elegant T-SQL Queries.

    Chapter 7, Building Diagnostic Queries Using DMVs and DMFs, introduces dynamic management views and functions that expose relevant just-in-time information to unlock the secrets of T-SQL execution. It includes real-world examples of how to use these artifacts to troubleshoot different poor performance scenarios, either leveraging snippets provided in this book or in GitHub, and how to build customized scripts.

    Chapter 8, Building XEvent Profiler Traces, introduces Extended Events (XEvents), the lightweight infrastructure that exposes relevant just-in-time information from every component of the SQL Database Engine, focused on those related to T-SQL execution. You will get real-world examples of how to use these XEvents to troubleshoot different poor performance scenarios, leveraging collection and analysis tools such as the XEvent Profiler, LogScout, and Replay Markup Language for event analysis, and dropping a note on the infamously deprecated SQL Server Profiler.

    Chapter 9, Comparative Analysis of Query Plans, introduces rich-UI tools that ship with SQL Server Management Studio to enable standalone query plan analysis or compare plans from different points in time. It then moves on to visually pinpoint the interesting parts that may provide the key to improving T-SQL query performance and scalability.

    Chapter 10, Tracking Performance History with Query Store, introduces a flagship feature: Query Store. This is a practical approach to leveraging what is effectively a flight recorder for your SQL Database Engine T-SQL execution, for the purpose of trend analysis or T-SQL performance troubleshooting and analysis, through rich UI reports that ship with SQL Server Management Studio. Then, you will see how Query Store integrates with the Query Plan Comparison and Query Plan Analysis functionalities for a complete, UI-driven workflow for query performance insights. Lastly, we’ll review some of the SQL Database Engine features that rely on the data collected by Query Store.

    Chapter 11, Troubleshooting Live Queries, introduces the profiling infrastructure that exposes real-time query execution plans, which enable scenarios such as production system troubleshooting. You will see a real-world example of how to leverage rich UI tools: Live Query Statistics as a standalone case or as part of the Activity Monitor functionality of SQL Server Management Studio.

    Chapter 12, Managing Optimizer Changes, discusses two features – QTA (client-side) and CE Feedback (server-side) – which aim to address some of the most common causes of cardinality estimation (CE)-related performance regressions that may affect our T-SQL queries after an upgrade from an older version of the SQL Database Engine to a newer version.

    To get the most out of this book

    A basic understanding of using the SQL Database Engine and writing T-SQL queries will help get you started with this book. Some familiarity with SQL Server Management Studio or Azure Data Studio is also helpful for running the sample queries and viewing query execution plans.

    The examples used in this book are designed for use on SQL Server 2022 and Azure SQL Database, but they should work on any version of SQL Server 2012 or later. The Developer Edition of SQL Server is free for development environments and can be used to run all the code samples. There is also a free tier of Azure SQL Database that you can use for testing at https://aka.ms/freedb.

    You will need the AdventureWorks2016_EXT (referred to as AdventureWorks) and AdventureWorksDW2016_EXT (referred to as AdventureWorksDW) sample databases, which can be found on GitHub at https://github.com/Microsoft/sql-server-samples/releases/tag/adventureworks.

    If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book’s GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.

    Note

    This book contains many horizontally long screenshots. These have been captured to provide readers with an overview of the execution plans for various SQL queries. As a result, the text in these images may appear small at 100% zoom. Additionally, you will be able to see these plans in more depth in the output in SQL Server as you code along.

    Download the example code files

    You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Learn-T-SQL-Querying-Second-Edition. If there’s an update to the code, it will be updated in the GitHub repository.

    We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

    Conventions used

    There are a number of text conventions used throughout this book.

    Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: Mount the downloaded WebStorm-10*.dmg disk image file as another disk in your system.

    A block of code is set as follows:

    SELECT LastName, FirstName

    FROM Person.Person

    WHERE FirstName = N'Andrew';

    Bold: Indicates a new term, an important word, or words that you see onscreen. For instance, words in menus or dialog boxes appear in bold. Here is an example: Select System info from the Administration panel.

    Tips or important notes

    Appear like this.

    Get in touch

    Feedback from our readers is always welcome.

    General feedback: If you have questions about any aspect of this book, email us at [email protected] and mention the book title in the subject of your message.

    Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata and fill in the form.

    Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

    If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

    Share Your Thoughts

    Once you’ve read Learn T-SQL Querying, Second Edition, we’d love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.

    Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.

    Download a free PDF copy of this book

    Thanks for purchasing this book!

    Do you like to read on the go but are unable to carry your print books everywhere?

    Is your e-book purchase not compatible with the device of your choice?

    Don’t worry!, Now with every Packt book, you get a DRM-free PDF version of that book at no cost.

    Read anywhere, any place, on any device. Search, copy, and paste code from your favorite technical books directly into your application.

    The perks don’t stop there, you can get exclusive access to discounts, newsletters, and great free content in your inbox daily

    Follow these simple steps to get the benefits:

    Scan the QR code or visit the following link:

    https://packt.link/free-ebook/9781837638994

    Submit your proof of purchase.

    That’s it! We’ll send your free PDF and other benefits to your email directly.

    Part 1: Query Processing Fundamentals

    To understand how to write solid, performant T-SQL queries, users should know how SQL Server runs the T-SQL syntax to deliver the intended result sets in a scalable fashion. This part introduces you to concepts that are used throughout the remaining parts of the book to explain most patterns and anti-patterns, as well as mitigation strategies.

    This part has the following chapters:

    Chapter 1, Understanding Query Processing

    Chapter 2, Mechanics of the Query Optimizer

    1

    Understanding Query Processing

    Transact-SQL, or T-SQL as it has become commonly known, is the language used to communicate with Microsoft SQL Server and Azure SQL Database. Any actions a user wishes to perform in a server, such as retrieving or modifying data in a database, creating objects, or changing server configurations, are all done via T-SQL commands.

    The first step in learning to write efficient T-SQL queries is understanding how the SQL Database Engine processes and executes the query. The Query Processor is a component, therefore a noun, should not be all lowercased includes query compilation, query optimization, and query execution essentials: how does the SQL Database Engine compile an incoming T-SQL statement? How does the SQL Database Engine optimize and execute a T-SQL statement? How does the SQL Database Engine use parameters? Are parameters an advantage? When and why does the SQL Database Engine cache execution plans for certain T-SQL statements but not for others? When is that an advantage and when is it a problem? This is information that any T-SQL practitioner needs to keep as a reference for proactive T-SQL query writing, as well as reactive troubleshooting and optimization purposes. This chapter will be referenced throughout all following chapters, as we bridge the gap between architectural topics and real-world usage.

    In this chapter, we’re going to cover the following main topics:

    Logical statement processing flow

    Query compilation essentials

    Query optimization essentials

    Query execution essentials

    Plan caching and reuse

    The importance of parameters

    Technical requirements

    The examples used in this chapter are designed for use on SQL Server 2022 and Azure SQL Database, but they should work on SQL Server version 2012 or later. The Developer Edition of SQL Server is free for development environments and can be used to run all the code samples. There is also a free tier of Azure SQL Database you can use for testing at https://aka.ms/freedb.

    You will need the sample database AdventureWorks2016_EXT (referred to as AdventureWorks), which can be found on GitHub at https://github.com/Microsoft/sql-server-samples/releases/tag/adventureworks. The code samples for this chapter can also be found on GitHub at https://github.com/PacktPublishing/Learn-T-SQL-Querying-Second-Edition/tree/main/ch1.

    Logical statement processing flow

    When writing T-SQL, it is important to be familiar with the order in which the SQL Database Engine interprets queries, to later create an execution plan. This helps anticipate possible performance issues arising from poorly written queries, as well as helping you understand cases of unintended results. The following steps outline a summarized view of the method that the Database Engine follows to process a T-SQL statement:

    Process all the source and target objects stated in the FROM clause (tables, views, and TVFs), together with the intended logical operation (JOIN and APPLY) to perform on those objects.

    Apply whatever pre-filters are defined in the WHERE clause to reduce the number of incoming rows from those objects.

    Apply any aggregation defined in the GROUP BY or aggregate functions (for example, a MIN or MAX function).

    Apply filters that can only be applied on the aggregations as defined in the HAVING clause.

    Compute the logic for windowing functions such as ROW_NUMBER, RANK, NTILE, LAG, and LEAD.

    Keep only the required columns for the output as specified in the SELECT clause, and if a UNION clause is present, combine the row sets.

    Remove duplicates from the row set if a DISTINCTclause exists.

    Order the resulting row set as specified by the ORDER BY clause.

    Account for any limits stated in the TOP clause.

    It becomes clearer now that properly defining how tables are joined (the logical join type) is important to any scalable T-SQL query, namely by carefully planning on which columns the tables are joined. For example, in an inner join, these join arguments are the first level of data filtering that can be enforced, because only the rows that represent the intersection of two tables are eligible for subsequent operations.

    Then it also makes sense to filter out rows from the result set using a WHERE clause, rather than applying any post-filtering conditions that apply to sub-groupings using a HAVING clause. Consider these two example queries:

    SELECT p.ProductNumber, AVG(sod.UnitPrice)

    FROM Production.Product AS p

    INNER JOIN Sales.SalesOrderDetail AS sod ON p.ProductID = sod.ProductID

    GROUP BY p.ProductNumber

    HAVING p.ProductNumber LIKE 'L%';

    SELECT p.ProductNumber, AVG(sod.UnitPrice)

    FROM Production.Product AS p

    INNER JOIN Sales.SalesOrderDetail AS sod ON p.ProductID = sod.ProductID

    WHERE p.ProductNumber LIKE 'L%'

    GROUP BY p.ProductNumber;

    While these two queries are logically equivalent, the second one is more efficient because the rows that do not have a ProductNumber starting with L will be filtered out of the results before the aggregation is calculated. This is because the SQL Database Engine evaluates a WHERE clause before a HAVING clause and can limit the row count earlier in the execution phase, translating into reduced I/O and memory requirements, and also reduced CPU usage when applying the post-filter to the group.

    The following diagram summarizes the logical statement-processing flow for the building blocks discussed previously in this chapter:

    Figure 1.1: Flow chart summarizing the logical statement-processing flow of a query

    Figure 1.1: Flow chart summarizing the logical statement-processing flow of a query

    Now that we understand the order in which the SQL Database Engine processes queries, let’s explore the essentials of query compilation.

    Query compilation essentials

    The main stages of query processing can be seen in the following overview diagram, which we will expand on throughout this chapter:

    Figure 1.2: Flow chart representing the states of query processing

    Figure 1.2: Flow chart representing the states of query processing

    The Query Processor is the component inside the SQL Database Engine that is responsible for compiling a query. In this section, we will focus on the highlighted steps of the following diagram that handle query compilation:

    Figure 1.3: States of query processing related to query compilation

    Figure 1.3: States of query processing related to query compilation

    The first stage of query processing is generally known as query compilation and includes a series of tasks that will eventually lead to the creation of a query plan. When an incoming T-SQL statement is parsed to perform syntax validations and ensure that it is correct T-SQL, a query hash value is generated that represents the statement text exactly as it was written. If that query hash is already mapped to a cached query plan, then it can just attempt to reuse that plan. However, if a query plan for the incoming query is not already found in the cache, query compilation proceeds with the following tasks:

    Perform binding, which is the process of verifying that the referenced tables and columns exist in the database schema.

    References to a view are replaced with the definition of that view (this is called expanding the view).

    Load metadata for the referenced tables and columns. This metadata is as follows:

    The definition of tables, indexes, views, constraints, and so on, that apply to the query.

    Data distribution statistics on the applicable schema object.

    Verify whether data conversions are required for the query.

    Note

    When the query compilation process is complete, a structure that can be used by the Query Optimizer is produced, known as the algebrizer tree or query tree.

    The following diagram further details these compilation tasks:

    Figure 1.4: Flow of compilation tasks for T-SQL statements

    Figure 1.4: Flow of compilation tasks for T-SQL statements

    If the T-SQL statement is a Data Definition Language (DDL) statement, there’s no possible optimization, and so a plan is produced immediately. However, if the T-SQL statement is a Data Manipulation Language (DML) statement, the SQL Database Engine will move to an exploratory process known as query optimization, which we will explore in the next section.

    Query optimization essentials

    The Query Processor is also the component inside the SQL Database Engine that is responsible for query optimization. This is the second stage of query processing and its goal is to produce a query plan that can then be cached for all subsequent uses of the same query. In this section, we will focus on the highlighted sections of the following diagram that handle query optimization:

    Figure 1.5: States of query processing related to query optimization

    Figure 1.5: States of query processing related to query optimization

    The SQL Database Engine uses cost-based optimization, which means that the Query Optimizer is driven mostly by estimations of the required cost to access and transform data (such as joins and aggregations) that will produce the intended result set. The purpose of the optimization process is to reasonably minimize the I/O, memory, and compute resources needed to execute a query in the fastest way possible. But it is also a time-bound process and can time out. This means that the Query Optimizer may not iterate through all the possible optimization permutations of a given T-SQL statement, but rather stops itself after finding an estimated good enough compromise between low resource usage and faster execution times.

    For this, the Query Optimizer takes several inputs to later produce what is called a query execution plan. These inputs are the following:

    The incoming T-SQL statement, including any input

    Enjoying the preview?
    Page 1 of 1