Querying Databricks with Spark SQL: Leverage SQL to query and analyze Big Data for insights (English Edition)

Ebook1,065 pages21 hours

Querying Databricks with Spark SQL: Leverage SQL to query and analyze Big Data for insights (English Edition)

Name: Querying Databricks with Spark SQL: Leverage SQL to query and analyze Big Data for insights (English Edition)
Author: Adam Aspin
ISBN: 9789355518026

By Adam Aspin

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Databricks stands out as a widely embraced platform dedicated to the creation of data lakes. Within its framework, it extends support to a specialized version of Structured Query Language (SQL) known as Spark SQL. If you are interested in learning more about how to use Spark SQL to analyze data in a data lake, then this book is for you.

The book covers everything from basic queries to complex data-processing tasks. It begins with an introduction to SQL and Spark. It then covers the basics of SQL, including data types, operators, and clauses. The next few chapters focus on filtering, aggregation, and calculation. Additionally, it covers dates and times, formatting output, and using logic in your queries. It also covers joining tables, subqueries, derived tables, and common table expressions. Additionally, it discusses correlated subqueries, joining and filtering datasets, using SQL in calculations, segmenting and classifying data, rolling analysis, and analyzing data over time. The book concludes with a chapter on advanced data presentation.

By the end of the book, you will be able to use Spark SQL to perform complex data analysis tasks on data lakes.

Skip carousel

LanguageEnglish

PublisherBPB Online LLP

Release dateOct 5, 2023

ISBN9789355518026

Author

Adam Aspin

Related to Querying Databricks with Spark SQL

Related ebooks

Skip carousel

SQL for Data Scientists: A Beginner's Guide for Building Datasets for Analysis
Ebook
SQL for Data Scientists: A Beginner's Guide for Building Datasets for Analysis
byRenee M. P. Teate
Rating: 0 out of 5 stars
0 ratings
Hands-on Cloud Analytics with Microsoft Azure Stack
Ebook
Hands-on Cloud Analytics with Microsoft Azure Stack
byPrashila Naik
Rating: 0 out of 5 stars
0 ratings
SQL 101 Crash Course: Comprehensive Guide to SQL Fundamentals and Practical Applications
Ebook
SQL 101 Crash Course: Comprehensive Guide to SQL Fundamentals and Practical Applications
byEmrys Callahan
Rating: 5 out of 5 stars
5/5
Practical Data Analytics for BFSI
Ebook
Practical Data Analytics for BFSI
byMr. Bharat Sikka
Rating: 0 out of 5 stars
0 ratings
Learn T-SQL From Scratch: An Easy-to-Follow Guide for Designing, Developing, and Deploying Databases in the SQL Server and Writing T-SQL Queries Efficiently
Ebook
Learn T-SQL From Scratch: An Easy-to-Follow Guide for Designing, Developing, and Deploying Databases in the SQL Server and Writing T-SQL Queries Efficiently
byBrahmanand Shukla
Rating: 0 out of 5 stars
0 ratings
Python Business Intelligence Cookbook: Leverage the computational power of Python with more than 60 recipes that arm you with the required skills to make informed business decisions
Ebook
Python Business Intelligence Cookbook: Leverage the computational power of Python with more than 60 recipes that arm you with the required skills to make informed business decisions
byRobert Dempsey
Rating: 0 out of 5 stars
0 ratings
The SQL Workshop: Learn to create, manipulate and secure data and manage relational databases with SQL
Ebook
The SQL Workshop: Learn to create, manipulate and secure data and manage relational databases with SQL
byFrank Solomon
Rating: 0 out of 5 stars
0 ratings
Business Intelligence with MicroStrategy Cookbook
Ebook
Business Intelligence with MicroStrategy Cookbook
byDavide Moraschi
Rating: 0 out of 5 stars
0 ratings
Microsoft Tabular Modeling Cookbook
Ebook
Microsoft Tabular Modeling Cookbook
byPaul te Braak
Rating: 0 out of 5 stars
0 ratings
Fun with Machine Learning: Simplify the Data Science process by automating repetitive and complex tasks using AutoML (English Edition)
Ebook
Fun with Machine Learning: Simplify the Data Science process by automating repetitive and complex tasks using AutoML (English Edition)
byArockia Liborious
Rating: 0 out of 5 stars
0 ratings
Sql : The Ultimate Beginner to Advanced Guide To Master SQL Quickly with Step-by-Step Practical Examples
Ebook
Sql : The Ultimate Beginner to Advanced Guide To Master SQL Quickly with Step-by-Step Practical Examples
byMark Robinson
Rating: 0 out of 5 stars
0 ratings
The Applied SQL Data Analytics Workshop - Second Edition: Develop your practical skills and prepare to become a professional data analyst, 2nd Edition
Ebook
The Applied SQL Data Analytics Workshop - Second Edition: Develop your practical skills and prepare to become a professional data analyst, 2nd Edition
byMatt Goldwasser
Rating: 0 out of 5 stars
0 ratings
Data Analytics & Visualization All-in-One For Dummies
Ebook
Data Analytics & Visualization All-in-One For Dummies
byJack A. Hyman
Rating: 0 out of 5 stars
0 ratings
Creating Data Stories with Tableau Public: Illustrate your data in a more interactive and interesting way using Tableau Public
Ebook
Creating Data Stories with Tableau Public: Illustrate your data in a more interactive and interesting way using Tableau Public
byAshley Ohmann
Rating: 0 out of 5 stars
0 ratings
SQL Database Programming: The Ultimate Guide to Learning SQL Database Programming Fast!
Ebook
SQL Database Programming: The Ultimate Guide to Learning SQL Database Programming Fast!
byTim Warren
Rating: 0 out of 5 stars
0 ratings
Excel Functions and Formula Combinations
Ebook
Excel Functions and Formula Combinations
byKiet Huynh
Rating: 0 out of 5 stars
0 ratings
Database Management for Business Leaders: Building and Using Data Solutions That Work for You
Ebook
Database Management for Business Leaders: Building and Using Data Solutions That Work for You
byLarry Ruddell
Rating: 0 out of 5 stars
0 ratings
SQL and NoSQL Interview Questions: Your essential guide to acing SQL and NoSQL job interviews (English Edition)
Ebook
SQL and NoSQL Interview Questions: Your essential guide to acing SQL and NoSQL job interviews (English Edition)
byVishwanathan Narayanan
Rating: 0 out of 5 stars
0 ratings
Mastering Azure Synapse Analytics: Learn how to develop end-to-end analytics solutions with Azure Synapse Analytics (English Edition)
Ebook
Mastering Azure Synapse Analytics: Learn how to develop end-to-end analytics solutions with Azure Synapse Analytics (English Edition)
byDebananda Ghosh
Rating: 0 out of 5 stars
0 ratings
Databricks Certified Associate Developer for Apache Spark Using Python: The ultimate guide to getting certified in Apache Spark using practical examples with Python
Ebook
Databricks Certified Associate Developer for Apache Spark Using Python: The ultimate guide to getting certified in Apache Spark using practical examples with Python
bySaba Shah
Rating: 0 out of 5 stars
0 ratings
NoSQL For Dummies
Ebook
NoSQL For Dummies
byAdam Fowler
Rating: 0 out of 5 stars
0 ratings
Apache Spark for Data Science Cookbook
Ebook
Apache Spark for Data Science Cookbook
byPadma Priya Chitturi
Rating: 0 out of 5 stars
0 ratings
Practitioner’s Guide to Data Science: Streamlining Data Science Solutions using Python, Scikit-Learn, and Azure ML Service Platform
Ebook
Practitioner’s Guide to Data Science: Streamlining Data Science Solutions using Python, Scikit-Learn, and Azure ML Service Platform
byNasir Ali Mirza
Rating: 0 out of 5 stars
0 ratings
Learning PySpark
Ebook
Learning PySpark
byTomasz Drabas
Rating: 0 out of 5 stars
0 ratings
Ultimate Snowflake Architecture for Cloud Data Warehousing: Architect, Manage, Secure, and Optimize Your Data Infrastructure Using Snowflake for Actionable Insights and Informed Decisions (English Edition)
Ebook
Ultimate Snowflake Architecture for Cloud Data Warehousing: Architect, Manage, Secure, and Optimize Your Data Infrastructure Using Snowflake for Actionable Insights and Informed Decisions (English Edition)
byGanesh Bharathan
Rating: 2 out of 5 stars
2/5
Mastering Databricks Lakehouse Platform: Perform Data Warehousing, Data Engineering, Machine Learning, DevOps, and BI into a Single Platform (English Edition)
Ebook
Mastering Databricks Lakehouse Platform: Perform Data Warehousing, Data Engineering, Machine Learning, DevOps, and BI into a Single Platform (English Edition)
bySagar Lad
Rating: 1 out of 5 stars
1/5
Ultimate Azure Data Engineering: Build Robust Data Engineering Systems on Azure with SQL, ETL, Data Modeling, and Power BI for Business Insights and Crack Azure Certifications (English Edition)
Ebook
Ultimate Azure Data Engineering: Build Robust Data Engineering Systems on Azure with SQL, ETL, Data Modeling, and Power BI for Business Insights and Crack Azure Certifications (English Edition)
byAshish Agarwal
Rating: 0 out of 5 stars
0 ratings
The Snowflake Handbook: Optimizing Data Warehousing and Analytics
Ebook
The Snowflake Handbook: Optimizing Data Warehousing and Analytics
byRobert Johnson
Rating: 0 out of 5 stars
0 ratings
Mastering Snowflake Platform: Generate, fetch, and automate Snowflake data as a skilled data practitioner (English Edition)
Ebook
Mastering Snowflake Platform: Generate, fetch, and automate Snowflake data as a skilled data practitioner (English Edition)
byPooja Kelgaonkar
Rating: 0 out of 5 stars
0 ratings
Azure Bicep QuickStart Pro
Ebook
Azure Bicep QuickStart Pro
bySelina Threxan
Rating: 0 out of 5 stars
0 ratings

Programming For You

Skip carousel

Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
Ebook
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
bySteven Cooper
Rating: 4 out of 5 stars
4/5
Python: Learn Python in 24 Hours
Ebook
Python: Learn Python in 24 Hours
byAlex Nordeen
Rating: 4 out of 5 stars
4/5
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
Ebook
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
byNigel Tillery
Rating: 5 out of 5 stars
5/5
The Advanced Roblox Coding Book: An Unofficial Guide, Updated Edition: Learn How to Script Games, Code Objects and Settings, and Create Your Own World!
Ebook
The Advanced Roblox Coding Book: An Unofficial Guide, Updated Edition: Learn How to Script Games, Code Objects and Settings, and Create Your Own World!
byHeath Haskins
Rating: 5 out of 5 stars
5/5
Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1
Ebook
Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1
byKevin Clark
Rating: 5 out of 5 stars
5/5
Python Programming for Beginners: A Comprehensive Crash Course With Practical Exercises to Quickly Learn Coding and Programming for Data Analysis and Machine Learning
Ebook
Python Programming for Beginners: A Comprehensive Crash Course With Practical Exercises to Quickly Learn Coding and Programming for Data Analysis and Machine Learning
byAnthony Adams
Rating: 4 out of 5 stars
4/5
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
Ebook
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
byWalter Shields
Rating: 4 out of 5 stars
4/5
Algorithms For Dummies
Ebook
Algorithms For Dummies
byJohn Paul Mueller
Rating: 4 out of 5 stars
4/5
Python Programming For Beginners: Learn The Basics Of Python Programming (Python Crash Course, Programming for Dummies)
Ebook
Python Programming For Beginners: Learn The Basics Of Python Programming (Python Crash Course, Programming for Dummies)
byJames Tudor
Rating: 5 out of 5 stars
5/5
Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer.
Ebook
Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer.
byGwendolyn Faraday
Rating: 5 out of 5 stars
5/5
SQL All-in-One For Dummies
Ebook
SQL All-in-One For Dummies
byAllen G. Taylor
Rating: 3 out of 5 stars
3/5
JavaScript All-in-One For Dummies
Ebook
JavaScript All-in-One For Dummies
byChris Minnick
Rating: 5 out of 5 stars
5/5
Excel 101: A Beginner's & Intermediate's Guide for Mastering the Quintessence of Microsoft Excel (2010-2019 & 365) in no time!
Ebook
Excel 101: A Beginner's & Intermediate's Guide for Mastering the Quintessence of Microsoft Excel (2010-2019 & 365) in no time!
byJohannes Wild
Rating: 0 out of 5 stars
0 ratings
Coding All-in-One For Dummies
Ebook
Coding All-in-One For Dummies
byNikhil Abraham
Rating: 4 out of 5 stars
4/5
HTML & CSS: Learn the Fundaments in 7 Days
Ebook
HTML & CSS: Learn the Fundaments in 7 Days
byMichael Knapp
Rating: 4 out of 5 stars
4/5
The JavaScript Workshop: Learn to develop interactive web applications with clean and maintainable JavaScript code
Ebook
The JavaScript Workshop: Learn to develop interactive web applications with clean and maintainable JavaScript code
byJoseph Labrecque
Rating: 5 out of 5 stars
5/5
Python: For Beginners A Crash Course Guide To Learn Python in 1 Week
Ebook
Python: For Beginners A Crash Course Guide To Learn Python in 1 Week
byTimothy C. Needham
Rating: 4 out of 5 stars
4/5
Problem Solving in C and Python: Programming Exercises and Solutions, Part 1
Ebook
Problem Solving in C and Python: Programming Exercises and Solutions, Part 1
byYana Kortsarts
Rating: 5 out of 5 stars
5/5
Python Programming : How to Code Python Fast In Just 24 Hours With 7 Simple Steps
Ebook
Python Programming : How to Code Python Fast In Just 24 Hours With 7 Simple Steps
byJason Scotts
Rating: 4 out of 5 stars
4/5
PYTHON PROGRAMMING
Ebook
PYTHON PROGRAMMING
byRamsey Hamilton
Rating: 4 out of 5 stars
4/5
Linux: Learn in 24 Hours
Ebook
Linux: Learn in 24 Hours
byAlex Nordeen
Rating: 5 out of 5 stars
5/5
Coding for Beginners and Kids Using Python: Python Basics for Beginners, High School Students and Teens Using Project Based Learning
Ebook
Coding for Beginners and Kids Using Python: Python Basics for Beginners, High School Students and Teens Using Project Based Learning
byBob Mather
Rating: 3 out of 5 stars
3/5
Python QuickStart Guide: The Simplified Beginner's Guide to Python Programming Using Hands-On Projects and Real-World Applications
Ebook
Python QuickStart Guide: The Simplified Beginner's Guide to Python Programming Using Hands-On Projects and Real-World Applications
byRobert Oliver
Rating: 5 out of 5 stars
5/5
Learn Python in 10 Minutes
Ebook
Learn Python in 10 Minutes
byVictor Ebai
Rating: 4 out of 5 stars
4/5
Microsoft Azure For Dummies
Ebook
Microsoft Azure For Dummies
byJack A. Hyman
Rating: 0 out of 5 stars
0 ratings
Learn Python Programming for Beginners: The Best Step-by-Step Guide for Coding with Python, Great for Kids and Adults. Includes Practical Exercises on Data Analysis, Machine Learning and More.
Ebook
Learn Python Programming for Beginners: The Best Step-by-Step Guide for Coding with Python, Great for Kids and Adults. Includes Practical Exercises on Data Analysis, Machine Learning and More.
byFlynn Fisher
Rating: 4 out of 5 stars
4/5
Beginning Programming with C++ For Dummies
Ebook
Beginning Programming with C++ For Dummies
byStephen R. Davis
Rating: 4 out of 5 stars
4/5
Microsoft Office 365 Bible: 10:1 Mastery | Excel in Your Profession, Enhance Time Management, and Foster Exceptional Collaboration [III EDITION]
Ebook
Microsoft Office 365 Bible: 10:1 Mastery | Excel in Your Profession, Enhance Time Management, and Foster Exceptional Collaboration [III EDITION]
byKevin Pitch
Rating: 5 out of 5 stars
5/5
Python Data Structures and Algorithms
Ebook
Python Data Structures and Algorithms
byBenjamin Baka
Rating: 5 out of 5 stars
5/5
Learn SQL in 24 Hours
Ebook
Learn SQL in 24 Hours
byAlex Nordeen
Rating: 5 out of 5 stars
5/5

Related categories

Skip carousel

Reviews for Querying Databricks with Spark SQL

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Querying Databricks with Spark SQL - Adam Aspin

HAPTER

Writing Basic SQL Queries

Introduction

Welcome to Databricks and the new world of data analysis that you are about to experience. As you are standing on the threshold of this voyage into the realm of serverless analytics, you could be feeling a little apprehensive. Well, do not worry; your journey will be as simple and comprehensible as possible. This chapter will start you on your adventure, first by outlining the software that you will be using and then by explaining what Databricks is. Then, it will show you how to take an initial look at the data itself. As you progress, you will learn how to be more selective about the data that you analyze.

It may seem obvious, but you will need some data in an accessible dataset before you can start your analysis. Throughout this book, you will be developing your analytical skills with the aid of a sample dataset named Prestige Cars. This dataset contains a small amount of data concerning sales of vehicles by a fictitious British car reseller. If you want to try the examples in this chapter, you will have to download the sample data from the BPB website and install it into a version of Databricks. So, it follows that now could be a good time to set up the sample dataset as described in Appendix A unless you have already done so. Of course, you can install the sample data only if Databricks is already available. So, if you are not in an enterprise environment where Databricks is already accessible, you will need to set up an account for the Databricks community edition before anything else.

Structure

In this chapter, you will learn about the following topics:

Databricks

The Databricks Web interface

Getting started with Databricks notebooks

Cells

Spark SQL

Databricks databases (schemas) and tables

Activating a database

Displaying the data in a table

Help writing SQL

Limiting the number of records displayed

Displaying data from a specific field

Finding the columns in a table

Displaying data from a specific set of fields

Modifying the field name in the output

Removing duplicates from query output

Sorting data

Sorting data in reverse order

Applying multiple sort criteria

Running SQL queries

Displaying the available tables

Finding all the views in a database

Resizing book cells

Overriding the 1,000-row output limit

Assuming that you have all the prerequisites in place, it is time to move on to the core focus of this chapter and start querying Databricks data.

Objectives

In this chapter, you will learn how to:

Query Databricks databases using Databricks notebooks

List the contents of tables

Select only certain fields in tables to display

Display only a few records from a table

Give columns new names in your query output

Sort your output

Note: If you know a little about SQL and if you have a basic knowledge of databases, then feel free to skip past the first few sections of this chapter (or even the first few chapters) until you find the parts that are new to you. If this is not the case, it is preferable to start from the beginning and provide all the information that you are likely to need to get the most out of your SQL learning experience.

Databricks

Databricks is a cloud data platform that delivers serverless analytics. This means that all the data and processing take place in the cloud (currently, Amazon Web Services, Microsoft Azure, or Google Cloud). All the analytics and data querying that you will do is carried out in a Web browser. Unlike traditional databases, Databricks does not load data into a database but stores data in files in a cloud-based data lake—hence the term serverless analytics.

Another area where Databricks breaks with the traditional database paradigm is by separating data storage from processing. This approach allows for much greater scalability and the ability to handle huge amounts of data-making Databricks is a leader in the area of Big Data.

The technical approaches that underpin Databricks are extremely complex, but fortunately, the Databricks interface hides virtually all the complexity. All you have to do is load data (or access data that is already loaded) and start querying.

The Databricks Web interface

To start querying data in Databricks, you first need to log on. If you have a login to an enterprise version of Databricks, then you will have been given a corporate login to use. If not (and assuming that you have created a community account), then all you need to do is to enter the following URL in your Web browser:

https://community.cloud.databricks.com/login.html

This will take you to the Databricks startup page, as shown in Figure 1.1:

Figure 1.1: The Databricks startup page

Hover over the left sidebar to expand it. This is the main menu where you can find all the core Databricks options. You can see this in Figure 1.2:

Figure 1.2: The Databricks main menu

Note: This book is only focused on one aspect of Databricks-querying data with Spark SQL. Consequently, there are huge swathes of Databricks that we will not be looking at. If you need to delve into other aspects of the platform, then the Databricks documentation is a good place to start.

Getting started with Databricks notebooks

Interaction with data in Databricks takes place in Databricks notebooks. You can consider these to be a kind of programming window where you will write your code to analyze and return data. The first thing to do is to create a new notebook. To create a new notebook, follow the following steps:

Hover over the plus symbol in the left pane and select Notebook in the popup pane. You can see this in Figure 1.3:

Figure 1.3: The Create pane

In the dialog that appears, enter a name for your notebook, ensure that the default language is set to SQL, and choose the cluster that you will be using to compute results. You can see this in Figure 1.4:

Figure 1.4: The Create Notebook dialog

Click Create, and a new notebook will be created. You can see this in Figure 1.5:

Figure 1.5: A Databricks Notebook

It is as simple. You can now start typing in SQL code to query your data.

Notebooks are fundamental to the user experience and are extremely simple to use. Their simplicity belies their power, however. As the idea is to start using Spark SQL to query data, it is probably best not to overwhelm you with interface considerations for the moment. Nonetheless, to begin, you need to be aware of that.

A notebook must be attached to a compute cluster. If you are working in a corporate environment, then your IT department will indicate which cluster to use. If you are using the Databricks Community edition, then you will need to set up a cluster and connect this cluster to your notebooks.

Notebooks are saved automatically. You do not have to manually save the code that you enter in a notebook.

Notebooks can contain code written in SQL, Python, Scala, or R. The notebook needs to know which language you will be using to process code. In our case, this will be SQL, which is why you selected SQL as the default language. You can change the default language at any time by clicking on the language popup at the top of the notebook.

A workbook name can contain virtually any combination of letters, numbers, spaces, and even special characters.

Workbooks are stored in Workspaces, which are a kind of container for your (or your colleagues’) interactions with Databricks.

Cells

All the SQL that you write to query data in Databricks will be written in a cell inside a notebook. You can create as many cells as you like and can reorder them inside the notebook. Cells can be copied and deleted, and each cell can run code in any of the languages that Databricks uses.

This allows notebooks to become complex programs, where the code contained in sequences of cells can be run from end to end. For the moment, it is easiest (at least when starting out with Databricks notebooks) to use a single cell for your SQL and replace the cell contents with the code that you are writing for each code sample in the book.

Note: In a corporate environment (but not, unfortunately, when using the community edition of Databricks), you will also see an option for SQL Warehouse in the menu bar on the left. This also lets you enter SQL commands and also offers a powerful SQL editor. The results will, in any case, be the same as those when using notebooks to write SQL queries.

Spark SQL

Databricks lets you use several languages to manipulate data. This book (as you can guess from the title) focuses on only one of these languages-Structured Query Language or SQL. The version of SQL that Databricks uses is known as Spark SQL. Databricks itself is built on top of the Spark open-source technology stack. It is a full-fledged and robust variant of SQL that you can use to extract and shape data stored in Databricks datalakes. Spark SQL will generally be referred to simply as SQL in this book.

Databricks databases (schemas) and tables

As you have decided to learn and use Spark SQL queries to analyze data stored in Databricks, you need to know a few of the basic concepts that underpin SQL and databases, to begin with. This is how Databricks classifies the data that you will be querying in Spark SQL:

At the lowest layer are the files that contain data. As far as the sample data you will be using is concerned, these are simple comma-separated lists.

These lists are attributed to a metadata layer (a conceptual overlay) that gives each file a recognizable structure. This structure is called a table. All the rows in a table consist of the same number of columns. So, a table is really nothing more than a well-structured list—rather like the ones that you have probably encountered in Excel.

Tables are grouped (logically, at least) into databases or, as Databricks prefers to call them, schemas.

To resume, then, a Databricks database (or schema) is a collection of lists (tables) containing columns (fields) of data in a set of rows (records). These elements can then be accessed independently or joined together to deliver the analysis you are looking for.

Conceptually, a database looks something like as shown in Figure 1.6:

Figure 1.6: Conceptualizing data in Databricks

It is worth grasping from the outset that Databricks does not copy the source files into tables like a traditional database does. The data remains in the source files. What Databricks does is it adds an internal overlay to the data in the files so that the data can be accessed as if it were in a database, and so can use SQL (the classic database query language) to query the data.

Note: Please note that all the exercises in this book use Databricks’s default database. The sample data is visible in the default database in the Databricks Community edition.

Activating a database

Before running any queries, you need to ensure that your book is using the correct database. To switch to (or to keep using) the default database where the sample data is stored, simply enter the following command in a workbook cell:

USE DATABASE default;

Then press Ctrl + Enter to run the command. Databricks will simply display OK beneath the cell to indicate that the database is active.

Displaying the data in a table

For your first query, let us suppose firstly that you have loaded the sample data as described in Appendix B and that you want to see the details of vehicles sold by Prestige Cars Ltd:

In a blank book cell, type in the following short piece of SQL:

SELECT *

FROM allsales;

Press the Ctrl + Enter to run the code and show the results. The results will appear under the code, as shown in Figure 1.7:

Figure 1.7: Displaying all the data in a table with SELECT *

How does it work

In just four words, you have said to Databricks, Show me the complete contents of the AllSales table, including all the rows and all the columns of data. All you needed to know was which table you wanted to look at. SQL did the rest.

This code snippet is incredibly simple—four words in all—but that is enough to show you how SQL works in practice. If you issue the right command, then you will get back the data you want to see. The point of the command is to let you see everything that is stored in a table of data.

Of course, you will need to know which table contains the data you want to display when you are dealing with your own data. If you are working in an enterprise environment, this may involve talking to the people in your organization who developed or maintained the databases. Alternatively, there may be documentation that you can read to find the information you require.

If there is no one you can ask and no documentation available, then you can still acquaint yourself with the data by running the SELECT * FROM clause with each data table that you can find in Databricks. However, before doing this, it is best to continue a little further in this book and learn how to limit the number of records outputs by a query. Later in this chapter, you will learn how to display all the tables in a database.

So, what exactly have you done here? Let us take a closer look at what you have written. Figure 1.8 breaks down the SQL statement into its constituent parts:

Figure 1.8: The anatomy of a simple SQL statement

These constituent parts are as follows:

Keywords: SQL is built using a set of keywords that you combine to produce commands. These commands (or snippets or phrases, if you want to call them that) are in English and follow a fairly rigorous syntax. Learning to understand and apply the grammar of SQL is what this book is all about.

Clauses: These are a set of short phrases composed of keywords and data elements that make up a SQL statement.

This was a simple SQL command, but the following are nonetheless a few key points that you will need to remember:

As mentioned previously, Databricks can contain (or host, if you prefer) dozens of databases. This means you always have to tell Databricks which database you want to interrogate. When using the community edition, it is simpler to use the default database. In a corporate environment, you should probably not use the default database.

If you do not indicate the correct database to use, Databricks will either return no data at all or, worse, return the wrong data from another database.

Ctrl + Enter runs (or executes if you prefer) the SQL code that you have typed or copied into a book cell. There are other ways to run a query that you will learn a little later in this chapter.

You can click inside the table of data returned by a query to scroll left and right or up and down through the data.

It can happen (even when you are an experienced data analyst) that executing a query, returns nothing more than an irritating error message like the following:

Error in SQL statement: AnalysisException: Table or view not found: allsales; line 1 pos 14;.

This probably means you have not specified correctly which database you want to query or that you have simply mistyped the table name.

In this book, the data tables that you need to use will be explained as you meet them. Here you have seen the first of the small set of tables that make up the sample data. As its name suggests, the AllSales table contains the details of cars that have been sold by Prestige Cars Ltd.

By default, only 1,000 rows are output from a query. This can be overridden, as you will learn a little later in this chapter.

Under the output data, you can also see how long it took to process the data, when it was last refreshed, the number of rows returned by the query, and how long since the query was run.

Help writing SQL

While Databricks books cannot write SQL for you, they can help you enter the names of tables, fields, and SQL keywords. As an example, clear the contents of the current cell and then do the following:

Start typing SELECT.

After three characters, press the Tab key.

You will see that the book has found only one word, that is, a keyword or table or field name (the SELECT keyword), and has completed the word for you. To continue the example, clear the contents of the current cell and then Enter:

SELECT * FROM al

Now press the Tab key. You will see a popup menu with any possible elements beginning with al that you could use, as shown in Figure 1.9:

Figure 1.9: The choice of available elements using autoprompt

Click the required item in the list, and it will be added to the SQL code.

Limiting the number of records displayed

Databricks tables can contain millions—or even billions—of records. Each time you run a query, all the records you see are sent from the server to the querying application. It follows that displaying all the rows in a huge table can place an unnecessary strain on the server that stores the data and on the network that connects the server to your workstation. So, in the real world, it is a good idea to display only a small number of records when examining a table. This is particularly true when looking at a table for the first time.

Fortunately, SQL has a way of limiting the number of rows returned by a query. This is nothing more than a simple extension of the command that you used in the previous section. Here you can see how to tweak a piece of SQL code to display only a few records in the query output.

Delete any SQL that might be in the query window (or open a new query window), and enter the following code snippet:

SELECT *

FROM allsales

LIMIT 5;

Run the SQL code by pressing Ctrl + Enter.

How does it work

Executing this command will display all the columns in the table—but only the first five records. This will look like the output shown in Figure 1.3 (limited to the first five rows).

You may well be wondering which records you are seeing when you ask SQL to display only five rows (or indeed any other number or records). The answer is that you might not even see the same set of records when you run this command at different times. Databricks does not necessarily return the first (or last) five records that were added to a table. Neither does it display the last five to be updated, viewed, or printed.

This apparently minor question reveals an important fact about Databricks data. Records are stored in any order (or, at least, as they were loaded). However, there is no immediate way of telling, in a table, whether the data is stored in any sequence-nor does it matter to SQL. Remember this when querying tables, and never trust the data to be returned in any specific order unless you have asked for this to happen. How to do this is explained later in this chapter.

The following are a few points that are worth mentioning at this juncture:

There are several ways to run the SQL code in a query window. Pressing Ctrl + Enter is only one method. Another is to select Run→Run Cell from the Workbook menu. Yet another alternative is to click the small triangle at the top right of the workbook cell and click on Run Cell.

You can enter the keywords and the table and column names in either uppercase or lowercase or even a mixture of the two. In this book, however, we will always enter keywords in uppercase to help them stand out in the code snippet. Hopefully, this will make them easier to learn.

You do not have to write queries in a purely linear fashion, starting with the SELECT clause and continuing to the end of the query. You can start with any part of the query and build it up as you want.

Displaying data from a specific field

A Databricks table can contain hundreds of columns. Most of the time, you will want to display data from only a few of the available columns in a table. The next piece of SQL shows how you choose a single column to display instead of all the available columns in a table. Specifically, it shows how to list only the customer names as follows:

SELECT CustomerName

FROM allsales;

Executing this piece of code (using any of the techniques that were pointed out at the end of the previous section) will show you something similar to the output shown in Figure 1.10:

Figure 1.10: Choosing a column from a table

How does it work

By replacing the star (or asterisk if you prefer) in your SQL with a specific column name, you have told Databricks that it is this column—and this column only—that you want to display. This example also makes the point that SQL is an extremely modular and extensible language, and you can easily extend or modify a simple command to make it more specific.

Of course, this approach relies on your knowing the exact name of a column and also typing it exactly as it appears in the table. Fortunately, Databricks has ways of displaying column names and using them in your SQL, as you will discover in the next section.

Even simple SQL commands have their subtleties. When listing data, you need to remember the following:

You can write short SQL commands like this one on a single line if you prefer. Alternatively, you could write the following instead of placing each clause on a separate line:

SELECT CustomerName FROM allsales;

This is entirely a question of personal choice. As we are presuming that you are new to SQL, we will keep the core SQL statements on separate lines in this book to accentuate the underlying logic of the language. In your queries, you can write the SQL any way you want, as long as it works. All that Databricks wants is that the grammar of the command is technically accurate and that keywords are separated by spaces or returns.

Finding the columns in a table

You may well be wondering how on Earth you can be expected to remember all the columns in each table so that you can type them into your SQL queries. Fortunately, Databricks Workbooks can help you here by displaying all the columns in a table in a single click.

Displaying the columns in a table is really easy. All you have to do (in an empty cell in a workbook) is type the following command:

DESCRIBE allsales;

You should see something like the output shown in Figure 1.11 (you may see fewer rows for the moment):

Figure 1.11: Displaying the columns in a table

When you display the list of columns in a table, you can also see some more technical information concerning the type of data that the column contains. We will explain what datatypes are in later chapters.

SQL writing style

You can format the SQL that you write using multiple styles. You may prefer to write short SQL statements on a single line. Alternatively, you may find it clearer if you place the core keywords such as SELECT, FROM, and ORDER BY on separate lines. The choice is yours.

All you have to remember is that keywords must be separated by a space, a tab character, or a line break. Otherwise, the presentation of your SQL is entirely up to you.

Displaying data from a specific set of fields

SQL does not limit you to displaying all the fields—or only one field—from a table. You can choose not only the fields from a table that you want to display but also the order in which they will appear in the output from the query. The following piece of code shows you how to select two fields (CountryName and Sales Region) from another table in the database—the Country table:

SELECT CountryName, CustomerName

FROM allsales;

Executing this piece of code will show you something similar to the output in Figure 1.12:

Figure 1.12: Displaying multiple fields from the Country table

How does it work

Here again, you have extended the base SQL that you used at the start of the chapter. Specifically, you have developed the SELECT statement to include the field names containing the data that you want to view. All you had to do was to separate each field name with a comma and place the field names in the order that you want to see their data in the output from left to right.

There is only one major trick to remember when listing a specific set of fields:

Remember not to add a comma after the final field in a list of fields in the SELECT clause.

Columns or fields

Whereas spreadsheets prefer the term columns, databases tend to use the word fields. So, despite the fact that we have used the term column up until now, in this chapter, we will switch to using the term field from now on. Both words, however, describe the same thing and can generally be used interchangeably.

Modifying the field name in the output using aliases

Many datasets have cryptic—or frankly incomprehensible—field names. While, as an analyst or data guru, you might get used to this, it is not always a good idea to present information to users in a way that makes the data harder to understand than is necessary. So, SQL allows you to output the data under a different field header to enhance readability. In this example, you will display the country field under another name. Start by taking a look at the following code snippet and then at the output it returns in Figure 1.13:

SELECT CountryName, CountryISO3 AS IsoCode FROM allsales;

Figure 1.13: Changing a field name using an alias

How does it work

In the query output, the original field name is replaced by the name you have chosen—IsoCode in this example. This technique is called aliasing; you are giving the field another name in the query. Applying an alias has no effect at all on the underlying data and does not change the underlying field name. What it does do is apply a different (and ideally more comprehensible) name in the query output.

Aliases have their own particular set of rules that must be adhered to if they are to work correctly. Essentially you need to remember the following points:

You may have noticed that all the table and field names that have been used so far contain neither spaces nor special (that is, non-alphanumeric) characters. This is because SQL requires you to specify the names of what it calls objects—that is, fields and tables, among other things—in a specific way. However, we do not want to make things appear over-complicated here, so as a starting point, let us just say you are better avoiding all non-alphanumeric characters when creating an alias for a field.

If you want to add a space or special characters to an alias (suppose in the example used in this section, you want to see ISO Code as the field heading), then you must place the alias inside backquotes. That is, you can write the following to add a space to the alias IsoCode that you want to use instead of the real field name:

SELECT CountryName, CountryISO3 AS `Iso Code` FROM allsales;

In practice, many data people advise that you avoid spaces and nonstandard characters if you can, as this can get painful when writing complex queries. To encourage this, we have stuck to aliases without spaces or non-alphanumeric characters in this book.

An alias cannot be more than 122 characters long.

Aliases can contain underscores.

You can also add an alias without the AS keyword using the following code:

SELECT CountryName, CountryISO3 `Iso Code` FROM allsales;

However, we prefer not to use this technique and so will not be taking this approach when aliasing columns:

Backquotes are also known as backticks.

Backquotes are not the same as single quotes. There are cases in SQL when they may be used interchangeably-but aliases with spaces, or special characters must use backquotes, not single or double quotes.

You cannot use double quotes or single quotes in aliases.

Removing duplicates from query output

As befits a company that sells its products across the globe, the CEO of Prestige Cars needs to know where its presence is felt the most. It follows that she wants you to produce a list of the countries where the company’s customers can be found. A quick look at the all-sales table reveals a Country field that you can use to list countries.

Here is the SQL that will produce the list you require:

SELECT DISTINCT CountryName

FROM allsales;

This query returns the results shown in Figure 1.14:

Figure 1.14: Returning distinct elements from the data

How does it work

This query applies the DISTINCT keyword to remove any duplicates from the list of countries.

The new aspect of this example is the DISTINCT keyword. This keyword is incredibly useful because it removes all duplicates from the query output before you even see the query results.

Undo and redo

A Databricks workbook is like many other desktop applications and Web interfaces in that you can undo changes and redo them if you want to.

Ctrl + Z will undo one or more modifications.

Ctrl + Y will redo modifications that you have undone.

Sorting data

Now that you can select the fields that contain the data you want to see in the sequence that you want to see them, you probably also want to sort the data. For example, you might want to sort car sales by increasing the sale price. To do this, just run the following snippet and take a look at the numbers in the SalePrice field in Figure 1.15. They are now sorted from lowest to highest.

SELECT MakeName, ModelName, SalePrice

FROM allsales

ORDER BY SalePrice;

Figure 1.15: Sorting data in ascending order

How does it work

To sort the data returned by a query, just add the ORDER BY keyword (it is considered to be a single keyword even if it is really two words) after the FROM clause of the SQL command. Then you add the field that you are sorting the data on. This creates an ORDER BY clause.

Equally important is that writing SQL is all about making simple extensions to existing code. So, you do not have to produce instant reams of code that work the first time. You can start with a small snippet of code, test it, and then extend it until it does exactly what you want it to do.

The following are several key points to remember here:

The ORDER BY keyword can also be used on text (in which case it sorts in alphabetical order), on numbers (where it sorts from lowest to highest), or on dates (in which case it places the dates in from earliest to latest).

If you want, you can add the ASC keyword after the sort field name to force an ORDER BY statement to sort the data in ascending order. However, Databricks sorts of data in ascending order out of the box. So, the result would be the same even if you wrote the following:

SELECT * FROM allsales ORDER BY SalePrice ASC;

If you test this, you will see the same result that you saw when you added the ASC keyword to the ORDER BY clause in the SQL snippet at the start of this section. Techies refer to this as the default sort order.

If you find that reiterating field names in the ORDER BY clause is somewhat laborious, then you can always apply a shortcut. Instead of using a field name, you can use a number to represent it. So, you could write the SQL query at the start of this section as follows:

SELECT MakeName, ModelName, SalePrice

FROM allsales

ORDER BY 3;

The number that you use in the ORDER BY clause stands for the position of the field in the SELECT clause of the SQL statement. So in this example, 1 represents IsoCode (aliased as IsoCode), 2 mean CountryName (the second field in the SELECT clause), and so on.

Note: We will use keywords to describe core Databricks terms even when the keyword consists of several individual words.

Sorting data in reverse order

As you saw in the previous example, data can be sorted from lowest to highest really quickly and easily. You can also sort data from highest to lowest (or Z to A). Changing the sort order is as simple as replacing the ASC (short for ascending) keyword that you just met with the DESC (short for descending) keyword. You can see this in the output shown in Figure 1.16, which is the result of the following code snippet that sorts the country names in reverse order of SalePrice:

SELECT MakeName, ModelName, SalePrice

FROM allsales

ORDER BY SalePrice DESC;

Figure 1.16: Sorting data in alphabetical order

How does it work

Switching the sort order is as easy as adding the ASC keyword to the end of the ORDER BY clause or replacing the DESC keyword with the ASC keyword at the end of the ORDER BY clause of a SQL command. Using DESC forces SQL to sort the results from highest to lowest (if they are numbers), Z to A (if they are text), or latest to earliest (if they are dates).

Applying multiple sort criteria

Larger data sets can require that you sort data according to multiple criteria. Suppose, for instance, that you want to list all the cars sold first by country, then by make (per country), and finally by model if there are several makes sold for a specific country.

This is easy to do in SQL because it is a simple extension of the techniques that you have seen in the previous two sections. Take a look at the following code snippet:

SELECT CountryName, MakeName, ModelName

FROM allsales

ORDER BY CountryName, MakeName, ModelName;

Executing this query will return output similar to that shown in Figure 1.17:

Figure 1.17: Sorting sales using multiple fields

How does it work

Sometimes, you will be faced with a table where some fields contain the same data elements repeated several times. A telephone directory is like this. You may have many pages of people named Smith, although nearly all may have different first names. Even if the last name and the first name are the same, they may have different middle initials. In these cases, you need to sort the data on successive fields so that (to continue the telephone directory analogy) you sort first by last name, then by first name, and finally by middle initial.

Entering several fields after the ORDER BY keyword tells SQL to sort the data progressively on the fields you entered. In this example, it means the following:

First, by the country.

Then, by the make (if there is more than one record with the same country).

Finally, by the model (if there is more than one record with the same country and make).

All you have to do is enter the field names separated by a comma in the ORDER BY clause.

Applying a multiple-sort order has its own specific set of core requirements. These include the following:

While there may be technical limits to the number of fields you can sort on, in practice, you do not need to worry and can extend the field list that you use in the ORDER BY statement to include as many fields as you want.

In this example, you used the same fields in the SELECT statement that you used in the ORDER BY statement. This is not compulsory in SQL because there is no obligation in a basic SQL query to display the same fields that you use for ordering the data. However, when you are testing your SQL skills (or ensuring that the data looks like you think it should), it can be a good idea to use the same groups of fields in both clauses. This way, you can see whether the output is what you expect.

Sorting query results requires a lot of computing horsepower when you are dealing with large tables running on small clusters. We recommend that you sort data only if it is really necessary.

As was the case for SELECT clauses, you should not add a comma after the final field name in the ORDER BY clause.

Running SQL queries

So far in this chapter, you have always run queries in the time-honored way by pressing Ctrl + Enter or selecting Run=>Cell from the book menu. Databricks does, however, offer you several ways to run SQL queries. These methods are explained in Table 1.1:

Table 1.1: Code execution scope

Displaying the available tables

All the data in Databricks is stored in files but accessed in SQL using tables. A Databricks database can consist of dozens—or even hundreds—of tables of data that have been carefully designed and created by database professionals. The first thing that you will have to do when faced with any database that is new to you is to take a look at the tables it contains. To do this, enter the following Spark SQL in a blank cell:

SHOW TABLES;

You should see something like the list of tables shown in Figure 1.18:

Figure 1.18: Displaying the tables contained in a database

As this sample database is small, it contains only a handful of tables. As you progress through this book, you will learn what data each table contains.

Finding all the views in a database

One of the aims of this book is to introduce you progressively to the

Enjoying the preview?

Page 1 of 1

Querying Databricks with Spark SQL: Leverage SQL to query and analyze Big Data for insights (English Edition)

About this ebook

Adam Aspin

Read more from Adam Aspin

Querying MySQL: Make your MySQL database analytics accessible with SQL operations, data extraction, and custom queries (English Edition)

Querying SQL Server: Run T-SQL operations, data extraction, data manipulation, and custom queries to deliver simplified analytics (English Edition)

Querying MariaDB: Use SQL Operations, Data Extraction, and Custom Queries to Make your MariaDB Database Analytics more Accessible (English Edition)

Related authors

Related to Querying Databricks with Spark SQL

Related ebooks

SQL for Data Scientists: A Beginner's Guide for Building Datasets for Analysis

Hands-on Cloud Analytics with Microsoft Azure Stack

SQL 101 Crash Course: Comprehensive Guide to SQL Fundamentals and Practical Applications

Practical Data Analytics for BFSI

Learn T-SQL From Scratch: An Easy-to-Follow Guide for Designing, Developing, and Deploying Databases in the SQL Server and Writing T-SQL Queries Efficiently

Python Business Intelligence Cookbook: Leverage the computational power of Python with more than 60 recipes that arm you with the required skills to make informed business decisions

The SQL Workshop: Learn to create, manipulate and secure data and manage relational databases with SQL

Business Intelligence with MicroStrategy Cookbook

Microsoft Tabular Modeling Cookbook

Fun with Machine Learning: Simplify the Data Science process by automating repetitive and complex tasks using AutoML (English Edition)

Sql : The Ultimate Beginner to Advanced Guide To Master SQL Quickly with Step-by-Step Practical Examples

The Applied SQL Data Analytics Workshop - Second Edition: Develop your practical skills and prepare to become a professional data analyst, 2nd Edition

Data Analytics & Visualization All-in-One For Dummies

Creating Data Stories with Tableau Public: Illustrate your data in a more interactive and interesting way using Tableau Public

SQL Database Programming: The Ultimate Guide to Learning SQL Database Programming Fast!

Excel Functions and Formula Combinations

Database Management for Business Leaders: Building and Using Data Solutions That Work for You

SQL and NoSQL Interview Questions: Your essential guide to acing SQL and NoSQL job interviews (English Edition)

Mastering Azure Synapse Analytics: Learn how to develop end-to-end analytics solutions with Azure Synapse Analytics (English Edition)

Databricks Certified Associate Developer for Apache Spark Using Python: The ultimate guide to getting certified in Apache Spark using practical examples with Python

NoSQL For Dummies

Apache Spark for Data Science Cookbook

Practitioner’s Guide to Data Science: Streamlining Data Science Solutions using Python, Scikit-Learn, and Azure ML Service Platform

Learning PySpark

Ultimate Snowflake Architecture for Cloud Data Warehousing: Architect, Manage, Secure, and Optimize Your Data Infrastructure Using Snowflake for Actionable Insights and Informed Decisions (English Edition)

Mastering Databricks Lakehouse Platform: Perform Data Warehousing, Data Engineering, Machine Learning, DevOps, and BI into a Single Platform (English Edition)

Ultimate Azure Data Engineering: Build Robust Data Engineering Systems on Azure with SQL, ETL, Data Modeling, and Power BI for Business Insights and Crack Azure Certifications (English Edition)

The Snowflake Handbook: Optimizing Data Warehousing and Analytics

Mastering Snowflake Platform: Generate, fetch, and automate Snowflake data as a skilled data practitioner (English Edition)

Azure Bicep QuickStart Pro

Programming For You

Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees

Python: Learn Python in 24 Hours

Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence

The Advanced Roblox Coding Book: An Unofficial Guide, Updated Edition: Learn How to Script Games, Code Objects and Settings, and Create Your Own World!

Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1

Python Programming for Beginners: A Comprehensive Crash Course With Practical Exercises to Quickly Learn Coding and Programming for Data Analysis and Machine Learning

SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL

Algorithms For Dummies

Python Programming For Beginners: Learn The Basics Of Python Programming (Python Crash Course, Programming for Dummies)

Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer.

SQL All-in-One For Dummies

JavaScript All-in-One For Dummies

Excel 101: A Beginner's & Intermediate's Guide for Mastering the Quintessence of Microsoft Excel (2010-2019 & 365) in no time!

Coding All-in-One For Dummies

HTML & CSS: Learn the Fundaments in 7 Days

The JavaScript Workshop: Learn to develop interactive web applications with clean and maintainable JavaScript code

Python: For Beginners A Crash Course Guide To Learn Python in 1 Week

Problem Solving in C and Python: Programming Exercises and Solutions, Part 1

Python Programming : How to Code Python Fast In Just 24 Hours With 7 Simple Steps

PYTHON PROGRAMMING

Linux: Learn in 24 Hours

Coding for Beginners and Kids Using Python: Python Basics for Beginners, High School Students and Teens Using Project Based Learning

Python QuickStart Guide: The Simplified Beginner's Guide to Python Programming Using Hands-On Projects and Real-World Applications

Learn Python in 10 Minutes

Microsoft Azure For Dummies

Learn Python Programming for Beginners: The Best Step-by-Step Guide for Coding with Python, Great for Kids and Adults. Includes Practical Exercises on Data Analysis, Machine Learning and More.

Beginning Programming with C++ For Dummies

Microsoft Office 365 Bible: 10:1 Mastery | Excel in Your Profession, Enhance Time Management, and Foster Exceptional Collaboration [III EDITION]

Python Data Structures and Algorithms

Learn SQL in 24 Hours

Related categories

Reviews for Querying Databricks with Spark SQL

What did you think?

Book preview

Querying Databricks with Spark SQL - Adam Aspin

Introduction

Structure

Objectives

Databricks