Skip to content

Commit 8b258f6

Browse files
author
tarasha
committed
sqlonlinux code snacks commit
1 parent 9a901b1 commit 8b258f6

File tree

1,378 files changed

+57276
-1
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

1,378 files changed

+57276
-1
lines changed

samples/features/in-memory/code-snacks/SQLonLinux/In-Memory-Columnstore/test.txt

Lines changed: 0 additions & 1 deletion
This file was deleted.
Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
# use glob syntax.
2+
syntax: glob
3+
*.ser
4+
*.class
5+
*~
6+
*.bak
7+
#*.off
8+
*.old
9+
10+
# eclipse conf file
11+
.settings
12+
.classpath
13+
.project
14+
.manager
15+
.scala_dependencies
16+
17+
# idea
18+
.idea
19+
*.iml
20+
21+
#visual studio
22+
.suo
23+
24+
# building
25+
target
26+
build
27+
null
28+
tmp*
29+
temp*
30+
dist
31+
test-output
32+
build.log
33+
out
34+
packages
35+
36+
# other scm
37+
.svn
38+
.CVS
39+
.hg*
40+
41+
# switch to regexp syntax.
42+
# syntax: regexp
43+
# ^\.pc/
44+
45+
#SHITTY output not in target directory
46+
build.log
47+
48+
#dropbox
49+
.DS_Store
128 KB
Loading
128 KB
Loading
Lines changed: 125 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,125 @@
1+
# In-Memory & Columnar Store Code Snack
2+
In this code snack, developers will experience the benefit of performing real-time operation analytics enabled by leveraging a memory optimized table in combination with a columnstore index. The Visual Studio project contains a load generator that will be used to simulate a write heavy workload. They will initially run the simulator against a disk based table with a clustered index (btree) and take note of the rows inserted per second, and will measure the performance of a provided analytics query while the system is under the heavy write load. They will then author the T-SQL to create the memory optimized table with a columnstore index, update the load generator to target the memory optimized table and observe the improved performance characteristics. This code snack targets SQL Server on Linux.
3+
4+
## Requirements
5+
- Visual Studio 2015 with Update 3 (or later)
6+
- [SQL Server Data Tools for Visual Studio 2015](https://msdn.microsoft.com/en-us/mt186501)
7+
- [SQL Server on Linux](https://www.microsoft.com/en-us/sql-server/sql-server-on-linux) running in [Docker](https://docs.docker.com/engine/installation/#/on-macos-and-windows)
8+
- Your SQL on Linux Server should have at least 4 GB of RAM
9+
10+
## Clone the provided project
11+
Clone this repo on to your local machine.
12+
The recommended path is C:\In-Memory and Columnar\
13+
14+
## Download the sample data
15+
This project requires a sample set of data you will load into SQL Server.
16+
Download the data from: [http://bit.ly/2envb8m](http://bit.ly/2envb8m)
17+
18+
## Copy the sample data to your SQL on Linux host
19+
1. On your Docker host machine, first grab the container ID of your SQL Server on Linux container. The Container ID is the value present in the first column.
20+
```
21+
docker ps
22+
```
23+
2. Copy over the sample data from the host to the container running SQL on Linux by using docker cp as follows (replace ContainerID with the container ID your retrieved). Note that the path /var/opt/mssql will map to C: in any T-SQL scripts executed against this server.
24+
```
25+
docker cp datapoints.bcp [ContainerID]:/var/opt/mssql/data/datapoints.bcp
26+
```
27+
3. Verify that the copy was successful. By connecting to bash within the container. Connect to your container (substitute your container ID in the command below) and list out the files:
28+
```
29+
docker exec -t -i [ContainerID] /bin/bash
30+
ls /var/opt/mssql/data -1 -s -h
31+
```
32+
4. You should see datapoints.bcp in the listing
33+
```
34+
-rw-r--r-- 1 root root 256 Nov 13 22:49 Entropy.bin
35+
-rw-r--r-- 1 root root 14M Nov 13 22:50 MSDBData.mdf
36+
-rw-r--r-- 1 root root 768K Nov 14 22:48 MSDBLog.ldf
37+
-rw-r--r-- 1 root root 161M Nov 11 04:18 datapoints.bcp
38+
-rw-r--r-- 1 root root 4.0M Nov 14 22:55 master.mdf
39+
-rw-r--r-- 1 root root 768K Nov 14 23:07 mastlog.ldf
40+
-rw-r--r-- 1 root root 8.0M Nov 14 23:00 model.mdf
41+
-rw-r--r-- 1 root root 8.0M Nov 14 23:00 modellog.ldf
42+
-rw-r--r-- 1 root root 264M Nov 14 23:07 tempdb.mdf
43+
-rw-r--r-- 1 root root 8.0M Nov 14 23:07 templog.ldf
44+
drwxr-xr-x 3 root root 4.0K Nov 14 23:01 xtp
45+
```
46+
5. You are all set to continue the lab in Visual Studio 2015.
47+
48+
## Create the database and tables
49+
1. Open the SqlLoadgenerator solution using Visual Studio 2015.
50+
2. From Solution Explorer, expand the SqlGenerator solution, then SQL Resources folder and open "Create Database.sql".
51+
3. Adjust the file paths for the FILENAME attributes if you installed SQL Server to a different location.
52+
4. Select the Execute button
53+
5. In the Connect dialog, provide your server name, authentication mode, username and password (as appropriate).
54+
6. Wait for the script to complete successfully.
55+
7. Within Visual Studio, open "Create Table- Disk Based.sql"
56+
8. Execute the script to create the DataPointsDiskBased table.
57+
This table will be used to store simulated IoT device telemetry, using traditional disk based table as well as clustered and non-clustered indexes on the fields commonly used in both point queries and analytic queries.
58+
9. Within Visual Studio, open "Create Table- In Memory.sql"
59+
10. Execute the script to create the DataPointsInMem table.
60+
This table will be used to store the same simulated IoT device telemetry, but this time using a memory optimized table as well as clustered column store index against all fields (which will support analytic queries) and non-clustered hash indexes on the id field (which will support point lookups common to transactional queries).
61+
```
62+
CREATE TABLE [DataPointsInMem] (
63+
-- ID should be a Primary Key, fields with a b-tree or hash index
64+
Id bigint IDENTITY NOT NULL PRIMARY KEY NONCLUSTERED HASH WITH (BUCKET_COUNT = 30000000),
65+
[Value] decimal(18,5),
66+
[TimestampUtc] datetime,
67+
DeviceId int,
68+
-- This table should have a columnar index
69+
INDEX Transactions_CCI CLUSTERED COLUMNSTORE
70+
) WITH (
71+
-- This should be an in-memory table
72+
MEMORY_OPTIMIZED = ON
73+
);
74+
75+
-- In-memory tables should auto-elevate their transaction level to Snapshot
76+
ALTER DATABASE CURRENT SET MEMORY_OPTIMIZED_ELEVATE_TO_SNAPSHOT=ON ;
77+
```
78+
79+
## Load initial data
80+
81+
1. Within Visual Studio, open "Load Sample Data.sql"
82+
2. Adjust the path to the DataPoint.bcp file so it matches the location of your BCP file within the SQL on Linux container (if necessary) and save the script. Recall the path /var/opt/mssql will map to C: in any T-SQL scripts executed against SQL Server on Linux.
83+
3. Execute the script to load each table with 4 million rows worth of sample data. This will take some time to complete.
84+
85+
## Execute the sample analytics query
86+
87+
1. Within Visual Studio, open "SampleQueries - DiskBased.sql".
88+
2. Execute the script to summarize the time series data stored in the disk based table.
89+
3. When the script completes, observe that 334 rows were returned.Take note of how long the query took to execute. Query time is shown in the bottom right of the document window in Visual Studio.
90+
![alt text][Disk Based Results]
91+
92+
[Disk Based Results]: images/DiskBasedResults.png "Disk Based Results"
93+
4. Now, execute the script to summarize the time series data stored in the memory-optimized table, in "SampleQueries - InMemory.sql".
94+
When the script completes, observe that 334 rows were returned.Take note of how long the query took to execute.
95+
You should notice that the performance of the query against the memory-optimized table runs between 2x-10x faster than the same query, running against the same data stored in a disk based table. Query time is shown in the bottom right of the document window in Visual Studio.
96+
![alt text][In-Memory Results]
97+
98+
[In-Memory Results]: images/InMemoryResults.png "In-Memory Results"
99+
100+
## Execute the queries under load
101+
102+
1. Within Visual Studio, Solution Explorer, expand the SqlLoadGenerator project and then open "App.config".
103+
2. Locate the connection string with the name "SqlConnection" and modify it so it points to your instance of SQL Server on Linux.
104+
3. Save the App.config.
105+
4. From the Debug menu, select Start Without Debugging.
106+
5. At the prompt, choose option 1 to target the disk based table.
107+
You should see log entries when every 1000 rows are inserted.
108+
Leave the console running (it should run for about 3 minutes) and return to Visual Studio.
109+
6. Open "SampleQueries - DiskBased.sql".
110+
7. Execute the script to summarize the time series data stored in the disk based table.
111+
8. Observe that more than 334 rows were returned.Take note of how long the query took to execute.
112+
9. Repeat the query a few times, waiting a few seconds in between queries to get a sense of how long the query takes, even as new rows are inserted by the load generator.
113+
10. Close the console load generator.
114+
11. Run the SqlLoadGenerator again.
115+
This time at the prompt, choose option 2 to target the memory-optimized table.
116+
You should see log entries when every 1000 rows are inserted.
117+
12. Leave the console running (it should run for about 3 minutes) and return to Visual Studio.
118+
13. Open "SampleQueries - In Memory.sql".
119+
14. Execute the script to summarize the time series data stored in the disk based table.
120+
15. Observe that more than 334 rows were returned.Take note of how long the query took to execute.
121+
Repeat the query a few times, waiting a few seconds in between queries to get a sense of how long the query takes, even as new rows are inserted by the load generator.
122+
16. Close the console load generator.
123+
124+
## Conclusion
125+
You should observe that while neither query was affected by the heavy insert load, the query against the analytics query continued to run 2x-10x faster than the same query against the disk-based table.
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
2+
-- For SQL Server 2016, create the Database with a master data file (mdf),
3+
-- the log data file (ldf) and a separate filegroup to support memory-optimized tables.
4+
CREATE DATABASE [Telemetry]
5+
CONTAINMENT = NONE
6+
ON PRIMARY
7+
( NAME = N'Telemetry', FILENAME = N'C:\data\Telemetry.mdf' , SIZE = 128MB , MAXSIZE = UNLIMITED, FILEGROWTH = 65536KB ),
8+
FILEGROUP [TelemetryInMem] CONTAINS MEMORY_OPTIMIZED_DATA DEFAULT
9+
( NAME = N'Telemetry_mem', FILENAME = N'C:\data\Telemetry_mem' , MAXSIZE = UNLIMITED)
10+
LOG ON
11+
( NAME = N'Telemetry_log', FILENAME = N'C:\data\Telemetry_log.ldf' , SIZE = 128MB , MAXSIZE = 2048GB , FILEGROWTH = 65536KB )
12+
GO
13+
14+
ALTER DATABASE [Telemetry] SET COMPATIBILITY_LEVEL = 130
15+
GO
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
USE Telemetry;
2+
GO
3+
4+
DROP INDEX IF EXISTS dbo.DataPointsDiskBased.IX_DeviceId;
5+
DROP INDEX IF EXISTS dbo.DataPointsDiskBased.IX_Timestamp;
6+
DROP TABLE IF EXISTS dbo.DataPointsDiskBased;
7+
8+
CREATE TABLE [DataPointsDiskBased] (
9+
Id bigint IDENTITY NOT NULL PRIMARY KEY CLUSTERED,
10+
[Value] decimal(18,5),
11+
[TimestampUtc] datetime,
12+
DeviceId int,
13+
);
14+
15+
CREATE NONCLUSTERED INDEX IX_DeviceId
16+
ON dbo.DataPointsDiskBased (DeviceId);
17+
GO
18+
19+
CREATE NONCLUSTERED INDEX IX_Timestamp
20+
ON dbo.DataPointsDiskBased (TimestampUtc);
21+
GO
22+
23+
24+
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
USE Telemetry;
2+
GO
3+
4+
DROP INDEX IF EXISTS dbo.DataPointsInMem.IX_DeviceId;
5+
DROP INDEX IF EXISTS dbo.DataPointsInMem.IX_Timestamp;
6+
DROP TABLE IF EXISTS dbo.DataPointsInMem;
7+
8+
CREATE TABLE [DataPointsInMem] (
9+
-- ID should be a Primary Key, fields with a b-tree or hash index
10+
Id bigint IDENTITY NOT NULL PRIMARY KEY NONCLUSTERED HASH WITH (BUCKET_COUNT = 30000000),
11+
[Value] decimal(18,5),
12+
[TimestampUtc] datetime,
13+
DeviceId int,
14+
-- This table should have a columnar index
15+
INDEX Transactions_CCI CLUSTERED COLUMNSTORE
16+
) WITH (
17+
-- This should be an in-memory table
18+
MEMORY_OPTIMIZED = ON
19+
);
20+
21+
-- In-memory tables should auto-elevate their transaction level to Snapshot
22+
ALTER DATABASE CURRENT SET MEMORY_OPTIMIZED_ELEVATE_TO_SNAPSHOT=ON ;
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
USE Telemetry;
2+
GO
3+
4+
BULK INSERT Telemetry.dbo.DataPointsDiskBased
5+
FROM 'C:\data\datapoints.bcp'
6+
WITH (
7+
DATAFILETYPE = 'native'
8+
);
9+
GO
10+
11+
BULK INSERT Telemetry.dbo.DataPointsInMem
12+
FROM 'C:\data\datapoints.bcp'
13+
WITH (
14+
DATAFILETYPE = 'native'
15+
);
16+
GO
17+
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
USE Telemetry;
2+
3+
SELECT
4+
Count(*) Counted,
5+
Max([Value]) MaxValue,
6+
Avg([Value]) AvgValue,
7+
Min([Value]) MinValue,
8+
DatePart(YYYY, TimestampUtc) [year],
9+
DatePart(MM, TimestampUtc) [month],
10+
DatePart(DD, TimestampUtc) [day],
11+
DatePart(hh, TimestampUtc) [hour],
12+
DatePart(mi, TimestampUtc) [minute],
13+
DatePart(ss, TimestampUtc) [second]
14+
FROM DataPointsDiskBased
15+
GROUP BY
16+
DatePart(YYYY, TimestampUtc),
17+
DatePart(MM, TimestampUtc),
18+
DatePart(DD, TimestampUtc),
19+
DatePart(hh, TimestampUtc),
20+
DatePart(mi, TimestampUtc),
21+
DatePart(ss, TimestampUtc)

0 commit comments

Comments
 (0)