|
| 1 | +# In-Memory & Columnar Store Code Snack |
| 2 | +In this code snack, developers will experience the benefit of performing real-time operation analytics enabled by leveraging a memory optimized table in combination with a columnstore index. The Visual Studio project contains a load generator that will be used to simulate a write heavy workload. They will initially run the simulator against a disk based table with a clustered index (btree) and take note of the rows inserted per second, and will measure the performance of a provided analytics query while the system is under the heavy write load. They will then author the T-SQL to create the memory optimized table with a columnstore index, update the load generator to target the memory optimized table and observe the improved performance characteristics. This code snack targets SQL Server on Linux. |
| 3 | + |
| 4 | +## Requirements |
| 5 | +- Visual Studio 2015 with Update 3 (or later) |
| 6 | +- [SQL Server Data Tools for Visual Studio 2015](https://msdn.microsoft.com/en-us/mt186501) |
| 7 | +- [SQL Server on Linux](https://www.microsoft.com/en-us/sql-server/sql-server-on-linux) running in [Docker](https://docs.docker.com/engine/installation/#/on-macos-and-windows) |
| 8 | +- Your SQL on Linux Server should have at least 4 GB of RAM |
| 9 | + |
| 10 | +## Clone the provided project |
| 11 | +Clone this repo on to your local machine. |
| 12 | +The recommended path is C:\In-Memory and Columnar\ |
| 13 | + |
| 14 | +## Download the sample data |
| 15 | +This project requires a sample set of data you will load into SQL Server. |
| 16 | +Download the data from: [http://bit.ly/2envb8m](http://bit.ly/2envb8m) |
| 17 | + |
| 18 | +## Copy the sample data to your SQL on Linux host |
| 19 | +1. On your Docker host machine, first grab the container ID of your SQL Server on Linux container. The Container ID is the value present in the first column. |
| 20 | +``` |
| 21 | +docker ps |
| 22 | +``` |
| 23 | +2. Copy over the sample data from the host to the container running SQL on Linux by using docker cp as follows (replace ContainerID with the container ID your retrieved). Note that the path /var/opt/mssql will map to C: in any T-SQL scripts executed against this server. |
| 24 | +``` |
| 25 | +docker cp datapoints.bcp [ContainerID]:/var/opt/mssql/data/datapoints.bcp |
| 26 | +``` |
| 27 | +3. Verify that the copy was successful. By connecting to bash within the container. Connect to your container (substitute your container ID in the command below) and list out the files: |
| 28 | +``` |
| 29 | +docker exec -t -i [ContainerID] /bin/bash |
| 30 | +ls /var/opt/mssql/data -1 -s -h |
| 31 | +``` |
| 32 | +4. You should see datapoints.bcp in the listing |
| 33 | +``` |
| 34 | +-rw-r--r-- 1 root root 256 Nov 13 22:49 Entropy.bin |
| 35 | +-rw-r--r-- 1 root root 14M Nov 13 22:50 MSDBData.mdf |
| 36 | +-rw-r--r-- 1 root root 768K Nov 14 22:48 MSDBLog.ldf |
| 37 | +-rw-r--r-- 1 root root 161M Nov 11 04:18 datapoints.bcp |
| 38 | +-rw-r--r-- 1 root root 4.0M Nov 14 22:55 master.mdf |
| 39 | +-rw-r--r-- 1 root root 768K Nov 14 23:07 mastlog.ldf |
| 40 | +-rw-r--r-- 1 root root 8.0M Nov 14 23:00 model.mdf |
| 41 | +-rw-r--r-- 1 root root 8.0M Nov 14 23:00 modellog.ldf |
| 42 | +-rw-r--r-- 1 root root 264M Nov 14 23:07 tempdb.mdf |
| 43 | +-rw-r--r-- 1 root root 8.0M Nov 14 23:07 templog.ldf |
| 44 | +drwxr-xr-x 3 root root 4.0K Nov 14 23:01 xtp |
| 45 | +``` |
| 46 | +5. You are all set to continue the lab in Visual Studio 2015. |
| 47 | + |
| 48 | +## Create the database and tables |
| 49 | +1. Open the SqlLoadgenerator solution using Visual Studio 2015. |
| 50 | +2. From Solution Explorer, expand the SqlGenerator solution, then SQL Resources folder and open "Create Database.sql". |
| 51 | +3. Adjust the file paths for the FILENAME attributes if you installed SQL Server to a different location. |
| 52 | +4. Select the Execute button |
| 53 | +5. In the Connect dialog, provide your server name, authentication mode, username and password (as appropriate). |
| 54 | +6. Wait for the script to complete successfully. |
| 55 | +7. Within Visual Studio, open "Create Table- Disk Based.sql" |
| 56 | +8. Execute the script to create the DataPointsDiskBased table. |
| 57 | +This table will be used to store simulated IoT device telemetry, using traditional disk based table as well as clustered and non-clustered indexes on the fields commonly used in both point queries and analytic queries. |
| 58 | +9. Within Visual Studio, open "Create Table- In Memory.sql" |
| 59 | +10. Execute the script to create the DataPointsInMem table. |
| 60 | +This table will be used to store the same simulated IoT device telemetry, but this time using a memory optimized table as well as clustered column store index against all fields (which will support analytic queries) and non-clustered hash indexes on the id field (which will support point lookups common to transactional queries). |
| 61 | +``` |
| 62 | +CREATE TABLE [DataPointsInMem] ( |
| 63 | + -- ID should be a Primary Key, fields with a b-tree or hash index |
| 64 | + Id bigint IDENTITY NOT NULL PRIMARY KEY NONCLUSTERED HASH WITH (BUCKET_COUNT = 30000000), |
| 65 | + [Value] decimal(18,5), |
| 66 | + [TimestampUtc] datetime, |
| 67 | + DeviceId int, |
| 68 | + -- This table should have a columnar index |
| 69 | + INDEX Transactions_CCI CLUSTERED COLUMNSTORE |
| 70 | +) WITH ( |
| 71 | + -- This should be an in-memory table |
| 72 | + MEMORY_OPTIMIZED = ON |
| 73 | +); |
| 74 | +
|
| 75 | +-- In-memory tables should auto-elevate their transaction level to Snapshot |
| 76 | +ALTER DATABASE CURRENT SET MEMORY_OPTIMIZED_ELEVATE_TO_SNAPSHOT=ON ; |
| 77 | +``` |
| 78 | + |
| 79 | +## Load initial data |
| 80 | + |
| 81 | +1. Within Visual Studio, open "Load Sample Data.sql" |
| 82 | +2. Adjust the path to the DataPoint.bcp file so it matches the location of your BCP file within the SQL on Linux container (if necessary) and save the script. Recall the path /var/opt/mssql will map to C: in any T-SQL scripts executed against SQL Server on Linux. |
| 83 | +3. Execute the script to load each table with 4 million rows worth of sample data. This will take some time to complete. |
| 84 | + |
| 85 | +## Execute the sample analytics query |
| 86 | + |
| 87 | +1. Within Visual Studio, open "SampleQueries - DiskBased.sql". |
| 88 | +2. Execute the script to summarize the time series data stored in the disk based table. |
| 89 | +3. When the script completes, observe that 334 rows were returned.Take note of how long the query took to execute. Query time is shown in the bottom right of the document window in Visual Studio. |
| 90 | +![alt text][Disk Based Results] |
| 91 | + |
| 92 | +[Disk Based Results]: images/DiskBasedResults.png "Disk Based Results" |
| 93 | +4. Now, execute the script to summarize the time series data stored in the memory-optimized table, in "SampleQueries - InMemory.sql". |
| 94 | +When the script completes, observe that 334 rows were returned.Take note of how long the query took to execute. |
| 95 | +You should notice that the performance of the query against the memory-optimized table runs between 2x-10x faster than the same query, running against the same data stored in a disk based table. Query time is shown in the bottom right of the document window in Visual Studio. |
| 96 | +![alt text][In-Memory Results] |
| 97 | + |
| 98 | +[In-Memory Results]: images/InMemoryResults.png "In-Memory Results" |
| 99 | + |
| 100 | +## Execute the queries under load |
| 101 | + |
| 102 | +1. Within Visual Studio, Solution Explorer, expand the SqlLoadGenerator project and then open "App.config". |
| 103 | +2. Locate the connection string with the name "SqlConnection" and modify it so it points to your instance of SQL Server on Linux. |
| 104 | +3. Save the App.config. |
| 105 | +4. From the Debug menu, select Start Without Debugging. |
| 106 | +5. At the prompt, choose option 1 to target the disk based table. |
| 107 | +You should see log entries when every 1000 rows are inserted. |
| 108 | +Leave the console running (it should run for about 3 minutes) and return to Visual Studio. |
| 109 | +6. Open "SampleQueries - DiskBased.sql". |
| 110 | +7. Execute the script to summarize the time series data stored in the disk based table. |
| 111 | +8. Observe that more than 334 rows were returned.Take note of how long the query took to execute. |
| 112 | +9. Repeat the query a few times, waiting a few seconds in between queries to get a sense of how long the query takes, even as new rows are inserted by the load generator. |
| 113 | +10. Close the console load generator. |
| 114 | +11. Run the SqlLoadGenerator again. |
| 115 | +This time at the prompt, choose option 2 to target the memory-optimized table. |
| 116 | +You should see log entries when every 1000 rows are inserted. |
| 117 | +12. Leave the console running (it should run for about 3 minutes) and return to Visual Studio. |
| 118 | +13. Open "SampleQueries - In Memory.sql". |
| 119 | +14. Execute the script to summarize the time series data stored in the disk based table. |
| 120 | +15. Observe that more than 334 rows were returned.Take note of how long the query took to execute. |
| 121 | +Repeat the query a few times, waiting a few seconds in between queries to get a sense of how long the query takes, even as new rows are inserted by the load generator. |
| 122 | +16. Close the console load generator. |
| 123 | + |
| 124 | +## Conclusion |
| 125 | +You should observe that while neither query was affected by the heavy insert load, the query against the analytics query continued to run 2x-10x faster than the same query against the disk-based table. |
0 commit comments