You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Our hypertable needs to have a primary dimension. In the example, we are showing the classic time-series use case with the `time` column as the primary dimension that is used for partitioning. Besides that we have two columns `cpu` and `disk_io` which are essentially value columns that we are capturing over time. There is also another column `device_id` which is used as a designator or a lookup key which designates which device these captured values belong to at a certain point in time.
39
+
Columns can be used in a few different ways:
40
+
You can use values in a column as a lookup key, in the example above the device_id is a typical example of such a column.
41
+
You can use a column for partitioning a table. This is typically a time column, but it is possible to partition the table using other columns as well.
42
+
You can use a column as a filter to narrow down on what data you select. The column device_type is an example of such a column where you can decide to only look at, for example, solid state drives (SSDs)
43
+
The remaining columns typically are the values or metrics you are querying for. They are typically aggregated or presented in other ways. The columns `cpu` and `disk_io` are typical examples of such columns.
44
+
An example query using value columns and filter on time and device type could look like this:
When chunks are compressed in a hypertable, data stored in them is reorganized and stored in column-order rather than row-order. As a result, it is not possible to use the same uncompressed schema version of the chunk and a different schema must be created. This is automatically handled by TimescaleDB, but it has a few implications:
54
+
The compression ratio and query performance is very dependent on the order and structure of the compressed data, so some considerations are needed when setting up compression.
55
+
Indexes on the hypertable cannot always be used in the same manner for the compressed data.
56
+
57
+
58
+
Based on the previous schema, filtering of data should happen over a certain time period and analytics are done on device granularity. This pattern of data access lends itself to organizing the data layout suitable for compression.
59
+
60
+
### Segmenting and ordering.
61
+
62
+
Segmenting the compressed data should be based on the way you access the data. Basically, you want to segment your data in such a way that you can make it easier for your queries to fetch the right data at the right time. That is to say, your queries should dictate how you segment the data so they can be optimized and yield even better query performance.
63
+
64
+
For example, If you want to access a single device using a specific `device_id` value (either all records or maybe for a specific time range), you would need to filter all those records one by one during row access time. To get around this, you can use device_id column for segmenting. This would allow you to run analytical queries on compressed data much faster if you are looking for specific device IDs.
Ordering the data will have a great impact on the compression ratio since we want to have rows that change over a dimension (most likely time) close to each other. Most of the time data changes in a predictable fashion, following a certain trend. We can exploit this fact to encode the data so it takes less space to store. For example, if you order the records over time, they will get compressed in that order and subsequently also accessed in the same order.
101
+
102
+
103
+
This makes the time column a perfect candidate for ordering your data since the measurements evolve as time goes on. If you were to use that as your only compression setting, you would most likely get a good enough compression ratio to save a lot of storage. However, accessing the data effectively depends on your use case and your queries. With this setup, you would always have to access the data by using the time dimension and subsequently filter all the rows based on any other criteria.
104
+
105
+
[insert query showing what happens when querying compressed chunks with orderby time]
0 commit comments