Skip to content

Commit 204f7f2

Browse files
committed
Updating documentation to use by_range and by_hash
The new generalized hypertable API uses a drastically simplified calling convention where you can provide a dimension object to the `create_hypertable` and `add_dimension` function. The documentation is changed to reflect this. The old interface is still available in the documentation.
1 parent 07263dc commit 204f7f2

File tree

14 files changed

+562
-147
lines changed

14 files changed

+562
-147
lines changed

api/add_dimension.md

Lines changed: 23 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,16 @@
11
## add_dimension()
22

33
Add an additional partitioning dimension to a TimescaleDB hypertable.
4-
The column selected as the dimension can either use interval
5-
partitioning (e.g., for a second time partition) or hash partitioning.
4+
The column selected as the dimension can either use range partitioning
5+
(e.g., for a second time partition) or hash partitioning.
6+
7+
**Note: this reference describes the new generalized hypertable
8+
API. The [old interface for `add_dimension` is also
9+
available](add_dimension_old.md).**
10+
611

712
<highlight type="warning">
8-
The `add_dimension` command can only be executed after a table has been
13+
The `add_dimension` command can only be executed after a table has been
914
converted to a hypertable (via `create_hypertable`), but must similarly
1015
be run only on an empty hypertable.
1116
</highlight>
@@ -72,69 +77,42 @@ queries.
7277

7378
### Required Arguments
7479

75-
|Name|Type|Description|
76-
|---|---|---|
77-
| `hypertable` | REGCLASS | Hypertable to add the dimension to.|
78-
| `column_name` | TEXT | Column to partition by.|
80+
| Name | Type | Description |
81+
|--------------|----------------|-------------------------------------|
82+
| `hypertable` | REGCLASS | Hypertable to add the dimension to. |
83+
| `dimension` | DIMENSION_INFO | Dimension to partition by. |
7984

8085
### Optional Arguments
8186

8287
|Name|Type|Description|
8388
|---|---|---|
84-
| `number_partitions` | INTEGER | Number of hash partitions to use on `column_name`. Must be > 0.|
85-
| `chunk_time_interval` | INTERVAL | Interval that each chunk covers. Must be > 0.|
86-
| `partitioning_func` | REGCLASS | The function to use for calculating a value's partition (see `create_hypertable` [instructions](/hypertable/create_hypertable)).|
8789
| `if_not_exists` | BOOLEAN | Set to true to avoid throwing an error if a dimension for the column already exists. A notice is issued instead. Defaults to false. |
8890

8991
### Returns
9092

91-
|Column|Type|Description|
92-
|---|---|---|
93-
| `dimension_id` | INTEGER |ID of the dimension in the TimescaleDB internal catalog. |
94-
| `schema_name` | TEXT | Schema name of the hypertable.|
95-
| `table_name` | TEXT | Table name of the hypertable. |
96-
| `column_name` | TEXT | Column name of the column to partition by. |
97-
| `created` | BOOLEAN | True if the dimension was added, false when `if_not_exists` is true and no dimension was added. |
98-
99-
When executing this function, either `number_partitions` or
100-
`chunk_time_interval` must be supplied, which will dictate if the
101-
dimension will use hash or interval partitioning.
102-
103-
The `chunk_time_interval` should be specified as follows:
104-
105-
- If the column to be partitioned is a TIMESTAMP, TIMESTAMPTZ, or
106-
DATE, this length should be specified either as an INTERVAL type or
107-
an integer value in *microseconds*.
108-
109-
- If the column is some other integer type, this length
110-
should be an integer that reflects
111-
the column's underlying semantics (e.g., the
112-
`chunk_time_interval` should be given in milliseconds if this column
113-
is the number of milliseconds since the UNIX epoch).
114-
115-
<highlight type="warning">
116-
Supporting more than **one** additional dimension is currently
117-
experimental. For any production environments, users are recommended
118-
to use at most one "space" dimension.
119-
</highlight>
93+
| Column | Type | Description |
94+
|----------------|---------|----------------------------------------------------------|
95+
| `dimension_id` | INTEGER | ID of the dimension in the TimescaleDB internal catalog. |
96+
| `created` | BOOLEAN | True if the dimension was added, false otherwise. |
12097

12198
### Sample Usage
12299

123100
First convert table `conditions` to hypertable with just time
124101
partitioning on column `time`, then add an additional partition key on `location` with four partitions:
125102
```sql
126103
SELECT create_hypertable('conditions', by_range('time'));
127-
SELECT add_dimension('conditions', 'location', number_partitions => 4);
104+
SELECT add_dimension('conditions', by_hash('location', 4));
128105
```
129106

130107
Convert table `conditions` to hypertable with time partitioning on `time` and
131108
space partitioning (2 partitions) on `location`, then add two additional dimensions.
132109

133110
```sql
134-
SELECT create_hypertable('conditions', 'time', 'location', 2);
135-
SELECT add_dimension('conditions', 'time_received', chunk_time_interval => INTERVAL '1 day');
136-
SELECT add_dimension('conditions', 'device_id', number_partitions => 2);
137-
SELECT add_dimension('conditions', 'device_id', number_partitions => 2, if_not_exists => true);
111+
SELECT create_hypertable('conditions', by_range('time'));
112+
SELECT add_dimension('conditions', by_hash('location', 2));
113+
SELECT add_dimension('conditions', by_range('time_received', INTERVAL '1 day'));
114+
SELECT add_dimension('conditions', by_hash('device_id', 2));
115+
SELECT add_dimension('conditions', by_hash('device_id', 2), if_not_exists => true);
138116
```
139117

140118
Now in a multi-node example for distributed hypertables with a cluster
@@ -148,5 +126,5 @@ with two partitions (as the number of the attached data nodes).
148126
SELECT add_data_node('dn1', host => 'dn1.example.com');
149127
SELECT add_data_node('dn2', host => 'dn2.example.com');
150128
SELECT create_distributed_hypertable('conditions', 'time');
151-
SELECT add_dimension('conditions', 'location', number_partitions => 2);
129+
SELECT add_dimension('conditions', by_hash('location', 2));
152130
```

api/add_dimension_old.md

Lines changed: 152 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,152 @@
1+
## add_dimension()
2+
3+
Add an additional partitioning dimension to a TimescaleDB hypertable.
4+
The column selected as the dimension can either use interval
5+
partitioning (e.g., for a second time partition) or hash partitioning.
6+
7+
<highlight type="warning">
8+
The `add_dimension` command can only be executed after a table has been
9+
converted to a hypertable (via `create_hypertable`), but must similarly
10+
be run only on an empty hypertable.
11+
</highlight>
12+
13+
**Space partitions**: Using space partitions is highly recommended
14+
for [distributed hypertables](/distributed-hypertables/create_distributed_hypertable) to achieve
15+
efficient scale-out performance. For [regular hypertables](/hypertable/create_hypertable)
16+
that exist only on a single node, additional partitioning can be used
17+
for specialized use cases and not recommended for most users.
18+
19+
Space partitions use hashing: Every distinct item is hashed to one of
20+
*N* buckets. Remember that we are already using (flexible) time
21+
intervals to manage chunk sizes; the main purpose of space
22+
partitioning is to enable parallelization across multiple
23+
data nodes (in the case of distributed hypertables) or
24+
across multiple disks within the same time interval
25+
(in the case of single-node deployments).
26+
27+
### Parallelizing queries across multiple data nodes
28+
29+
In a distributed hypertable, space partitioning enables inserts to be
30+
parallelized across data nodes, even while the inserted rows share
31+
timestamps from the same time interval, and thus increases the ingest rate.
32+
Query performance also benefits by being able to parallelize queries
33+
across nodes, particularly when full or partial aggregations can be
34+
"pushed down" to data nodes (e.g., as in the query
35+
`avg(temperature) FROM conditions GROUP BY hour, location`
36+
when using `location` as a space partition). Please see our
37+
[best practices about partitioning in distributed hypertables][distributed-hypertable-partitioning-best-practices]
38+
for more information.
39+
40+
### Parallelizing disk I/O on a single node
41+
42+
Parallel I/O can benefit in two scenarios: (a) two or more concurrent
43+
queries should be able to read from different disks in parallel, or
44+
(b) a single query should be able to use query parallelization to read
45+
from multiple disks in parallel.
46+
47+
Thus, users looking for parallel I/O have two options:
48+
49+
1. Use a RAID setup across multiple physical disks, and expose a
50+
single logical disk to the hypertable (i.e., via a single tablespace).
51+
52+
1. For each physical disk, add a separate tablespace to the
53+
database. TimescaleDB allows you to actually add multiple tablespaces
54+
to a *single* hypertable (although under the covers, a hypertable's
55+
chunks are spread across the tablespaces associated with that hypertable).
56+
57+
We recommend a RAID setup when possible, as it supports both forms of
58+
parallelization described above (i.e., separate queries to separate
59+
disks, single query to multiple disks in parallel). The multiple
60+
tablespace approach only supports the former. With a RAID setup,
61+
*no spatial partitioning is required*.
62+
63+
That said, when using space partitions, we recommend using 1
64+
space partition per disk.
65+
66+
TimescaleDB does *not* benefit from a very large number of space
67+
partitions (such as the number of unique items you expect in partition
68+
field). A very large number of such partitions leads both to poorer
69+
per-partition load balancing (the mapping of items to partitions using
70+
hashing), as well as much increased planning latency for some types of
71+
queries.
72+
73+
### Required Arguments
74+
75+
|Name|Type|Description|
76+
|---|---|---|
77+
| `hypertable` | REGCLASS | Hypertable to add the dimension to.|
78+
| `column_name` | TEXT | Column to partition by.|
79+
80+
### Optional Arguments
81+
82+
|Name|Type|Description|
83+
|---|---|---|
84+
| `number_partitions` | INTEGER | Number of hash partitions to use on `column_name`. Must be > 0.|
85+
| `chunk_time_interval` | INTERVAL | Interval that each chunk covers. Must be > 0.|
86+
| `partitioning_func` | REGCLASS | The function to use for calculating a value's partition (see `create_hypertable` [instructions](/hypertable/create_hypertable)).|
87+
| `if_not_exists` | BOOLEAN | Set to true to avoid throwing an error if a dimension for the column already exists. A notice is issued instead. Defaults to false. |
88+
89+
### Returns
90+
91+
|Column|Type|Description|
92+
|---|---|---|
93+
| `dimension_id` | INTEGER |ID of the dimension in the TimescaleDB internal catalog. |
94+
| `schema_name` | TEXT | Schema name of the hypertable.|
95+
| `table_name` | TEXT | Table name of the hypertable. |
96+
| `column_name` | TEXT | Column name of the column to partition by. |
97+
| `created` | BOOLEAN | True if the dimension was added, false when `if_not_exists` is true and no dimension was added. |
98+
99+
When executing this function, either `number_partitions` or
100+
`chunk_time_interval` must be supplied, which will dictate if the
101+
dimension will use hash or interval partitioning.
102+
103+
The `chunk_time_interval` should be specified as follows:
104+
105+
- If the column to be partitioned is a TIMESTAMP, TIMESTAMPTZ, or
106+
DATE, this length should be specified either as an INTERVAL type or
107+
an integer value in *microseconds*.
108+
109+
- If the column is some other integer type, this length
110+
should be an integer that reflects
111+
the column's underlying semantics (e.g., the
112+
`chunk_time_interval` should be given in milliseconds if this column
113+
is the number of milliseconds since the UNIX epoch).
114+
115+
<highlight type="warning">
116+
Supporting more than **one** additional dimension is currently
117+
experimental. For any production environments, users are recommended
118+
to use at most one "space" dimension.
119+
</highlight>
120+
121+
### Sample Usage
122+
123+
First convert table `conditions` to hypertable with just time
124+
partitioning on column `time`, then add an additional partition key on `location` with four partitions:
125+
```sql
126+
SELECT create_hypertable('conditions', by_range('time'));
127+
SELECT add_dimension('conditions', 'location', number_partitions => 4);
128+
```
129+
130+
Convert table `conditions` to hypertable with time partitioning on `time` and
131+
space partitioning (2 partitions) on `location`, then add two additional dimensions.
132+
133+
```sql
134+
SELECT create_hypertable('conditions', 'time', 'location', 2);
135+
SELECT add_dimension('conditions', 'time_received', chunk_time_interval => INTERVAL '1 day');
136+
SELECT add_dimension('conditions', 'device_id', number_partitions => 2);
137+
SELECT add_dimension('conditions', 'device_id', number_partitions => 2, if_not_exists => true);
138+
```
139+
140+
Now in a multi-node example for distributed hypertables with a cluster
141+
of one access node and two data nodes, configure the access node for
142+
access to the two data nodes. Then, convert table `conditions` to
143+
a distributed hypertable with just time partitioning on column `time`,
144+
and finally add a space partitioning dimension on `location`
145+
with two partitions (as the number of the attached data nodes).
146+
147+
```sql
148+
SELECT add_data_node('dn1', host => 'dn1.example.com');
149+
SELECT add_data_node('dn2', host => 'dn2.example.com');
150+
SELECT create_distributed_hypertable('conditions', 'time');
151+
SELECT add_dimension('conditions', 'location', number_partitions => 2);
152+
```

api/chunks.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ Get information about the chunks of a hypertable.
3636
CREATE TABLESPACE tablespace1 location '/usr/local/pgsql/data1';
3737

3838
CREATE TABLE hyper_int (a_col integer, b_col integer, c integer);
39-
SELECT table_name from create_hypertable('hyper_int', 'a_col', chunk_time_interval=> 10);
39+
SELECT table_name from create_hypertable('hyper_int', by_range('a_col', 10));
4040
CREATE OR REPLACE FUNCTION integer_now_hyper_int() returns int LANGUAGE SQL STABLE as $$ SELECT coalesce(max(a_col), 0) FROM hyper_int $$;
4141
SELECT set_integer_now_func('hyper_int', 'integer_now_hyper_int');
4242

0 commit comments

Comments
 (0)