|
| 1 | +## add_dimension() |
| 2 | + |
| 3 | +Add an additional partitioning dimension to a TimescaleDB hypertable. |
| 4 | +The column selected as the dimension can either use interval |
| 5 | +partitioning (e.g., for a second time partition) or hash partitioning. |
| 6 | + |
| 7 | +<highlight type="warning"> |
| 8 | + The `add_dimension` command can only be executed after a table has been |
| 9 | +converted to a hypertable (via `create_hypertable`), but must similarly |
| 10 | +be run only on an empty hypertable. |
| 11 | +</highlight> |
| 12 | + |
| 13 | +**Space partitions**: Using space partitions is highly recommended |
| 14 | +for [distributed hypertables](/distributed-hypertables/create_distributed_hypertable) to achieve |
| 15 | +efficient scale-out performance. For [regular hypertables](/hypertable/create_hypertable) |
| 16 | +that exist only on a single node, additional partitioning can be used |
| 17 | +for specialized use cases and not recommended for most users. |
| 18 | + |
| 19 | +Space partitions use hashing: Every distinct item is hashed to one of |
| 20 | +*N* buckets. Remember that we are already using (flexible) time |
| 21 | +intervals to manage chunk sizes; the main purpose of space |
| 22 | +partitioning is to enable parallelization across multiple |
| 23 | +data nodes (in the case of distributed hypertables) or |
| 24 | +across multiple disks within the same time interval |
| 25 | +(in the case of single-node deployments). |
| 26 | + |
| 27 | +### Parallelizing queries across multiple data nodes |
| 28 | + |
| 29 | +In a distributed hypertable, space partitioning enables inserts to be |
| 30 | +parallelized across data nodes, even while the inserted rows share |
| 31 | +timestamps from the same time interval, and thus increases the ingest rate. |
| 32 | +Query performance also benefits by being able to parallelize queries |
| 33 | +across nodes, particularly when full or partial aggregations can be |
| 34 | +"pushed down" to data nodes (e.g., as in the query |
| 35 | +`avg(temperature) FROM conditions GROUP BY hour, location` |
| 36 | +when using `location` as a space partition). Please see our |
| 37 | +[best practices about partitioning in distributed hypertables][distributed-hypertable-partitioning-best-practices] |
| 38 | +for more information. |
| 39 | + |
| 40 | +### Parallelizing disk I/O on a single node |
| 41 | + |
| 42 | +Parallel I/O can benefit in two scenarios: (a) two or more concurrent |
| 43 | +queries should be able to read from different disks in parallel, or |
| 44 | +(b) a single query should be able to use query parallelization to read |
| 45 | +from multiple disks in parallel. |
| 46 | + |
| 47 | +Thus, users looking for parallel I/O have two options: |
| 48 | + |
| 49 | +1. Use a RAID setup across multiple physical disks, and expose a |
| 50 | +single logical disk to the hypertable (i.e., via a single tablespace). |
| 51 | + |
| 52 | +1. For each physical disk, add a separate tablespace to the |
| 53 | +database. TimescaleDB allows you to actually add multiple tablespaces |
| 54 | +to a *single* hypertable (although under the covers, a hypertable's |
| 55 | +chunks are spread across the tablespaces associated with that hypertable). |
| 56 | + |
| 57 | +We recommend a RAID setup when possible, as it supports both forms of |
| 58 | +parallelization described above (i.e., separate queries to separate |
| 59 | +disks, single query to multiple disks in parallel). The multiple |
| 60 | +tablespace approach only supports the former. With a RAID setup, |
| 61 | +*no spatial partitioning is required*. |
| 62 | + |
| 63 | +That said, when using space partitions, we recommend using 1 |
| 64 | +space partition per disk. |
| 65 | + |
| 66 | +TimescaleDB does *not* benefit from a very large number of space |
| 67 | +partitions (such as the number of unique items you expect in partition |
| 68 | +field). A very large number of such partitions leads both to poorer |
| 69 | +per-partition load balancing (the mapping of items to partitions using |
| 70 | +hashing), as well as much increased planning latency for some types of |
| 71 | +queries. |
| 72 | + |
| 73 | +### Required Arguments |
| 74 | + |
| 75 | +|Name|Type|Description| |
| 76 | +|---|---|---| |
| 77 | +| `hypertable` | REGCLASS | Hypertable to add the dimension to.| |
| 78 | +| `column_name` | TEXT | Column to partition by.| |
| 79 | + |
| 80 | +### Optional Arguments |
| 81 | + |
| 82 | +|Name|Type|Description| |
| 83 | +|---|---|---| |
| 84 | +| `number_partitions` | INTEGER | Number of hash partitions to use on `column_name`. Must be > 0.| |
| 85 | +| `chunk_time_interval` | INTERVAL | Interval that each chunk covers. Must be > 0.| |
| 86 | +| `partitioning_func` | REGCLASS | The function to use for calculating a value's partition (see `create_hypertable` [instructions](/hypertable/create_hypertable)).| |
| 87 | +| `if_not_exists` | BOOLEAN | Set to true to avoid throwing an error if a dimension for the column already exists. A notice is issued instead. Defaults to false. | |
| 88 | + |
| 89 | +### Returns |
| 90 | + |
| 91 | +|Column|Type|Description| |
| 92 | +|---|---|---| |
| 93 | +| `dimension_id` | INTEGER |ID of the dimension in the TimescaleDB internal catalog. | |
| 94 | +| `schema_name` | TEXT | Schema name of the hypertable.| |
| 95 | +| `table_name` | TEXT | Table name of the hypertable. | |
| 96 | +| `column_name` | TEXT | Column name of the column to partition by. | |
| 97 | +| `created` | BOOLEAN | True if the dimension was added, false when `if_not_exists` is true and no dimension was added. | |
| 98 | + |
| 99 | +When executing this function, either `number_partitions` or |
| 100 | +`chunk_time_interval` must be supplied, which will dictate if the |
| 101 | +dimension will use hash or interval partitioning. |
| 102 | + |
| 103 | +The `chunk_time_interval` should be specified as follows: |
| 104 | + |
| 105 | +- If the column to be partitioned is a TIMESTAMP, TIMESTAMPTZ, or |
| 106 | +DATE, this length should be specified either as an INTERVAL type or |
| 107 | +an integer value in *microseconds*. |
| 108 | + |
| 109 | +- If the column is some other integer type, this length |
| 110 | +should be an integer that reflects |
| 111 | +the column's underlying semantics (e.g., the |
| 112 | +`chunk_time_interval` should be given in milliseconds if this column |
| 113 | +is the number of milliseconds since the UNIX epoch). |
| 114 | + |
| 115 | +<highlight type="warning"> |
| 116 | + Supporting more than **one** additional dimension is currently |
| 117 | + experimental. For any production environments, users are recommended |
| 118 | + to use at most one "space" dimension. |
| 119 | +</highlight> |
| 120 | + |
| 121 | +### Sample Usage |
| 122 | + |
| 123 | +First convert table `conditions` to hypertable with just time |
| 124 | +partitioning on column `time`, then add an additional partition key on `location` with four partitions: |
| 125 | +```sql |
| 126 | +SELECT create_hypertable('conditions', by_range('time')); |
| 127 | +SELECT add_dimension('conditions', 'location', number_partitions => 4); |
| 128 | +``` |
| 129 | + |
| 130 | +Convert table `conditions` to hypertable with time partitioning on `time` and |
| 131 | +space partitioning (2 partitions) on `location`, then add two additional dimensions. |
| 132 | + |
| 133 | +```sql |
| 134 | +SELECT create_hypertable('conditions', 'time', 'location', 2); |
| 135 | +SELECT add_dimension('conditions', 'time_received', chunk_time_interval => INTERVAL '1 day'); |
| 136 | +SELECT add_dimension('conditions', 'device_id', number_partitions => 2); |
| 137 | +SELECT add_dimension('conditions', 'device_id', number_partitions => 2, if_not_exists => true); |
| 138 | +``` |
| 139 | + |
| 140 | +Now in a multi-node example for distributed hypertables with a cluster |
| 141 | +of one access node and two data nodes, configure the access node for |
| 142 | +access to the two data nodes. Then, convert table `conditions` to |
| 143 | +a distributed hypertable with just time partitioning on column `time`, |
| 144 | +and finally add a space partitioning dimension on `location` |
| 145 | +with two partitions (as the number of the attached data nodes). |
| 146 | + |
| 147 | +```sql |
| 148 | +SELECT add_data_node('dn1', host => 'dn1.example.com'); |
| 149 | +SELECT add_data_node('dn2', host => 'dn2.example.com'); |
| 150 | +SELECT create_distributed_hypertable('conditions', 'time'); |
| 151 | +SELECT add_dimension('conditions', 'location', number_partitions => 2); |
| 152 | +``` |
0 commit comments