Skip to content

Commit 1493930

Browse files
committed
Move historical data into decompress
1 parent 13a3d50 commit 1493930

File tree

2 files changed

+112
-3
lines changed

2 files changed

+112
-3
lines changed

timescaledb/how-to-guides/compression/decompress-chunks.md

Lines changed: 111 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ additional storage capacity for decompressing chunks if you need to.
1212
These are the main steps for decompressing chunks in preparation for inserting
1313
or backfilling data:
1414
1. Temporarily turn off any existing compression policy. This stops the policy
15-
trying to compress chunks that you are currently working on.
15+
trying to compress chunks that you are currently working on.
1616
1. Decompress chunks.
1717
1. Perform the insertion or backfill.
1818
1. Re-enable the compression policy. This will re-compress the chunks you worked on.
@@ -45,3 +45,113 @@ SELECT tableoid::regclass FROM metrics
4545
------------------------------------------
4646
_timescaledb_internal._hyper_72_37_chunk
4747
```
48+
49+
# Backfill historical data on compressed chunks
50+
When you backfill data, you are inserting data that has a timestamp in the past into a corresponding chunk that has already been compressed. If you need to insert a batch of backfilled data, the [TimescaleDB extras][timescaledb-extras] GitHub repository includes functions for [backfilling batch data to compressed chunks][timescaledb-extras-backfill]. By "backfill", we
51+
mean inserting data corresponding to a timestamp well in the past, which given
52+
its timestamp, already corresponds to a compressed chunk.
53+
54+
<highlight type="warning"
55+
Compression alters data on your disk, so always back up before you start!
56+
</highlight>
57+
58+
In the below example, we backfill data into a temporary table; such temporary
59+
tables are short-lived and only exist for the duration of the database
60+
session. Alternatively, if backfill is common, one might use a normal table for
61+
this instead, which would allow multiple writers to insert into the table at
62+
the same time before the `decompress_backfill` process.
63+
64+
To use this procedure:
65+
66+
1. Create a table with the same schema as the hypertable (in
67+
this example, `cpu`) that we are backfilling into:
68+
69+
```sql
70+
CREATE TEMPORARY TABLE cpu_temp AS SELECT * FROM cpu WITH NO DATA;
71+
```
72+
73+
1. Insert data into the backfill table.
74+
75+
1. Use a supplied backfill procedure to perform the above steps: halt
76+
compression policy, identify those compressed chunks to which the backfilled
77+
data corresponds, decompress those chunks, insert data from the backfill
78+
table into the main hypertable, and then re-enable compression policy:
79+
80+
```sql
81+
CALL decompress_backfill(staging_table=>'cpu_temp', destination_hypertable=>'cpu');`
82+
```
83+
84+
If using a temp table, the table is automatically dropped at the end of your
85+
database session. If using a normal table, after you are done backfilling the
86+
data successfully, you will likely want to truncate your table in preparation
87+
for the next backfill (or drop it completely).
88+
89+
## Manually decompressing chunks for backfill
90+
91+
To perform these steps more manually, we first identify and turn off our
92+
compression policy, before manually decompressing chunks. To accomplish this
93+
we first find the job_id of the policy using:
94+
95+
```sql
96+
SELECT s.job_id
97+
FROM timescaledb_information.jobs j
98+
INNER JOIN timescaledb_information.job_stats s ON j.job_id = s.job_id
99+
WHERE j.proc_name = 'policy_compression' AND s.hypertable_name = <target table>;
100+
```
101+
102+
Next, pause the job with:
103+
104+
``` sql
105+
SELECT alter_job(<job_id>, scheduled => false);
106+
```
107+
108+
We have now paused the compress chunk policy from the hypertable which
109+
will leave us free to decompress the chunks we need to modify via backfill or
110+
update. To decompress the chunk(s) that we will be modifying, for each chunk:
111+
112+
``` sql
113+
SELECT decompress_chunk('_timescaledb_internal._hyper_2_2_chunk');
114+
```
115+
116+
Similar to above, you can also decompress a set of chunks based on a
117+
time range by first looking up this set of chunks via `show_chunks`:
118+
119+
``` sql
120+
SELECT decompress_chunk(i) from show_chunks('conditions', newer_than, older_than) i;
121+
```
122+
123+
<highlight type="tip">
124+
You need to run 'decompress_chunk' for each chunk that will be impacted
125+
by your INSERT or UPDATE statement in backfilling data. Once your needed chunks
126+
are decompressed you can proceed with your data backfill operations.
127+
</highlight>
128+
129+
Once your backfill and update operations are complete we can simply re-enable
130+
our compression policy job:
131+
132+
``` sql
133+
SELECT alter_job(<job_id>, scheduled => true);
134+
```
135+
136+
This job will re-compress any chunks that were decompressed during your backfilling
137+
operation the next time it runs. To have it run immediately, you can expressly execute
138+
the command via [`run_job`][run-job]:
139+
140+
``` sql
141+
CALL run_job(<job_id>);
142+
```
143+
144+
## Future Work [](future-work)
145+
146+
One of the current limitations of TimescaleDB is that once chunks are converted
147+
into compressed column form, we do not allow updates and deletes of the data
148+
or changes to the schema without manual decompression, except as noted [above][compression-schema-changes].
149+
In other words, chunks are partially immutable in compressed form.
150+
Attempts to modify the chunks' data in those cases will either error or fail silently (as preferred by users).
151+
We plan to remove this limitation in future releases.
152+
153+
154+
[timescaledb-extras]: https://github.com/timescale/timescaledb-extras
155+
[compression-schema-changes]: /how-to-guides/compression/modify-a-schema/
156+
[timescaledb-extras-backfill]: https://github.com/timescale/timescaledb-extras/blob/master/backfill.sql
157+
[run-job]: /api/:currentVersion:/actions-and-automation/run_job/

timescaledb/how-to-guides/compression/manually-compress-chunks.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ than three days.
1414

1515
### Procedure: Selecting chunks to compress
1616
1. At the psql prompt, select all chunks in the table `example` that are older
17-
than three days:
17+
than three days:
1818
```sql
1919
SELECT show_chunks('example', older_than => INTERVAL '3 days');
2020
```
@@ -24,7 +24,6 @@ than three days:
2424
|1|_timescaledb_internal_hyper_1_2_chunk|
2525
|2|_timescaledb_internal_hyper_1_3_chunk|
2626

27-
2827
When you are happy with the list of chunks, you can use the chunk names to manually compress each one.
2928

3029
### Procedure: Compressing chunks manually

0 commit comments

Comments
 (0)