New hypertable api #2732

mkindahl · 2023-10-05T15:28:35Z

Description

Update documentation to use the new generalized hypertable API

Links

Fixes #[insert issue link, if any]

Writing help

For information about style and word usage, see the style guide

Review checklists

Reviewers: use this section to ensure you have checked everything before approving this PR:

Subject matter expert (SME) review checklist

Is the content technically accurate?
Is the content complete?
Is the content presented in a logical order?
Does the content use appropriate names for features and products?
Does the content provide relevant links to further information?

Documentation team review checklist

Is the content free from typos?
Does the content use plain English?
Does the content contain clear sections for concepts, tasks, and references?
Have any images been uploaded to the correct location, and are resolvable?
If the page index was updated, are redirects required
and have they been implemented?
Have you checked the built version of this content?

CLAassistant · 2023-10-05T15:28:41Z

All committers have signed the CLA.

mkindahl · 2023-10-06T07:45:43Z

It seems it is based on the wrong branch, rebasing it.

github-actions · 2023-10-06T11:53:26Z

Allow 10 minutes from last push for the staging site to build. If the link doesn't work, try using incognito mode instead. For internal reviewers, check web-documentation repo actions for staging build status. Link to build for this PR: http://docs-dev.timescale.com/docs-new-hypertable-api

api/create_distributed_hypertable.md

api/dimension_info.md

self-hosted/migration/schema-then-data.md

nikkhils

I am sure, you have been thorough, but would be good to check for space dimension references in the entire documentation :-)

mkindahl · 2023-10-12T08:42:42Z

I am sure, you have been thorough, but would be good to check for space dimension references in the entire documentation :-)

Actually, I have not. I have tried to capture the most glaring cases, but there is just too much.

There are several chapters that need a rewrite, especially around distributed hypertables, but I want to avoid snowballing this pull request and take that separately. In particular, I think we need to do an overhaul of all the chapters in use-timescale and re-focus them so they do not only discuss time as a partitioning column.

erimatnor

I am not sure it makes sense to simply replace "space" with "hash", because the words aren't referring to exactly the same thing. In my mind, hash partitioning is technique to implement space partitioning. In other words, space partitioning refers to horizontal partitioning that spread data across, e.g., nodes or tables spaces. This can be achieved with different techniques, where "hash" partitioning is one implementation.

So, what I am trying to say is that I think something gets lost when simply replacing "space" with "hash". It also becomes very "implementation"-oriented.

Perhaps we need to rethink the docs more broadly when introducing a more general API?

I think there needs to be a discussion about when to apply the different partitioning techniques. For example, we need to let people know when using by_hash() is appropriate (and not only for distributed hypertables).

api/add_dimension.md

erimatnor · 2023-10-16T10:32:12Z

_partials/_create-hypertable-blockchain.md


    ```sql
-    SELECT create_hypertable('transactions', 'time');
+    SELECT create_hypertable('transactions', by_range('time'));


Do we want this to be the premier way of doing things going forward? Even when there's a clear time-oriented use case?

Shouldn't we explain somewhere that, e.g., SELECT create_hypertable('transactions', 'time'); is a shortcut for SELECT create_hypertable('transactions', by_range('time')); and that by_range is the default for the first dimension?

Do we want this to be the premier way of doing things going forward? Even when there's a clear time-oriented use case?

The new interface is more robust to changes going forward and allow overloading and specialization in various ways. It also support all the existing use-cases, so I see no reason to not use the new interface also for time-oriented use-cases.

Shouldn't we explain somewhere that, e.g., SELECT create_hypertable('transactions', 'time'); is a shortcut for SELECT create_hypertable('transactions', by_range('time')); and that by_range is the default for the first dimension?

I have no strong opinion on how to formulate the distinction or what overloads to keep and am open to suggestions. The only matter that I think is important is that we need to do an overhaul of the documentation to make it more useful and approachable for wider use-cases.

I am worried that we go all in on rewriting the documentation for the new API, users will be confused and they don't understand how the existing API relates to the new one.

I think, for most cases, where you only have a time column, it seems more simple and convenient to use the existing API. So, it seems reasonable to think of it as an abbreviated version of the more general API rather than saying it is legacy.

I think, for most cases, where you only have a time column, it seems more simple and convenient to use the existing API. So, it seems reasonable to think of it as an abbreviated version of the more general API rather than saying it is legacy.

I'm fine with this. It is going to be the first step anyway, so we can use this for now. Can you point to the places where you think we should update the PR in that case?

_partials/_create-hypertable-nyctaxis.md

mkindahl · 2023-10-16T10:52:44Z

I am not sure it makes sense to simply replace "space" with "hash", because the words aren't referring to exactly the same thing. In my mind, hash partitioning is technique to implement space partitioning. In other words, space partitioning refers to horizontal partitioning that spread data across, e.g., nodes or tables spaces. This can be achieved with different techniques, where "hash" partitioning is one implementation.

I agree with this, but I tried to make minimal changes to the documentation with this pull request since a bigger rewrite is necessary.

So, what I am trying to say is that I think something gets lost when simply replacing "space" with "hash". It also becomes very "implementation"-oriented.

Yes, I think we need to go over it. Right now I am focusing on minimal changes, so if you have suggestions for better formulations in this PR, please add them.

Perhaps we need to rethink the docs more broadly when introducing a more general API?

I think there needs to be a discussion about when to apply the different partitioning techniques. For example, we need to let people know when using by_hash() is appropriate (and not only for distributed hypertables).

nikkhils

LGTM in general.

josesahad · 2023-10-26T23:02:28Z

IMHO, given we are trying to improve our positioning outside of a time column, it would be great to have better examples in "Getting started" and "Use Timescale". Maybe I missed them, but I think that just explaining the new API won't help with that purpose.

mkindahl · 2023-10-27T06:13:57Z

IMHO, given we are trying to improve our positioning outside of a time column, it would be great to have better examples in "Getting started" and "Use Timescale". Maybe I missed them, but I think that just explaining the new API won't help with that purpose.

That was the intention, but this PR is just fixing the existing description. We need to update documentation in these sections as well, but I would like to take them as separate PRs.

Just make minimal changes to document the new API. Sections on using the new API will be following. Co-authored-by: Erik Nordström <[email protected]>

This reverts commit 5726611.

* Revert "Fix broken links in distributed hypertable docs (#2767)" This reverts commit 7655b73. * Revert "Document new hypertable API (#2732)" This reverts commit 5726611.

Just make minimal changes to document the new API. Sections on using the new API will be following. Co-authored-by: Erik Nordström <[email protected]>

* Document new hypertable API (#2732) Just make minimal changes to document the new API. Sections on using the new API will be following. Co-authored-by: Erik Nordström <[email protected]> * Fix broken links in distributed hypertable docs (#2767) * Fix links to dimension builders * Update api/add_dimension.md Co-authored-by: James Guthrie <[email protected]> Signed-off-by: Mats Kindahl <[email protected]> --------- Signed-off-by: Mats Kindahl <[email protected]> Co-authored-by: Erik Nordström <[email protected]> Co-authored-by: James Guthrie <[email protected]>

mkindahl force-pushed the new-hypertable-api branch from 204f7f2 to 9d6182e Compare October 6, 2023 11:53

mkindahl requested a review from nikkhils October 6, 2023 13:43

leeshyan assigned nikkhils Oct 10, 2023