Skip to content

Conversation

@mkindahl
Copy link
Contributor

@mkindahl mkindahl commented Oct 5, 2023

Description

Update documentation to use the new generalized hypertable API

Links

Fixes #[insert issue link, if any]

Writing help

For information about style and word usage, see the style guide

Review checklists

Reviewers: use this section to ensure you have checked everything before approving this PR:

Subject matter expert (SME) review checklist

  • Is the content technically accurate?
  • Is the content complete?
  • Is the content presented in a logical order?
  • Does the content use appropriate names for features and products?
  • Does the content provide relevant links to further information?

Documentation team review checklist

  • Is the content free from typos?
  • Does the content use plain English?
  • Does the content contain clear sections for concepts, tasks, and references?
  • Have any images been uploaded to the correct location, and are resolvable?
  • If the page index was updated, are redirects required
    and have they been implemented?
  • Have you checked the built version of this content?

@CLAassistant
Copy link

CLAassistant commented Oct 5, 2023

CLA assistant check
All committers have signed the CLA.

@mkindahl
Copy link
Contributor Author

mkindahl commented Oct 6, 2023

It seems it is based on the wrong branch, rebasing it.

@mkindahl mkindahl force-pushed the new-hypertable-api branch from 204f7f2 to 9d6182e Compare October 6, 2023 11:53
@github-actions
Copy link

github-actions bot commented Oct 6, 2023

Allow 10 minutes from last push for the staging site to build. If the link doesn't work, try using incognito mode instead. For internal reviewers, check web-documentation repo actions for staging build status. Link to build for this PR: http://docs-dev.timescale.com/docs-new-hypertable-api

Copy link
Contributor

@nikkhils nikkhils left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sure, you have been thorough, but would be good to check for space dimension references in the entire documentation :-)

@mkindahl
Copy link
Contributor Author

I am sure, you have been thorough, but would be good to check for space dimension references in the entire documentation :-)

Actually, I have not. I have tried to capture the most glaring cases, but there is just too much.

There are several chapters that need a rewrite, especially around distributed hypertables, but I want to avoid snowballing this pull request and take that separately. In particular, I think we need to do an overhaul of all the chapters in use-timescale and re-focus them so they do not only discuss time as a partitioning column.

@mkindahl mkindahl requested a review from nikkhils October 12, 2023 11:28
Copy link
Member

@erimatnor erimatnor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure it makes sense to simply replace "space" with "hash", because the words aren't referring to exactly the same thing. In my mind, hash partitioning is technique to implement space partitioning. In other words, space partitioning refers to horizontal partitioning that spread data across, e.g., nodes or tables spaces. This can be achieved with different techniques, where "hash" partitioning is one implementation.

So, what I am trying to say is that I think something gets lost when simply replacing "space" with "hash". It also becomes very "implementation"-oriented.

Perhaps we need to rethink the docs more broadly when introducing a more general API?

I think there needs to be a discussion about when to apply the different partitioning techniques. For example, we need to let people know when using by_hash() is appropriate (and not only for distributed hypertables).


```sql
SELECT create_hypertable('transactions', 'time');
SELECT create_hypertable('transactions', by_range('time'));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want this to be the premier way of doing things going forward? Even when there's a clear time-oriented use case?

Shouldn't we explain somewhere that, e.g., SELECT create_hypertable('transactions', 'time'); is a shortcut for SELECT create_hypertable('transactions', by_range('time')); and that by_range is the default for the first dimension?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want this to be the premier way of doing things going forward? Even when there's a clear time-oriented use case?

The new interface is more robust to changes going forward and allow overloading and specialization in various ways. It also support all the existing use-cases, so I see no reason to not use the new interface also for time-oriented use-cases.

Shouldn't we explain somewhere that, e.g., SELECT create_hypertable('transactions', 'time'); is a shortcut for SELECT create_hypertable('transactions', by_range('time')); and that by_range is the default for the first dimension?

I have no strong opinion on how to formulate the distinction or what overloads to keep and am open to suggestions. The only matter that I think is important is that we need to do an overhaul of the documentation to make it more useful and approachable for wider use-cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am worried that we go all in on rewriting the documentation for the new API, users will be confused and they don't understand how the existing API relates to the new one.

I think, for most cases, where you only have a time column, it seems more simple and convenient to use the existing API. So, it seems reasonable to think of it as an abbreviated version of the more general API rather than saying it is legacy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, for most cases, where you only have a time column, it seems more simple and convenient to use the existing API. So, it seems reasonable to think of it as an abbreviated version of the more general API rather than saying it is legacy.

I'm fine with this. It is going to be the first step anyway, so we can use this for now. Can you point to the places where you think we should update the PR in that case?

@mkindahl
Copy link
Contributor Author

I am not sure it makes sense to simply replace "space" with "hash", because the words aren't referring to exactly the same thing. In my mind, hash partitioning is technique to implement space partitioning. In other words, space partitioning refers to horizontal partitioning that spread data across, e.g., nodes or tables spaces. This can be achieved with different techniques, where "hash" partitioning is one implementation.

I agree with this, but I tried to make minimal changes to the documentation with this pull request since a bigger rewrite is necessary.

So, what I am trying to say is that I think something gets lost when simply replacing "space" with "hash". It also becomes very "implementation"-oriented.

Yes, I think we need to go over it. Right now I am focusing on minimal changes, so if you have suggestions for better formulations in this PR, please add them.

Perhaps we need to rethink the docs more broadly when introducing a more general API?

I think there needs to be a discussion about when to apply the different partitioning techniques. For example, we need to let people know when using by_hash() is appropriate (and not only for distributed hypertables).

@mkindahl mkindahl requested a review from erimatnor October 16, 2023 13:15
@mkindahl mkindahl assigned mkindahl and unassigned nikkhils Oct 17, 2023
Copy link
Contributor

@nikkhils nikkhils left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in general.

@horzsolt horzsolt added this to the TimescaleDB 2.13.0 milestone Oct 25, 2023
@josesahad
Copy link
Contributor

IMHO, given we are trying to improve our positioning outside of a time column, it would be great to have better examples in "Getting started" and "Use Timescale". Maybe I missed them, but I think that just explaining the new API won't help with that purpose.

@mkindahl
Copy link
Contributor Author

IMHO, given we are trying to improve our positioning outside of a time column, it would be great to have better examples in "Getting started" and "Use Timescale". Maybe I missed them, but I think that just explaining the new API won't help with that purpose.

That was the intention, but this PR is just fixing the existing description. We need to update documentation in these sections as well, but I would like to take them as separate PRs.

Just make minimal changes to document the new API. Sections on using
the new API will be following.

Co-authored-by: Erik Nordström <[email protected]>
@mkindahl mkindahl merged commit 5726611 into latest Oct 27, 2023
@mkindahl mkindahl deleted the new-hypertable-api branch October 27, 2023 07:56
mkindahl added a commit that referenced this pull request Oct 30, 2023
mkindahl added a commit that referenced this pull request Oct 30, 2023
* Revert "Fix broken links in distributed hypertable docs (#2767)"

This reverts commit 7655b73.

* Revert "Document new hypertable API (#2732)"

This reverts commit 5726611.
mkindahl added a commit that referenced this pull request Oct 30, 2023
Just make minimal changes to document the new API. Sections on using
the new API will be following.

Co-authored-by: Erik Nordström <[email protected]>
mkindahl added a commit that referenced this pull request Dec 5, 2023
* Document new hypertable API (#2732)

Just make minimal changes to document the new API. Sections on using
the new API will be following.

Co-authored-by: Erik Nordström <[email protected]>

* Fix broken links in distributed hypertable docs (#2767)

* Fix links to dimension builders

* Update api/add_dimension.md

Co-authored-by: James Guthrie <[email protected]>
Signed-off-by: Mats Kindahl <[email protected]>

---------

Signed-off-by: Mats Kindahl <[email protected]>
Co-authored-by: Erik Nordström <[email protected]>
Co-authored-by: James Guthrie <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants