Skip to content

Feature Request: Pagination Handlers #2618

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
xorima opened this issue Jan 5, 2023 · 24 comments · Fixed by #3504
Closed

Feature Request: Pagination Handlers #2618

xorima opened this issue Jan 5, 2023 · 24 comments · Fixed by #3504
Assignees

Comments

@xorima
Copy link
Contributor

xorima commented Jan 5, 2023

Hey 👋

I notice how most resources already have a way of handling pagination but we leave it to the users to implement the pagination methods themseleves.

Would it be to use to have additional appropriate functions/methods to handle the pagination for the user, so they can just focus on the data they want to get back and not how to handle the pagination. (potentially with some offsets/config for max objects returned)

I'm happy to code this up, just after feedback if this is a feature people want

@gmlewis
Copy link
Collaborator

gmlewis commented Jan 5, 2023

Hi @xorima !

Yes, there is interest - see #1935 - but note that that issue was closed due to lack of volunteers. 😄

Also note that many endpoints handle pagination differently due to the organic growth of the GitHub v3 API.

If you could come up with a way to support paginating through all results that supports all the different kinds of pagination schemes, that would be great.

However, there is also one slightly tricky part - spoken from experience - and that is that you MUST respect the rate limits (or risk your access token being blocked for a period of "cool off" time). This means that the rate limit values must be monitored during your pagination and you must ensure that you let your pagination loop cool down if you get close to exceeding the rate limits.

With all that in mind, you are welcome to put together a PR and we can explore this topic further.

Thank you, @xorima !

@gal-legit
Copy link
Contributor

gal-legit commented Jan 18, 2023

Adding complexity to the discussion:

  1. Configurability:
    • One should need to opt into the auto-pagination feature on a per-request basis (perhaps via the opts argument) and optionally also as a default behavior for the client.
    • One should still be able to pass the per-page value on a per-request basis.
  2. Sync vs Async: One might prefer a blocking (wait) or non-blocking (return) approach for pagination. The async approach is essential here because one might want to start processing results before the end of the pagination.
    • either way: should return the results found up to the rate limit, and return a meaningful error to the caller (so the user can use the results gathered so far and also continue the request from the same point if they decide).
  3. Rate limiting: there's currently no cooldown support in the client. If such a behavior is implemented for pagination, it should also be made configurable (block=cooldown or non-block=return).
    • Blocking: one might want to wait only if the wait time is below a threshold. Especially relevant for secondary rate limits (one might not want to wait an hour or so but would wait minutes).
  4. Return value: the pagination-related response fields should reflect the state of the cursor.

IMO #1 is a must-have for the feature (to avoid breaking backward compatibility), whilst #2 and #3 aren't, but would make the built-in pagination feature relevant only for some specific use cases.

Finally, sharing a handler that can be used on a per-request basis (given the current status - without the pagination feature):

func PaginateResults(api func(opts *github.ListOptions) (*github.Response, error)) error {
	var opts github.ListOptions

	for {
		resp, err := api(&opts)

		if err != nil {
			return err
		}

		if resp.NextPage == 0 {
			return nil
		}

		opts.Page = resp.NextPage
	}
}

which is then used like that:

	err := PaginateResults(func(opts *github.ListOptions) (*github.Response, error) {
	        // optionally edit opts to change the per-page value
	        // note that you can change PaginateResults to receive per-page as a parameter and set it there
	        
		res, resp, err := client.SomeCall(A, B, C, opts)
		if err != nil {
			return resp, err
		}

		go func() {
		   // do something with res
		}

		return resp, nil
	})

p.s. @gofri is my personal account.

@xorima
Copy link
Contributor Author

xorima commented Jan 19, 2023

Thanks for the criteria @gal-legit, I think it's very solid.

My initial idea was to have a separate function to call to have it auto handle pagiantion, similar to how the go-tfe module handles pagination, but this is a much cleaner approach.

I am aiming to go with a version 1 works, version 2 works better approach, so I suspect whatever I include in v1 will change a lot by the time we are calling this ready to go.

I should be able to get something up next week with it being CNY out here so I have plenty of time off to focus on this

@danielhochman
Copy link

@xorima hi, just checking to see if you have been able to make any progress on the pagination helpers. i would be interested in helping.

@xorima
Copy link
Contributor Author

xorima commented Jul 29, 2023

So I know I've been poor at responding on this.
So far I'm testing the latest iteration of paginatiors I've made which looks like the below.

Still need more tests on it before I'm happy to contribute it into the codebase but it's here for comments on design and approach.

package pagination

import (
	"context"
	"github.com/google/go-github/v53/github"
)

// ListFunc is an interfact that returns a list of items for the given type
// for example it could be:
//
//	type listFunc struct {
//		client *github.Client
//	}
//
//	func (l *listFunc) List(ctx context.Context, opt *github.ListOptions) ([]*github.Repository, *github.Response, error) {
//		t, r, err := l.client.Apps.ListRepos(ctx, opt)
//		return t.Repositories, r, err
//	}
type ListFunc[T any] interface {
	List(ctx context.Context, opt *github.ListOptions) ([]T, *github.Response, error)
}

// ProcessFunc is a function that processes an item for the given type
// this is optional as the user may not want to process the items so
// they can input a skip function that does nothing
// example:
//
//	type processFunc struct {
//		client *github.Client
//	}
//
//	func (p *processFunc) Process(ctx context.Context, item *github.Repository) error {
//		fmt.Println(item.GetName())
//		return nil
//	}
type ProcessFunc[T any] interface {
	Process(ctx context.Context, item T) error
}

// RateLimitFunc is a function that handles rate limiting
// it returns a bool to indicate if the pagination should continue
// and an error if the users wishes to return more information/errors
// example:
//
//	type rateLimitFunc struct {
//	}
//
//	func (r *rateLimitFunc) RateLimit(ctx context.Context, resp *github.Response) (bool, error) {
//		return true, nil
//	}
type RateLimitFunc interface {
	RateLimit(ctx context.Context, resp *github.Response) (bool, error)
}

func Paginator[T any](ctx context.Context, listFunc ListFunc[T], processFunc ProcessFunc[T], rateLimitFunc RateLimitFunc) ([]T, error) {

	opts := &github.ListOptions{PerPage: 100, Page: 1}

	var allItems []T

	for {
		items, resp, err := listFunc.List(ctx, opts)
		if err != nil {
			return allItems, err
		}
		allItems = append(allItems, items...)
		for _, item := range items {
			if err = processFunc.Process(ctx, item); err != nil {
				return allItems, err
			}
		}

		// Handle rate limits
		shouldContinue, err := rateLimitFunc.RateLimit(ctx, resp)
		if err != nil {
			return allItems, err
		}
		if !shouldContinue {
			break
		}

		if resp.NextPage == 0 {
			break
		}
		opts.Page = resp.NextPage
	}
	return allItems, nil
}

@gmlewis
Copy link
Collaborator

gmlewis commented Aug 1, 2023

Correct me if I'm wrong, but this could easily reside in an external repo, right?
I'm thinking that since this would be the first introduction of generics (and therefore require Go 1.18+), maybe it would make more sense to move this to an external helper/support repo.
We did the same thing for #1150.

@xorima
Copy link
Contributor Author

xorima commented Aug 3, 2023

that's what I'm thinking, making it an external repo people can use and then a conversation about joining it into the main repo in future when go 1.18 is the base requirements for other reasons

@xorima
Copy link
Contributor Author

xorima commented Aug 19, 2023

Repository is up now: https://github.com/xorima/go-api-pagination

note: the repo itself may be moved in a few weeks while I go through some work processes to have it as part of the main foss offerings from $dayjob

@gmlewis
Copy link
Collaborator

gmlewis commented Aug 19, 2023

Hey, looks great, @xorima !

I'll leave this issue open for you to create a PR that will update the README.md file in this repo with a section that points to your repo with a short blurb about why/how/when you would use it. Your PR description can say Fixes: #2618.

@gofri
Copy link
Contributor

gofri commented Jan 10, 2024

hey, I know it's been a while and the pagination wrapper-function probably work fine for many users,
but I just figured that I could solve this in an elegant maaner with a round-tripper.
I hope I'll find the time to get down to writing it soon, but am leaving the note here anyway so that anyone can ping me ([email protected]) if you find it interesting / wanna suggest requirements/features for that.

@gofri
Copy link
Contributor

gofri commented Jan 26, 2024

I had a few hours today and I made some progress. Mainly, I solved merging the response json arrays/dictionaries in a smooth manner, so I only have to add the pagination headers stuff and glue it together to get a basic round-tripper going.

In that context, I have a sort of a legal question.
I'd love to use the code from populatePageValues() with slight modifications, because it's already proven to work.
I wanted to copy and adjust it (and leave a "shamelessly stolen from" note), but I'm not sure what the exact BSD license instructions/limitations are.
Any chance that you can advise on that?

@gmlewis
Copy link
Collaborator

gmlewis commented Jan 26, 2024

NOTE: I am NOT a lawyer and I do NOT speak for Google, so I cannot officially answer this question.

However, in my humble opinion (that you can take with a grain of salt), it seems totally reasonable to me (with my limited understanding of the BSD license) to copy the function with a GitHub permalink comment added that says where you copied it from and also the exact phrase Copyright (c) 2013 The go-github AUTHORS. All rights reserved. along with a full copy of the Google go-github BSD license appended to your own LICENSE file in your repo.

So, for example, it might look like this:

// populatePageValues parses the HTTP Link response headers and populates the
// various pagination link values in the Response.
//
// Copied from: https://github.com/google/go-github/blob/20fc90190626bc9ff64da10ffdbb3be9f1ba85be/github/github.go#L674-L741
// Copyright (c) 2013 The go-github AUTHORS. All rights reserved.
func (r *Response) populatePageValues() {
...
}

...and then concatenate this repo's LICENSE file to the end of your own repo's LICENSE file ... thereby your LICENSE file would actually contain two complete license declarations.

Does that make sense?

@gmlewis
Copy link
Collaborator

gmlewis commented Jan 26, 2024

Reading my reply, maybe // Adapted from: ... might be better since you may have to change the function signature like the receiver or maybe change some of the internals to refer to structs properly, of course. In other words, you may have to edit it, but I would think "Adapted from" might be more appriopriate than "Copied from", but your call... I think either is probably fine as long as you have the copyright message in there.

@gofri
Copy link
Contributor

gofri commented Jan 26, 2024

@gmlewis makes total sense, that's exactly what I was thinking would be appropriate. Thanks!

@gofri
Copy link
Contributor

gofri commented Jan 31, 2024

So I have a first version going and it seems to work well:
https://github.com/gofri/go-github-pagination
Please do check it out!
I'm gonna add some improvements soon (see the known-limitation section in the readme).

p.s. @gmlewis, it turned out that I couldn't use the discussed code after all. The implementation here sort of relies on the user to pick the right field from the response and place it in the relevant field for the specific request.
For my implementation I needed it to be solved automatically, so I had to be more careful about it.
I'd be glad to contribute this logic to this repo, but I'm afraid that it's gonna be hard to do it without breaking compatability. Anyway, let me know if you think there's anything I can do to improve that part here.

p.s.2. now that I'm deep in the ocean of pagination, I think I can add some useful explanation to the readme (e.g., for the "since" param). do you think it's worth a PR?

p.s.3. I'd be glad to add a reference to go-github-pagination to the README here, once I finish the extra bits I want the round tripper to have. let me know if that sounds ok.

@gmlewis
Copy link
Collaborator

gmlewis commented Jan 31, 2024

Hey @gofri - looks great! (I might try to reduce the stutter by rearranging the directories a bit and put unit tests within the same dir as the code, but those are really nit issues, so take that with a grain of salt.)

p.s. ...

No problem.

p.s.2. ...

Sure, improvements to our documentation are always welcome. If you are talking about your repo's README, then it probably wouldn't hurt.

p.s.3. ...

Sure, improvements to our documentation are always welcome.

Thank you, @gofri !

@gofri
Copy link
Contributor

gofri commented Jan 31, 2024

@gmlewis thanks ! :)
thanks for the comment about the tests too!
tbh I always go this extra mile with the unit tests to make sure that they only use the public API, instead of testing implementation details.

Sure, improvements to our documentation is always welcome. If you are talking about your repo's README, then it probably wouldn't hurt.

Actually I was referring to the README of this repo.
Specifically, there's the since=NextPageToken thing (for list organizations) which I found most confusing as a user.

@gmlewis
Copy link
Collaborator

gmlewis commented Jan 31, 2024

tbh I always go this extra mile with the unit tests to make sure that they only use the public API, instead of testing implementation details.

You know that you can keep it in the same directory by changing the package from "package mypackage" to "package mypackage_test", right?

@gofri
Copy link
Contributor

gofri commented Jan 31, 2024

@gmlewis I do now 😅 (and feel real stupid lol, I didn't know about the {pkg_name}_test exception for package name collisions)

@gmlewis
Copy link
Collaborator

gmlewis commented Jan 31, 2024

No worries! I learn new stuff every day, and hopefully will continue that trend. 😄

@gofri
Copy link
Contributor

gofri commented Mar 5, 2024

Hey all (@jnewland , @xorima , @danriedl)
I just published v1.0.0 for go-github-pagination.
It now supports:

  • Sync pagination (seamless).
  • Async pagination using a helper function (designed for go-github's users).
  • General and per-request configurations.

I attached a sample for using the async wrapper with go-github below.
@gmlewis I'll soon open a PR in this repo to reference it as we disucssed before :)

  handler := func(resp *http.Response, repos []*github.Repository) error {
    fmt.Printf("found repos: %+v\n", repos)
    return nil
  }
  ctx := githubpagination.WithOverrideConfig(context.Background(),
    githubpagination.WithMaxNumOfPages(3), // e.g, limit number of pages for this request
  )
  async := githubpagination.NewAsync(handler)
  err := async.Paginate(client.Repositories.ListByUser, ctx, "gofri", nil)
  if err != nil {
    panic(err)
  }

@xorima
Copy link
Contributor Author

xorima commented Mar 13, 2024

Fantastic @danielhochman

I've also finally moved my variant to the final location: https://github.com/hsbc/go-api-pagination

@gmlewis are you happy for an update to the readme pointing to the two different pagination handler options out there for this repository?

@gmlewis
Copy link
Collaborator

gmlewis commented Mar 13, 2024

@gmlewis are you happy for an update to the readme pointing to the two different pagination handler options out there for this repository?

Yes, absolutely, I think that makes sense.

@lrstanley
Copy link
Contributor

lrstanley commented Mar 9, 2025

Looked at both repos mentioned here and wasn't too happy with their complexity and that all data would be fetched in one go. Using this in my project (ofc adding Paginate2, Paginate3 as needed for list functions that have more than one non-option input), using iterators so it only fetches data as I consume it. Posting here in case it inspires improvements with other solutions, or this repo.

func Paginate1[T, O, V1 any](
	ctx context.Context,
	fn func(context.Context, V1, *O) ([]*T, *github.Response, error),
	v1 V1,
	opts *O,
	perPage int,
) iter.Seq2[*T, error] {
	if opts == nil {
		opts = new(O)
	}

	return func(yield func(*T, error) bool) {
		page := 1
		for {
			v := reflect.ValueOf(opts).Elem()
			v.FieldByName("Page").Set(reflect.ValueOf(page))
			v.FieldByName("PerPage").Set(reflect.ValueOf(perPage))

			results, resp, err := fn(ctx, v1, opts)
			if err != nil {
				yield(nil, err)
				return
			}

			for _, item := range results {
				if !yield(item, nil) {
					return
				}
			}
			if resp.NextPage == 0 {
				return
			}
			page = resp.NextPage
		}
	}
}

And an example of consuming using this:

repoIter := Paginate1(
	ctx,
	os.rest.Repositories.ListByOrg,
	"foo-bar-baz",
	&github.RepositoryListByOrgOptions{Sort: "full_name"},
)

for repo, err := range repoIter {
	if err != nil {
		return fmt.Errorf("failed to list repositories: %w", err)
	}

	fmt.Println(repo.GetName())
}

Haven't tested it with all endpoints, but quite a few at this point. handling rate limiting through the HTTP client automatically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants