Skip to content

Fails to pull a public image from: ghcr.io/matejvasek/builder-ubi8-base:latest #30

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
cmoulliard opened this issue May 6, 2025 · 12 comments
Assignees
Labels
bug v1.0.0-beta.1 Features & bug fixes to be released in version 1.0.0-alpha.1

Comments

@cmoulliard
Copy link

Issue

I can pull an image using podman pull ghcr.io/matejvasek/builder-ubi8-base from ghcr.io registry but that fails using this python lib

The registry creds file exists.
Image ref: [ghcr.io/matejvasek/builder-ubi8-base:latest](http://ghcr.io/matejvasek/builder-ubi8-base:latest)
api base url: https://ghcr.io/v2/matejvasek/builder-ubi8-base/manifests/latest
-----------START-----------
GET https://ghcr.io/v2/matejvasek/builder-ubi8-base/manifests/latest
Accept: application/vnd.docker.distribution.manifest.list.v2+json,application/vnd.docker.distribution.manifest.v2+json,application/vnd.oci.image.index.v1+json,application/vnd.oci.image.manifest.v1+json,application/vnd.docker.distribution.manifest.v1+json,application/vnd.docker.distribution.manifest.v1+prettyjws
None
Traceback (most recent call last):
  File "/tekton/scripts/script-0-pwrkx", line 51, in <module>
    manifest = builder_image.get_manifest(auth=AUTH)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/image/containerimage.py", line 179, in get_manifest
    ContainerImageRegistryClient.get_manifest(self, auth)
  File "/opt/app-root/lib64/python3.11/site-packages/image/client.py", line 384, in get_manifest
    res = ContainerImageRegistryClient.query_manifest(
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/image/client.py", line 360, in query_manifest
    res.raise_for_status()
  File "/opt/app-root/lib64/python3.11/site-packages/requests/models.py", line 1024, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://ghcr.io/v2/matejvasek/builder-ubi8-base/manifests/latest
@matejvasek
Copy link

It appears the the python client does not obtain token the same way Go client do.

@matejvasek
Copy link

package main

import (
	"fmt"
	
	"github.com/google/go-containerregistry/pkg/name"
	"github.com/google/go-containerregistry/pkg/v1/remote"
)

func main() {
	img, err := remote.Image(name.MustParseReference("ghcr.io/matejvasek/builder-ubi8-base"))
	if err != nil {
		panic(err)
	}
	fmt.Println(img)
}

"just works"

@cmoulliard
Copy link
Author

cmoulliard commented May 6, 2025

Yep. We should update the code to get a token when we access ghcr.io packages

#!/usr/bin/env bash

USER_IMAGE=matejvasek/builder-ubi8-base

# get token ('{"token":"***"}' -> '***')
TOKEN="$(curl -s "https://ghcr.io/token?scope=repository:${USER_IMAGE}:pull" | awk -F'"' '$0=$4')"

_curl() {
  curl -H "Authorization: Bearer ${TOKEN}" "$1"
}

# get manifest of the latest image
_curl "https://ghcr.io/v2/${USER_IMAGE}/manifests/latest"

@cmoulliard
Copy link
Author

Achieving the same level of code as this one will take time but nevertheless it can help the folks interested to develop/maintain this python registry container - https://github.com/google/go-containerregistry/tree/main/pkg/v1/remote/transport ;-)

@whatsacomputertho
Copy link
Collaborator

Thanks @cmoulliard, there is probably a gap here.

Right now I have implemented the CNCF specification for client logic involving a container image registry with a remote "token auth" server

  • Source:
    @staticmethod
    def get_auth_token(
    res: requests.Response,
    reg_auth: str
    ) -> Tuple[str, str]:
    """
    The response from the distribution registry API, which MUST be a 401
    response, and MUST include the www-authenticate header
    Args:
    res (Type[requests.Response]): The response from the registry API
    reg_auth (str): The auth retrieved for the registry
    Returns:
    str: The auth scheme for the token
    str: The token retrieved from the auth service
    """
    # Get the www-authenticate header, split into components
    www_auth_header = res.headers['www-authenticate']
    auth_components = www_auth_header.split(" ")
    # Parse the auth scheme from the header
    auth_scheme = auth_components[0]
    # Parse each key-value pair into a dict
    query_params = {}
    query_param_components = auth_components[1].split(",")
    for param in query_param_components:
    param_components = param.split("=")
    query_params[param_components[0]] = param_components[1].replace("\"", "")
    # Pop the realm value out of the dict and encode as a query string
    # Format into the auth service URL to request
    realm = query_params.pop("realm")
    query_string = urllib.parse.urlencode(query_params)
    auth_url = f"{realm}?{query_string}"
    # Send the request to the auth service, parse the token from the
    # response
    headers = {
    'Authorization': f"Basic {reg_auth}"
    }
    token_res = requests.get(auth_url, headers=headers)
    token_res.raise_for_status()
    token_json = token_res.json()
    token = token_json['token']
    return auth_scheme, token
  • Spec: https://distribution.github.io/distribution/spec/auth/jwt/

I do see there is another page in the spec on Oauth2

but I'm not sure at first glance if that's what you're doing above, I only see one token being fetched above, not a refresh token to then generate an access token.

@whatsacomputertho
Copy link
Collaborator

whatsacomputertho commented May 7, 2025

@cmoulliard Is there any chance you're still dealing with this issue:

There is some probing we can do to confirm this, but my current understanding is that it's probably skipping past the whole auth dance in this if statement

# Send the request to the distribution registry API
# If it fails with a 401 response code and auth given, do OAuth dance
res = requests.get(api_url, headers=headers)
if res.status_code == 401 and found and \
'www-authenticate' in res.headers.keys():
# Do Oauth dance if basic auth fails
# Ref: https://distribution.github.io/distribution/spec/auth/token/
scheme, token = ContainerImageRegistryClient.get_auth_token(
res, reg_auth
)
headers['Authorization'] = f'{scheme} {token}'
res = requests.get(api_url, headers=headers)
# Raise exceptions on error status codes
res.raise_for_status()

There are a few possibilities here that could cause this

  1. No auth token was found for ghcr.io from your auth dict - this would be the case if you are still dealing with your other issue
  2. ghcr.io returned a 401 response without a www-authenticate header (This would mean either ghcr.io doesn't respect the CNCF spec or my client logic has some subtle gap that I'm missing)
  3. ghcr.io returned a 401 response with a www-authenticate header, retrieved the auth token, but then hit another 401 on the second attempt to query the manifest (I think it's fair to say this is unlikely at this point but possible)

@whatsacomputertho
Copy link
Collaborator

@cmoulliard Ok I see the issue now that I've gone through it once locally - I only ever do the auth dance when basic auth creds are found. I assume that auth is only ever relevant when you as a user attempt to basic auth into the registry and that fails. I just need to remove that condition (whether the basic auth creds were found in your auth dict) from the if statement around the auth dance and this should work. That way we at least try the auth dance any time we get a 401 with www-authenticate header

@whatsacomputertho
Copy link
Collaborator

@cmoulliard I removed the found condition from each auth dance and was able to work ahead, but still hit an error on the manifest schema

% python3.9 examples/quick-example.py
Traceback (most recent call last):
  File "/Users/ethanbalcik/Desktop/containerimage-py/examples/quick-example.py", line 25, in <module>
    my_image.get_size_formatted(auth={}) # 499.91 MB
  File "/Users/ethanbalcik/Desktop/containerimage-py/image/containerimage.py", line 267, in get_size_formatted
    return ByteUnit.format_size_bytes(self.get_size(auth))
  File "/Users/ethanbalcik/Desktop/containerimage-py/image/containerimage.py", line 250, in get_size
    manifest = self.get_manifest(auth)
  File "/Users/ethanbalcik/Desktop/containerimage-py/image/containerimage.py", line 178, in get_manifest
    return ContainerImageManifestFactory.create(
  File "/Users/ethanbalcik/Desktop/containerimage-py/image/manifestfactory.py", line 128, in create
    raise ContainerImageError(
image.errors.ContainerImageError: Invalid schema, not v2s2 or OCI manifest or list: {"schemaVersion": 2, "mediaType": "application/vnd.oci.image.index.v1+json", "manifests": [{"mediaType": "application/vnd.docker.distribution.manifest.v2+json", "digest": "sha256:05253df7fd44665ce1cb95953665b861c46becdbc8dc0ab18331db8ce79e1978", "size": 8754, "platform": {"architecture": "amd64", "os": "linux"}}, {"mediaType": "application/vnd.docker.distribution.manifest.v2+json", "digest": "sha256:c4c9a29818d859418884948820b4a1fce77a03891224b646d54aedca23265a47", "size": 8754, "platform": {"architecture": "arm64", "os": "linux"}}]}

I think this currently fails because the manifest list is an OCI index but the arch manifest entries are all docker v2s2 and I strictly expect OCI manifests if I get an OCI index, and vice versa for Docker v2. That might be too heavy handed of a check on my end

@whatsacomputertho whatsacomputertho self-assigned this May 7, 2025
@whatsacomputertho whatsacomputertho added bug v1.0.0-beta.1 Features & bug fixes to be released in version 1.0.0-alpha.1 labels May 7, 2025
@whatsacomputertho
Copy link
Collaborator

@cmoulliard @matejvasek Can you review this PR for me? This should fix your initial issue (well, not in its entirety as mentioned above, but we can hash that out in a separate issue)

@whatsacomputertho
Copy link
Collaborator

Moved into review. The fix is merged into main & release-1.0. Will leave in review until we have a v1.0.0-beta.1 release available.

@whatsacomputertho
Copy link
Collaborator

@cmoulliard @matejvasek This PR should unblock you guys, please take a look when you get the chance

It is tracked under this separate issue

I will get this into the v1.0.0-beta.1 release soon

@whatsacomputertho
Copy link
Collaborator

Going to go ahead and merge that PR now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug v1.0.0-beta.1 Features & bug fixes to be released in version 1.0.0-alpha.1
Projects
None yet
Development

No branches or pull requests

3 participants