Skip to content

Conversation

helmiazizm
Copy link
Contributor

@helmiazizm helmiazizm commented Apr 16, 2025

Rationale for this change

This fix addressed issue #1922 and changed the behavior for both _oss_fs and _s3_fs to be able to parse s3.force-virtual-addressing correctly.

Are these changes tested?

No unit tests are given to this change yet

Are there any user-facing changes?

No

"secret_key": get_first_property_value(self.properties, S3_SECRET_ACCESS_KEY, AWS_SECRET_ACCESS_KEY),
"session_token": get_first_property_value(self.properties, S3_SESSION_TOKEN, AWS_SESSION_TOKEN),
"region": get_first_property_value(self.properties, S3_REGION, AWS_REGION),
"force_virtual_addressing": property_as_bool(self.properties, S3_FORCE_VIRTUAL_ADDRESSING, True),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can remove region?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it breaks the tests:

                # For oss scheme, user provided region is used instead
>               assert pyarrow_file_io.new_input(f"oss://{bucket_region[0]}/path/to/file")._filesystem.region == user_provided_region
E               AssertionError: assert 'us-east-1' == 'ap-southeast-1'
E                 - ap-southeast-1
E                 + us-east-1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can remove region?

For OSS authentication, it's not really necessary since they only need endpoint, but perhaps you're right that it's nicer to keep it.

Copy link
Contributor

@Fokko Fokko Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@helmiazizm Thoughts on changing the default?

Also, below we have False the default:

        if self.properties.get(S3_FORCE_VIRTUAL_ADDRESSING) is not None:
            client_kwargs["force_virtual_addressing"] = property_as_bool(self.properties, S3_FORCE_VIRTUAL_ADDRESSING, False)

Copy link
Contributor Author

@helmiazizm helmiazizm Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlike S3 authentication, virtual addressing is mandatory in OSS. Not having it set to True will always throw access denied error, hence why setting it to True by default would make authentication easier for users in case if they ever forget to set s3.force-virtual-addressing parameter to True. I also don't think this setting for OSS will go away anytime soon as other S3-compatible object storages are also using the same method, so this change should be safe.

I still keep the property_as_bool function to make the authentication feel as graceful in case if users insist on setting it to False.

Copy link
Contributor

@Fokko Fokko Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@helmiazizm Thanks for the explanation, as I don't have any hands-on experience with Alibaba Cloud. Lets move this forward 👍

@helmiazizm helmiazizm requested a review from Fokko April 17, 2025 14:28
…ssing is set to True by default for OSS authentication
@Fokko Fokko merged commit 831170d into apache:main Apr 17, 2025
8 checks passed
@helmiazizm helmiazizm deleted the fix-virtual-addressing branch April 19, 2025 03:31
@Fokko Fokko added this to the PyIceberg 0.9.1 milestone Apr 20, 2025
@helmiazizm helmiazizm restored the fix-virtual-addressing branch April 22, 2025 02:36
@helmiazizm helmiazizm deleted the fix-virtual-addressing branch April 22, 2025 02:47
@kevinjqliu kevinjqliu removed this from the PyIceberg 0.9.1 milestone Jun 12, 2025
gabeiglio pushed a commit to Netflix/iceberg-python that referenced this pull request Aug 13, 2025
<!--
Thanks for opening a pull request!
-->

<!-- In the case this PR will resolve an issue, please replace
${GITHUB_ISSUE_ID} below with the actual Github issue id. -->
<!-- Closes #${GITHUB_ISSUE_ID} -->

# Rationale for this change

This fix addressed issue apache#1922 and changed the behavior for both
`_oss_fs` and `_s3_fs` to be able to parse `s3.force-virtual-addressing`
correctly.

# Are these changes tested?

No unit tests are given to this change yet

# Are there any user-facing changes?

No

<!-- In the case of user-facing changes, please add the changelog label.
-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants