Skip to content

Conversation

kevinjqliu
Copy link
Contributor

@kevinjqliu kevinjqliu commented Jun 2, 2025

Closes #2052

Rationale for this change

Aligns catalog behavior with the java reference implementation.

HiveCatalog, DynamoDbCatalog, and GlueCatalog all use .db suffix in warehouse location
JdbcCatalog, HadoopCatalog, and InMemoryCatalog do not use .db suffix in warehouse location

Are these changes tested?

Yes tests for sql catalog are modified to remove .db

Are there any user-facing changes?

@jayceslesar
Copy link
Contributor

could do something fancy with os.path.splitext but that seems like it would lead down a path where we are assuming what the user wants

@kevinjqliu kevinjqliu force-pushed the kevinjqliu/remove-.db-suffix branch from 217d912 to f393c38 Compare June 3, 2025 02:28
@kevinjqliu kevinjqliu changed the title remove .db suffix in warehouse location only add .db suffix in warehouse location for dynamo/hive/glue catalogs Jun 3, 2025
@kevinjqliu kevinjqliu marked this pull request as ready for review June 3, 2025 02:44
@kevinjqliu
Copy link
Contributor Author

@jayceslesar made it so aligns with the java implementation

@kevinjqliu kevinjqliu requested a review from Fokko June 3, 2025 02:46
catalog=self,
)

def _get_default_warehouse_location(self, database_name: str, table_name: str) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we're repeating this a couple of times, should we move this into pyiceberg/catalog/__init__.py?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call, done

Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, I think this is a good change since we don't want to leak Hive legacy until eternity :)

@kevinjqliu kevinjqliu changed the title only add .db suffix in warehouse location for dynamo/hive/glue catalogs change catalog default warehouse location to not use hive-style warehouse location Jun 14, 2025
@kevinjqliu kevinjqliu merged commit 41ff734 into apache:main Jun 14, 2025
17 of 18 checks passed
@kevinjqliu kevinjqliu deleted the kevinjqliu/remove-.db-suffix branch June 14, 2025 18:16
amitgilad3 pushed a commit to amitgilad3/iceberg-python that referenced this pull request Jul 7, 2025
…ouse location (apache#2059)

<!--
Thanks for opening a pull request!
-->

<!-- In the case this PR will resolve an issue, please replace
${GITHUB_ISSUE_ID} below with the actual Github issue id. -->
<!-- Closes #${GITHUB_ISSUE_ID} -->
Closes apache#2052

# Rationale for this change
Aligns catalog behavior with the java reference implementation.


[HiveCatalog](https://github.com/apache/iceberg/blob/7537c3c3a2a6491abcf0c3ef58cc4d5dc6ac4bae/hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java#L697-L736),
[DynamoDbCatalog](https://github.com/apache/iceberg/blob/7537c3c3a2a6491abcf0c3ef58cc4d5dc6ac4bae/aws/src/main/java/org/apache/iceberg/aws/dynamodb/DynamoDbCatalog.java#L185),
and
[GlueCatalog](https://github.com/apache/iceberg/blob/7537c3c3a2a6491abcf0c3ef58cc4d5dc6ac4bae/aws/src/main/java/org/apache/iceberg/aws/glue/GlueCatalog.java#L291)
all use `.db` suffix in warehouse location

[JdbcCatalog](https://github.com/apache/iceberg/blob/7537c3c3a2a6491abcf0c3ef58cc4d5dc6ac4bae/core/src/main/java/org/apache/iceberg/jdbc/JdbcCatalog.java#L268),
[HadoopCatalog](https://github.com/apache/iceberg/blob/7537c3c3a2a6491abcf0c3ef58cc4d5dc6ac4bae/core/src/main/java/org/apache/iceberg/hadoop/HadoopCatalog.java#L233-L245),
and
[InMemoryCatalog](https://github.com/apache/iceberg/blob/7537c3c3a2a6491abcf0c3ef58cc4d5dc6ac4bae/core/src/main/java/org/apache/iceberg/inmemory/InMemoryCatalog.java#L106-L117)
do not use `.db` suffix in warehouse location


# Are these changes tested?
Yes tests for sql catalog are modified to remove `.db`

# Are there any user-facing changes?

<!-- In the case of user-facing changes, please add the changelog label.
-->
gabeiglio pushed a commit to Netflix/iceberg-python that referenced this pull request Aug 13, 2025
…ouse location (apache#2059)

<!--
Thanks for opening a pull request!
-->

<!-- In the case this PR will resolve an issue, please replace
${GITHUB_ISSUE_ID} below with the actual Github issue id. -->
<!-- Closes #${GITHUB_ISSUE_ID} -->
Closes apache#2052

# Rationale for this change
Aligns catalog behavior with the java reference implementation.


[HiveCatalog](https://github.com/apache/iceberg/blob/7537c3c3a2a6491abcf0c3ef58cc4d5dc6ac4bae/hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java#L697-L736),
[DynamoDbCatalog](https://github.com/apache/iceberg/blob/7537c3c3a2a6491abcf0c3ef58cc4d5dc6ac4bae/aws/src/main/java/org/apache/iceberg/aws/dynamodb/DynamoDbCatalog.java#L185),
and
[GlueCatalog](https://github.com/apache/iceberg/blob/7537c3c3a2a6491abcf0c3ef58cc4d5dc6ac4bae/aws/src/main/java/org/apache/iceberg/aws/glue/GlueCatalog.java#L291)
all use `.db` suffix in warehouse location

[JdbcCatalog](https://github.com/apache/iceberg/blob/7537c3c3a2a6491abcf0c3ef58cc4d5dc6ac4bae/core/src/main/java/org/apache/iceberg/jdbc/JdbcCatalog.java#L268),
[HadoopCatalog](https://github.com/apache/iceberg/blob/7537c3c3a2a6491abcf0c3ef58cc4d5dc6ac4bae/core/src/main/java/org/apache/iceberg/hadoop/HadoopCatalog.java#L233-L245),
and
[InMemoryCatalog](https://github.com/apache/iceberg/blob/7537c3c3a2a6491abcf0c3ef58cc4d5dc6ac4bae/core/src/main/java/org/apache/iceberg/inmemory/InMemoryCatalog.java#L106-L117)
do not use `.db` suffix in warehouse location


# Are these changes tested?
Yes tests for sql catalog are modified to remove `.db`

# Are there any user-facing changes?

<!-- In the case of user-facing changes, please add the changelog label.
-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

pyiceberg write an extra .db in the schema path

3 participants