Skip to content

Conversation

chalmerlowe
Copy link
Collaborator

This PR adds the StorageDescriptor class and the associated tests, plus minor tweaks to support both of those changes.

@chalmerlowe chalmerlowe requested review from a team as code owners January 13, 2025 20:09
@product-auto-label product-auto-label bot added the size: m Pull request size is medium. label Jan 13, 2025
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Jan 13, 2025
@chalmerlowe chalmerlowe assigned tswast and Linchin and unassigned PhongChuong Jan 13, 2025
Comment on lines 653 to 665
inputFormat (Optional[str]): Specifies the fully qualified class name of
the InputFormat (e.g.
"org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"). The maximum
length is 128 characters.
locationUri (Optional[str]): The physical location of the table (e.g.
'gs://spark-dataproc-data/pangea-data/case_sensitive/' or
'gs://spark-dataproc-data/pangea-data/'). The maximum length is
2056 bytes.
outputFormat (Optional[str]): Specifies the fully qualified class name
of the OutputFormat (e.g.
"org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat"). The maximum
length is 128 characters.
serdeInfo (Union[SerDeInfo, dict, None]): Serializer and deserializer information.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
inputFormat (Optional[str]): Specifies the fully qualified class name of
the InputFormat (e.g.
"org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"). The maximum
length is 128 characters.
locationUri (Optional[str]): The physical location of the table (e.g.
'gs://spark-dataproc-data/pangea-data/case_sensitive/' or
'gs://spark-dataproc-data/pangea-data/'). The maximum length is
2056 bytes.
outputFormat (Optional[str]): Specifies the fully qualified class name
of the OutputFormat (e.g.
"org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat"). The maximum
length is 128 characters.
serdeInfo (Union[SerDeInfo, dict, None]): Serializer and deserializer information.
input_format (Optional[str]): Specifies the fully qualified class name of
the InputFormat (e.g.
"org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"). The maximum
length is 128 characters.
location_uri (Optional[str]): The physical location of the table (e.g.
'gs://spark-dataproc-data/pangea-data/case_sensitive/' or
'gs://spark-dataproc-data/pangea-data/'). The maximum length is
2056 bytes.
output_format (Optional[str]): Specifies the fully qualified class name
of the OutputFormat (e.g.
"org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat"). The maximum
length is 128 characters.
serde_info (Union[SerDeInfo, dict, None]): Serializer and deserializer information.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved.

self._properties["outputFormat"] = value

@property
def serde_info(self) -> Union[SerDeInfo, dict, None]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd never return a dict though, just SerDeInfo or None, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved.

mypy sometimes gets confused by a setter that accepts A, B, C
when paired with a getter that can only return A, C.

I added a typing.cast() call on ~line 680 to help mypy out and added a comment at that point to explain 'why typing.cast?'


prop = _helpers._get_sub_prop(self._properties, ["serDeInfo"])
if prop is not None:
prop = SerDeInfo("PLACEHOLDER").from_api_repr(prop)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from_api_repr should be a class method, so instance shouldn't be required.

Suggested change
prop = SerDeInfo("PLACEHOLDER").from_api_repr(prop)
prop = SerDeInfo.from_api_repr(prop)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved.

Copy link
Contributor

@Linchin Linchin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, LGTM.

@chalmerlowe chalmerlowe merged commit 6be0272 into main Jan 14, 2025
19 checks passed
@chalmerlowe chalmerlowe deleted the feat-b358215039-adds-storagedescriptor-class branch January 14, 2025 20:48
chalmerlowe added a commit that referenced this pull request Jan 22, 2025
* feat: adds StorageDescriptor and tests

* updates attr names, corrects type hinting
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. size: m Pull request size is medium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants