Skip to content

Beam AnalyzeAndTransformDataset runs expensive transformation _InstanceDictInputToTFXIOInput Twice #296

@michaelwsherman

Description

@michaelwsherman

AnalyzeAndTransformDataset should not run _InstanceDictInputToTF twice.

AnalyzeAndTransformDataset runs AnalyzeDataset and TransformDataset back-to-back. AnalyzeDataset runs _InstanceDictInputToTFXIOInput and TransformDataset also runs _InstanceDictInputToTFXIOInput.

But when running AnalyzeAndTransformDataset, the _InstanceDictInputToTFXIOInput call in TransformDataset is unnecessary, since it was already run in AnalyzeDataset.

The _InstanceDictInputToTFXIOInput transformation is expensive, and this redundant call meaningfully increase runtime and cost

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions