Add `AudioDecoder.get_samples_played_in_range()` public method #555

NicolasHug · 2025-03-13T12:30:58Z

Towards #549

…blic

NicolasHug · 2025-03-13T12:42:16Z

src/torchcodec/decoders/_audio_decoder.py

+
+    def get_samples_played_in_range(
+        self, start_seconds: float, stop_seconds: Optional[float] = None
+    ) -> AudioSamples:


I feel like start_seconds should default to the stream's beginning, but that can be done later.

If we do that, we should also mirror that in the video API. We could try to play the same trick that range() does, where the semantics of the first parameter changes based on the number of parameters, but I'd (softly) rather not do that.

Agreed, I'll update #150 eventually with more of these desirable default behaviors. That would be a good onboarding PR.

NicolasHug · 2025-03-13T12:44:14Z

src/torchcodec/_frame.py

@@ -114,3 +114,28 @@ def __len__(self):

    def __repr__(self):
        return _frame_repr(self)
+
+


Below: the audio decoding API returns the new AudioSamples class rather than a pure Tensor. I think it has the following benefits:

users can keep track of the sample_rate within that struct without having to handle it separately

for some edge-cases, like with our mp3 test asset, the stream's beginning isn't 0. So we also returns pts_seconds, which may not always be equal to start_seconds.

I think this makes sense, and it mirrors the video API. In the video API, __getitem__() returns just the tensor, but the named methods return Frame or FrameBatch. I think we should probably do that for audio as well.

test/decoders/test_ops.py

NicolasHug · 2025-03-13T13:56:04Z

src/torchcodec/_frame.py

+    sample_rate: int
+
+    def __post_init__(self):
+        # This is called after __init__() when a Frame is created. We can run


Suggested change

# This is called after __init__() when a Frame is created. We can run

# This is called after __init__() when an AudioSamples instance is created. We can run

NicolasHug · 2025-03-13T13:57:42Z

src/torchcodec/decoders/_core/VideoDecoder.cpp

@@ -854,7 +854,7 @@ VideoDecoder::AudioFramesOutput VideoDecoder::getFramesPlayedInRangeAudio(

  if (startSeconds == stopSeconds) {
    // For consistency with video
-    return AudioFramesOutput{torch::empty({0}), 0.0};
+    return AudioFramesOutput{torch::empty({0, 0}), 0.0};


Drive-by, the video APIs return something of shape (0, C, H, W) (where C H W are from the metadata). Here, (0, 0) is the best we can do, at least it preserves the number of dimensions.

scotts · 2025-03-13T14:15:04Z

src/torchcodec/decoders/_audio_decoder.py

+            output_pts_seconds = first_pts
+
+        num_samples = frames.shape[1]
+        last_pts = first_pts + num_samples / self.metadata.sample_rate


Won't last_pts be a float here? Is that your intention, or should we call round()?

Yes it's a float. I didn't add the _seconds everywhere, but they're all in seconds. I can do it if you think it improves clarity.

We do call round() just below, which I think is enough? Or is there an edge-case I'm missing?

NicolasHug added 6 commits March 12, 2025 20:07

Add get_samples_played_in_range public method

4495150

Nit

277fac2

WIP

0f9e14d

Add pts test for audio

00bb28d

Merge commit '00bb28d23f6615fe4e45282adbcacf87ffb13845' into audio_pu…

9a00c91

…blic

WIP

a7b67d5

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 13, 2025

NicolasHug mentioned this pull request Mar 12, 2025

Audio decoding TODOs #549

Closed

7 tasks

Fix Fing mypy

02067cf

NicolasHug commented Mar 13, 2025

View reviewed changes

test/decoders/test_ops.py Outdated Show resolved Hide resolved

Merge branch 'main' of github.com:pytorch/torchcodec into audio_public

e5c9831

NicolasHug commented Mar 13, 2025

View reviewed changes

scotts reviewed Mar 13, 2025

View reviewed changes

scotts approved these changes Mar 13, 2025

View reviewed changes

NicolasHug merged commit 1fd20b2 into pytorch:main Mar 13, 2025
46 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `AudioDecoder.get_samples_played_in_range()` public method #555

Add `AudioDecoder.get_samples_played_in_range()` public method #555

NicolasHug commented Mar 13, 2025 •

edited

Loading

NicolasHug Mar 13, 2025 •

edited

Loading

scotts Mar 13, 2025 •

edited

Loading

NicolasHug Mar 13, 2025 •

edited

Loading

NicolasHug Mar 13, 2025

scotts Mar 13, 2025

NicolasHug Mar 13, 2025

NicolasHug Mar 13, 2025

scotts Mar 13, 2025

NicolasHug Mar 13, 2025 •

edited

Loading

		@@ -114,3 +114,28 @@ def __len__(self):

		def __repr__(self):
		return _frame_repr(self)

	# This is called after __init__() when a Frame is created. We can run
	# This is called after __init__() when an AudioSamples instance is created. We can run

Add AudioDecoder.get_samples_played_in_range() public method #555

Add AudioDecoder.get_samples_played_in_range() public method #555

Conversation

NicolasHug commented Mar 13, 2025 • edited Loading

NicolasHug Mar 13, 2025 • edited Loading

Choose a reason for hiding this comment

scotts Mar 13, 2025 • edited Loading

Choose a reason for hiding this comment

NicolasHug Mar 13, 2025 • edited Loading

Choose a reason for hiding this comment

NicolasHug Mar 13, 2025

Choose a reason for hiding this comment

scotts Mar 13, 2025

Choose a reason for hiding this comment

NicolasHug Mar 13, 2025

Choose a reason for hiding this comment

NicolasHug Mar 13, 2025

Choose a reason for hiding this comment

scotts Mar 13, 2025

Choose a reason for hiding this comment

NicolasHug Mar 13, 2025 • edited Loading

Choose a reason for hiding this comment

Add `AudioDecoder.get_samples_played_in_range()` public method #555

Add `AudioDecoder.get_samples_played_in_range()` public method #555

NicolasHug commented Mar 13, 2025 •

edited

Loading

NicolasHug Mar 13, 2025 •

edited

Loading

scotts Mar 13, 2025 •

edited

Loading

NicolasHug Mar 13, 2025 •

edited

Loading

NicolasHug Mar 13, 2025 •

edited

Loading