-
Notifications
You must be signed in to change notification settings - Fork 36
Add AudioDecoder.get_samples_played_in_range()
public method
#555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
||
def get_samples_played_in_range( | ||
self, start_seconds: float, stop_seconds: Optional[float] = None | ||
) -> AudioSamples: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like start_seconds
should default to the stream's beginning, but that can be done later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do that, we should also mirror that in the video API. We could try to play the same trick that range()
does, where the semantics of the first parameter changes based on the number of parameters, but I'd (softly) rather not do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I'll update #150 eventually with more of these desirable default behaviors. That would be a good onboarding PR.
@@ -114,3 +114,28 @@ def __len__(self): | |||
|
|||
def __repr__(self): | |||
return _frame_repr(self) | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Below: the audio decoding API returns the new AudioSamples
class rather than a pure Tensor. I think it has the following benefits:
- users can keep track of the sample_rate within that struct without having to handle it separately
- for some edge-cases, like with our mp3 test asset, the stream's beginning isn't 0. So we also returns
pts_seconds
, which may not always be equal tostart_seconds
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this makes sense, and it mirrors the video API. In the video API, __getitem__()
returns just the tensor, but the named methods return Frame
or FrameBatch
. I think we should probably do that for audio as well.
sample_rate: int | ||
|
||
def __post_init__(self): | ||
# This is called after __init__() when a Frame is created. We can run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# This is called after __init__() when a Frame is created. We can run | |
# This is called after __init__() when an AudioSamples instance is created. We can run |
@@ -854,7 +854,7 @@ VideoDecoder::AudioFramesOutput VideoDecoder::getFramesPlayedInRangeAudio( | |||
|
|||
if (startSeconds == stopSeconds) { | |||
// For consistency with video | |||
return AudioFramesOutput{torch::empty({0}), 0.0}; | |||
return AudioFramesOutput{torch::empty({0, 0}), 0.0}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drive-by, the video APIs return something of shape (0, C, H, W)
(where C H W are from the metadata). Here, (0, 0) is the best we can do, at least it preserves the number of dimensions.
output_pts_seconds = first_pts | ||
|
||
num_samples = frames.shape[1] | ||
last_pts = first_pts + num_samples / self.metadata.sample_rate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't last_pts
be a float here? Is that your intention, or should we call round()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it's a float. I didn't add the _seconds
everywhere, but they're all in seconds. I can do it if you think it improves clarity.
We do call round()
just below, which I think is enough? Or is there an edge-case I'm missing?
Towards #549