Skip to content

File downloads should not fail silently #2647

Open
@jewettaijfc

Description

@jewettaijfc

@yaugenst , please feel free to correct my description of the problem or delete this issue if there's another issue that describes this problem better. You know more about this problem than I do.

This issue overlaps with issues 2600 and 2648 which are being actively worked on. If fixing those issues mostly resolves this problem, I will close this issue.

Feature request

  • Users should always be clearly informed about incomplete/unsuccessful file downloads.

Details

We are using S3.Client.download_file() from amazon's boto3 module to download all of our files. Supposedly, this function raises an error whenever the transfer fails for any reason.

Problem: In some scenarios, we are not catching those exceptions and displaying those errors to the user in a way that is easy for them to understand.

I'm not sure why the users don't notice the failure. We might need to reach out to the users to ask them what they were doing and what they saw when the download failed.

Priority

At least 2 users have complained about file download failures in the last 2 weeks. One of them a few days ago only realized that the file download failed after noticing their file was empty.

File integrity guarantees offered by bota3

On a related note, we want to be certain whether or not the download was successful. Fortunately that work is already done for us. S3.Client.download_file() verifies checksums and automatically raises exceptions when downloads fail. It also automatically splits large files (>8Mb) into chunks and reassembles them later. It will automatically attempt to re-try downloads of files (or chunks) that fail by default (5 times, with a 60-second timeout, according to google-gemini). And it only raises an exception if all retries fail. If download_file() succeeds without any exceptions being raised, the file has downloaded successfully. We just need to catch any exceptions when they are raised and display them in a way the user can understand.

(NOTE: S3.Client.download_file() does not read the file on the user's computer after it was downloaded to verify it's integrity. It trusts that the user's storage hardware is reliable. I did not intend to check for this kind of problem when I opened this issue. But it would not be hard to read the file, calculate the SHA256 checksum locally, and verify that it matches head_response.get("ChecksumSHA256").)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions