Open
Description
Environment details
- Python version: 3.9.18
- pip version: 23.2.1
oci
version: 2.111.0
Issue
We are comparing the download performance of the OCI Python SDK and boto3 (AWS SDK). For the same objects stored in an OCI bucket, we’ve observed that the OCI SDK is approximately 20% to 50% slower than boto3 when downloading to memory.
Methods Tested with OCI SDK
- Using
response.data.content
:
response = self._oci_client.get_object(
namespace_name=self._namespace, bucket_name=bucket, object_name=key, range=bytes_range
)
return response.data.content
- Using response.data.raw.stream
Get idea from this issue, this method is ~60% faster than method 1 but still ~20% slower than boto3:
response = self._oci_client.get_object(
namespace_name=self._namespace, bucket_name=bucket, object_name=key, range=bytes_range
)
content = bytearray()
for chunk in response.data.raw.stream(1024 * 1024, decode_content=False): # 1MB chunks
content.extend(chunk)
return bytes(content)
Note: We tested various chunk sizes, but they did not yield further improvements.
boto3 Baseline Implementation
response = s3_client.get_object(Bucket=bucket_name, Key=key)
return response['Body'].read()
Performance Results
With ThreadPoolExecutor(max_workers=16)
, I got following average throughput downloading 64MB x 1000 objects from the same OCI bucket to memory:
- boto3
get_object
: 9.8 Gbps - OCI SDK
response.data.content
: 4.1 Gbps - OCI SDK
response.data.raw.stream
: 6.8 Gbps
The gap remains consistent across multiple test runs, including various multithreaded and multiprocessed setups.
Questions
- Is this performance gap expected?
- Are there any recommended optimizations or best practices for improving download performance with the OCI Python SDK?
- Are there any internal differences in how OCI supports S3-compatible APIs handling downloads that might explain the performance gap?
Thanks!