-
Notifications
You must be signed in to change notification settings - Fork 382
Tracking cache requests #1566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracking cache requests #1566
Conversation
Hi @martindurant! I think this PR is ready for a review in case you have time this week. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, this looks good on the whole, particularly the set of tests.
I have some comments for you to consider.
sstart = i * self.blocksize | ||
send = min(sstart + self.blocksize, self.size) | ||
logger.debug(f"MMap get block #{i} ({sstart}-{send}") | ||
self.total_requested_bytes += send - sstart | ||
logger.debug(f"MMap get block #{i} ({sstart}-{send})") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few of these (not your code, but shows up in the diff due to small corrections). Debug of the form ("test %s", arg) is preferred, so that the string doesn't need to be evaluated in the case that debug is not required.
part = self.cache[start:end] | ||
if end > self.blocksize: | ||
self.total_requested_bytes += end - self.blocksize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment: isn't the number of bytes returned by fetcher
the more important number? If we request 100bytes, we might only get 10 back; probably the calling code will request more. OTOH, the various implementations of the request code might keep making new requests until enough bytes arrive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, perhaps we are missing another variable to keep track of the total_bytes_returned
?
fsspec/tests/test_caches.py
Outdated
@@ -15,26 +21,124 @@ def test_cache_getitem(Cache_imp): | |||
|
|||
|
|||
def test_block_cache_lru(): | |||
cache = BlockCache(4, letters_fetcher, len(string.ascii_letters), maxblocks=2) | |||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a comment, not a docstring
I can't fix your linter failure, since the PR is from master branch. You can run precommit locally, e.g.,
|
Thanks @martindurant! I'll work on some minor changes now and will run linting locally before the next push. About modifying code on external forks, it's possible! I think you could pull/edit the PR directly (as long as the PR author checks the |
Ok now it should pass and I also addressed the comments on formatting the logs for cache implementations that do nothing @martindurant |
As discussed in #1527 this PR adds a few extra variables in the cache implementation to keep track of bytes requested vs cache hits. The numbers can be accessed directly or via the logs in debug mode.
This is also useful to benchmark the I/O behavior of the different cache implementations.