Skip to content

Cache input data read during init_separators to avoid seek/rewind. #45

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Sep 18, 2018

Conversation

vyznev
Copy link
Contributor

@vyznev vyznev commented Sep 11, 2018

Here's a basic patch that avoids calling seek or rewind after row_sep autodetection (#44) by saving the sample data and wrapping it in a StringIO object for re-reading. It passes all existing tests.

It could also be a good idea to add some new tests, e.g. using a dummy IO wrapper like in my online demonstration, to verify that the new implementation works properly with non-seekable input streams, but I have not done that yet in this patch.

Note that this patch also removes several special cases from init_separators. In particular, row_sep autodetection should now work when reading from a pipe or from ARGF or STDIN. This should probably also have some tests added for it.

I have not run any benchmarks to compare this new implementation with the previous one, but I suspect that they should be pretty close. In some cases the new code might even be faster, e.g. if (re)reading the input data is really slow for some reason.

@kou
Copy link
Member

kou commented Sep 13, 2018

I'll merge this after tests for this changes are added.
Can you work on adding tests?

@vyznev
Copy link
Contributor Author

vyznev commented Sep 16, 2018

I've added some tests for reading from a non-seekable StringIO wrapper, as in the demonstration code I linked to above. I could add some tests for reading from pipes as well, but these might not be fully portable (since the obvious way to implement them would be using fork). Let me know if you think that would also be useful. In any case, the code on this branch no longer treats pipes or STDIN as a special case, although the old code did.

@kou kou merged commit 9ccdfe3 into ruby:master Sep 18, 2018
@kou
Copy link
Member

kou commented Sep 18, 2018

Thanks.
I've merged this.

Test for pipe isn't needed because it's covered by the DummyIO case.

@vyznev vyznev deleted the wip-seekless branch September 19, 2018 07:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants