Skip to content

gh-73123: Add a keepempty argument to string, bytes and bytearray split methods #26222

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 28 commits into
base: main
Choose a base branch
from

Conversation

MarkCBell
Copy link

@MarkCBell MarkCBell commented May 18, 2021

This PR adds an optional keepempty argument to string.split (and similarly for bytes.split, bytearray.split and UserString.split). As described in issue bpo-28937:

  • If keepempty is true then empty strings are never stripped out of the result array.
  • If keepempty is false then empty strings are always stripped out of the result array.
  • If keepempty is None (the default) then the current behaviour is followed in which empty strings are stripped out of the result array if and only if the separator string is None.

To do this it uses a new splitting algorithm which has been designed to be compatible with the existing maxsplit argument. This is roughly:

def split(string, sep=None, maxsplit=None, keepempty=None):

    prune = sep is None if keepempty is None else not keepempty
    if sep is None: sep = ' '
    # Ok, the real implementation actually matches on any whitespace,
    # but matching on ' ' is good enough for this toy example.

    results = []
    count = 0
    i = 0
    j = string.find(sep, i)
    while j >= 0:
        if j > i or not prune:
            if maxsplit is not None and count >= maxsplit:
                break
            results.append(string[i:j])
            count += 1
        i = j + len(sep)
        j = string.find(sep, i)

    if i < len(string) or not prune:
        results.append(string[i:])

    return results

A number of tests have been added to check the correct behaviour.

https://bugs.python.org/issue28937

Copy link
Member

@tiran tiran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyUncode_Split() and PyUnicode_RSplit() are in the stable ABI and API. You cannot modify the function arguments. Instead you have to create additional functions.

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@MarkCBell MarkCBell requested a review from a team as a code owner May 19, 2021 17:05
@MarkCBell
Copy link
Author

MarkCBell commented May 19, 2021

Thank you for letting me know about the API / ABI. I have changed this PR so that it leaves PyUncode_Split() and PyUnicode_RSplit() alone and created two new methods PyUncode_SplitWithKeepempty() and PyUnicode_RSplitWithKeepempty() that provide access to the new functionality instead.

I have decided that the C API interfaces add little extra value and so have removed them. If it is decided later that these are useful then they can be re-added by reverting commit f95b254.

Now that I have made the requested changes; please review again.

@bedevere-bot
Copy link

Thanks for making the requested changes!

@tiran: please review the changes made to this pull request.

@bedevere-bot bedevere-bot requested a review from tiran May 19, 2021 17:51
@rhettinger rhettinger removed their request for review May 21, 2021 01:18
@MarkCBell
Copy link
Author

Could a maintainer approve running the last two jobs of the CI workflow for me please.

@akulakov
Copy link
Contributor

akulakov commented Jun 2, 2021

I've left a comment in the issue tracker, I think there may be a potential problem with the API.

@github-actions
Copy link

github-actions bot commented Jul 3, 2021

This PR is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale Stale PR or inactive for long period of time. label Jul 3, 2021
@github-actions github-actions bot removed the stale Stale PR or inactive for long period of time. label Aug 8, 2022
@erlend-aasland erlend-aasland changed the title bpo-28937 Adds a keepempty argument to string, bytes and bytearray split methods gh-73123: Add a keepempty argument to string, bytes and bytearray split methods Jan 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants