Skip to content

Support for CRLF line breaks #3

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Sep 10, 2014
Merged

Conversation

Daniel-Diaz
Copy link

I - as many others - write parsers that run on different systems. Windows uses CRLF line breaks while Unix uses just LF. Currently, the newline parser only succeeds for LF line breaks. This leads to parsers that fail on one system but not the other, which is annoying.

In my software, I frequently add the parsers I have attached in this pull request. I thought that they may be useful for other users of parsec. Basically, I am just adding CRLF line break support and a anyNewline parser, which succeeds for both LF and CRLF line breaks.

If there is any concern about the names, they can be modified.

@aslatter
Copy link
Collaborator

How are you getting your strings into your parser?

For some reason I was under the impression that some of the file-IO APIs did some normalization of newlines.

@UnkindPartition
Copy link
Contributor

Text IO (openFile, Data.Text) indeed performs newline conversion.
Binary IO (openBinaryFile, Data.ByteString) doesn't.

@Daniel-Diaz
Copy link
Author

I read files as ByteStrings and then I use a decoding function (like decodeLatin1 or decodeUtf8) to turn it into Text. The decoding function varies on each case.

@aslatter
Copy link
Collaborator

It's a pity that newline has a return-value. Otherwise we could re-define newline as newline = (char '\n' >> return ()) <|> (string "\r\n" >> return ()) <?> "new-line"

@aslatter
Copy link
Collaborator

Alright, after thinking for a bit it seems like we could either:

  1. Re-define the type of newline
  2. Steal endOfLine from attoparsec with the definition we'd like to have for newline

Option (1) might break people so I'm leaning towards (2) (even though I have no data).

@Daniel-Diaz
Copy link
Author

I think very few people use the value returned by newline. In any case, I'd rather avoid option 1 since it may cause some breakages. So, yes, I would take option 2 as well.

@Daniel-Diaz
Copy link
Author

Actually, some people may be using newline in a parser that reads a text with line breaks and keeps the text as it is after parsing. Something like:

parseSomeText :: Parser String
parseSomeText = many $ alphaNum <|> char ' ' <|> newline

So I take back that the very few people use the value returned by newline. I might be wrong there. Even more, I am probably doing it somewhere!

@Daniel-Diaz
Copy link
Author

I have renamed anyNewline to endOfLine to adapt this pull request to (2). Any other changes needed?

aslatter added a commit that referenced this pull request Sep 10, 2014
Support for CRLF line breaks
@aslatter aslatter merged commit e6ab087 into haskell:master Sep 10, 2014
hvr added a commit that referenced this pull request Dec 30, 2017
This is a provisional measure until `notFollowedBy` gets fixed.

See #3 for more details
int-index pushed a commit to int-index/parsec that referenced this pull request Sep 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants