-
Notifications
You must be signed in to change notification settings - Fork 577
[doc] perlpacktut #19203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thank you. I agree with your first point, though it may be made even clearer by using strings containing hex digits A-F in the example. For point 2, the Unicode section probably needs to be rewritten as it's overly abstraction dependent, similar to your "use bytes" example which breaks the Perl string abstraction. I'm not sure exactly what you're suggesting is the problem there otherwise. |
For point 2, I would like to see some clarifications in the tutorial. I agree that some sections may "needs to be rewritten". When I read the tutorial, I had these questions. Q1: Can a unicode string be unpacked? If it is not recommended, then the tutorial can make it clear "do not unpack unicode string". Q2: The example in the tutorial seems to suggest that it is fine to unpack a unicode string into "strings"? If a unicode string can be unpacked in some cases, when would it work?
|
It's a bit complex. The Perl string abstraction is simply a sequence of codepoints - not Unicode, nor bytes, until something interprets it as such. The 'a' and 'A' patterns for example will pass through a codepoint whether or not it fits in a byte, but other patterns like 'C' which are defined to operate on bytes have less obvious behavior (and unfortunately don't warn that you're doing something strange). And your example has an additional complication. Unless you pass |
Thanks for the explanation. To summarize, a string may have a codepoint consists of more than one byte. The 'a' or A' pattern works with those codepoints while some other patterns works with bytes only. |
It's more accurate to say it may have a codepoint which cannot represent a byte because it is higher than 255. What it's represented by internally is immaterial (unless using "use bytes", which is why that is problematic). |
I've never fully understood Looking @zhijieshi 's first example, I would think that if it were changed to
things would be clear. But instead this comes out
And if we make the first value in the range into a number containing a hex-only digit, we get
So, the numbers And this is near the beginning of a tutorial, talking about beginner level stuff |
Where
https://perldoc.perl.org/perlpacktut
Description
Issue 1:
The example at the end of "The Basic Principle" packs "byte contents from a string of hexadecimal digits".
The code is
pack( 'H2' x 10, 30..39 )
. It is not really straightforward to see 30 as a "hexadecimal digits".Why making it unnecessarily confusing?
The following would be easier for beginners, avoiding "misunderstanding", which is the purpose of this tutorial.
Issue 2:
Since there are unicode strings and byte strings, it is not clear what can be unpacked. It seems unpacking unicode strings may have unexpected result.
The output is:
The text was updated successfully, but these errors were encountered: