[doc] perlpacktut #19203

zhijieshi · 2021-10-19T16:38:41Z

Where

https://perldoc.perl.org/perlpacktut

Description

Issue 1:

The example at the end of "The Basic Principle" packs "byte contents from a string of hexadecimal digits".
The code is pack( 'H2' x 10, 30..39 ). It is not really straightforward to see 30 as a "hexadecimal digits".
Why making it unnecessarily confusing?

The following would be easier for beginners, avoiding "misunderstanding", which is the purpose of this tutorial.

my $s = pack( 'H2' x 10, '30'..'39');
print "$s\n";

Issue 2:

Since there are unicode strings and byte strings, it is not clear what can be unpacked. It seems unpacking unicode strings may have unexpected result.

#!/usr/bin/perl -w
use v5.34;
use utf8;
use strict;
use warnings;
use Encode qw(encode decode);

my $s = "0123456789😀";
my $b = encode "UTF8", $s;

say "Unpack unicode string 1: ",  unpack( '(H2)*', $s);
say "Unpack unicode string 2: ",  unpack( 'H*', $s);
say "Unpack bytes:            ", unpack( 'H*', $b);

{
use bytes;
say "Unpack unicode string 3: ",  unpack( 'H*', $s);
}

The output is:

Character in 'H' format wrapped in unpack at .\t.pl line 11.
Unpack unicode string 1: 3031323334353637383900
Character in 'H' format wrapped in unpack at .\t.pl line 12.
Unpack unicode string 2: 3031323334353637383900
Unpack bytes:            30313233343536373839f09f9880
Unpack unicode string 3: 30313233343536373839f09f9880

The text was updated successfully, but these errors were encountered:

Grinnz · 2021-10-19T17:10:12Z

Thank you. I agree with your first point, though it may be made even clearer by using strings containing hex digits A-F in the example.

For point 2, the Unicode section probably needs to be rewritten as it's overly abstraction dependent, similar to your "use bytes" example which breaks the Perl string abstraction. I'm not sure exactly what you're suggesting is the problem there otherwise.

zhijieshi · 2021-10-19T20:45:04Z

For point 2, I would like to see some clarifications in the tutorial. I agree that some sections may "needs to be rewritten". When I read the tutorial, I had these questions.

Q1: Can a unicode string be unpacked? If it is not recommended, then the tutorial can make it clear "do not unpack unicode string".

Q2: The example in the tutorial seems to suggest that it is fine to unpack a unicode string into "strings"? If a unicode string can be unpacked in some cases, when would it work?

while (<>) {
    my ($date, $desc, $income, $expend) =
        unpack("A10xA27xA7xA*", $_);
    $tot_income += $income;
    $tot_expend += $expend;
}

Grinnz · 2021-10-19T21:07:03Z

It's a bit complex. The Perl string abstraction is simply a sequence of codepoints - not Unicode, nor bytes, until something interprets it as such. The 'a' and 'A' patterns for example will pass through a codepoint whether or not it fits in a byte, but other patterns like 'C' which are defined to operate on bytes have less obvious behavior (and unfortunately don't warn that you're doing something strange).

And your example has an additional complication. Unless you pass -CSD or add a decoding layer to STDIN or the files you are reading from, <> will return encoded bytes, not Unicode strings. So in that example unpack is likely receiving a byte string.

zhijieshi · 2021-10-20T00:25:25Z

Thanks for the explanation. To summarize, a string may have a codepoint consists of more than one byte. The 'a' or A' pattern works with those codepoints while some other patterns works with bytes only.

Grinnz · 2021-10-20T00:45:37Z

It's more accurate to say it may have a codepoint which cannot represent a byte because it is higher than 255. What it's represented by internally is immaterial (unless using "use bytes", which is why that is problematic).

khwilliamson · 2025-05-04T17:09:13Z

I've never fully understood pack and unpack, and I don't think now it's just me.

Looking @zhijieshi 's first example, I would think that if it were changed to

my $s = pack( 'H2' x 26, '41'..'5A' );

things would be clear. But instead this comes out

ABCDEFGHIPQRSTUVWXY`abcdef

And if we make the first value in the range into a number containing a hex-only digit, we get

my $s = pack( 'H2' x 6, '4A'..'4F' );
Argument "4A" isn't numeric in range (or flop)

So, the numbers 30..39 are interpreted as hex, but not all hex numbers can be used here.

And this is near the beginning of a tutorial, talking about beginner level stuff

zhijieshi added documentation Needs Triage labels Oct 19, 2021

Grinnz removed the Needs Triage label Oct 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[doc] perlpacktut #19203

[doc] perlpacktut #19203

zhijieshi commented Oct 19, 2021 •

edited

Loading

Grinnz commented Oct 19, 2021

zhijieshi commented Oct 19, 2021

Grinnz commented Oct 19, 2021 •

edited

Loading

zhijieshi commented Oct 20, 2021

Grinnz commented Oct 20, 2021

khwilliamson commented May 4, 2025

[doc] perlpacktut #19203

[doc] perlpacktut #19203

Comments

zhijieshi commented Oct 19, 2021 • edited Loading

Grinnz commented Oct 19, 2021

zhijieshi commented Oct 19, 2021

Grinnz commented Oct 19, 2021 • edited Loading

zhijieshi commented Oct 20, 2021

Grinnz commented Oct 20, 2021

khwilliamson commented May 4, 2025

zhijieshi commented Oct 19, 2021 •

edited

Loading

Grinnz commented Oct 19, 2021 •

edited

Loading