distrib/pod/perlpacktut.pod

1N/A=head1 NAME
1N/A
1N/Aperlpacktut - tutorial on C<pack> and C<unpack>
1N/A
1N/A=head1 DESCRIPTION
1N/A
1N/AC<pack> and C<unpack> are two functions for transforming data according
1N/Ato a user-defined template, between the guarded way Perl stores values
1N/Aand some well-defined representation as might be required in the
1N/Aenvironment of a Perl program. Unfortunately, they're also two of
1N/Athe most misunderstood and most often overlooked functions that Perl
1N/Aprovides. This tutorial will demystify them for you.
1N/A
1N/A
1N/A=head1 The Basic Principle
1N/A
1N/AMost programming languages don't shelter the memory where variables are
1N/Astored. In C, for instance, you can take the address of some variable,
1N/Aand the C<sizeof> operator tells you how many bytes are allocated to
1N/Athe variable. Using the address and the size, you may access the storage
1N/Ato your heart's content.
1N/A
1N/AIn Perl, you just can't access memory at random, but the structural and
1N/Arepresentational conversion provided by C<pack> and C<unpack> is an
1N/Aexcellent alternative. The C<pack> function converts values to a byte
1N/Asequence containing representations according to a given specification,
1N/Athe so-called "template" argument. C<unpack> is the reverse process,
1N/Aderiving some values from the contents of a string of bytes. (Be cautioned,
1N/Ahowever, that not all that has been packed together can be neatly unpacked -
1N/Aa very common experience as seasoned travellers are likely to confirm.)
1N/A
1N/AWhy, you may ask, would you need a chunk of memory containing some values
1N/Ain binary representation? One good reason is input and output accessing
1N/Asome file, a device, or a network connection, whereby this binary
1N/Arepresentation is either forced on you or will give you some benefit
1N/Ain processing. Another cause is passing data to some system call that
1N/Ais not available as a Perl function: C<syscall> requires you to provide
1N/Aparameters stored in the way it happens in a C program. Even text processing
1N/A(as shown in the next section) may be simplified with judicious usage
1N/Aof these two functions.
1N/A
1N/ATo see how (un)packing works, we'll start with a simple template
1N/Acode where the conversion is in low gear: between the contents of a byte
1N/Asequence and a string of hexadecimal digits. Let's use C<unpack>, since
1N/Athis is likely to remind you of a dump program, or some desperate last
1N/Amessage unfortunate programs are wont to throw at you before they expire
1N/Ainto the wild blue yonder. Assuming that the variable C<$mem> holds a
1N/Asequence of bytes that we'd like to inspect without assuming anything
1N/Aabout its meaning, we can write
1N/A
1N/A   my( $hex ) = unpack( 'H*', $mem );
1N/A   print "$hex\n";
1N/A
1N/Awhereupon we might see something like this, with each pair of hex digits
1N/Acorresponding to a byte:
1N/A
1N/A   41204d414e204120504c414e20412043414e414c2050414e414d41
1N/A
1N/AWhat was in this chunk of memory? Numbers, characters, or a mixture of
1N/Aboth? Assuming that we're on a computer where ASCII (or some similar)
1N/Aencoding is used: hexadecimal values in the range C<0x40> - C<0x5A>
1N/Aindicate an uppercase letter, and C<0x20> encodes a space. So we might
1N/Aassume it is a piece of text, which some are able to read like a tabloid;
1N/Abut others will have to get hold of an ASCII table and relive that
1N/Afirstgrader feeling. Not caring too much about which way to read this,
1N/Awe note that C<unpack> with the template code C<H> converts the contents
1N/Aof a sequence of bytes into the customary hexadecimal notation. Since
1N/A"a sequence of" is a pretty vague indication of quantity, C<H> has been
1N/Adefined to convert just a single hexadecimal digit unless it is followed
1N/Aby a repeat count. An asterisk for the repeat count means to use whatever
1N/Aremains.
1N/A
1N/AThe inverse operation - packing byte contents from a string of hexadecimal
1N/Adigits - is just as easily written. For instance:
1N/A
1N/A   my $s = pack( 'H2' x 10, map { "3$_" } ( 0..9 ) );
1N/A   print "$s\n";
1N/A
1N/ASince we feed a list of ten 2-digit hexadecimal strings to C<pack>, the
1N/Apack template should contain ten pack codes. If this is run on a computer
1N/Awith ASCII character coding, it will print C<0123456789>.
1N/A
1N/A
1N/A=head1 Packing Text
1N/A
1N/ALet's suppose you've got to read in a data file like this:
1N/A
1N/A    Date      |Description                | Income|Expenditure
1N/A    01/24/2001 Ahmed's Camel Emporium                  1147.99
1N/A    01/28/2001 Flea spray                                24.99
1N/A    01/29/2001 Camel rides to tourists      235.00
1N/A
1N/AHow do we do it? You might think first to use C<split>; however, since
1N/AC<split> collapses blank fields, you'll never know whether a record was
1N/Aincome or expenditure. Oops. Well, you could always use C<substr>:
1N/A
1N/A    while (<>) {
1N/A        my $date   = substr($_,  0, 11);
1N/A        my $desc   = substr($_, 12, 27);
1N/A        my $income = substr($_, 40,  7);
1N/A        my $expend = substr($_, 52,  7);
1N/A        ...
1N/A    }
1N/A
1N/AIt's not really a barrel of laughs, is it? In fact, it's worse than it
1N/Amay seem; the eagle-eyed may notice that the first field should only be
1N/A10 characters wide, and the error has propagated right through the other
1N/Anumbers - which we've had to count by hand. So it's error-prone as well
1N/Aas horribly unfriendly.
1N/A
1N/AOr maybe we could use regular expressions:
1N/A
1N/A    while (<>) {
1N/A        my($date, $desc, $income, $expend) =
1N/A            m|(\d\d/\d\d/\d{4}) (.{27}) (.{7})(.*)|;
1N/A        ...
1N/A    }
1N/A
1N/AUrgh. Well, it's a bit better, but - well, would you want to maintain
1N/Athat?
1N/A
1N/AHey, isn't Perl supposed to make this sort of thing easy? Well, it does,
1N/Aif you use the right tools. C<pack> and C<unpack> are designed to help
1N/Ayou out when dealing with fixed-width data like the above. Let's have a
1N/Alook at a solution with C<unpack>:
1N/A
1N/A    while (<>) {
1N/A        my($date, $desc, $income, $expend) = unpack("A10xA27xA7A*", $_);
1N/A        ...
1N/A    }
1N/A
1N/AThat looks a bit nicer; but we've got to take apart that weird template.
1N/AWhere did I pull that out of?
1N/A
1N/AOK, let's have a look at some of our data again; in fact, we'll include
1N/Athe headers, and a handy ruler so we can keep track of where we are.
1N/A
1N/A             1         2         3         4         5
1N/A    1234567890123456789012345678901234567890123456789012345678
1N/A    Date      |Description                | Income|Expenditure
1N/A    01/28/2001 Flea spray                                24.99
1N/A    01/29/2001 Camel rides to tourists      235.00
1N/A
1N/AFrom this, we can see that the date column stretches from column 1 to
1N/Acolumn 10 - ten characters wide. The C<pack>-ese for "character" is
1N/AC<A>, and ten of them are C<A10>. So if we just wanted to extract the
1N/Adates, we could say this:
1N/A
1N/A    my($date) = unpack("A10", $_);
1N/A
1N/AOK, what's next? Between the date and the description is a blank column;
1N/Awe want to skip over that. The C<x> template means "skip forward", so we
1N/Awant one of those. Next, we have another batch of characters, from 12 to
1N/A38. That's 27 more characters, hence C<A27>. (Don't make the fencepost
1N/Aerror - there are 27 characters between 12 and 38, not 26. Count 'em!)
1N/A
1N/ANow we skip another character and pick up the next 7 characters:
1N/A
1N/A    my($date,$description,$income) = unpack("A10xA27xA7", $_);
1N/A
1N/ANow comes the clever bit. Lines in our ledger which are just income and
1N/Anot expenditure might end at column 46. Hence, we don't want to tell our
1N/AC<unpack> pattern that we B<need> to find another 12 characters; we'll
1N/Ajust say "if there's anything left, take it". As you might guess from
1N/Aregular expressions, that's what the C<*> means: "use everything
1N/Aremaining".
1N/A
1N/A=over 3
1N/A
1N/A=item *
1N/A
1N/ABe warned, though, that unlike regular expressions, if the C<unpack>
1N/Atemplate doesn't match the incoming data, Perl will scream and die.
1N/A
1N/A=back
1N/A
1N/A
1N/AHence, putting it all together:
1N/A
1N/A    my($date,$description,$income,$expend) = unpack("A10xA27xA7xA*", $_);
1N/A
1N/ANow, that's our data parsed. I suppose what we might want to do now is
1N/Atotal up our income and expenditure, and add another line to the end of
1N/Aour ledger - in the same format - saying how much we've brought in and
1N/Ahow much we've spent:
1N/A
1N/A    while (<>) {
1N/A        my($date, $desc, $income, $expend) = unpack("A10xA27xA7xA*", $_);
1N/A        $tot_income += $income;
1N/A        $tot_expend += $expend;
1N/A    }
1N/A
1N/A    $tot_income = sprintf("%.2f", $tot_income); # Get them into
1N/A    $tot_expend = sprintf("%.2f", $tot_expend); # "financial" format
1N/A
1N/A    $date = POSIX::strftime("%m/%d/%Y", localtime);
1N/A
1N/A    # OK, let's go:
1N/A
1N/A    print pack("A10xA27xA7xA*", $date, "Totals", $tot_income, $tot_expend);
1N/A
1N/AOh, hmm. That didn't quite work. Let's see what happened:
1N/A
1N/A    01/24/2001 Ahmed's Camel Emporium                   1147.99
1N/A    01/28/2001 Flea spray                                 24.99
1N/A    01/29/2001 Camel rides to tourists     1235.00
1N/A    03/23/2001Totals                     1235.001172.98
1N/A
1N/AOK, it's a start, but what happened to the spaces? We put C<x>, didn't
1N/Awe? Shouldn't it skip forward? Let's look at what L<perlfunc/pack> says:
1N/A
1N/A    x   A null byte.
1N/A
1N/AUrgh. No wonder. There's a big difference between "a null byte",
1N/Acharacter zero, and "a space", character 32. Perl's put something
1N/Abetween the date and the description - but unfortunately, we can't see
1N/Ait!
1N/A
1N/AWhat we actually need to do is expand the width of the fields. The C<A>
1N/Aformat pads any non-existent characters with spaces, so we can use the
1N/Aadditional spaces to line up our fields, like this:
1N/A
1N/A    print pack("A11 A28 A8 A*", $date, "Totals", $tot_income, $tot_expend);
1N/A
1N/A(Note that you can put spaces in the template to make it more readable,
1N/Abut they don't translate to spaces in the output.) Here's what we got
1N/Athis time:
1N/A
1N/A    01/24/2001 Ahmed's Camel Emporium                   1147.99
1N/A    01/28/2001 Flea spray                                 24.99
1N/A    01/29/2001 Camel rides to tourists     1235.00
1N/A    03/23/2001 Totals                      1235.00 1172.98
1N/A
1N/AThat's a bit better, but we still have that last column which needs to
1N/Abe moved further over. There's an easy way to fix this up:
1N/Aunfortunately, we can't get C<pack> to right-justify our fields, but we
1N/Acan get C<sprintf> to do it:
1N/A
1N/A    $tot_income = sprintf("%.2f", $tot_income);
1N/A    $tot_expend = sprintf("%12.2f", $tot_expend);
1N/A    $date = POSIX::strftime("%m/%d/%Y", localtime);
1N/A    print pack("A11 A28 A8 A*", $date, "Totals", $tot_income, $tot_expend);
1N/A
1N/AThis time we get the right answer:
1N/A
1N/A    01/28/2001 Flea spray                                 24.99
1N/A    01/29/2001 Camel rides to tourists     1235.00
1N/A    03/23/2001 Totals                      1235.00      1172.98
1N/A
1N/ASo that's how we consume and produce fixed-width data. Let's recap what
1N/Awe've seen of C<pack> and C<unpack> so far:
1N/A
1N/A=over 3
1N/A
1N/A=item *
1N/A
1N/AUse C<pack> to go from several pieces of data to one fixed-width
1N/Aversion; use C<unpack> to turn a fixed-width-format string into several
1N/Apieces of data.
1N/A
1N/A=item *
1N/A
1N/AThe pack format C<A> means "any character"; if you're C<pack>ing and
1N/Ayou've run out of things to pack, C<pack> will fill the rest up with
1N/Aspaces.
1N/A
1N/A=item *
1N/A
1N/AC<x> means "skip a byte" when C<unpack>ing; when C<pack>ing, it means
1N/A"introduce a null byte" - that's probably not what you mean if you're
1N/Adealing with plain text.
1N/A
1N/A=item *
1N/A
1N/AYou can follow the formats with numbers to say how many characters
1N/Ashould be affected by that format: C<A12> means "take 12 characters";
1N/AC<x6> means "skip 6 bytes" or "character 0, 6 times".
1N/A
1N/A=item *
1N/A
1N/AInstead of a number, you can use C<*> to mean "consume everything else
1N/Aleft".
1N/A
1N/AB<Warning>: when packing multiple pieces of data, C<*> only means
1N/A"consume all of the current piece of data". That's to say
1N/A
1N/A    pack("A*A*", $one, $two)
1N/A
1N/Apacks all of C<$one> into the first C<A*> and then all of C<$two> into
1N/Athe second. This is a general principle: each format character
1N/Acorresponds to one piece of data to be C<pack>ed.
1N/A
1N/A=back
1N/A
1N/A
1N/A
1N/A=head1 Packing Numbers
1N/A
1N/ASo much for textual data. Let's get onto the meaty stuff that C<pack>
1N/Aand C<unpack> are best at: handling binary formats for numbers. There is,
1N/Aof course, not just one binary format  - life would be too simple - but
1N/APerl will do all the finicky labor for you.
1N/A
1N/A
1N/A=head2 Integers
1N/A
1N/APacking and unpacking numbers implies conversion to and from some
1N/AI<specific> binary representation. Leaving floating point numbers
1N/Aaside for the moment, the salient properties of any such representation
1N/Aare:
1N/A
1N/A=over 4
1N/A
1N/A=item *
1N/A
1N/Athe number of bytes used for storing the integer,
1N/A
1N/A=item *
1N/A
1N/Awhether the contents are interpreted as a signed or unsigned number,
1N/A
1N/A=item *
1N/A
1N/Athe byte ordering: whether the first byte is the least or most
1N/Asignificant byte (or: little-endian or big-endian, respectively).
1N/A
1N/A=back
1N/A
1N/ASo, for instance, to pack 20302 to a signed 16 bit integer in your
1N/Acomputer's representation you write
1N/A
1N/A   my $ps = pack( 's', 20302 );
1N/A
1N/AAgain, the result is a string, now containing 2 bytes. If you print
1N/Athis string (which is, generally, not recommended) you might see
1N/AC<ON> or C<NO> (depending on your system's byte ordering) - or something
1N/Aentirely different if your computer doesn't use ASCII character encoding.
1N/AUnpacking C<$ps> with the same template returns the original integer value:
1N/A
1N/A   my( $s ) = unpack( 's', $ps );
1N/A
1N/AThis is true for all numeric template codes. But don't expect miracles:
1N/Aif the packed value exceeds the allotted byte capacity, high order bits
1N/Aare silently discarded, and unpack certainly won't be able to pull them
1N/Aback out of some magic hat. And, when you pack using a signed template
1N/Acode such as C<s>, an excess value may result in the sign bit
1N/Agetting set, and unpacking this will smartly return a negative value.
1N/A
1N/A16 bits won't get you too far with integers, but there is C<l> and C<L>
1N/Afor signed and unsigned 32-bit integers. And if this is not enough and
1N/Ayour system supports 64 bit integers you can push the limits much closer
1N/Ato infinity with pack codes C<q> and C<Q>. A notable exception is provided
1N/Aby pack codes C<i> and C<I> for signed and unsigned integers of the
1N/A"local custom" variety: Such an integer will take up as many bytes as
1N/Aa local C compiler returns for C<sizeof(int)>, but it'll use I<at least>
1N/A32 bits.
1N/A
1N/AEach of the integer pack codes C<sSlLqQ> results in a fixed number of bytes,
1N/Ano matter where you execute your program. This may be useful for some
1N/Aapplications, but it does not provide for a portable way to pass data
1N/Astructures between Perl and C programs (bound to happen when you call
1N/AXS extensions or the Perl function C<syscall>), or when you read or
1N/Awrite binary files. What you'll need in this case are template codes that
1N/Adepend on what your local C compiler compiles when you code C<short> or
1N/AC<unsigned long>, for instance. These codes and their corresponding
1N/Abyte lengths are shown in the table below.  Since the C standard leaves
1N/Amuch leeway with respect to the relative sizes of these data types, actual
1N/Avalues may vary, and that's why the values are given as expressions in
1N/AC and Perl. (If you'd like to use values from C<%Config> in your program
1N/Ayou have to import it with C<use Config>.)
1N/A
1N/A   signed unsigned  byte length in C   byte length in Perl
1N/A     s!     S!      sizeof(short)      $Config{shortsize}
1N/A     i!     I!      sizeof(int)        $Config{intsize}
1N/A     l!     L!      sizeof(long)       $Config{longsize}
1N/A     q!     Q!      sizeof(long long)  $Config{longlongsize}
1N/A
1N/AThe C<i!> and C<I!> codes aren't different from C<i> and C<I>; they are
1N/Atolerated for completeness' sake.
1N/A
1N/A
1N/A=head2 Unpacking a Stack Frame
1N/A
1N/ARequesting a particular byte ordering may be necessary when you work with
1N/Abinary data coming from some specific architecture whereas your program could
1N/Arun on a totally different system. As an example, assume you have 24 bytes
1N/Acontaining a stack frame as it happens on an Intel 8086:
1N/A
1N/A      +---------+        +----+----+               +---------+
1N/A TOS: |   IP    |  TOS+4:| FL | FH | FLAGS  TOS+14:|   SI    |
1N/A      +---------+        +----+----+               +---------+
1N/A      |   CS    |        | AL | AH | AX            |   DI    |
1N/A      +---------+        +----+----+               +---------+
1N/A                         | BL | BH | BX            |   BP    |
1N/A                         +----+----+               +---------+
1N/A                         | CL | CH | CX            |   DS    |
1N/A                         +----+----+               +---------+
1N/A                         | DL | DH | DX            |   ES    |
1N/A                         +----+----+               +---------+
1N/A
1N/AFirst, we note that this time-honored 16-bit CPU uses little-endian order,
1N/Aand that's why the low order byte is stored at the lower address. To
1N/Aunpack such a (signed) short we'll have to use code C<v>. A repeat
1N/Acount unpacks all 12 shorts:
1N/A
1N/A   my( $ip, $cs, $flags, $ax, $bx, $cd, $dx, $si, $di, $bp, $ds, $es ) =
1N/A     unpack( 'v12', $frame );
1N/A
1N/AAlternatively, we could have used C<C> to unpack the individually
1N/Aaccessible byte registers FL, FH, AL, AH, etc.:
1N/A
1N/A   my( $fl, $fh, $al, $ah, $bl, $bh, $cl, $ch, $dl, $dh ) =
1N/A     unpack( 'C10', substr( $frame, 4, 10 ) );
1N/A
1N/AIt would be nice if we could do this in one fell swoop: unpack a short,
1N/Aback up a little, and then unpack 2 bytes. Since Perl I<is> nice, it
1N/Aproffers the template code C<X> to back up one byte. Putting this all
1N/Atogether, we may now write:
1N/A
1N/A   my( $ip, $cs,
1N/A       $flags,$fl,$fh,
1N/A       $ax,$al,$ah, $bx,$bl,$bh, $cx,$cl,$ch, $dx,$dl,$dh,
1N/A       $si, $di, $bp, $ds, $es ) =
1N/A   unpack( 'v2' . ('vXXCC' x 5) . 'v5', $frame );
1N/A
1N/A(The clumsy construction of the template can be avoided - just read on!)
1N/A
1N/AWe've taken some pains to construct the template so that it matches
1N/Athe contents of our frame buffer. Otherwise we'd either get undefined values,
1N/Aor C<unpack> could not unpack all. If C<pack> runs out of items, it will
1N/Asupply null strings (which are coerced into zeroes whenever the pack code
1N/Asays so).
1N/A
1N/A
1N/A=head2 How to Eat an Egg on a Net
1N/A
1N/AThe pack code for big-endian (high order byte at the lowest address) is
1N/AC<n> for 16 bit and C<N> for 32 bit integers. You use these codes
1N/Aif you know that your data comes from a compliant architecture, but,
1N/Asurprisingly enough, you should also use these pack codes if you
1N/Aexchange binary data, across the network, with some system that you
1N/Aknow next to nothing about. The simple reason is that this
1N/Aorder has been chosen as the I<network order>, and all standard-fearing
1N/Aprograms ought to follow this convention. (This is, of course, a stern
1N/Abacking for one of the Lilliputian parties and may well influence the
1N/Apolitical development there.) So, if the protocol expects you to send
1N/Aa message by sending the length first, followed by just so many bytes,
1N/Ayou could write:
1N/A
1N/A   my $buf = pack( 'N', length( $msg ) ) . $msg;
1N/A
1N/Aor even:
1N/A
1N/A   my $buf = pack( 'NA*', length( $msg ), $msg );
1N/A
1N/Aand pass C<$buf> to your send routine. Some protocols demand that the
1N/Acount should include the length of the count itself: then just add 4
1N/Ato the data length. (But make sure to read L<"Lengths and Widths"> before
1N/Ayou really code this!)
1N/A
1N/A
1N/A
1N/A=head2 Floating point Numbers
1N/A
1N/AFor packing floating point numbers you have the choice between the
1N/Apack codes C<f> and C<d> which pack into (or unpack from) single-precision or
1N/Adouble-precision representation as it is provided by your system. (There
1N/Ais no such thing as a network representation for reals, so if you want
1N/Ato send your real numbers across computer boundaries, you'd better stick
1N/Ato ASCII representation, unless you're absolutely sure what's on the other
1N/Aend of the line.)
1N/A
1N/A
1N/A
1N/A=head1 Exotic Templates
1N/A
1N/A
1N/A=head2 Bit Strings
1N/A
1N/ABits are the atoms in the memory world. Access to individual bits may
1N/Ahave to be used either as a last resort or because it is the most
1N/Aconvenient way to handle your data. Bit string (un)packing converts
1N/Abetween strings containing a series of C<0> and C<1> characters and
1N/Aa sequence of bytes each containing a group of 8 bits. This is almost
1N/Aas simple as it sounds, except that there are two ways the contents of
1N/Aa byte may be written as a bit string. Let's have a look at an annotated
1N/Abyte:
1N/A
1N/A     7 6 5 4 3 2 1 0
1N/A   +-----------------+
1N/A   | 1 0 0 0 1 1 0 0 |
1N/A   +-----------------+
1N/A    MSB           LSB
1N/A
1N/AIt's egg-eating all over again: Some think that as a bit string this should
1N/Abe written "10001100" i.e. beginning with the most significant bit, others
1N/Ainsist on "00110001". Well, Perl isn't biased, so that's why we have two bit
1N/Astring codes:
1N/A
1N/A   $byte = pack( 'B8', '10001100' ); # start with MSB
1N/A   $byte = pack( 'b8', '00110001' ); # start with LSB
1N/A
1N/AIt is not possible to pack or unpack bit fields - just integral bytes.
1N/AC<pack> always starts at the next byte boundary and "rounds up" to the
1N/Anext multiple of 8 by adding zero bits as required. (If you do want bit
1N/Afields, there is L<perlfunc/vec>. Or you could implement bit field
1N/Ahandling at the character string level, using split, substr, and
1N/Aconcatenation on unpacked bit strings.)
1N/A
1N/ATo illustrate unpacking for bit strings, we'll decompose a simple
1N/Astatus register (a "-" stands for a "reserved" bit):
1N/A
1N/A   +-----------------+-----------------+
1N/A   | S Z - A - P - C | - - - - O D I T |
1N/A   +-----------------+-----------------+
1N/A    MSB           LSB MSB           LSB
1N/A
1N/AConverting these two bytes to a string can be done with the unpack
1N/Atemplate C<'b16'>. To obtain the individual bit values from the bit
1N/Astring we use C<split> with the "empty" separator pattern which dissects
1N/Ainto individual characters. Bit values from the "reserved" positions are
1N/Asimply assigned to C<undef>, a convenient notation for "I don't care where
1N/Athis goes".
1N/A
1N/A   ($carry, undef, $parity, undef, $auxcarry, undef, $zero, $sign,
1N/A    $trace, $interrupt, $direction, $overflow) =
1N/A      split( //, unpack( 'b16', $status ) );
1N/A
1N/AWe could have used an unpack template C<'b12'> just as well, since the
1N/Alast 4 bits can be ignored anyway.
1N/A
1N/A
1N/A=head2 Uuencoding
1N/A
1N/AAnother odd-man-out in the template alphabet is C<u>, which packs an
1N/A"uuencoded string". ("uu" is short for Unix-to-Unix.) Chances are that
1N/Ayou won't ever need this encoding technique which was invented to overcome
1N/Athe shortcomings of old-fashioned transmission mediums that do not support
1N/Aother than simple ASCII data. The essential recipe is simple: Take three
1N/Abytes, or 24 bits. Split them into 4 six-packs, adding a space (0x20) to
1N/Aeach. Repeat until all of the data is blended. Fold groups of 4 bytes into
1N/Alines no longer than 60 and garnish them in front with the original byte count
1N/A(incremented by 0x20) and a C<"\n"> at the end. - The C<pack> chef will
1N/Aprepare this for you, a la minute, when you select pack code C<u> on the menu:
1N/A
1N/A   my $uubuf = pack( 'u', $bindat );
1N/A
1N/AA repeat count after C<u> sets the number of bytes to put into an
1N/Auuencoded line, which is the maximum of 45 by default, but could be
1N/Aset to some (smaller) integer multiple of three. C<unpack> simply ignores
1N/Athe repeat count.
1N/A
1N/A
1N/A=head2 Doing Sums
1N/A
1N/AAn even stranger template code is C<%>E<lt>I<number>E<gt>. First, because
1N/Ait's used as a prefix to some other template code. Second, because it
1N/Acannot be used in C<pack> at all, and third, in C<unpack>, doesn't return the
1N/Adata as defined by the template code it precedes. Instead it'll give you an
1N/Ainteger of I<number> bits that is computed from the data value by
1N/Adoing sums. For numeric unpack codes, no big feat is achieved:
1N/A
1N/A    my $buf = pack( 'iii', 100, 20, 3 );
1N/A    print unpack( '%32i3', $buf ), "\n";  # prints 123
1N/A
1N/AFor string values, C<%> returns the sum of the byte values saving
1N/Ayou the trouble of a sum loop with C<substr> and C<ord>:
1N/A
1N/A    print unpack( '%32A*', "\x01\x10" ), "\n";  # prints 17
1N/A
1N/AAlthough the C<%> code is documented as returning a "checksum":
1N/Adon't put your trust in such values! Even when applied to a small number
1N/Aof bytes, they won't guarantee a noticeable Hamming distance.
1N/A
1N/AIn connection with C<b> or C<B>, C<%> simply adds bits, and this can be put
1N/Ato good use to count set bits efficiently:
1N/A
1N/A    my $bitcount = unpack( '%32b*', $mask );
1N/A
1N/AAnd an even parity bit can be determined like this:
1N/A
1N/A    my $evenparity = unpack( '%1b*', $mask );
1N/A
1N/A
1N/A=head2  Unicode
1N/A
1N/AUnicode is a character set that can represent most characters in most of
1N/Athe world's languages, providing room for over one million different
1N/Acharacters. Unicode 3.1 specifies 94,140 characters: The Basic Latin
1N/Acharacters are assigned to the numbers 0 - 127. The Latin-1 Supplement with
1N/Acharacters that are used in several European languages is in the next
1N/Arange, up to 255. After some more Latin extensions we find the character
1N/Asets from languages using non-Roman alphabets, interspersed with a
1N/Avariety of symbol sets such as currency symbols, Zapf Dingbats or Braille.
1N/A(You might want to visit L<www.unicode.org> for a look at some of
1N/Athem - my personal favourites are Telugu and Kannada.)
1N/A
1N/AThe Unicode character sets associates characters with integers. Encoding
1N/Athese numbers in an equal number of bytes would more than double the
1N/Arequirements for storing texts written in Latin alphabets.
1N/AThe UTF-8 encoding avoids this by storing the most common (from a western
1N/Apoint of view) characters in a single byte while encoding the rarer
1N/Aones in three or more bytes.
1N/A
1N/ASo what has this got to do with C<pack>? Well, if you want to convert
1N/Abetween a Unicode number and its UTF-8 representation you can do so by
1N/Ausing template code C<U>. As an example, let's produce the UTF-8
1N/Arepresentation of the Euro currency symbol (code number 0x20AC):
1N/A
1N/A   $UTF8{Euro} = pack( 'U', 0x20AC );
1N/A
1N/AInspecting C<$UTF8{Euro}> shows that it contains 3 bytes: "\xe2\x82\xac". The
1N/Around trip can be completed with C<unpack>:
1N/A
1N/A   $Unicode{Euro} = unpack( 'U', $UTF8{Euro} );
1N/A
1N/AUsually you'll want to pack or unpack UTF-8 strings:
1N/A
1N/A   # pack and unpack the Hebrew alphabet
1N/A   my $alefbet = pack( 'U*', 0x05d0..0x05ea );
1N/A   my @hebrew = unpack( 'U*', $utf );
1N/A
1N/A
1N/A=head2 Another Portable Binary Encoding
1N/A
1N/AThe pack code C<w> has been added to support a portable binary data
1N/Aencoding scheme that goes way beyond simple integers. (Details can
1N/Abe found at L<Casbah.org>, the Scarab project.)  A BER (Binary Encoded
1N/ARepresentation) compressed unsigned integer stores base 128
1N/Adigits, most significant digit first, with as few digits as possible.
1N/ABit eight (the high bit) is set on each byte except the last. There
1N/Ais no size limit to BER encoding, but Perl won't go to extremes.
1N/A
1N/A   my $berbuf = pack( 'w*', 1, 128, 128+1, 128*128+127 );
1N/A
1N/AA hex dump of C<$berbuf>, with spaces inserted at the right places,
1N/Ashows 01 8100 8101 81807F. Since the last byte is always less than
1N/A128, C<unpack> knows where to stop.
1N/A
1N/A
1N/A=head1 Template Grouping
1N/A
1N/APrior to Perl 5.8, repetitions of templates had to be made by
1N/AC<x>-multiplication of template strings. Now there is a better way as
1N/Awe may use the pack codes C<(> and C<)> combined with a repeat count.
1N/AThe C<unpack> template from the Stack Frame example can simply
1N/Abe written like this:
1N/A
1N/A   unpack( 'v2 (vXXCC)5 v5', $frame )
1N/A
1N/ALet's explore this feature a little more. We'll begin with the equivalent of
1N/A
1N/A   join( '', map( substr( $_, 0, 1 ), @str ) )
1N/A
1N/Awhich returns a string consisting of the first character from each string.
1N/AUsing pack, we can write
1N/A
1N/A   pack( '(A)'.@str, @str )
1N/A
1N/Aor, because a repeat count C<*> means "repeat as often as required",
1N/Asimply
1N/A
1N/A   pack( '(A)*', @str )
1N/A
1N/A(Note that the template C<A*> would only have packed C<$str[0]> in full
1N/Alength.)
1N/A
1N/ATo pack dates stored as triplets ( day, month, year ) in an array C<@dates>
1N/Ainto a sequence of byte, byte, short integer we can write
1N/A
1N/A   $pd = pack( '(CCS)*', map( @$_, @dates ) );
1N/A
1N/ATo swap pairs of characters in a string (with even length) one could use
1N/Aseveral techniques. First, let's use C<x> and C<X> to skip forward and back:
1N/A
1N/A   $s = pack( '(A)*', unpack( '(xAXXAx)*', $s ) );
1N/A
1N/AWe can also use C<@> to jump to an offset, with 0 being the position where
1N/Awe were when the last C<(> was encountered:
1N/A
1N/A   $s = pack( '(A)*', unpack( '(@1A @0A @2)*', $s ) );
1N/A
1N/AFinally, there is also an entirely different approach by unpacking big
1N/Aendian shorts and packing them in the reverse byte order:
1N/A
1N/A   $s = pack( '(v)*', unpack( '(n)*', $s );
1N/A
1N/A
1N/A=head1 Lengths and Widths
1N/A
1N/A=head2 String Lengths
1N/A
1N/AIn the previous section we've seen a network message that was constructed
1N/Aby prefixing the binary message length to the actual message. You'll find
1N/Athat packing a length followed by so many bytes of data is a
1N/Afrequently used recipe since appending a null byte won't work
1N/Aif a null byte may be part of the data. Here is an example where both
1N/Atechniques are used: after two null terminated strings with source and
1N/Adestination address, a Short Message (to a mobile phone) is sent after
1N/Aa length byte:
1N/A
1N/A   my $msg = pack( 'Z*Z*CA*', $src, $dst, length( $sm ), $sm );
1N/A
1N/AUnpacking this message can be done with the same template:
1N/A
1N/A   ( $src, $dst, $len, $sm ) = unpack( 'Z*Z*CA*', $msg );
1N/A
1N/AThere's a subtle trap lurking in the offing: Adding another field after
1N/Athe Short Message (in variable C<$sm>) is all right when packing, but this
1N/Acannot be unpacked naively:
1N/A
1N/A   # pack a message
1N/A   my $msg = pack( 'Z*Z*CA*C', $src, $dst, length( $sm ), $sm, $prio );
1N/A
1N/A   # unpack fails - $prio remains undefined!
1N/A   ( $src, $dst, $len, $sm, $prio ) = unpack( 'Z*Z*CA*C', $msg );
1N/A
1N/AThe pack code C<A*> gobbles up all remaining bytes, and C<$prio> remains
1N/Aundefined! Before we let disappointment dampen the morale: Perl's got
1N/Athe trump card to make this trick too, just a little further up the sleeve.
1N/AWatch this:
1N/A
1N/A   # pack a message: ASCIIZ, ASCIIZ, length/string, byte
1N/A   my $msg = pack( 'Z* Z* C/A* C', $src, $dst, $sm, $prio );
1N/A
1N/A   # unpack
1N/A   ( $src, $dst, $sm, $prio ) = unpack( 'Z* Z* C/A* C', $msg );
1N/A
1N/ACombining two pack codes with a slash (C</>) associates them with a single
1N/Avalue from the argument list. In C<pack>, the length of the argument is
1N/Ataken and packed according to the first code while the argument itself
1N/Ais added after being converted with the template code after the slash.
1N/AThis saves us the trouble of inserting the C<length> call, but it is
1N/Ain C<unpack> where we really score: The value of the length byte marks the
1N/Aend of the string to be taken from the buffer. Since this combination
1N/Adoesn't make sense except when the second pack code isn't C<a*>, C<A*>
1N/Aor C<Z*>, Perl won't let you.
1N/A
1N/AThe pack code preceding C</> may be anything that's fit to represent a
1N/Anumber: All the numeric binary pack codes, and even text codes such as
1N/AC<A4> or C<Z*>:
1N/A
1N/A   # pack/unpack a string preceded by its length in ASCII
1N/A   my $buf = pack( 'A4/A*', "Humpty-Dumpty" );
1N/A   # unpack $buf: '13  Humpty-Dumpty'
1N/A   my $txt = unpack( 'A4/A*', $buf );
1N/A
1N/AC</> is not implemented in Perls before 5.6, so if your code is required to
1N/Awork on older Perls you'll need to C<unpack( 'Z* Z* C')> to get the length,
1N/Athen use it to make a new unpack string. For example
1N/A
1N/A   # pack a message: ASCIIZ, ASCIIZ, length, string, byte (5.005 compatible)
1N/A   my $msg = pack( 'Z* Z* C A* C', $src, $dst, length $sm, $sm, $prio );
1N/A
1N/A   # unpack
1N/A   ( undef, undef, $len) = unpack( 'Z* Z* C', $msg );
1N/A   ($src, $dst, $sm, $prio) = unpack ( "Z* Z* x A$len C", $msg );
1N/A
1N/ABut that second C<unpack> is rushing ahead. It isn't using a simple literal
1N/Astring for the template. So maybe we should introduce...
1N/A
1N/A=head2 Dynamic Templates
1N/A
1N/ASo far, we've seen literals used as templates. If the list of pack
1N/Aitems doesn't have fixed length, an expression constructing the
1N/Atemplate is required (whenever, for some reason, C<()*> cannot be used).
1N/AHere's an example: To store named string values in a way that can be
1N/Aconveniently parsed by a C program, we create a sequence of names and
1N/Anull terminated ASCII strings, with C<=> between the name and the value,
1N/Afollowed by an additional delimiting null byte. Here's how:
1N/A
1N/A   my $env = pack( '(A*A*Z*)' . keys( %Env ) . 'C',
1N/A                   map( { ( $_, '=', $Env{$_} ) } keys( %Env ) ), 0 );
1N/A
1N/ALet's examine the cogs of this byte mill, one by one. There's the C<map>
1N/Acall, creating the items we intend to stuff into the C<$env> buffer:
1N/Ato each key (in C<$_>) it adds the C<=> separator and the hash entry value.
1N/AEach triplet is packed with the template code sequence C<A*A*Z*> that
1N/Ais repeated according to the number of keys. (Yes, that's what the C<keys>
1N/Afunction returns in scalar context.) To get the very last null byte,
1N/Awe add a C<0> at the end of the C<pack> list, to be packed with C<C>.
1N/A(Attentive readers may have noticed that we could have omitted the 0.)
1N/A
1N/AFor the reverse operation, we'll have to determine the number of items
1N/Ain the buffer before we can let C<unpack> rip it apart:
1N/A
1N/A   my $n = $env =~ tr/\0// - 1;
1N/A   my %env = map( split( /=/, $_ ), unpack( "(Z*)$n", $env ) );
1N/A
1N/AThe C<tr> counts the null bytes. The C<unpack> call returns a list of
1N/Aname-value pairs each of which is taken apart in the C<map> block.
1N/A
1N/A
1N/A=head2 Counting Repetitions
1N/A
1N/ARather than storing a sentinel at the end of a data item (or a list of items),
1N/Awe could precede the data with a count. Again, we pack keys and values of
1N/Aa hash, preceding each with an unsigned short length count, and up front
1N/Awe store the number of pairs:
1N/A
1N/A   my $env = pack( 'S(S/A* S/A*)*', scalar keys( %Env ), %Env );
1N/A
1N/AThis simplifies the reverse operation as the number of repetitions can be
1N/Aunpacked with the C</> code:
1N/A
1N/A   my %env = unpack( 'S/(S/A* S/A*)', $env );
1N/A
1N/ANote that this is one of the rare cases where you cannot use the same
1N/Atemplate for C<pack> and C<unpack> because C<pack> can't determine
1N/Aa repeat count for a C<()>-group.
1N/A
1N/A
1N/A=head1 Packing and Unpacking C Structures
1N/A
1N/AIn previous sections we have seen how to pack numbers and character
1N/Astrings. If it were not for a couple of snags we could conclude this
1N/Asection right away with the terse remark that C structures don't
1N/Acontain anything else, and therefore you already know all there is to it.
1N/ASorry, no: read on, please.
1N/A
1N/A=head2 The Alignment Pit
1N/A
1N/AIn the consideration of speed against memory requirements the balance
1N/Ahas been tilted in favor of faster execution. This has influenced the
1N/Away C compilers allocate memory for structures: On architectures
1N/Awhere a 16-bit or 32-bit operand can be moved faster between places in
1N/Amemory, or to or from a CPU register, if it is aligned at an even or
1N/Amultiple-of-four or even at a multiple-of eight address, a C compiler
1N/Awill give you this speed benefit by stuffing extra bytes into structures.
1N/AIf you don't cross the C shoreline this is not likely to cause you any
1N/Agrief (although you should care when you design large data structures,
1N/Aor you want your code to be portable between architectures (you do want
1N/Athat, don't you?)).
1N/A
1N/ATo see how this affects C<pack> and C<unpack>, we'll compare these two
1N/AC structures:
1N/A
1N/A   typedef struct {
1N/A     char     c1;
1N/A     short    s;
1N/A     char     c2;
1N/A     long     l;
1N/A   } gappy_t;
1N/A
1N/A   typedef struct {
1N/A     long     l;
1N/A     short    s;
1N/A     char     c1;
1N/A     char     c2;
1N/A   } dense_t;
1N/A
1N/ATypically, a C compiler allocates 12 bytes to a C<gappy_t> variable, but
1N/Arequires only 8 bytes for a C<dense_t>. After investigating this further,
1N/Awe can draw memory maps, showing where the extra 4 bytes are hidden:
1N/A
1N/A   0           +4          +8          +12
1N/A   +--+--+--+--+--+--+--+--+--+--+--+--+
1N/A   |c1|xx|  s  |c2|xx|xx|xx|     l     |    xx = fill byte
1N/A   +--+--+--+--+--+--+--+--+--+--+--+--+
1N/A   gappy_t
1N/A
1N/A   0           +4          +8
1N/A   +--+--+--+--+--+--+--+--+
1N/A   |     l     |  h  |c1|c2|
1N/A   +--+--+--+--+--+--+--+--+
1N/A   dense_t
1N/A
1N/AAnd that's where the first quirk strikes: C<pack> and C<unpack>
1N/Atemplates have to be stuffed with C<x> codes to get those extra fill bytes.
1N/A
1N/AThe natural question: "Why can't Perl compensate for the gaps?" warrants
1N/Aan answer. One good reason is that C compilers might provide (non-ANSI)
1N/Aextensions permitting all sorts of fancy control over the way structures
1N/Aare aligned, even at the level of an individual structure field. And, if
1N/Athis were not enough, there is an insidious thing called C<union> where
1N/Athe amount of fill bytes cannot be derived from the alignment of the next
1N/Aitem alone.
1N/A
1N/AOK, so let's bite the bullet. Here's one way to get the alignment right
1N/Aby inserting template codes C<x>, which don't take a corresponding item
1N/Afrom the list:
1N/A
1N/A  my $gappy = pack( 'cxs cxxx l!', $c1, $s, $c2, $l );
1N/A
1N/ANote the C<!> after C<l>: We want to make sure that we pack a long
1N/Ainteger as it is compiled by our C compiler. And even now, it will only
1N/Awork for the platforms where the compiler aligns things as above.
1N/AAnd somebody somewhere has a platform where it doesn't.
1N/A[Probably a Cray, where C<short>s, C<int>s and C<long>s are all 8 bytes. :-)]
1N/A
1N/ACounting bytes and watching alignments in lengthy structures is bound to
1N/Abe a drag. Isn't there a way we can create the template with a simple
1N/Aprogram? Here's a C program that does the trick:
1N/A
1N/A   #include <stdio.h>
1N/A   #include <stddef.h>
1N/A
1N/A   typedef struct {
1N/A     char     fc1;
1N/A     short    fs;
1N/A     char     fc2;
1N/A     long     fl;
1N/A   } gappy_t;
1N/A
1N/A   #define Pt(struct,field,tchar) \
1N/A     printf( "@%d%s ", offsetof(struct,field), # tchar );
1N/A
1N/A   int main() {
1N/A     Pt( gappy_t, fc1, c  );
1N/A     Pt( gappy_t, fs,  s! );
1N/A     Pt( gappy_t, fc2, c  );
1N/A     Pt( gappy_t, fl,  l! );
1N/A     printf( "\n" );
1N/A   }
1N/A
1N/AThe output line can be used as a template in a C<pack> or C<unpack> call:
1N/A
1N/A  my $gappy = pack( '@0c @2s! @4c @8l!', $c1, $s, $c2, $l );
1N/A
1N/AGee, yet another template code - as if we hadn't plenty. But
1N/AC<@> saves our day by enabling us to specify the offset from the beginning
1N/Aof the pack buffer to the next item: This is just the value
1N/Athe C<offsetof> macro (defined in C<E<lt>stddef.hE<gt>>) returns when
1N/Agiven a C<struct> type and one of its field names ("member-designator" in
1N/AC standardese).
1N/A
1N/ANeither using offsets nor adding C<x>'s to bridge the gaps is satisfactory.
1N/A(Just imagine what happens if the structure changes.) What we really need
1N/Ais a way of saying "skip as many bytes as required to the next multiple of N".
1N/AIn fluent Templatese, you say this with C<x!N> where N is replaced by the
1N/Aappropriate value. Here's the next version of our struct packaging:
1N/A
1N/A  my $gappy = pack( 'c x!2 s c x!4 l!', $c1, $s, $c2, $l );
1N/A
1N/AThat's certainly better, but we still have to know how long all the
1N/Aintegers are, and portability is far away. Rather than C<2>,
1N/Afor instance, we want to say "however long a short is". But this can be
1N/Adone by enclosing the appropriate pack code in brackets: C<[s]>. So, here's
1N/Athe very best we can do:
1N/A
1N/A  my $gappy = pack( 'c x![s] s c x![l!] l!', $c1, $s, $c2, $l );
1N/A
1N/A
1N/A=head2 Alignment, Take 2
1N/A
1N/AI'm afraid that we're not quite through with the alignment catch yet. The
1N/Ahydra raises another ugly head when you pack arrays of structures:
1N/A
1N/A   typedef struct {
1N/A     short    count;
1N/A     char     glyph;
1N/A   } cell_t;
1N/A
1N/A   typedef cell_t buffer_t[BUFLEN];
1N/A
1N/AWhere's the catch? Padding is neither required before the first field C<count>,
1N/Anor between this and the next field C<glyph>, so why can't we simply pack
1N/Alike this:
1N/A
1N/A   # something goes wrong here:
1N/A   pack( 's!a' x @buffer,
1N/A         map{ ( $_->{count}, $_->{glyph} ) } @buffer );
1N/A
1N/AThis packs C<3*@buffer> bytes, but it turns out that the size of
1N/AC<buffer_t> is four times C<BUFLEN>! The moral of the story is that
1N/Athe required alignment of a structure or array is propagated to the
1N/Anext higher level where we have to consider padding I<at the end>
1N/Aof each component as well. Thus the correct template is:
1N/A
1N/A   pack( 's!ax' x @buffer,
1N/A         map{ ( $_->{count}, $_->{glyph} ) } @buffer );
1N/A
1N/A=head2 Alignment, Take 3
1N/A
1N/AAnd even if you take all the above into account, ANSI still lets this:
1N/A
1N/A   typedef struct {
1N/A     char     foo[2];
1N/A   } foo_t;
1N/A
1N/Avary in size. The alignment constraint of the structure can be greater than
1N/Aany of its elements. [And if you think that this doesn't affect anything
1N/Acommon, dismember the next cellphone that you see. Many have ARM cores, and
1N/Athe ARM structure rules make C<sizeof (foo_t)> == 4]
1N/A
1N/A=head2 Pointers for How to Use Them
1N/A
1N/AThe title of this section indicates the second problem you may run into
1N/Asooner or later when you pack C structures. If the function you intend
1N/Ato call expects a, say, C<void *> value, you I<cannot> simply take
1N/Aa reference to a Perl variable. (Although that value certainly is a
1N/Amemory address, it's not the address where the variable's contents are
1N/Astored.)
1N/A
1N/ATemplate code C<P> promises to pack a "pointer to a fixed length string".
1N/AIsn't this what we want? Let's try:
1N/A
1N/A    # allocate some storage and pack a pointer to it
1N/A    my $memory = "\x00" x $size;
1N/A    my $memptr = pack( 'P', $memory );
1N/A
1N/ABut wait: doesn't C<pack> just return a sequence of bytes? How can we pass this
1N/Astring of bytes to some C code expecting a pointer which is, after all,
1N/Anothing but a number? The answer is simple: We have to obtain the numeric
1N/Aaddress from the bytes returned by C<pack>.
1N/A
1N/A    my $ptr = unpack( 'L!', $memptr );
1N/A
1N/AObviously this assumes that it is possible to typecast a pointer
1N/Ato an unsigned long and vice versa, which frequently works but should not
1N/Abe taken as a universal law. - Now that we have this pointer the next question
1N/Ais: How can we put it to good use? We need a call to some C function
1N/Awhere a pointer is expected. The read(2) system call comes to mind:
1N/A
1N/A    ssize_t read(int fd, void *buf, size_t count);
1N/A
1N/AAfter reading L<perlfunc> explaining how to use C<syscall> we can write
1N/Athis Perl function copying a file to standard output:
1N/A
1N/A    require 'syscall.ph';
1N/A    sub cat($){
1N/A        my $path = shift();
1N/A        my $size = -s $path;
1N/A        my $memory = "\x00" x $size;  # allocate some memory
1N/A        my $ptr = unpack( 'L', pack( 'P', $memory ) );
1N/A        open( F, $path ) || die( "$path: cannot open ($!)\n" );
1N/A        my $fd = fileno(F);
1N/A        my $res = syscall( &SYS_read, fileno(F), $ptr, $size );
1N/A        print $memory;
1N/A        close( F );
1N/A    }
1N/A
1N/AThis is neither a specimen of simplicity nor a paragon of portability but
1N/Ait illustrates the point: We are able to sneak behind the scenes and
1N/Aaccess Perl's otherwise well-guarded memory! (Important note: Perl's
1N/AC<syscall> does I<not> require you to construct pointers in this roundabout
1N/Away. You simply pass a string variable, and Perl forwards the address.)
1N/A
1N/AHow does C<unpack> with C<P> work? Imagine some pointer in the buffer
1N/Aabout to be unpacked: If it isn't the null pointer (which will smartly
1N/Aproduce the C<undef> value) we have a start address - but then what?
1N/APerl has no way of knowing how long this "fixed length string" is, so
1N/Ait's up to you to specify the actual size as an explicit length after C<P>.
1N/A
1N/A   my $mem = "abcdefghijklmn";
1N/A   print unpack( 'P5', pack( 'P', $mem ) ); # prints "abcde"
1N/A
1N/AAs a consequence, C<pack> ignores any number or C<*> after C<P>.
1N/A
1N/A
1N/ANow that we have seen C<P> at work, we might as well give C<p> a whirl.
1N/AWhy do we need a second template code for packing pointers at all? The
1N/Aanswer lies behind the simple fact that an C<unpack> with C<p> promises
1N/Aa null-terminated string starting at the address taken from the buffer,
1N/Aand that implies a length for the data item to be returned:
1N/A
1N/A   my $buf = pack( 'p', "abc\x00efhijklmn" );
1N/A   print unpack( 'p', $buf );    # prints "abc"
1N/A
1N/A
1N/A
1N/AAlbeit this is apt to be confusing: As a consequence of the length being
1N/Aimplied by the string's length, a number after pack code C<p> is a repeat
1N/Acount, not a length as after C<P>.
1N/A
1N/A
1N/AUsing C<pack(..., $x)> with C<P> or C<p> to get the address where C<$x> is
1N/Aactually stored must be used with circumspection. Perl's internal machinery
1N/Aconsiders the relation between a variable and that address as its very own
1N/Aprivate matter and doesn't really care that we have obtained a copy. Therefore:
1N/A
1N/A=over 4
1N/A
1N/A=item *
1N/A
1N/ADo not use C<pack> with C<p> or C<P> to obtain the address of variable
1N/Athat's bound to go out of scope (and thereby freeing its memory) before you
1N/Aare done with using the memory at that address.
1N/A
1N/A=item *
1N/A
1N/ABe very careful with Perl operations that change the value of the
1N/Avariable. Appending something to the variable, for instance, might require
1N/Areallocation of its storage, leaving you with a pointer into no-man's land.
1N/A
1N/A=item *
1N/A
1N/ADon't think that you can get the address of a Perl variable
1N/Awhen it is stored as an integer or double number! C<pack('P', $x)> will
1N/Aforce the variable's internal representation to string, just as if you
1N/Ahad written something like C<$x .= ''>.
1N/A
1N/A=back
1N/A
1N/AIt's safe, however, to P- or p-pack a string literal, because Perl simply
1N/Aallocates an anonymous variable.
1N/A
1N/A
1N/A
1N/A=head1 Pack Recipes
1N/A
1N/AHere are a collection of (possibly) useful canned recipes for C<pack>
1N/Aand C<unpack>:
1N/A
1N/A    # Convert IP address for socket functions
1N/A    pack( "C4", split /\./, "123.4.5.6" );
1N/A
1N/A    # Count the bits in a chunk of memory (e.g. a select vector)
1N/A    unpack( '%32b*', $mask );
1N/A
1N/A    # Determine the endianness of your system
1N/A    $is_little_endian = unpack( 'c', pack( 's', 1 ) );
1N/A    $is_big_endian = unpack( 'xc', pack( 's', 1 ) );
1N/A
1N/A    # Determine the number of bits in a native integer
1N/A    $bits = unpack( '%32I!', ~0 );
1N/A
1N/A    # Prepare argument for the nanosleep system call
1N/A    my $timespec = pack( 'L!L!', $secs, $nanosecs );
1N/A
1N/AFor a simple memory dump we unpack some bytes into just as
1N/Amany pairs of hex digits, and use C<map> to handle the traditional
1N/Aspacing - 16 bytes to a line:
1N/A
1N/A    my $i;
1N/A    print map( ++$i % 16 ? "$_ " : "$_\n",
1N/A               unpack( 'H2' x length( $mem ), $mem ) ),
1N/A          length( $mem ) % 16 ? "\n" : '';
1N/A
1N/A
1N/A=head1 Funnies Section
1N/A
1N/A    # Pulling digits out of nowhere...
1N/A    print unpack( 'C', pack( 'x' ) ),
1N/A          unpack( '%B*', pack( 'A' ) ),
1N/A          unpack( 'H', pack( 'A' ) ),
1N/A          unpack( 'A', unpack( 'C', pack( 'A' ) ) ), "\n";
1N/A
1N/A    # One for the road ;-)
1N/A    my $advice = pack( 'all u can in a van' );
1N/A
1N/A
1N/A=head1 Authors
1N/A
1N/ASimon Cozens and Wolfgang Laun.
1N/A