1N/Aperldsc - Perl Data Structures Cookbook
1N/AThe single feature most sorely lacking in the Perl programming language
1N/Aprior to its 5.0 release was complex data structures. Even without direct
1N/Alanguage support, some valiant programmers did manage to emulate them, but
1N/Ait was hard work and not for the faint of heart. You could occasionally
1N/Aget away with the C<$m{$AoA,$b}> notation borrowed from B<awk> in which the
1N/Akeys are actually more like a single concatenated string C<"$AoA$b">, but
1N/Atraversal and sorting were difficult. More desperate programmers even
1N/Ahacked Perl's internal symbol table directly, a strategy that proved hard
1N/Ato develop and maintain--to put it mildly.
1N/AThe 5.0 release of Perl let us have complex data structures. You
1N/Amay now write something like this and all of a sudden, you'd have an array
1N/Awith three dimensions!
1N/AAlas, however simple this may appear, underneath it's a much more
1N/Aelaborate construct than meets the eye!
1N/AHow do you print it out? Why can't you say just C<print @AoA>? How do
1N/Ayou sort it? How can you pass it to a function or get one of these back
1N/Afrom a function? Is it an object? Can you save it to disk to read
1N/Aback later? How do you access whole rows or columns of that matrix? Do
1N/Aall the values have to be numeric?
1N/AAs you see, it's quite easy to become confused. While some small portion
1N/Aof the blame for this can be attributed to the reference-based
1N/Aimplementation, it's really more due to a lack of existing documentation with
1N/Aexamples designed for the beginner.
1N/AThis document is meant to be a detailed but understandable treatment of the
1N/Amany different sorts of data structures you might want to develop. It
1N/Ashould also serve as a cookbook of examples. That way, when you need to
1N/Acreate one of these complex data structures, you can just pinch, pilfer, or
1N/Apurloin a drop-in example from here.
1N/ALet's look at each of these possible constructs in detail. There are separate
1N/Asections on each of the following:
1N/A=item * arrays of arrays
1N/A=item * hashes of arrays
1N/A=item * arrays of hashes
1N/A=item * hashes of hashes
1N/A=item * more elaborate constructs
1N/ABut for now, let's look at general issues common to all
1N/Athese types of data structures.
1N/AThe most important thing to understand about all data structures in Perl
1N/A-- including multidimensional arrays--is that even though they might
1N/Aappear otherwise, Perl C<@ARRAY>s and C<%HASH>es are all internally
1N/Aone-dimensional. They can hold only scalar values (meaning a string,
1N/Anumber, or a reference). They cannot directly contain other arrays or
1N/Ahashes, but instead contain I<references> to other arrays or hashes.
1N/AYou can't use a reference to an array or hash in quite the same way that you
1N/Awould a real array or hash. For C or C++ programmers unused to
1N/Adistinguishing between arrays and pointers to the same, this can be
1N/Aconfusing. If so, just think of it as the difference between a structure
1N/Aand a pointer to a structure.
1N/AYou can (and should) read more about references in the perlref(1) man
1N/Apage. Briefly, references are rather like pointers that know what they
1N/Apoint to. (Objects are also a kind of reference, but we won't be needing
1N/Athem right away--if ever.) This means that when you have something which
1N/Alooks to you like an access to a two-or-more-dimensional array
and/or hash,
1N/Awhat's really going on is that the base type is
1N/Amerely a one-dimensional entity that contains references to the next
1N/Alevel. It's just that you can I<use> it as though it were a
1N/Atwo-dimensional one. This is actually the way almost all C
1N/Amultidimensional arrays work as well.
1N/A $array[7][12] # array of arrays
1N/A $array[7]{string} # array of hashes
1N/A $hash{string}[7] # hash of arrays
1N/A $hash{string}{'another string'} # hash of hashes
1N/ANow, because the top level contains only references, if you try to print
1N/Aout your array in with a simple print() function, you'll get something
1N/Athat doesn't look very nice, like this:
1N/A @AoA = ( [2, 3], [4, 5, 7], [0] );
1N/A ARRAY(0x83c38)ARRAY(0x8b194)ARRAY(0x8b1d0)
1N/AThat's because Perl doesn't (ever) implicitly dereference your variables.
1N/AIf you want to get at the thing a reference is referring to, then you have
1N/Ato do this yourself using either prefix typing indicators, like
1N/AC<${$blah}>, C<@{$blah}>, C<@{$blah[$i]}>, or else postfix pointer arrows,
1N/Alike C<$a-E<gt>[3]>, C<$h-E<gt>{fred}>, or even C<$ob-E<gt>method()-E<gt>[3]>.
1N/A=head1 COMMON MISTAKES
1N/AThe two most common mistakes made in constructing something like
1N/Aan array of arrays is either accidentally counting the number of
1N/Aelements or else taking a reference to the same memory location
1N/Arepeatedly. Here's the case where you just get the count instead
1N/A @array = somefunc($i);
1N/A $AoA[$i] = @array; # WRONG!
1N/AThat's just the simple case of assigning an array to a scalar and getting
1N/Aits element count. If that's what you really and truly want, then you
1N/Amight do well to consider being a tad more explicit about it, like this:
1N/A @array = somefunc($i);
1N/A $counts[$i] = scalar @array;
1N/AHere's the case of taking a reference to the same memory location
1N/A @array = somefunc($i);
1N/A $AoA[$i] = \@array; # WRONG!
1N/ASo, what's the big problem with that? It looks right, doesn't it?
1N/AAfter all, I just told you that you need an array of references, so by
1N/Agolly, you've made me one!
1N/AUnfortunately, while this is true, it's still broken. All the references
1N/Ain @AoA refer to the I<very same place>, and they will therefore all hold
1N/Awhatever was last in @array! It's similar to the problem demonstrated in
1N/Athe following C program:
1N/A struct passwd *getpwnam(), *rp, *dp;
1N/A rp = getpwnam("root");
1N/A dp = getpwnam("daemon");
1N/A printf("daemon name is %s\nroot name is %s\n",
1N/A dp->pw_name, rp->pw_name);
1N/A daemon name is daemon
1N/AThe problem is that both C<rp> and C<dp> are pointers to the same location
1N/Ain memory! In C, you'd have to remember to malloc() yourself some new
1N/Amemory. In Perl, you'll want to use the array constructor C<[]> or the
1N/Ahash constructor C<{}> instead. Here's the right way to do the preceding
1N/Abroken code fragments:
1N/A @array = somefunc($i);
1N/A $AoA[$i] = [ @array ];
1N/AThe square brackets make a reference to a new array with a I<copy>
1N/Aof what's in @array at the time of the assignment. This is what
1N/ANote that this will produce something similar, but it's
1N/A @{$AoA[$i]} = @array;
1N/AIs it the same? Well, maybe so--and maybe not. The subtle difference
1N/Ais that when you assign something in square brackets, you know for sure
1N/Ait's always a brand new reference with a new I<copy> of the data.
1N/ASomething else could be going on in this new case with the C<@{$AoA[$i]}}>
1N/Adereference on the left-hand-side of the assignment. It all depends on
1N/Awhether C<$AoA[$i]> had been undefined to start with, or whether it
1N/Aalready contained a reference. If you had already populated @AoA with
1N/A $AoA[3] = \@another_array;
1N/AThen the assignment with the indirection on the left-hand-side would
1N/Ause the existing reference that was already there:
1N/A @{$AoA[3]} = @array;
1N/AOf course, this I<would> have the "interesting" effect of clobbering
1N/A@another_array. (Have you ever noticed how when a programmer says
1N/Asomething is "interesting", that rather than meaning "intriguing",
1N/Athey're disturbingly more apt to mean that it's "annoying",
1N/A"difficult", or both? :-)
1N/ASo just remember always to use the array or hash constructors with C<[]>
1N/Aor C<{}>, and you'll be fine, although it's not always optimally
1N/ASurprisingly, the following dangerous-looking construct will
1N/Aactually work out fine:
1N/A my @array = somefunc($i);
1N/AThat's because my() is more of a run-time statement than it is a
1N/Acompile-time declaration I<per se>. This means that the my() variable is
1N/Aremade afresh each time through the loop. So even though it I<looks> as
1N/Athough you stored the same variable reference each time, you actually did
1N/Anot! This is a subtle distinction that can produce more efficient code at
1N/Athe risk of misleading all but the most experienced of programmers. So I
1N/Ausually advise against teaching it to beginners. In fact, except for
1N/Apassing arguments to functions, I seldom like to see the gimme-a-reference
1N/Aoperator (backslash) used much at all in code. Instead, I advise
1N/Abeginners that they (and most of the rest of us) should try to use the
1N/Amuch more easily understood constructors C<[]> and C<{}> instead of
1N/Arelying upon lexical (or dynamic) scoping and hidden reference-counting to
1N/Ado the right thing behind the scenes.
1N/A $AoA[$i] = [ @array ]; # usually best
1N/A $AoA[$i] = \@array; # perilous; just how my() was that array?
1N/A @{ $AoA[$i] } = @array; # way too tricky for most programmers
1N/A=head1 CAVEAT ON PRECEDENCE
1N/ASpeaking of things like C<@{$AoA[$i]}>, the following are actually the
1N/A $aref->[2][2] # clear
1N/A $$aref[2][2] # confusing
1N/AThat's because Perl's precedence rules on its five prefix dereferencers
1N/A(which look like someone swearing: C<$ @ * % &>) make them bind more
1N/Atightly than the postfix subscripting brackets or braces! This will no
1N/Adoubt come as a great shock to the C or C++ programmer, who is quite
1N/Aaccustomed to using C<*a[i]> to mean what's pointed to by the I<i'th>
1N/Aelement of C<a>. That is, they first take the subscript, and only then
1N/Adereference the thing at that subscript. That's fine in C, but this isn't C.
1N/AThe seemingly equivalent construct in Perl, C<$$aref[$i]> first does
1N/Athe deref of $aref, making it take $aref as a reference to an
1N/Aarray, and then dereference that, and finally tell you the I<i'th> value
1N/Aof the array pointed to by $AoA. If you wanted the C notion, you'd have to
1N/Awrite C<${$AoA[$i]}> to force the C<$AoA[$i]> to get evaluated first
1N/Abefore the leading C<$> dereferencer.
1N/A=head1 WHY YOU SHOULD ALWAYS C<use strict>
1N/AIf this is starting to sound scarier than it's worth, relax. Perl has
1N/Asome features to help you avoid its most common pitfalls. The best
1N/Away to avoid getting confused is to start every program like this:
1N/AThis way, you'll be forced to declare all your variables with my() and
1N/Aalso disallow accidental "symbolic dereferencing". Therefore if you'd done
1N/A [ "fred", "barney", "pebbles", "bambam", "dino", ],
1N/A [ "homer", "bart", "marge", "maggie", ],
1N/A [ "george", "jane", "elroy", "judy", ],
1N/AThe compiler would immediately flag that as an error I<at compile time>,
1N/Abecause you were accidentally accessing C<@aref>, an undeclared
1N/Avariable, and it would thereby remind you to write instead:
1N/ABefore version 5.002, the standard Perl debugger didn't do a very nice job of
1N/Aprinting out complex data structures. With 5.002 or above, the
1N/Adebugger includes several new features, including command line editing as
1N/Awell as the C<x> command to dump out complex data structures. For
1N/Aexample, given the assignment to $AoA above, here's the debugger output:
1N/A $AoA = ARRAY(0x13b5a0)
1N/APresented with little comment (these will get their own manpages someday)
1N/Ahere are short code examples illustrating access of various
1N/Atypes of data structures.
1N/A=head1 ARRAYS OF ARRAYS
1N/A=head2 Declaration of an ARRAY OF ARRAYS
1N/A [ "fred", "barney" ],
1N/A [ "george", "jane", "elroy" ],
1N/A [ "homer", "marge", "bart" ],
1N/A=head2 Generation of an ARRAY OF ARRAYS
1N/A push @AoA, [ split ];
1N/A # calling a function
1N/A for $i ( 1 .. 10 ) {
1N/A $AoA[$i] = [ somefunc($i) ];
1N/A for $i ( 1 .. 10 ) {
1N/A @tmp = somefunc($i);
1N/A $AoA[$i] = [ @tmp ];
1N/A # add to an existing row
1N/A push @{ $AoA[0] }, "wilma", "betty";
1N/A=head2 Access and Printing of an ARRAY OF ARRAYS
1N/A $AoA[0][0] = "Fred";
1N/A $AoA[1][1] =~ s/(\w)/\u$1/;
1N/A # print the whole thing with refs
1N/A for $aref ( @AoA ) {
1N/A print "\t [ @$aref ],\n";
1N/A # print the whole thing with indices
1N/A for $i ( 0 .. $#AoA ) {
1N/A print "\t [ @{$AoA[$i]} ],\n";
1N/A # print the whole thing one at a time
1N/A for $i ( 0 .. $#AoA ) {
1N/A for $j ( 0 .. $#{ $AoA[$i] } ) {
1N/A print "elt $i $j is $AoA[$i][$j]\n";
1N/A=head1 HASHES OF ARRAYS
1N/A=head2 Declaration of a HASH OF ARRAYS
1N/A flintstones => [ "fred", "barney" ],
1N/A jetsons => [ "george", "jane", "elroy" ],
1N/A simpsons => [ "homer", "marge", "bart" ],
1N/A=head2 Generation of a HASH OF ARRAYS
1N/A # flintstones: fred barney wilma dino
1N/A next unless s/^(.*?):\s*//;
1N/A $HoA{$1} = [ split ];
1N/A # reading from file; more temps
1N/A # flintstones: fred barney wilma dino
1N/A while ( $line = <> ) {
1N/A ($who, $rest) = split /:\s*/, $line, 2;
1N/A @fields = split ' ', $rest;
1N/A $HoA{$who} = [ @fields ];
1N/A # calling a function that returns a list
1N/A for $group ( "simpsons", "jetsons", "flintstones" ) {
1N/A $HoA{$group} = [ get_family($group) ];
1N/A # likewise, but using temps
1N/A for $group ( "simpsons", "jetsons", "flintstones" ) {
1N/A @members = get_family($group);
1N/A $HoA{$group} = [ @members ];
1N/A # append new members to an existing family
1N/A push @{ $HoA{"flintstones"} }, "wilma", "betty";
1N/A=head2 Access and Printing of a HASH OF ARRAYS
1N/A $HoA{flintstones}[0] = "Fred";
1N/A $HoA{simpsons}[1] =~ s/(\w)/\u$1/;
1N/A # print the whole thing
1N/A foreach $family ( keys %HoA ) {
1N/A print "$family: @{ $HoA{$family} }\n"
1N/A # print the whole thing with indices
1N/A foreach $family ( keys %HoA ) {
1N/A foreach $i ( 0 .. $#{ $HoA{$family} } ) {
1N/A print " $i = $HoA{$family}[$i]";
1N/A # print the whole thing sorted by number of members
1N/A foreach $family ( sort { @{$HoA{$b}} <=> @{$HoA{$a}} } keys %HoA ) {
1N/A print "$family: @{ $HoA{$family} }\n"
1N/A # print the whole thing sorted by number of members and name
1N/A foreach $family ( sort {
1N/A @{$HoA{$b}} <=> @{$HoA{$a}}
1N/A print "$family: ", join(", ", sort @{ $HoA{$family} }), "\n";
1N/A=head1 ARRAYS OF HASHES
1N/A=head2 Declaration of an ARRAY OF HASHES
1N/A=head2 Generation of an ARRAY OF HASHES
1N/A # format: LEAD=fred FRIEND=barney
1N/A for $field ( split ) {
1N/A ($key, $value) = split /=/, $field;
1N/A $rec->{$key} = $value;
1N/A # format: LEAD=fred FRIEND=barney
1N/A push @AoH, { split /[\s+=]/ };
1N/A # calling a function that returns a
key/value pair list, like
1N/A # "lead","fred","daughter","pebbles"
1N/A while ( %fields = getnextpairset() ) {
1N/A push @AoH, { %fields };
1N/A # likewise, but using no temp vars
1N/A push @AoH, { parsepairs($_) };
1N/A $AoH[0]{pet} = "dino";
1N/A $AoH[2]{pet} = "santa's little helper";
1N/A=head2 Access and Printing of an ARRAY OF HASHES
1N/A $AoH[0]{lead} = "fred";
1N/A $AoH[1]{lead} =~ s/(\w)/\u$1/;
1N/A # print the whole thing with refs
1N/A for $href ( @AoH ) {
1N/A for $role ( keys %$href ) {
1N/A print "$role=$href->{$role} ";
1N/A # print the whole thing with indices
1N/A for $i ( 0 .. $#AoH ) {
1N/A for $role ( keys %{ $AoH[$i] } ) {
1N/A print "$role=$AoH[$i]{$role} ";
1N/A # print the whole thing one at a time
1N/A for $i ( 0 .. $#AoH ) {
1N/A for $role ( keys %{ $AoH[$i] } ) {
1N/A print "elt $i $role is $AoH[$i]{$role}\n";
1N/A=head1 HASHES OF HASHES
1N/A=head2 Declaration of a HASH OF HASHES
1N/A "his boy" => "elroy",
1N/A=head2 Generation of a HASH OF HASHES
1N/A # flintstones: lead=fred pal=barney wife=wilma pet=dino
1N/A next unless s/^(.*?):\s*//;
1N/A for $field ( split ) {
1N/A ($key, $value) = split /=/, $field;
1N/A $HoH{$who}{$key} = $value;
1N/A # reading from file; more temps
1N/A next unless s/^(.*?):\s*//;
1N/A for $field ( split ) {
1N/A ($key, $value) = split /=/, $field;
1N/A $rec->{$key} = $value;
1N/A # calling a function that returns a key,value hash
1N/A for $group ( "simpsons", "jetsons", "flintstones" ) {
1N/A $HoH{$group} = { get_family($group) };
1N/A # likewise, but using temps
1N/A for $group ( "simpsons", "jetsons", "flintstones" ) {
1N/A %members = get_family($group);
1N/A $HoH{$group} = { %members };
1N/A # append new members to an existing family
1N/A for $what (keys %new_folks) {
1N/A $HoH{flintstones}{$what} = $new_folks{$what};
1N/A=head2 Access and Printing of a HASH OF HASHES
1N/A $HoH{flintstones}{wife} = "wilma";
1N/A $HoH{simpsons}{lead} =~ s/(\w)/\u$1/;
1N/A # print the whole thing
1N/A foreach $family ( keys %HoH ) {
1N/A print "$family: { ";
1N/A for $role ( keys %{ $HoH{$family} } ) {
1N/A print "$role=$HoH{$family}{$role} ";
1N/A # print the whole thing somewhat sorted
1N/A foreach $family ( sort keys %HoH ) {
1N/A print "$family: { ";
1N/A for $role ( sort keys %{ $HoH{$family} } ) {
1N/A print "$role=$HoH{$family}{$role} ";
1N/A # print the whole thing sorted by number of members
1N/A foreach $family ( sort { keys %{$HoH{$b}} <=> keys %{$HoH{$a}} } keys %HoH ) {
1N/A print "$family: { ";
1N/A for $role ( sort keys %{ $HoH{$family} } ) {
1N/A print "$role=$HoH{$family}{$role} ";
1N/A # establish a sort order (rank) for each role
1N/A for ( qw(lead wife son daughter pal pet) ) { $rank{$_} = ++$i }
1N/A # now print the whole thing sorted by number of members
1N/A foreach $family ( sort { keys %{ $HoH{$b} } <=> keys %{ $HoH{$a} } } keys %HoH ) {
1N/A print "$family: { ";
1N/A # and print these according to rank order
1N/A for $role ( sort { $rank{$a} <=> $rank{$b} } keys %{ $HoH{$family} } ) {
1N/A print "$role=$HoH{$family}{$role} ";
1N/A=head1 MORE ELABORATE RECORDS
1N/A=head2 Declaration of MORE ELABORATE RECORDS
1N/AHere's a sample showing how to create and use a record whose fields are of
1N/Amany different sorts:
1N/A SEQUENCE => [ @old_values ],
1N/A LOOKUP => { %some_table },
1N/A THATCODE => \&some_function,
1N/A THISCODE => sub { $_[0] ** $_[1] },
1N/A print $rec->{SEQUENCE}[0];
1N/A $last = pop @ { $rec->{SEQUENCE} };
1N/A print $rec->{LOOKUP}{"key"};
1N/A ($first_k, $first_v) = each %{ $rec->{LOOKUP} };
1N/A $answer = $rec->{THATCODE}->($arg);
1N/A $answer = $rec->{THISCODE}->($arg1, $arg2);
1N/A # careful of extra block braces on fh ref
1N/A print { $rec->{HANDLE} } "a string\n";
1N/A $rec->{HANDLE}->autoflush(1);
1N/A $rec->{HANDLE}->print(" a string\n");
1N/A=head2 Declaration of a HASH OF COMPLEX RECORDS
1N/A series => "flintstones",
1N/A nights => [ qw(monday thursday friday) ],
1N/A { name => "fred", role => "lead", age => 36, },
1N/A { name => "wilma", role => "wife", age => 31, },
1N/A { name => "pebbles", role => "kid", age => 4, },
1N/A series => "jetsons",
1N/A nights => [ qw(wednesday saturday) ],
1N/A { name => "george", role => "lead", age => 41, },
1N/A { name => "jane", role => "wife", age => 39, },
1N/A { name => "elroy", role => "kid", age => 9, },
1N/A series => "simpsons",
1N/A nights => [ qw(monday) ],
1N/A { name => "homer", role => "lead", age => 34, },
1N/A { name => "marge", role => "wife", age => 37, },
1N/A { name => "bart", role => "kid", age => 11, },
1N/A=head2 Generation of a HASH OF COMPLEX RECORDS
1N/A # this is most easily done by having the file itself be
1N/A # in the raw data format as shown above. perl is happy
1N/A # to parse complex data structures if declared as data, so
1N/A # sometimes it's easiest to do that
1N/A # here's a piece by piece build up
1N/A $rec->{series} = "flintstones";
1N/A $rec->{nights} = [ find_days() ];
1N/A # assume this file in field=value syntax
1N/A %fields = split /[\s=]+/;
1N/A push @members, { %fields };
1N/A $rec->{members} = [ @members ];
1N/A # now remember the whole thing
1N/A $TV{ $rec->{series} } = $rec;
1N/A ###########################################################
1N/A # now, you might want to make interesting extra fields that
1N/A # include pointers back into the same data structure so if
1N/A # change one piece, it changes everywhere, like for example
1N/A # if you wanted a {kids} field that was a reference
1N/A # to an array of the kids' records without having duplicate
1N/A # records and thus update problems.
1N/A ###########################################################
1N/A foreach $family (keys %TV) {
1N/A $rec = $TV{$family}; # temp pointer
1N/A for $person ( @{ $rec->{members} } ) {
1N/A if ($person->{role} =~ /kid|son|daughter/) {
1N/A push @kids, $person;
1N/A # REMEMBER: $rec and $TV{$family} point to same data!!
1N/A $rec->{kids} = [ @kids ];
1N/A # you copied the array, but the array itself contains pointers
1N/A # to uncopied objects. this means that if you make bart get
1N/A $TV{simpsons}{kids}[0]{age}++;
1N/A # then this would also change in
1N/A print $TV{simpsons}{members}[2]{age};
1N/A # because $TV{simpsons}{kids}[0] and $TV{simpsons}{members}[2]
1N/A # both point to the same underlying anonymous hash table
1N/A # print the whole thing
1N/A foreach $family ( keys %TV ) {
1N/A print "the $family";
1N/A print " is on during @{ $TV{$family}{nights} }\n";
1N/A print "its members are:\n";
1N/A for $who ( @{ $TV{$family}{members} } ) {
1N/A print " $who->{name} ($who->{role}), age $who->{age}\n";
1N/A print "it turns out that $TV{$family}{lead} has ";
1N/A print scalar ( @{ $TV{$family}{kids} } ), " kids named ";
1N/A print join (", ", map { $_->{name} } @{ $TV{$family}{kids} } );
1N/AYou cannot easily tie a multilevel data structure (such as a hash of
1N/Ahashes) to a dbm file. The first problem is that all but GDBM and
1N/ABerkeley DB have size limitations, but beyond that, you also have problems
1N/Awith how references are to be represented on disk. One experimental
1N/Amodule that does partially attempt to address this need is the MLDBM
1N/Amodule. Check your nearest CPAN site as described in L<perlmodlib> for
1N/Asource code to MLDBM.
1N/Aperlref(1), perllol(1), perldata(1), perlobj(1)
1N/ATom Christiansen <F<tchrist@perl.com>>
1N/AWed Oct 23 04:57:50 MET DST 1996