distrib/pod/perlfaq4.pod

1N/A=head1 NAME
1N/A
1N/Aperlfaq4 - Data Manipulation ($Revision: 1.54 $, $Date: 2003/11/30 00:50:08 $)
1N/A
1N/A=head1 DESCRIPTION
1N/A
1N/AThis section of the FAQ answers questions related to manipulating
1N/Anumbers, dates, strings, arrays, hashes, and miscellaneous data issues.
1N/A
1N/A=head1 Data: Numbers
1N/A
1N/A=head2 Why am I getting long decimals (eg, 19.9499999999999) instead of the numbers I should be getting (eg, 19.95)?
1N/A
1N/AInternally, your computer represents floating-point numbers
1N/Ain binary. Digital (as in powers of two) computers cannot
1N/Astore all numbers exactly.  Some real numbers lose precision
1N/Ain the process.  This is a problem with how computers store
1N/Anumbers and affects all computer languages, not just Perl.
1N/A
1N/AL<perlnumber> show the gory details of number
1N/Arepresentations and conversions.
1N/A
1N/ATo limit the number of decimal places in your numbers, you
1N/Acan use the printf or sprintf function.  See the
1N/AL<"Floating Point Arithmetic"|perlop> for more details.
1N/A
1N/A    printf "%.2f", 10/3;
1N/A
1N/A    my $number = sprintf "%.2f", 10/3;
1N/A
1N/A=head2 Why is int() broken?
1N/A
1N/AYour int() is most probably working just fine.  It's the numbers that
1N/Aaren't quite what you think.
1N/A
1N/AFirst, see the above item "Why am I getting long decimals
1N/A(eg, 19.9499999999999) instead of the numbers I should be getting
1N/A(eg, 19.95)?".
1N/A
1N/AFor example, this
1N/A
1N/A    print int(0.6/0.2-2), "\n";
1N/A
1N/Awill in most computers print 0, not 1, because even such simple
1N/Anumbers as 0.6 and 0.2 cannot be presented exactly by floating-point
1N/Anumbers.  What you think in the above as 'three' is really more like
1N/A2.9999999999999995559.
1N/A
1N/A=head2 Why isn't my octal data interpreted correctly?
1N/A
1N/APerl only understands octal and hex numbers as such when they occur as
1N/Aliterals in your program.  Octal literals in perl must start with a
1N/Aleading "0" and hexadecimal literals must start with a leading "0x".
1N/AIf they are read in from somewhere and assigned, no automatic
1N/Aconversion takes place.  You must explicitly use oct() or hex() if you
1N/Awant the values converted to decimal.  oct() interprets hex ("0x350"),
1N/Aoctal ("0350" or even without the leading "0", like "377") and binary
1N/A("0b1010") numbers, while hex() only converts hexadecimal ones, with
1N/Aor without a leading "0x", like "0x255", "3A", "ff", or "deadbeef".
1N/AThe inverse mapping from decimal to octal can be done with either the
1N/A"%o" or "%O" sprintf() formats.
1N/A
1N/AThis problem shows up most often when people try using chmod(), mkdir(),
1N/Aumask(), or sysopen(), which by widespread tradition typically take
1N/Apermissions in octal.
1N/A
1N/A    chmod(644,  $file); # WRONG
1N/A    chmod(0644, $file); # right
1N/A
1N/ANote the mistake in the first line was specifying the decimal literal
1N/A644, rather than the intended octal literal 0644.  The problem can
1N/Abe seen with:
1N/A
1N/A    printf("%#o",644); # prints 01204
1N/A
1N/ASurely you had not intended C<chmod(01204, $file);> - did you?  If you
1N/Awant to use numeric literals as arguments to chmod() et al. then please
1N/Atry to express them as octal constants, that is with a leading zero and
1N/Awith the following digits restricted to the set 0..7.
1N/A
1N/A=head2 Does Perl have a round() function?  What about ceil() and floor()?  Trig functions?
1N/A
1N/ARemember that int() merely truncates toward 0.  For rounding to a
1N/Acertain number of digits, sprintf() or printf() is usually the easiest
1N/Aroute.
1N/A
1N/A    printf("%.3f", 3.1415926535);   # prints 3.142
1N/A
1N/AThe POSIX module (part of the standard Perl distribution) implements
1N/Aceil(), floor(), and a number of other mathematical and trigonometric
1N/Afunctions.
1N/A
1N/A    use POSIX;
1N/A    $ceil   = ceil(3.5);            # 4
1N/A    $floor  = floor(3.5);           # 3
1N/A
1N/AIn 5.000 to 5.003 perls, trigonometry was done in the Math::Complex
1N/Amodule.  With 5.004, the Math::Trig module (part of the standard Perl
1N/Adistribution) implements the trigonometric functions. Internally it
1N/Auses the Math::Complex module and some functions can break out from
1N/Athe real axis into the complex plane, for example the inverse sine of
1N/A2.
1N/A
1N/ARounding in financial applications can have serious implications, and
1N/Athe rounding method used should be specified precisely.  In these
1N/Acases, it probably pays not to trust whichever system rounding is
1N/Abeing used by Perl, but to instead implement the rounding function you
1N/Aneed yourself.
1N/A
1N/ATo see why, notice how you'll still have an issue on half-way-point
1N/Aalternation:
1N/A
1N/A    for ($i = 0; $i < 1.01; $i += 0.05) { printf "%.1f ",$i}
1N/A
1N/A    0.0 0.1 0.1 0.2 0.2 0.2 0.3 0.3 0.4 0.4 0.5 0.5 0.6 0.7 0.7
1N/A    0.8 0.8 0.9 0.9 1.0 1.0
1N/A
1N/ADon't blame Perl.  It's the same as in C.  IEEE says we have to do this.
1N/APerl numbers whose absolute values are integers under 2**31 (on 32 bit
1N/Amachines) will work pretty much like mathematical integers.  Other numbers
1N/Aare not guaranteed.
1N/A
1N/A=head2 How do I convert between numeric representations/bases/radixes?
1N/A
1N/AAs always with Perl there is more than one way to do it.  Below
1N/Aare a few examples of approaches to making common conversions
1N/Abetween number representations.  This is intended to be representational
1N/Arather than exhaustive.
1N/A
1N/ASome of the examples below use the Bit::Vector module from CPAN.
1N/AThe reason you might choose Bit::Vector over the perl built in
1N/Afunctions is that it works with numbers of ANY size, that it is
1N/Aoptimized for speed on some operations, and for at least some
1N/Aprogrammers the notation might be familiar.
1N/A
1N/A=over 4
1N/A
1N/A=item How do I convert hexadecimal into decimal
1N/A
1N/AUsing perl's built in conversion of 0x notation:
1N/A
1N/A    $dec = 0xDEADBEEF;
1N/A
1N/AUsing the hex function:
1N/A
1N/A    $dec = hex("DEADBEEF");
1N/A
1N/AUsing pack:
1N/A
1N/A    $dec = unpack("N", pack("H8", substr("0" x 8 . "DEADBEEF", -8)));
1N/A
1N/AUsing the CPAN module Bit::Vector:
1N/A
1N/A    use Bit::Vector;
1N/A    $vec = Bit::Vector->new_Hex(32, "DEADBEEF");
1N/A    $dec = $vec->to_Dec();
1N/A
1N/A=item How do I convert from decimal to hexadecimal
1N/A
1N/AUsing sprintf:
1N/A
1N/A    $hex = sprintf("%X", 3735928559); # upper case A-F
1N/A    $hex = sprintf("%x", 3735928559); # lower case a-f
1N/A
1N/AUsing unpack:
1N/A
1N/A    $hex = unpack("H*", pack("N", 3735928559));
1N/A
1N/AUsing Bit::Vector:
1N/A
1N/A    use Bit::Vector;
1N/A    $vec = Bit::Vector->new_Dec(32, -559038737);
1N/A    $hex = $vec->to_Hex();
1N/A
1N/AAnd Bit::Vector supports odd bit counts:
1N/A
1N/A    use Bit::Vector;
1N/A    $vec = Bit::Vector->new_Dec(33, 3735928559);
1N/A    $vec->Resize(32); # suppress leading 0 if unwanted
1N/A    $hex = $vec->to_Hex();
1N/A
1N/A=item How do I convert from octal to decimal
1N/A
1N/AUsing Perl's built in conversion of numbers with leading zeros:
1N/A
1N/A    $dec = 033653337357; # note the leading 0!
1N/A
1N/AUsing the oct function:
1N/A
1N/A    $dec = oct("33653337357");
1N/A
1N/AUsing Bit::Vector:
1N/A
1N/A    use Bit::Vector;
1N/A    $vec = Bit::Vector->new(32);
1N/A    $vec->Chunk_List_Store(3, split(//, reverse "33653337357"));
1N/A    $dec = $vec->to_Dec();
1N/A
1N/A=item How do I convert from decimal to octal
1N/A
1N/AUsing sprintf:
1N/A
1N/A    $oct = sprintf("%o", 3735928559);
1N/A
1N/AUsing Bit::Vector:
1N/A
1N/A    use Bit::Vector;
1N/A    $vec = Bit::Vector->new_Dec(32, -559038737);
1N/A    $oct = reverse join('', $vec->Chunk_List_Read(3));
1N/A
1N/A=item How do I convert from binary to decimal
1N/A
1N/APerl 5.6 lets you write binary numbers directly with
1N/Athe 0b notation:
1N/A
1N/A    $number = 0b10110110;
1N/A
1N/AUsing oct:
1N/A
1N/A    my $input = "10110110";
1N/A    $decimal = oct( "0b$input" );
1N/A
1N/AUsing pack and ord:
1N/A
1N/A    $decimal = ord(pack('B8', '10110110'));
1N/A
1N/AUsing pack and unpack for larger strings:
1N/A
1N/A    $int = unpack("N", pack("B32",
1N/A    substr("0" x 32 . "11110101011011011111011101111", -32)));
1N/A    $dec = sprintf("%d", $int);
1N/A
1N/A    # substr() is used to left pad a 32 character string with zeros.
1N/A
1N/AUsing Bit::Vector:
1N/A
1N/A    $vec = Bit::Vector->new_Bin(32, "11011110101011011011111011101111");
1N/A    $dec = $vec->to_Dec();
1N/A
1N/A=item How do I convert from decimal to binary
1N/A
1N/AUsing sprintf (perl 5.6+):
1N/A
1N/A    $bin = sprintf("%b", 3735928559);
1N/A
1N/AUsing unpack:
1N/A
1N/A    $bin = unpack("B*", pack("N", 3735928559));
1N/A
1N/AUsing Bit::Vector:
1N/A
1N/A    use Bit::Vector;
1N/A    $vec = Bit::Vector->new_Dec(32, -559038737);
1N/A    $bin = $vec->to_Bin();
1N/A
1N/AThe remaining transformations (e.g. hex -> oct, bin -> hex, etc.)
1N/Aare left as an exercise to the inclined reader.
1N/A
1N/A=back
1N/A
1N/A=head2 Why doesn't & work the way I want it to?
1N/A
1N/AThe behavior of binary arithmetic operators depends on whether they're
1N/Aused on numbers or strings.  The operators treat a string as a series
1N/Aof bits and work with that (the string C<"3"> is the bit pattern
1N/AC<00110011>).  The operators work with the binary form of a number
1N/A(the number C<3> is treated as the bit pattern C<00000011>).
1N/A
1N/ASo, saying C<11 & 3> performs the "and" operation on numbers (yielding
1N/AC<3>).  Saying C<"11" & "3"> performs the "and" operation on strings
1N/A(yielding C<"1">).
1N/A
1N/AMost problems with C<&> and C<|> arise because the programmer thinks
1N/Athey have a number but really it's a string.  The rest arise because
1N/Athe programmer says:
1N/A
1N/A    if ("\020\020" & "\101\101") {
1N/A    # ...
1N/A    }
1N/A
1N/Abut a string consisting of two null bytes (the result of C<"\020\020"
1N/A& "\101\101">) is not a false value in Perl.  You need:
1N/A
1N/A    if ( ("\020\020" & "\101\101") !~ /[^\000]/) {
1N/A    # ...
1N/A    }
1N/A
1N/A=head2 How do I multiply matrices?
1N/A
1N/AUse the Math::Matrix or Math::MatrixReal modules (available from CPAN)
1N/Aor the PDL extension (also available from CPAN).
1N/A
1N/A=head2 How do I perform an operation on a series of integers?
1N/A
1N/ATo call a function on each element in an array, and collect the
1N/Aresults, use:
1N/A
1N/A    @results = map { my_func($_) } @array;
1N/A
1N/AFor example:
1N/A
1N/A    @triple = map { 3 * $_ } @single;
1N/A
1N/ATo call a function on each element of an array, but ignore the
1N/Aresults:
1N/A
1N/A    foreach $iterator (@array) {
1N/A        some_func($iterator);
1N/A    }
1N/A
1N/ATo call a function on each integer in a (small) range, you B<can> use:
1N/A
1N/A    @results = map { some_func($_) } (5 .. 25);
1N/A
1N/Abut you should be aware that the C<..> operator creates an array of
1N/Aall integers in the range.  This can take a lot of memory for large
1N/Aranges.  Instead use:
1N/A
1N/A    @results = ();
1N/A    for ($i=5; $i < 500_005; $i++) {
1N/A        push(@results, some_func($i));
1N/A    }
1N/A
1N/AThis situation has been fixed in Perl5.005. Use of C<..> in a C<for>
1N/Aloop will iterate over the range, without creating the entire range.
1N/A
1N/A    for my $i (5 .. 500_005) {
1N/A        push(@results, some_func($i));
1N/A    }
1N/A
1N/Awill not create a list of 500,000 integers.
1N/A
1N/A=head2 How can I output Roman numerals?
1N/A
1N/AGet the http://www.cpan.org/modules/by-module/Roman module.
1N/A
1N/A=head2 Why aren't my random numbers random?
1N/A
1N/AIf you're using a version of Perl before 5.004, you must call C<srand>
1N/Aonce at the start of your program to seed the random number generator.
1N/A
1N/A     BEGIN { srand() if $] < 5.004 }
1N/A
1N/A5.004 and later automatically call C<srand> at the beginning.  Don't
1N/Acall C<srand> more than once---you make your numbers less random, rather
1N/Athan more.
1N/A
1N/AComputers are good at being predictable and bad at being random
1N/A(despite appearances caused by bugs in your programs :-).  see the
1N/AF<random> article in the "Far More Than You Ever Wanted To Know"
1N/Acollection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz , courtesy of
1N/ATom Phoenix, talks more about this.  John von Neumann said, ``Anyone
1N/Awho attempts to generate random numbers by deterministic means is, of
1N/Acourse, living in a state of sin.''
1N/A
1N/AIf you want numbers that are more random than C<rand> with C<srand>
1N/Aprovides, you should also check out the Math::TrulyRandom module from
1N/ACPAN.  It uses the imperfections in your system's timer to generate
1N/Arandom numbers, but this takes quite a while.  If you want a better
1N/Apseudorandom generator than comes with your operating system, look at
1N/A``Numerical Recipes in C'' at http://www.nr.com/ .
1N/A
1N/A=head2 How do I get a random number between X and Y?
1N/A
1N/AC<rand($x)> returns a number such that
1N/AC<< 0 <= rand($x) < $x >>. Thus what you want to have perl
1N/Afigure out is a random number in the range from 0 to the
1N/Adifference between your I<X> and I<Y>.
1N/A
1N/AThat is, to get a number between 10 and 15, inclusive, you
1N/Awant a random number between 0 and 5 that you can then add
1N/Ato 10.
1N/A
1N/A    my $number = 10 + int rand( 15-10+1 );
1N/A
1N/AHence you derive the following simple function to abstract
1N/Athat. It selects a random integer between the two given
1N/Aintegers (inclusive), For example: C<random_int_in(50,120)>.
1N/A
1N/A   sub random_int_in ($$) {
1N/A     my($min, $max) = @_;
1N/A      # Assumes that the two arguments are integers themselves!
1N/A     return $min if $min == $max;
1N/A     ($min, $max) = ($max, $min)  if  $min > $max;
1N/A     return $min + int rand(1 + $max - $min);
1N/A   }
1N/A
1N/A=head1 Data: Dates
1N/A
1N/A=head2 How do I find the day or week of the year?
1N/A
1N/AThe localtime function returns the day of the week.  Without an
1N/Aargument localtime uses the current time.
1N/A
1N/A    $day_of_year = (localtime)[7];
1N/A
1N/AThe POSIX module can also format a date as the day of the year or
1N/Aweek of the year.
1N/A
1N/A    use POSIX qw/strftime/;
1N/A    my $day_of_year  = strftime "%j", localtime;
1N/A    my $week_of_year = strftime "%W", localtime;
1N/A
1N/ATo get the day of year for any date, use the Time::Local module to get
1N/Aa time in epoch seconds for the argument to localtime.
1N/A
1N/A    use POSIX qw/strftime/;
1N/A    use Time::Local;
1N/A    my $week_of_year = strftime "%W",
1N/A        localtime( timelocal( 0, 0, 0, 18, 11, 1987 ) );
1N/A
1N/AThe Date::Calc module provides two functions for to calculate these.
1N/A
1N/A    use Date::Calc;
1N/A    my $day_of_year  = Day_of_Year(  1987, 12, 18 );
1N/A    my $week_of_year = Week_of_Year( 1987, 12, 18 );
1N/A
1N/A=head2 How do I find the current century or millennium?
1N/A
1N/AUse the following simple functions:
1N/A
1N/A    sub get_century    {
1N/A    return int((((localtime(shift || time))[5] + 1999))/100);
1N/A    }
1N/A    sub get_millennium {
1N/A    return 1+int((((localtime(shift || time))[5] + 1899))/1000);
1N/A    }
1N/A
1N/AOn some systems, the POSIX module's strftime() function has
1N/Abeen extended in a non-standard way to use a C<%C> format,
1N/Awhich they sometimes claim is the "century".  It isn't,
1N/Abecause on most such systems, this is only the first two
1N/Adigits of the four-digit year, and thus cannot be used to
1N/Areliably determine the current century or millennium.
1N/A
1N/A=head2 How can I compare two dates and find the difference?
1N/A
1N/AIf you're storing your dates as epoch seconds then simply subtract one
1N/Afrom the other.  If you've got a structured date (distinct year, day,
1N/Amonth, hour, minute, seconds values), then for reasons of accessibility,
1N/Asimplicity, and efficiency, merely use either timelocal or timegm (from
1N/Athe Time::Local module in the standard distribution) to reduce structured
1N/Adates to epoch seconds.  However, if you don't know the precise format of
1N/Ayour dates, then you should probably use either of the Date::Manip and
1N/ADate::Calc modules from CPAN before you go hacking up your own parsing
1N/Aroutine to handle arbitrary date formats.
1N/A
1N/A=head2 How can I take a string and turn it into epoch seconds?
1N/A
1N/AIf it's a regular enough string that it always has the same format,
1N/Ayou can split it up and pass the parts to C<timelocal> in the standard
1N/ATime::Local module.  Otherwise, you should look into the Date::Calc
1N/Aand Date::Manip modules from CPAN.
1N/A
1N/A=head2 How can I find the Julian Day?
1N/A
1N/AUse the Time::JulianDay module (part of the Time-modules bundle
1N/Aavailable from CPAN.)
1N/A
1N/ABefore you immerse yourself too deeply in this, be sure to verify that
1N/Ait is the I<Julian> Day you really want.  Are you interested in a way
1N/Aof getting serial days so that you just can tell how many days they
1N/Aare apart or so that you can do also other date arithmetic?  If you
1N/Aare interested in performing date arithmetic, this can be done using
1N/Amodules Date::Manip or Date::Calc.
1N/A
1N/AThere is too many details and much confusion on this issue to cover in
1N/Athis FAQ, but the term is applied (correctly) to a calendar now
1N/Asupplanted by the Gregorian Calendar, with the Julian Calendar failing
1N/Ato adjust properly for leap years on centennial years (among other
1N/Aannoyances).  The term is also used (incorrectly) to mean: [1] days in
1N/Athe Gregorian Calendar; and [2] days since a particular starting time
1N/Aor `epoch', usually 1970 in the Unix world and 1980 in the
1N/AMS-DOS/Windows world.  If you find that it is not the first meaning
1N/Athat you really want, then check out the Date::Manip and Date::Calc
1N/Amodules.  (Thanks to David Cassell for most of this text.)
1N/A
1N/A=head2 How do I find yesterday's date?
1N/A
1N/AIf you only need to find the date (and not the same time), you
1N/Acan use the Date::Calc module.
1N/A
1N/A    use Date::Calc qw(Today Add_Delta_Days);
1N/A
1N/A    my @date = Add_Delta_Days( Today(), -1 );
1N/A
1N/A    print "@date\n";
1N/A
1N/AMost people try to use the time rather than the calendar to
1N/Afigure out dates, but that assumes that your days are
1N/Atwenty-four hours each.  For most people, there are two days
1N/Aa year when they aren't: the switch to and from summer time
1N/Athrows this off. Russ Allbery offers this solution.
1N/A
1N/A    sub yesterday {
1N/A        my $now  = defined $_[0] ? $_[0] : time;
1N/A        my $then = $now - 60 * 60 * 24;
1N/A        my $ndst = (localtime $now)[8] > 0;
1N/A        my $tdst = (localtime $then)[8] > 0;
1N/A        $then - ($tdst - $ndst) * 60 * 60;
1N/A        }
1N/A
1N/AShould give you "this time yesterday" in seconds since epoch relative to
1N/Athe first argument or the current time if no argument is given and
1N/Asuitable for passing to localtime or whatever else you need to do with
1N/Ait.  $ndst is whether we're currently in daylight savings time; $tdst is
1N/Awhether the point 24 hours ago was in daylight savings time.  If $tdst
1N/Aand $ndst are the same, a boundary wasn't crossed, and the correction
1N/Awill subtract 0.  If $tdst is 1 and $ndst is 0, subtract an hour more
1N/Afrom yesterday's time since we gained an extra hour while going off
1N/Adaylight savings time.  If $tdst is 0 and $ndst is 1, subtract a
1N/Anegative hour (add an hour) to yesterday's time since we lost an hour.
1N/A
1N/AAll of this is because during those days when one switches off or onto
1N/ADST, a "day" isn't 24 hours long; it's either 23 or 25.
1N/A
1N/AThe explicit settings of $ndst and $tdst are necessary because localtime
1N/Aonly says it returns the system tm struct, and the system tm struct at
1N/Aleast on Solaris doesn't guarantee any particular positive value (like,
1N/Asay, 1) for isdst, just a positive value.  And that value can
1N/Apotentially be negative, if DST information isn't available (this sub
1N/Ajust treats those cases like no DST).
1N/A
1N/ANote that between 2am and 3am on the day after the time zone switches
1N/Aoff daylight savings time, the exact hour of "yesterday" corresponding
1N/Ato the current hour is not clearly defined.  Note also that if used
1N/Abetween 2am and 3am the day after the change to daylight savings time,
1N/Athe result will be between 3am and 4am of the previous day; it's
1N/Aarguable whether this is correct.
1N/A
1N/AThis sub does not attempt to deal with leap seconds (most things don't).
1N/A
1N/A
1N/A
1N/A=head2 Does Perl have a Year 2000 problem?  Is Perl Y2K compliant?
1N/A
1N/AShort answer: No, Perl does not have a Year 2000 problem.  Yes, Perl is
1N/AY2K compliant (whatever that means).  The programmers you've hired to
1N/Ause it, however, probably are not.
1N/A
1N/ALong answer: The question belies a true understanding of the issue.
1N/APerl is just as Y2K compliant as your pencil--no more, and no less.
1N/ACan you use your pencil to write a non-Y2K-compliant memo?  Of course
1N/Ayou can.  Is that the pencil's fault?  Of course it isn't.
1N/A
1N/AThe date and time functions supplied with Perl (gmtime and localtime)
1N/Asupply adequate information to determine the year well beyond 2000
1N/A(2038 is when trouble strikes for 32-bit machines).  The year returned
1N/Aby these functions when used in a list context is the year minus 1900.
1N/AFor years between 1910 and 1999 this I<happens> to be a 2-digit decimal
1N/Anumber. To avoid the year 2000 problem simply do not treat the year as
1N/Aa 2-digit number.  It isn't.
1N/A
1N/AWhen gmtime() and localtime() are used in scalar context they return
1N/Aa timestamp string that contains a fully-expanded year.  For example,
1N/AC<$timestamp = gmtime(1005613200)> sets $timestamp to "Tue Nov 13 01:00:00
1N/A2001".  There's no year 2000 problem here.
1N/A
1N/AThat doesn't mean that Perl can't be used to create non-Y2K compliant
1N/Aprograms.  It can.  But so can your pencil.  It's the fault of the user,
1N/Anot the language.  At the risk of inflaming the NRA: ``Perl doesn't
1N/Abreak Y2K, people do.''  See http://language.perl.com/news/y2k.html for
1N/Aa longer exposition.
1N/A
1N/A=head1 Data: Strings
1N/A
1N/A=head2 How do I validate input?
1N/A
1N/AThe answer to this question is usually a regular expression, perhaps
1N/Awith auxiliary logic.  See the more specific questions (numbers, mail
1N/Aaddresses, etc.) for details.
1N/A
1N/A=head2 How do I unescape a string?
1N/A
1N/AIt depends just what you mean by ``escape''.  URL escapes are dealt
1N/Awith in L<perlfaq9>.  Shell escapes with the backslash (C<\>)
1N/Acharacter are removed with
1N/A
1N/A    s/\\(.)/$1/g;
1N/A
1N/AThis won't expand C<"\n"> or C<"\t"> or any other special escapes.
1N/A
1N/A=head2 How do I remove consecutive pairs of characters?
1N/A
1N/ATo turn C<"abbcccd"> into C<"abccd">:
1N/A
1N/A    s/(.)\1/$1/g;   # add /s to include newlines
1N/A
1N/AHere's a solution that turns "abbcccd" to "abcd":
1N/A
1N/A    y///cs; # y == tr, but shorter :-)
1N/A
1N/A=head2 How do I expand function calls in a string?
1N/A
1N/AThis is documented in L<perlref>.  In general, this is fraught with
1N/Aquoting and readability problems, but it is possible.  To interpolate
1N/Aa subroutine call (in list context) into a string:
1N/A
1N/A    print "My sub returned @{[mysub(1,2,3)]} that time.\n";
1N/A
1N/ASee also ``How can I expand variables in text strings?'' in this
1N/Asection of the FAQ.
1N/A
1N/A=head2 How do I find matching/nesting anything?
1N/A
1N/AThis isn't something that can be done in one regular expression, no
1N/Amatter how complicated.  To find something between two single
1N/Acharacters, a pattern like C</x([^x]*)x/> will get the intervening
1N/Abits in $1. For multiple ones, then something more like
1N/AC</alpha(.*?)omega/> would be needed.  But none of these deals with
1N/Anested patterns.  For balanced expressions using C<(>, C<{>, C<[>
1N/Aor C<< < >> as delimiters, use the CPAN module Regexp::Common, or see
1N/AL<perlre/(??{ code })>.  For other cases, you'll have to write a parser.
1N/A
1N/AIf you are serious about writing a parser, there are a number of
1N/Amodules or oddities that will make your life a lot easier.  There are
1N/Athe CPAN modules Parse::RecDescent, Parse::Yapp, and Text::Balanced;
1N/Aand the byacc program.   Starting from perl 5.8 the Text::Balanced
1N/Ais part of the standard distribution.
1N/A
1N/AOne simple destructive, inside-out approach that you might try is to
1N/Apull out the smallest nesting parts one at a time:
1N/A
1N/A    while (s/BEGIN((?:(?!BEGIN)(?!END).)*)END//gs) {
1N/A    # do something with $1
1N/A    }
1N/A
1N/AA more complicated and sneaky approach is to make Perl's regular
1N/Aexpression engine do it for you.  This is courtesy Dean Inada, and
1N/Arather has the nature of an Obfuscated Perl Contest entry, but it
1N/Areally does work:
1N/A
1N/A    # $_ contains the string to parse
1N/A    # BEGIN and END are the opening and closing markers for the
1N/A    # nested text.
1N/A
1N/A    @( = ('(','');
1N/A    @) = (')','');
1N/A    ($re=$_)=~s/((BEGIN)|(END)|.)/$)[!$3]\Q$1\E$([!$2]/gs;
1N/A    @$ = (eval{/$re/},$@!~/unmatched/i);
1N/A    print join("\n",@$[0..$#$]) if( $$[-1] );
1N/A
1N/A=head2 How do I reverse a string?
1N/A
1N/AUse reverse() in scalar context, as documented in
1N/AL<perlfunc/reverse>.
1N/A
1N/A    $reversed = reverse $string;
1N/A
1N/A=head2 How do I expand tabs in a string?
1N/A
1N/AYou can do it yourself:
1N/A
1N/A    1 while $string =~ s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e;
1N/A
1N/AOr you can just use the Text::Tabs module (part of the standard Perl
1N/Adistribution).
1N/A
1N/A    use Text::Tabs;
1N/A    @expanded_lines = expand(@lines_with_tabs);
1N/A
1N/A=head2 How do I reformat a paragraph?
1N/A
1N/AUse Text::Wrap (part of the standard Perl distribution):
1N/A
1N/A    use Text::Wrap;
1N/A    print wrap("\t", '  ', @paragraphs);
1N/A
1N/AThe paragraphs you give to Text::Wrap should not contain embedded
1N/Anewlines.  Text::Wrap doesn't justify the lines (flush-right).
1N/A
1N/AOr use the CPAN module Text::Autoformat.  Formatting files can be easily
1N/Adone by making a shell alias, like so:
1N/A
1N/A    alias fmt="perl -i -MText::Autoformat -n0777 \
1N/A        -e 'print autoformat $_, {all=>1}' $*"
1N/A
1N/ASee the documentation for Text::Autoformat to appreciate its many
1N/Acapabilities.
1N/A
1N/A=head2 How can I access or change N characters of a string?
1N/A
1N/AYou can access the first characters of a string with substr().
1N/ATo get the first character, for example, start at position 0
1N/Aand grab the string of length 1.
1N/A
1N/A
1N/A    $string = "Just another Perl Hacker";
1N/A    $first_char = substr( $string, 0, 1 );  #  'J'
1N/A
1N/ATo change part of a string, you can use the optional fourth
1N/Aargument which is the replacement string.
1N/A
1N/A    substr( $string, 13, 4, "Perl 5.8.0" );
1N/A
1N/AYou can also use substr() as an lvalue.
1N/A
1N/A    substr( $string, 13, 4 ) =  "Perl 5.8.0";
1N/A
1N/A=head2 How do I change the Nth occurrence of something?
1N/A
1N/AYou have to keep track of N yourself.  For example, let's say you want
1N/Ato change the fifth occurrence of C<"whoever"> or C<"whomever"> into
1N/AC<"whosoever"> or C<"whomsoever">, case insensitively.  These
1N/Aall assume that $_ contains the string to be altered.
1N/A
1N/A    $count = 0;
1N/A    s{((whom?)ever)}{
1N/A    ++$count == 5       # is it the 5th?
1N/A        ? "${2}soever"  # yes, swap
1N/A        : $1        # renege and leave it there
1N/A    }ige;
1N/A
1N/AIn the more general case, you can use the C</g> modifier in a C<while>
1N/Aloop, keeping count of matches.
1N/A
1N/A    $WANT = 3;
1N/A    $count = 0;
1N/A    $_ = "One fish two fish red fish blue fish";
1N/A    while (/(\w+)\s+fish\b/gi) {
1N/A        if (++$count == $WANT) {
1N/A            print "The third fish is a $1 one.\n";
1N/A        }
1N/A    }
1N/A
1N/AThat prints out: C<"The third fish is a red one.">  You can also use a
1N/Arepetition count and repeated pattern like this:
1N/A
1N/A    /(?:\w+\s+fish\s+){2}(\w+)\s+fish/i;
1N/A
1N/A=head2 How can I count the number of occurrences of a substring within a string?
1N/A
1N/AThere are a number of ways, with varying efficiency.  If you want a
1N/Acount of a certain single character (X) within a string, you can use the
1N/AC<tr///> function like so:
1N/A
1N/A    $string = "ThisXlineXhasXsomeXx'sXinXit";
1N/A    $count = ($string =~ tr/X//);
1N/A    print "There are $count X characters in the string";
1N/A
1N/AThis is fine if you are just looking for a single character.  However,
1N/Aif you are trying to count multiple character substrings within a
1N/Alarger string, C<tr///> won't work.  What you can do is wrap a while()
1N/Aloop around a global pattern match.  For example, let's count negative
1N/Aintegers:
1N/A
1N/A    $string = "-9 55 48 -2 23 -76 4 14 -44";
1N/A    while ($string =~ /-\d+/g) { $count++ }
1N/A    print "There are $count negative numbers in the string";
1N/A
1N/AAnother version uses a global match in list context, then assigns the
1N/Aresult to a scalar, producing a count of the number of matches.
1N/A
1N/A    $count = () = $string =~ /-\d+/g;
1N/A
1N/A=head2 How do I capitalize all the words on one line?
1N/A
1N/ATo make the first letter of each word upper case:
1N/A
1N/A        $line =~ s/\b(\w)/\U$1/g;
1N/A
1N/AThis has the strange effect of turning "C<don't do it>" into "C<Don'T
1N/ADo It>".  Sometimes you might want this.  Other times you might need a
1N/Amore thorough solution (Suggested by brian d foy):
1N/A
1N/A    $string =~ s/ (
1N/A                 (^\w)    #at the beginning of the line
1N/A                   |      # or
1N/A                 (\s\w)   #preceded by whitespace
1N/A                   )
1N/A                /\U$1/xg;
1N/A    $string =~ /([\w']+)/\u\L$1/g;
1N/A
1N/ATo make the whole line upper case:
1N/A
1N/A        $line = uc($line);
1N/A
1N/ATo force each word to be lower case, with the first letter upper case:
1N/A
1N/A        $line =~ s/(\w+)/\u\L$1/g;
1N/A
1N/AYou can (and probably should) enable locale awareness of those
1N/Acharacters by placing a C<use locale> pragma in your program.
1N/ASee L<perllocale> for endless details on locales.
1N/A
1N/AThis is sometimes referred to as putting something into "title
1N/Acase", but that's not quite accurate.  Consider the proper
1N/Acapitalization of the movie I<Dr. Strangelove or: How I Learned to
1N/AStop Worrying and Love the Bomb>, for example.
1N/A
1N/ADamian Conway's L<Text::Autoformat> module provides some smart
1N/Acase transformations:
1N/A
1N/A    use Text::Autoformat;
1N/A    my $x = "Dr. Strangelove or: How I Learned to Stop ".
1N/A      "Worrying and Love the Bomb";
1N/A
1N/A    print $x, "\n";
1N/A    for my $style (qw( sentence title highlight ))
1N/A    {
1N/A        print autoformat($x, { case => $style }), "\n";
1N/A    }
1N/A
1N/A=head2 How can I split a [character] delimited string except when inside [character]?
1N/A
1N/ASeveral modules can handle this sort of pasing---Text::Balanced,
1N/AText::CVS, Text::CVS_XS, and Text::ParseWords, among others.
1N/A
1N/ATake the example case of trying to split a string that is
1N/Acomma-separated into its different fields. You can't use C<split(/,/)>
1N/Abecause you shouldn't split if the comma is inside quotes.  For
1N/Aexample, take a data line like this:
1N/A
1N/A    SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped"
1N/A
1N/ADue to the restriction of the quotes, this is a fairly complex
1N/Aproblem.  Thankfully, we have Jeffrey Friedl, author of
1N/AI<Mastering Regular Expressions>, to handle these for us.  He
1N/Asuggests (assuming your string is contained in $text):
1N/A
1N/A     @new = ();
1N/A     push(@new, $+) while $text =~ m{
1N/A         "([^\"\\]*(?:\\.[^\"\\]*)*)",?  # groups the phrase inside the quotes
1N/A       | ([^,]+),?
1N/A       | ,
1N/A     }gx;
1N/A     push(@new, undef) if substr($text,-1,1) eq ',';
1N/A
1N/AIf you want to represent quotation marks inside a
1N/Aquotation-mark-delimited field, escape them with backslashes (eg,
1N/AC<"like \"this\"">.
1N/A
1N/AAlternatively, the Text::ParseWords module (part of the standard Perl
1N/Adistribution) lets you say:
1N/A
1N/A    use Text::ParseWords;
1N/A    @new = quotewords(",", 0, $text);
1N/A
1N/AThere's also a Text::CSV (Comma-Separated Values) module on CPAN.
1N/A
1N/A=head2 How do I strip blank space from the beginning/end of a string?
1N/A
1N/AAlthough the simplest approach would seem to be
1N/A
1N/A    $string =~ s/^\s*(.*?)\s*$/$1/;
1N/A
1N/Anot only is this unnecessarily slow and destructive, it also fails with
1N/Aembedded newlines.  It is much faster to do this operation in two steps:
1N/A
1N/A    $string =~ s/^\s+//;
1N/A    $string =~ s/\s+$//;
1N/A
1N/AOr more nicely written as:
1N/A
1N/A    for ($string) {
1N/A    s/^\s+//;
1N/A    s/\s+$//;
1N/A    }
1N/A
1N/AThis idiom takes advantage of the C<foreach> loop's aliasing
1N/Abehavior to factor out common code.  You can do this
1N/Aon several strings at once, or arrays, or even the
1N/Avalues of a hash if you use a slice:
1N/A
1N/A    # trim whitespace in the scalar, the array,
1N/A    # and all the values in the hash
1N/A    foreach ($scalar, @array, @hash{keys %hash}) {
1N/A        s/^\s+//;
1N/A        s/\s+$//;
1N/A    }
1N/A
1N/A=head2 How do I pad a string with blanks or pad a number with zeroes?
1N/A
1N/AIn the following examples, C<$pad_len> is the length to which you wish
1N/Ato pad the string, C<$text> or C<$num> contains the string to be padded,
1N/Aand C<$pad_char> contains the padding character. You can use a single
1N/Acharacter string constant instead of the C<$pad_char> variable if you
1N/Aknow what it is in advance. And in the same way you can use an integer in
1N/Aplace of C<$pad_len> if you know the pad length in advance.
1N/A
1N/AThe simplest method uses the C<sprintf> function. It can pad on the left
1N/Aor right with blanks and on the left with zeroes and it will not
1N/Atruncate the result. The C<pack> function can only pad strings on the
1N/Aright with blanks and it will truncate the result to a maximum length of
1N/AC<$pad_len>.
1N/A
1N/A    # Left padding a string with blanks (no truncation):
1N/A    $padded = sprintf("%${pad_len}s", $text);
1N/A    $padded = sprintf("%*s", $pad_len, $text);  # same thing
1N/A
1N/A    # Right padding a string with blanks (no truncation):
1N/A    $padded = sprintf("%-${pad_len}s", $text);
1N/A    $padded = sprintf("%-*s", $pad_len, $text); # same thing
1N/A
1N/A    # Left padding a number with 0 (no truncation):
1N/A    $padded = sprintf("%0${pad_len}d", $num);
1N/A    $padded = sprintf("%0*d", $pad_len, $num); # same thing
1N/A
1N/A    # Right padding a string with blanks using pack (will truncate):
1N/A    $padded = pack("A$pad_len",$text);
1N/A
1N/AIf you need to pad with a character other than blank or zero you can use
1N/Aone of the following methods.  They all generate a pad string with the
1N/AC<x> operator and combine that with C<$text>. These methods do
1N/Anot truncate C<$text>.
1N/A
1N/ALeft and right padding with any character, creating a new string:
1N/A
1N/A    $padded = $pad_char x ( $pad_len - length( $text ) ) . $text;
1N/A    $padded = $text . $pad_char x ( $pad_len - length( $text ) );
1N/A
1N/ALeft and right padding with any character, modifying C<$text> directly:
1N/A
1N/A    substr( $text, 0, 0 ) = $pad_char x ( $pad_len - length( $text ) );
1N/A    $text .= $pad_char x ( $pad_len - length( $text ) );
1N/A
1N/A=head2 How do I extract selected columns from a string?
1N/A
1N/AUse substr() or unpack(), both documented in L<perlfunc>.
1N/AIf you prefer thinking in terms of columns instead of widths,
1N/Ayou can use this kind of thing:
1N/A
1N/A    # determine the unpack format needed to split Linux ps output
1N/A    # arguments are cut columns
1N/A    my $fmt = cut2fmt(8, 14, 20, 26, 30, 34, 41, 47, 59, 63, 67, 72);
1N/A
1N/A    sub cut2fmt {
1N/A    my(@positions) = @_;
1N/A    my $template  = '';
1N/A    my $lastpos   = 1;
1N/A    for my $place (@positions) {
1N/A        $template .= "A" . ($place - $lastpos) . " ";
1N/A        $lastpos   = $place;
1N/A    }
1N/A    $template .= "A*";
1N/A    return $template;
1N/A    }
1N/A
1N/A=head2 How do I find the soundex value of a string?
1N/A
1N/AUse the standard Text::Soundex module distributed with Perl.
1N/ABefore you do so, you may want to determine whether `soundex' is in
1N/Afact what you think it is.  Knuth's soundex algorithm compresses words
1N/Ainto a small space, and so it does not necessarily distinguish between
1N/Atwo words which you might want to appear separately.  For example, the
1N/Alast names `Knuth' and `Kant' are both mapped to the soundex code K530.
1N/AIf Text::Soundex does not do what you are looking for, you might want
1N/Ato consider the String::Approx module available at CPAN.
1N/A
1N/A=head2 How can I expand variables in text strings?
1N/A
1N/ALet's assume that you have a string like:
1N/A
1N/A    $text = 'this has a $foo in it and a $bar';
1N/A
1N/AIf those were both global variables, then this would
1N/Asuffice:
1N/A
1N/A    $text =~ s/\$(\w+)/${$1}/g;  # no /e needed
1N/A
1N/ABut since they are probably lexicals, or at least, they could
1N/Abe, you'd have to do this:
1N/A
1N/A    $text =~ s/(\$\w+)/$1/eeg;
1N/A    die if $@;          # needed /ee, not /e
1N/A
1N/AIt's probably better in the general case to treat those
1N/Avariables as entries in some special hash.  For example:
1N/A
1N/A    %user_defs = (
1N/A    foo  => 23,
1N/A    bar  => 19,
1N/A    );
1N/A    $text =~ s/\$(\w+)/$user_defs{$1}/g;
1N/A
1N/ASee also ``How do I expand function calls in a string?'' in this section
1N/Aof the FAQ.
1N/A
1N/A=head2 What's wrong with always quoting "$vars"?
1N/A
1N/AThe problem is that those double-quotes force stringification--
1N/Acoercing numbers and references into strings--even when you
1N/Adon't want them to be strings.  Think of it this way: double-quote
1N/Aexpansion is used to produce new strings.  If you already
1N/Ahave a string, why do you need more?
1N/A
1N/AIf you get used to writing odd things like these:
1N/A
1N/A    print "$var";       # BAD
1N/A    $new = "$old";      # BAD
1N/A    somefunc("$var");   # BAD
1N/A
1N/AYou'll be in trouble.  Those should (in 99.8% of the cases) be
1N/Athe simpler and more direct:
1N/A
1N/A    print $var;
1N/A    $new = $old;
1N/A    somefunc($var);
1N/A
1N/AOtherwise, besides slowing you down, you're going to break code when
1N/Athe thing in the scalar is actually neither a string nor a number, but
1N/Aa reference:
1N/A
1N/A    func(\@array);
1N/A    sub func {
1N/A    my $aref = shift;
1N/A    my $oref = "$aref";  # WRONG
1N/A    }
1N/A
1N/AYou can also get into subtle problems on those few operations in Perl
1N/Athat actually do care about the difference between a string and a
1N/Anumber, such as the magical C<++> autoincrement operator or the
1N/Asyscall() function.
1N/A
1N/AStringification also destroys arrays.
1N/A
1N/A    @lines = `command`;
1N/A    print "@lines";     # WRONG - extra blanks
1N/A    print @lines;       # right
1N/A
1N/A=head2 Why don't my E<lt>E<lt>HERE documents work?
1N/A
1N/ACheck for these three things:
1N/A
1N/A=over 4
1N/A
1N/A=item There must be no space after the E<lt>E<lt> part.
1N/A
1N/A=item There (probably) should be a semicolon at the end.
1N/A
1N/A=item You can't (easily) have any space in front of the tag.
1N/A
1N/A=back
1N/A
1N/AIf you want to indent the text in the here document, you
1N/Acan do this:
1N/A
1N/A    # all in one
1N/A    ($VAR = <<HERE_TARGET) =~ s/^\s+//gm;
1N/A        your text
1N/A        goes here
1N/A    HERE_TARGET
1N/A
1N/ABut the HERE_TARGET must still be flush against the margin.
1N/AIf you want that indented also, you'll have to quote
1N/Ain the indentation.
1N/A
1N/A    ($quote = <<'    FINIS') =~ s/^\s+//gm;
1N/A            ...we will have peace, when you and all your works have
1N/A            perished--and the works of your dark master to whom you
1N/A            would deliver us. You are a liar, Saruman, and a corrupter
1N/A            of men's hearts.  --Theoden in /usr/src/perl/taint.c
1N/A        FINIS
1N/A    $quote =~ s/\s+--/\n--/;
1N/A
1N/AA nice general-purpose fixer-upper function for indented here documents
1N/Afollows.  It expects to be called with a here document as its argument.
1N/AIt looks to see whether each line begins with a common substring, and
1N/Aif so, strips that substring off.  Otherwise, it takes the amount of leading
1N/Awhitespace found on the first line and removes that much off each
1N/Asubsequent line.
1N/A
1N/A    sub fix {
1N/A        local $_ = shift;
1N/A        my ($white, $leader);  # common whitespace and common leading string
1N/A        if (/^\s*(?:([^\w\s]+)(\s*).*\n)(?:\s*\1\2?.*\n)+$/) {
1N/A            ($white, $leader) = ($2, quotemeta($1));
1N/A        } else {
1N/A            ($white, $leader) = (/^(\s+)/, '');
1N/A        }
1N/A        s/^\s*?$leader(?:$white)?//gm;
1N/A        return $_;
1N/A    }
1N/A
1N/AThis works with leading special strings, dynamically determined:
1N/A
1N/A    $remember_the_main = fix<<'    MAIN_INTERPRETER_LOOP';
1N/A    @@@ int
1N/A    @@@ runops() {
1N/A    @@@     SAVEI32(runlevel);
1N/A    @@@     runlevel++;
1N/A    @@@     while ( op = (*op->op_ppaddr)() );
1N/A    @@@     TAINT_NOT;
1N/A    @@@     return 0;
1N/A    @@@ }
1N/A    MAIN_INTERPRETER_LOOP
1N/A
1N/AOr with a fixed amount of leading whitespace, with remaining
1N/Aindentation correctly preserved:
1N/A
1N/A    $poem = fix<<EVER_ON_AND_ON;
1N/A       Now far ahead the Road has gone,
1N/A      And I must follow, if I can,
1N/A       Pursuing it with eager feet,
1N/A      Until it joins some larger way
1N/A       Where many paths and errands meet.
1N/A      And whither then? I cannot say.
1N/A        --Bilbo in /usr/src/perl/pp_ctl.c
1N/A    EVER_ON_AND_ON
1N/A
1N/A=head1 Data: Arrays
1N/A
1N/A=head2 What is the difference between a list and an array?
1N/A
1N/AAn array has a changeable length.  A list does not.  An array is something
1N/Ayou can push or pop, while a list is a set of values.  Some people make
1N/Athe distinction that a list is a value while an array is a variable.
1N/ASubroutines are passed and return lists, you put things into list
1N/Acontext, you initialize arrays with lists, and you foreach() across
1N/Aa list.  C<@> variables are arrays, anonymous arrays are arrays, arrays
1N/Ain scalar context behave like the number of elements in them, subroutines
1N/Aaccess their arguments through the array C<@_>, and push/pop/shift only work
1N/Aon arrays.
1N/A
1N/AAs a side note, there's no such thing as a list in scalar context.
1N/AWhen you say
1N/A
1N/A    $scalar = (2, 5, 7, 9);
1N/A
1N/Ayou're using the comma operator in scalar context, so it uses the scalar
1N/Acomma operator.  There never was a list there at all!  This causes the
1N/Alast value to be returned: 9.
1N/A
1N/A=head2 What is the difference between $array[1] and @array[1]?
1N/A
1N/AThe former is a scalar value; the latter an array slice, making
1N/Ait a list with one (scalar) value.  You should use $ when you want a
1N/Ascalar value (most of the time) and @ when you want a list with one
1N/Ascalar value in it (very, very rarely; nearly never, in fact).
1N/A
1N/ASometimes it doesn't make a difference, but sometimes it does.
1N/AFor example, compare:
1N/A
1N/A    $good[0] = `some program that outputs several lines`;
1N/A
1N/Awith
1N/A
1N/A    @bad[0]  = `same program that outputs several lines`;
1N/A
1N/AThe C<use warnings> pragma and the B<-w> flag will warn you about these
1N/Amatters.
1N/A
1N/A=head2 How can I remove duplicate elements from a list or array?
1N/A
1N/AThere are several possible ways, depending on whether the array is
1N/Aordered and whether you wish to preserve the ordering.
1N/A
1N/A=over 4
1N/A
1N/A=item a)
1N/A
1N/AIf @in is sorted, and you want @out to be sorted:
1N/A(this assumes all true values in the array)
1N/A
1N/A    $prev = "not equal to $in[0]";
1N/A    @out = grep($_ ne $prev && ($prev = $_, 1), @in);
1N/A
1N/AThis is nice in that it doesn't use much extra memory, simulating
1N/Auniq(1)'s behavior of removing only adjacent duplicates.  The ", 1"
1N/Aguarantees that the expression is true (so that grep picks it up)
1N/Aeven if the $_ is 0, "", or undef.
1N/A
1N/A=item b)
1N/A
1N/AIf you don't know whether @in is sorted:
1N/A
1N/A    undef %saw;
1N/A    @out = grep(!$saw{$_}++, @in);
1N/A
1N/A=item c)
1N/A
1N/ALike (b), but @in contains only small integers:
1N/A
1N/A    @out = grep(!$saw[$_]++, @in);
1N/A
1N/A=item d)
1N/A
1N/AA way to do (b) without any loops or greps:
1N/A
1N/A    undef %saw;
1N/A    @saw{@in} = ();
1N/A    @out = sort keys %saw;  # remove sort if undesired
1N/A
1N/A=item e)
1N/A
1N/ALike (d), but @in contains only small positive integers:
1N/A
1N/A    undef @ary;
1N/A    @ary[@in] = @in;
1N/A    @out = grep {defined} @ary;
1N/A
1N/A=back
1N/A
1N/ABut perhaps you should have been using a hash all along, eh?
1N/A
1N/A=head2 How can I tell whether a certain element is contained in a list or array?
1N/A
1N/AHearing the word "in" is an I<in>dication that you probably should have
1N/Aused a hash, not a list or array, to store your data.  Hashes are
1N/Adesigned to answer this question quickly and efficiently.  Arrays aren't.
1N/A
1N/AThat being said, there are several ways to approach this.  If you
1N/Aare going to make this query many times over arbitrary string values,
1N/Athe fastest way is probably to invert the original array and maintain a
1N/Ahash whose keys are the first array's values.
1N/A
1N/A    @blues = qw/azure cerulean teal turquoise lapis-lazuli/;
1N/A    %is_blue = ();
1N/A    for (@blues) { $is_blue{$_} = 1 }
1N/A
1N/ANow you can check whether $is_blue{$some_color}.  It might have been a
1N/Agood idea to keep the blues all in a hash in the first place.
1N/A
1N/AIf the values are all small integers, you could use a simple indexed
1N/Aarray.  This kind of an array will take up less space:
1N/A
1N/A    @primes = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31);
1N/A    @is_tiny_prime = ();
1N/A    for (@primes) { $is_tiny_prime[$_] = 1 }
1N/A    # or simply  @istiny_prime[@primes] = (1) x @primes;
1N/A
1N/ANow you check whether $is_tiny_prime[$some_number].
1N/A
1N/AIf the values in question are integers instead of strings, you can save
1N/Aquite a lot of space by using bit strings instead:
1N/A
1N/A    @articles = ( 1..10, 150..2000, 2017 );
1N/A    undef $read;
1N/A    for (@articles) { vec($read,$_,1) = 1 }
1N/A
1N/ANow check whether C<vec($read,$n,1)> is true for some C<$n>.
1N/A
1N/APlease do not use
1N/A
1N/A    ($is_there) = grep $_ eq $whatever, @array;
1N/A
1N/Aor worse yet
1N/A
1N/A    ($is_there) = grep /$whatever/, @array;
1N/A
1N/AThese are slow (checks every element even if the first matches),
1N/Ainefficient (same reason), and potentially buggy (what if there are
1N/Aregex characters in $whatever?).  If you're only testing once, then
1N/Ause:
1N/A
1N/A    $is_there = 0;
1N/A    foreach $elt (@array) {
1N/A    if ($elt eq $elt_to_find) {
1N/A        $is_there = 1;
1N/A        last;
1N/A    }
1N/A    }
1N/A    if ($is_there) { ... }
1N/A
1N/A=head2 How do I compute the difference of two arrays?  How do I compute the intersection of two arrays?
1N/A
1N/AUse a hash.  Here's code to do both and more.  It assumes that
1N/Aeach element is unique in a given array:
1N/A
1N/A    @union = @intersection = @difference = ();
1N/A    %count = ();
1N/A    foreach $element (@array1, @array2) { $count{$element}++ }
1N/A    foreach $element (keys %count) {
1N/A    push @union, $element;
1N/A    push @{ $count{$element} > 1 ? \@intersection : \@difference }, $element;
1N/A    }
1N/A
1N/ANote that this is the I<symmetric difference>, that is, all elements in
1N/Aeither A or in B but not in both.  Think of it as an xor operation.
1N/A
1N/A=head2 How do I test whether two arrays or hashes are equal?
1N/A
1N/AThe following code works for single-level arrays.  It uses a stringwise
1N/Acomparison, and does not distinguish defined versus undefined empty
1N/Astrings.  Modify if you have other needs.
1N/A
1N/A    $are_equal = compare_arrays(\@frogs, \@toads);
1N/A
1N/A    sub compare_arrays {
1N/A    my ($first, $second) = @_;
1N/A    no warnings;  # silence spurious -w undef complaints
1N/A    return 0 unless @$first == @$second;
1N/A    for (my $i = 0; $i < @$first; $i++) {
1N/A        return 0 if $first->[$i] ne $second->[$i];
1N/A    }
1N/A    return 1;
1N/A    }
1N/A
1N/AFor multilevel structures, you may wish to use an approach more
1N/Alike this one.  It uses the CPAN module FreezeThaw:
1N/A
1N/A    use FreezeThaw qw(cmpStr);
1N/A    @a = @b = ( "this", "that", [ "more", "stuff" ] );
1N/A
1N/A    printf "a and b contain %s arrays\n",
1N/A        cmpStr(\@a, \@b) == 0
1N/A        ? "the same"
1N/A        : "different";
1N/A
1N/AThis approach also works for comparing hashes.  Here
1N/Awe'll demonstrate two different answers:
1N/A
1N/A    use FreezeThaw qw(cmpStr cmpStrHard);
1N/A
1N/A    %a = %b = ( "this" => "that", "extra" => [ "more", "stuff" ] );
1N/A    $a{EXTRA} = \%b;
1N/A    $b{EXTRA} = \%a;
1N/A
1N/A    printf "a and b contain %s hashes\n",
1N/A    cmpStr(\%a, \%b) == 0 ? "the same" : "different";
1N/A
1N/A    printf "a and b contain %s hashes\n",
1N/A    cmpStrHard(\%a, \%b) == 0 ? "the same" : "different";
1N/A
1N/A
1N/AThe first reports that both those the hashes contain the same data,
1N/Awhile the second reports that they do not.  Which you prefer is left as
1N/Aan exercise to the reader.
1N/A
1N/A=head2 How do I find the first array element for which a condition is true?
1N/A
1N/ATo find the first array element which satisfies a condition, you can
1N/Ause the first() function in the List::Util module, which comes with
1N/APerl 5.8.  This example finds the first element that contains "Perl".
1N/A
1N/A    use List::Util qw(first);
1N/A
1N/A    my $element = first { /Perl/ } @array;
1N/A
1N/AIf you cannot use List::Util, you can make your own loop to do the
1N/Asame thing.  Once you find the element, you stop the loop with last.
1N/A
1N/A    my $found;
1N/A    foreach my $element ( @array )
1N/A        {
1N/A        if( /Perl/ ) { $found = $element; last }
1N/A        }
1N/A
1N/AIf you want the array index, you can iterate through the indices
1N/Aand check the array element at each index until you find one
1N/Athat satisfies the condition.
1N/A
1N/A    my( $found, $index ) = ( undef, -1 );
1N/A    for( $i = 0; $i < @array; $i++ )
1N/A        {
1N/A        if( $array[$i] =~ /Perl/ )
1N/A            {
1N/A            $found = $array[$i];
1N/A            $index = $i;
1N/A            last;
1N/A            }
1N/A        }
1N/A
1N/A=head2 How do I handle linked lists?
1N/A
1N/AIn general, you usually don't need a linked list in Perl, since with
1N/Aregular arrays, you can push and pop or shift and unshift at either end,
1N/Aor you can use splice to add and/or remove arbitrary number of elements at
1N/Aarbitrary points.  Both pop and shift are both O(1) operations on Perl's
1N/Adynamic arrays.  In the absence of shifts and pops, push in general
1N/Aneeds to reallocate on the order every log(N) times, and unshift will
1N/Aneed to copy pointers each time.
1N/A
1N/AIf you really, really wanted, you could use structures as described in
1N/AL<perldsc> or L<perltoot> and do just what the algorithm book tells you
1N/Ato do.  For example, imagine a list node like this:
1N/A
1N/A    $node = {
1N/A        VALUE => 42,
1N/A        LINK  => undef,
1N/A    };
1N/A
1N/AYou could walk the list this way:
1N/A
1N/A    print "List: ";
1N/A    for ($node = $head;  $node; $node = $node->{LINK}) {
1N/A        print $node->{VALUE}, " ";
1N/A    }
1N/A    print "\n";
1N/A
1N/AYou could add to the list this way:
1N/A
1N/A    my ($head, $tail);
1N/A    $tail = append($head, 1);       # grow a new head
1N/A    for $value ( 2 .. 10 ) {
1N/A        $tail = append($tail, $value);
1N/A    }
1N/A
1N/A    sub append {
1N/A        my($list, $value) = @_;
1N/A        my $node = { VALUE => $value };
1N/A        if ($list) {
1N/A            $node->{LINK} = $list->{LINK};
1N/A            $list->{LINK} = $node;
1N/A        } else {
1N/A            $_[0] = $node;      # replace caller's version
1N/A        }
1N/A        return $node;
1N/A    }
1N/A
1N/ABut again, Perl's built-in are virtually always good enough.
1N/A
1N/A=head2 How do I handle circular lists?
1N/A
1N/ACircular lists could be handled in the traditional fashion with linked
1N/Alists, or you could just do something like this with an array:
1N/A
1N/A    unshift(@array, pop(@array));  # the last shall be first
1N/A    push(@array, shift(@array));   # and vice versa
1N/A
1N/A=head2 How do I shuffle an array randomly?
1N/A
1N/AIf you either have Perl 5.8.0 or later installed, or if you have
1N/AScalar-List-Utils 1.03 or later installed, you can say:
1N/A
1N/A    use List::Util 'shuffle';
1N/A
1N/A    @shuffled = shuffle(@list);
1N/A
1N/AIf not, you can use a Fisher-Yates shuffle.
1N/A
1N/A    sub fisher_yates_shuffle {
1N/A        my $deck = shift;  # $deck is a reference to an array
1N/A        my $i = @$deck;
1N/A        while ($i--) {
1N/A            my $j = int rand ($i+1);
1N/A            @$deck[$i,$j] = @$deck[$j,$i];
1N/A        }
1N/A    }
1N/A
1N/A    # shuffle my mpeg collection
1N/A    #
1N/A    my @mpeg = <audio/*/*.mp3>;
1N/A    fisher_yates_shuffle( \@mpeg );    # randomize @mpeg in place
1N/A    print @mpeg;
1N/A
1N/ANote that the above implementation shuffles an array in place,
1N/Aunlike the List::Util::shuffle() which takes a list and returns
1N/Aa new shuffled list.
1N/A
1N/AYou've probably seen shuffling algorithms that work using splice,
1N/Arandomly picking another element to swap the current element with
1N/A
1N/A    srand;
1N/A    @new = ();
1N/A    @old = 1 .. 10;  # just a demo
1N/A    while (@old) {
1N/A    push(@new, splice(@old, rand @old, 1));
1N/A    }
1N/A
1N/AThis is bad because splice is already O(N), and since you do it N times,
1N/Ayou just invented a quadratic algorithm; that is, O(N**2).  This does
1N/Anot scale, although Perl is so efficient that you probably won't notice
1N/Athis until you have rather largish arrays.
1N/A
1N/A=head2 How do I process/modify each element of an array?
1N/A
1N/AUse C<for>/C<foreach>:
1N/A
1N/A    for (@lines) {
1N/A    s/foo/bar/; # change that word
1N/A    y/XZ/ZX/;   # swap those letters
1N/A    }
1N/A
1N/AHere's another; let's compute spherical volumes:
1N/A
1N/A    for (@volumes = @radii) {   # @volumes has changed parts
1N/A    $_ **= 3;
1N/A    $_ *= (4/3) * 3.14159;  # this will be constant folded
1N/A    }
1N/A
1N/Awhich can also be done with map() which is made to transform
1N/Aone list into another:
1N/A
1N/A    @volumes = map {$_ ** 3 * (4/3) * 3.14159} @radii;
1N/A
1N/AIf you want to do the same thing to modify the values of the
1N/Ahash, you can use the C<values> function.  As of Perl 5.6
1N/Athe values are not copied, so if you modify $orbit (in this
1N/Acase), you modify the value.
1N/A
1N/A    for $orbit ( values %orbits ) {
1N/A    ($orbit **= 3) *= (4/3) * 3.14159;
1N/A    }
1N/A
1N/APrior to perl 5.6 C<values> returned copies of the values,
1N/Aso older perl code often contains constructions such as
1N/AC<@orbits{keys %orbits}> instead of C<values %orbits> where
1N/Athe hash is to be modified.
1N/A
1N/A=head2 How do I select a random element from an array?
1N/A
1N/AUse the rand() function (see L<perlfunc/rand>):
1N/A
1N/A    $index   = rand @array;
1N/A    $element = $array[$index];
1N/A
1N/AOr, simply:
1N/A    my $element = $array[ rand @array ];
1N/A
1N/A=head2 How do I permute N elements of a list?
1N/A
1N/AUse the List::Permutor module on CPAN.  If the list is
1N/Aactually an array, try the Algorithm::Permute module (also
1N/Aon CPAN).  It's written in XS code and is very efficient.
1N/A
1N/A    use Algorithm::Permute;
1N/A    my @array = 'a'..'d';
1N/A    my $p_iterator = Algorithm::Permute->new ( \@array );
1N/A    while (my @perm = $p_iterator->next) {
1N/A       print "next permutation: (@perm)\n";
1N/A    }
1N/A
1N/AFor even faster execution, you could do:
1N/A
1N/A   use Algorithm::Permute;
1N/A   my @array = 'a'..'d';
1N/A   Algorithm::Permute::permute {
1N/A      print "next permutation: (@array)\n";
1N/A   } @array;
1N/A
1N/AHere's a little program that generates all permutations of
1N/Aall the words on each line of input. The algorithm embodied
1N/Ain the permute() function is discussed in Volume 4 (still
1N/Aunpublished) of Knuth's I<The Art of Computer Programming>
1N/Aand will work on any list:
1N/A
1N/A    #!/usr/bin/perl -n
1N/A    # Fischer-Kause ordered permutation generator
1N/A
1N/A    sub permute (&@) {
1N/A        my $code = shift;
1N/A        my @idx = 0..$#_;
1N/A        while ( $code->(@_[@idx]) ) {
1N/A            my $p = $#idx;
1N/A            --$p while $idx[$p-1] > $idx[$p];
1N/A            my $q = $p or return;
1N/A            push @idx, reverse splice @idx, $p;
1N/A            ++$q while $idx[$p-1] > $idx[$q];
1N/A            @idx[$p-1,$q]=@idx[$q,$p-1];
1N/A        }
1N/A    }
1N/A
1N/A    permute {print"@_\n"} split;
1N/A
1N/A=head2 How do I sort an array by (anything)?
1N/A
1N/ASupply a comparison function to sort() (described in L<perlfunc/sort>):
1N/A
1N/A    @list = sort { $a <=> $b } @list;
1N/A
1N/AThe default sort function is cmp, string comparison, which would
1N/Asort C<(1, 2, 10)> into C<(1, 10, 2)>.  C<< <=> >>, used above, is
1N/Athe numerical comparison operator.
1N/A
1N/AIf you have a complicated function needed to pull out the part you
1N/Awant to sort on, then don't do it inside the sort function.  Pull it
1N/Aout first, because the sort BLOCK can be called many times for the
1N/Asame element.  Here's an example of how to pull out the first word
1N/Aafter the first number on each item, and then sort those words
1N/Acase-insensitively.
1N/A
1N/A    @idx = ();
1N/A    for (@data) {
1N/A    ($item) = /\d+\s*(\S+)/;
1N/A    push @idx, uc($item);
1N/A    }
1N/A    @sorted = @data[ sort { $idx[$a] cmp $idx[$b] } 0 .. $#idx ];
1N/A
1N/Awhich could also be written this way, using a trick
1N/Athat's come to be known as the Schwartzian Transform:
1N/A
1N/A    @sorted = map  { $_->[0] }
1N/A          sort { $a->[1] cmp $b->[1] }
1N/A          map  { [ $_, uc( (/\d+\s*(\S+)/)[0]) ] } @data;
1N/A
1N/AIf you need to sort on several fields, the following paradigm is useful.
1N/A
1N/A    @sorted = sort { field1($a) <=> field1($b) ||
1N/A                     field2($a) cmp field2($b) ||
1N/A                     field3($a) cmp field3($b)
1N/A                   }     @data;
1N/A
1N/AThis can be conveniently combined with precalculation of keys as given
1N/Aabove.
1N/A
1N/ASee the F<sort> article in the "Far More Than You Ever Wanted
1N/ATo Know" collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz for
1N/Amore about this approach.
1N/A
1N/ASee also the question below on sorting hashes.
1N/A
1N/A=head2 How do I manipulate arrays of bits?
1N/A
1N/AUse pack() and unpack(), or else vec() and the bitwise operations.
1N/A
1N/AFor example, this sets $vec to have bit N set if $ints[N] was set:
1N/A
1N/A    $vec = '';
1N/A    foreach(@ints) { vec($vec,$_,1) = 1 }
1N/A
1N/AHere's how, given a vector in $vec, you can
1N/Aget those bits into your @ints array:
1N/A
1N/A    sub bitvec_to_list {
1N/A    my $vec = shift;
1N/A    my @ints;
1N/A    # Find null-byte density then select best algorithm
1N/A    if ($vec =~ tr/\0// / length $vec > 0.95) {
1N/A        use integer;
1N/A        my $i;
1N/A        # This method is faster with mostly null-bytes
1N/A        while($vec =~ /[^\0]/g ) {
1N/A        $i = -9 + 8 * pos $vec;
1N/A        push @ints, $i if vec($vec, ++$i, 1);
1N/A        push @ints, $i if vec($vec, ++$i, 1);
1N/A        push @ints, $i if vec($vec, ++$i, 1);
1N/A        push @ints, $i if vec($vec, ++$i, 1);
1N/A        push @ints, $i if vec($vec, ++$i, 1);
1N/A        push @ints, $i if vec($vec, ++$i, 1);
1N/A        push @ints, $i if vec($vec, ++$i, 1);
1N/A        push @ints, $i if vec($vec, ++$i, 1);
1N/A        }
1N/A    } else {
1N/A        # This method is a fast general algorithm
1N/A        use integer;
1N/A        my $bits = unpack "b*", $vec;
1N/A        push @ints, 0 if $bits =~ s/^(\d)// && $1;
1N/A        push @ints, pos $bits while($bits =~ /1/g);
1N/A    }
1N/A    return \@ints;
1N/A    }
1N/A
1N/AThis method gets faster the more sparse the bit vector is.
1N/A(Courtesy of Tim Bunce and Winfried Koenig.)
1N/A
1N/AYou can make the while loop a lot shorter with this suggestion
1N/Afrom Benjamin Goldberg:
1N/A
1N/A    while($vec =~ /[^\0]+/g ) {
1N/A       push @ints, grep vec($vec, $_, 1), $-[0] * 8 .. $+[0] * 8;
1N/A    }
1N/A
1N/AOr use the CPAN module Bit::Vector:
1N/A
1N/A    $vector = Bit::Vector->new($num_of_bits);
1N/A    $vector->Index_List_Store(@ints);
1N/A    @ints = $vector->Index_List_Read();
1N/A
1N/ABit::Vector provides efficient methods for bit vector, sets of small integers
1N/Aand "big int" math.
1N/A
1N/AHere's a more extensive illustration using vec():
1N/A
1N/A    # vec demo
1N/A    $vector = "\xff\x0f\xef\xfe";
1N/A    print "Ilya's string \\xff\\x0f\\xef\\xfe represents the number ",
1N/A    unpack("N", $vector), "\n";
1N/A    $is_set = vec($vector, 23, 1);
1N/A    print "Its 23rd bit is ", $is_set ? "set" : "clear", ".\n";
1N/A    pvec($vector);
1N/A
1N/A    set_vec(1,1,1);
1N/A    set_vec(3,1,1);
1N/A    set_vec(23,1,1);
1N/A
1N/A    set_vec(3,1,3);
1N/A    set_vec(3,2,3);
1N/A    set_vec(3,4,3);
1N/A    set_vec(3,4,7);
1N/A    set_vec(3,8,3);
1N/A    set_vec(3,8,7);
1N/A
1N/A    set_vec(0,32,17);
1N/A    set_vec(1,32,17);
1N/A
1N/A    sub set_vec {
1N/A    my ($offset, $width, $value) = @_;
1N/A    my $vector = '';
1N/A    vec($vector, $offset, $width) = $value;
1N/A    print "offset=$offset width=$width value=$value\n";
1N/A    pvec($vector);
1N/A    }
1N/A
1N/A    sub pvec {
1N/A    my $vector = shift;
1N/A    my $bits = unpack("b*", $vector);
1N/A    my $i = 0;
1N/A    my $BASE = 8;
1N/A
1N/A    print "vector length in bytes: ", length($vector), "\n";
1N/A    @bytes = unpack("A8" x length($vector), $bits);
1N/A    print "bits are: @bytes\n\n";
1N/A    }
1N/A
1N/A=head2 Why does defined() return true on empty arrays and hashes?
1N/A
1N/AThe short story is that you should probably only use defined on scalars or
1N/Afunctions, not on aggregates (arrays and hashes).  See L<perlfunc/defined>
1N/Ain the 5.004 release or later of Perl for more detail.
1N/A
1N/A=head1 Data: Hashes (Associative Arrays)
1N/A
1N/A=head2 How do I process an entire hash?
1N/A
1N/AUse the each() function (see L<perlfunc/each>) if you don't care
1N/Awhether it's sorted:
1N/A
1N/A    while ( ($key, $value) = each %hash) {
1N/A    print "$key = $value\n";
1N/A    }
1N/A
1N/AIf you want it sorted, you'll have to use foreach() on the result of
1N/Asorting the keys as shown in an earlier question.
1N/A
1N/A=head2 What happens if I add or remove keys from a hash while iterating over it?
1N/A
1N/ADon't do that. :-)
1N/A
1N/A[lwall] In Perl 4, you were not allowed to modify a hash at all while
1N/Aiterating over it.  In Perl 5 you can delete from it, but you still
1N/Acan't add to it, because that might cause a doubling of the hash table,
1N/Ain which half the entries get copied up to the new top half of the
1N/Atable, at which point you've totally bamboozled the iterator code.
1N/AEven if the table doesn't double, there's no telling whether your new
1N/Aentry will be inserted before or after the current iterator position.
1N/A
1N/AEither treasure up your changes and make them after the iterator finishes
1N/Aor use keys to fetch all the old keys at once, and iterate over the list
1N/Aof keys.
1N/A
1N/A=head2 How do I look up a hash element by value?
1N/A
1N/ACreate a reverse hash:
1N/A
1N/A    %by_value = reverse %by_key;
1N/A    $key = $by_value{$value};
1N/A
1N/AThat's not particularly efficient.  It would be more space-efficient
1N/Ato use:
1N/A
1N/A    while (($key, $value) = each %by_key) {
1N/A    $by_value{$value} = $key;
1N/A    }
1N/A
1N/AIf your hash could have repeated values, the methods above will only find
1N/Aone of the associated keys.   This may or may not worry you.  If it does
1N/Aworry you, you can always reverse the hash into a hash of arrays instead:
1N/A
1N/A     while (($key, $value) = each %by_key) {
1N/A     push @{$key_list_by_value{$value}}, $key;
1N/A     }
1N/A
1N/A=head2 How can I know how many entries are in a hash?
1N/A
1N/AIf you mean how many keys, then all you have to do is
1N/Ause the keys() function in a scalar context:
1N/A
1N/A    $num_keys = keys %hash;
1N/A
1N/AThe keys() function also resets the iterator, which means that you may
1N/Asee strange results if you use this between uses of other hash operators
1N/Asuch as each().
1N/A
1N/A=head2 How do I sort a hash (optionally by value instead of key)?
1N/A
1N/AInternally, hashes are stored in a way that prevents you from imposing
1N/Aan order on key-value pairs.  Instead, you have to sort a list of the
1N/Akeys or values:
1N/A
1N/A    @keys = sort keys %hash;    # sorted by key
1N/A    @keys = sort {
1N/A            $hash{$a} cmp $hash{$b}
1N/A        } keys %hash;   # and by value
1N/A
1N/AHere we'll do a reverse numeric sort by value, and if two keys are
1N/Aidentical, sort by length of key, or if that fails, by straight ASCII
1N/Acomparison of the keys (well, possibly modified by your locale--see
1N/AL<perllocale>).
1N/A
1N/A    @keys = sort {
1N/A        $hash{$b} <=> $hash{$a}
1N/A              ||
1N/A        length($b) <=> length($a)
1N/A              ||
1N/A              $a cmp $b
1N/A    } keys %hash;
1N/A
1N/A=head2 How can I always keep my hash sorted?
1N/A
1N/AYou can look into using the DB_File module and tie() using the
1N/A$DB_BTREE hash bindings as documented in L<DB_File/"In Memory Databases">.
1N/AThe Tie::IxHash module from CPAN might also be instructive.
1N/A
1N/A=head2 What's the difference between "delete" and "undef" with hashes?
1N/A
1N/AHashes contain pairs of scalars: the first is the key, the
1N/Asecond is the value.  The key will be coerced to a string,
1N/Aalthough the value can be any kind of scalar: string,
1N/Anumber, or reference.  If a key $key is present in
1N/A%hash, C<exists($hash{$key})> will return true.  The value
1N/Afor a given key can be C<undef>, in which case
1N/AC<$hash{$key}> will be C<undef> while C<exists $hash{$key}>
1N/Awill return true.  This corresponds to (C<$key>, C<undef>)
1N/Abeing in the hash.
1N/A
1N/APictures help...  here's the %hash table:
1N/A
1N/A      keys  values
1N/A    +------+------+
1N/A    |  a   |  3   |
1N/A    |  x   |  7   |
1N/A    |  d   |  0   |
1N/A    |  e   |  2   |
1N/A    +------+------+
1N/A
1N/AAnd these conditions hold
1N/A
1N/A    $hash{'a'}                       is true
1N/A    $hash{'d'}                       is false
1N/A    defined $hash{'d'}               is true
1N/A    defined $hash{'a'}               is true
1N/A    exists $hash{'a'}                is true (Perl5 only)
1N/A    grep ($_ eq 'a', keys %hash)     is true
1N/A
1N/AIf you now say
1N/A
1N/A    undef $hash{'a'}
1N/A
1N/Ayour table now reads:
1N/A
1N/A
1N/A      keys  values
1N/A    +------+------+
1N/A    |  a   | undef|
1N/A    |  x   |  7   |
1N/A    |  d   |  0   |
1N/A    |  e   |  2   |
1N/A    +------+------+
1N/A
1N/Aand these conditions now hold; changes in caps:
1N/A
1N/A    $hash{'a'}                       is FALSE
1N/A    $hash{'d'}                       is false
1N/A    defined $hash{'d'}               is true
1N/A    defined $hash{'a'}               is FALSE
1N/A    exists $hash{'a'}                is true (Perl5 only)
1N/A    grep ($_ eq 'a', keys %hash)     is true
1N/A
1N/ANotice the last two: you have an undef value, but a defined key!
1N/A
1N/ANow, consider this:
1N/A
1N/A    delete $hash{'a'}
1N/A
1N/Ayour table now reads:
1N/A
1N/A      keys  values
1N/A    +------+------+
1N/A    |  x   |  7   |
1N/A    |  d   |  0   |
1N/A    |  e   |  2   |
1N/A    +------+------+
1N/A
1N/Aand these conditions now hold; changes in caps:
1N/A
1N/A    $hash{'a'}                       is false
1N/A    $hash{'d'}                       is false
1N/A    defined $hash{'d'}               is true
1N/A    defined $hash{'a'}               is false
1N/A    exists $hash{'a'}                is FALSE (Perl5 only)
1N/A    grep ($_ eq 'a', keys %hash)     is FALSE
1N/A
1N/ASee, the whole entry is gone!
1N/A
1N/A=head2 Why don't my tied hashes make the defined/exists distinction?
1N/A
1N/AThis depends on the tied hash's implementation of EXISTS().
1N/AFor example, there isn't the concept of undef with hashes
1N/Athat are tied to DBM* files. It also means that exists() and
1N/Adefined() do the same thing with a DBM* file, and what they
1N/Aend up doing is not what they do with ordinary hashes.
1N/A
1N/A=head2 How do I reset an each() operation part-way through?
1N/A
1N/AUsing C<keys %hash> in scalar context returns the number of keys in
1N/Athe hash I<and> resets the iterator associated with the hash.  You may
1N/Aneed to do this if you use C<last> to exit a loop early so that when you
1N/Are-enter it, the hash iterator has been reset.
1N/A
1N/A=head2 How can I get the unique keys from two hashes?
1N/A
1N/AFirst you extract the keys from the hashes into lists, then solve
1N/Athe "removing duplicates" problem described above.  For example:
1N/A
1N/A    %seen = ();
1N/A    for $element (keys(%foo), keys(%bar)) {
1N/A    $seen{$element}++;
1N/A    }
1N/A    @uniq = keys %seen;
1N/A
1N/AOr more succinctly:
1N/A
1N/A    @uniq = keys %{{%foo,%bar}};
1N/A
1N/AOr if you really want to save space:
1N/A
1N/A    %seen = ();
1N/A    while (defined ($key = each %foo)) {
1N/A        $seen{$key}++;
1N/A    }
1N/A    while (defined ($key = each %bar)) {
1N/A        $seen{$key}++;
1N/A    }
1N/A    @uniq = keys %seen;
1N/A
1N/A=head2 How can I store a multidimensional array in a DBM file?
1N/A
1N/AEither stringify the structure yourself (no fun), or else
1N/Aget the MLDBM (which uses Data::Dumper) module from CPAN and layer
1N/Ait on top of either DB_File or GDBM_File.
1N/A
1N/A=head2 How can I make my hash remember the order I put elements into it?
1N/A
1N/AUse the Tie::IxHash from CPAN.
1N/A
1N/A    use Tie::IxHash;
1N/A    tie my %myhash, 'Tie::IxHash';
1N/A    for (my $i=0; $i<20; $i++) {
1N/A        $myhash{$i} = 2*$i;
1N/A    }
1N/A    my @keys = keys %myhash;
1N/A    # @keys = (0,1,2,3,...)
1N/A
1N/A=head2 Why does passing a subroutine an undefined element in a hash create it?
1N/A
1N/AIf you say something like:
1N/A
1N/A    somefunc($hash{"nonesuch key here"});
1N/A
1N/AThen that element "autovivifies"; that is, it springs into existence
1N/Awhether you store something there or not.  That's because functions
1N/Aget scalars passed in by reference.  If somefunc() modifies C<$_[0]>,
1N/Ait has to be ready to write it back into the caller's version.
1N/A
1N/AThis has been fixed as of Perl5.004.
1N/A
1N/ANormally, merely accessing a key's value for a nonexistent key does
1N/AI<not> cause that key to be forever there.  This is different than
1N/Aawk's behavior.
1N/A
1N/A=head2 How can I make the Perl equivalent of a C structure/C++ class/hash or array of hashes or arrays?
1N/A
1N/AUsually a hash ref, perhaps like this:
1N/A
1N/A    $record = {
1N/A        NAME   => "Jason",
1N/A        EMPNO  => 132,
1N/A        TITLE  => "deputy peon",
1N/A        AGE    => 23,
1N/A        SALARY => 37_000,
1N/A        PALS   => [ "Norbert", "Rhys", "Phineas"],
1N/A    };
1N/A
1N/AReferences are documented in L<perlref> and the upcoming L<perlreftut>.
1N/AExamples of complex data structures are given in L<perldsc> and
1N/AL<perllol>.  Examples of structures and object-oriented classes are
1N/Ain L<perltoot>.
1N/A
1N/A=head2 How can I use a reference as a hash key?
1N/A
1N/AYou can't do this directly, but you could use the standard Tie::RefHash
1N/Amodule distributed with Perl.
1N/A
1N/A=head1 Data: Misc
1N/A
1N/A=head2 How do I handle binary data correctly?
1N/A
1N/APerl is binary clean, so this shouldn't be a problem.  For example,
1N/Athis works fine (assuming the files are found):
1N/A
1N/A    if (`cat /vmunix` =~ /gzip/) {
1N/A    print "Your kernel is GNU-zip enabled!\n";
1N/A    }
1N/A
1N/AOn less elegant (read: Byzantine) systems, however, you have
1N/Ato play tedious games with "text" versus "binary" files.  See
1N/AL<perlfunc/"binmode"> or L<perlopentut>.
1N/A
1N/AIf you're concerned about 8-bit ASCII data, then see L<perllocale>.
1N/A
1N/AIf you want to deal with multibyte characters, however, there are
1N/Asome gotchas.  See the section on Regular Expressions.
1N/A
1N/A=head2 How do I determine whether a scalar is a number/whole/integer/float?
1N/A
1N/AAssuming that you don't care about IEEE notations like "NaN" or
1N/A"Infinity", you probably just want to use a regular expression.
1N/A
1N/A   if (/\D/)            { print "has nondigits\n" }
1N/A   if (/^\d+$/)         { print "is a whole number\n" }
1N/A   if (/^-?\d+$/)       { print "is an integer\n" }
1N/A   if (/^[+-]?\d+$/)    { print "is a +/- integer\n" }
1N/A   if (/^-?\d+\.?\d*$/) { print "is a real number\n" }
1N/A   if (/^-?(?:\d+(?:\.\d*)?|\.\d+)$/) { print "is a decimal number\n" }
1N/A   if (/^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/)
1N/A            { print "a C float\n" }
1N/A
1N/AThere are also some commonly used modules for the task.
1N/AL<Scalar::Util> (distributed with 5.8) provides access to perl's
1N/Ainternal function C<looks_like_number> for determining
1N/Awhether a variable looks like a number.  L<Data::Types>
1N/Aexports functions that validate data types using both the
1N/Aabove and other regular expressions. Thirdly, there is
1N/AC<Regexp::Common> which has regular expressions to match
1N/Avarious types of numbers. Those three modules are available
1N/Afrom the CPAN.
1N/A
1N/AIf you're on a POSIX system, Perl supports the C<POSIX::strtod>
1N/Afunction.  Its semantics are somewhat cumbersome, so here's a C<getnum>
1N/Awrapper function for more convenient access.  This function takes
1N/Aa string and returns the number it found, or C<undef> for input that
1N/Aisn't a C float.  The C<is_numeric> function is a front end to C<getnum>
1N/Aif you just want to say, ``Is this a float?''
1N/A
1N/A    sub getnum {
1N/A        use POSIX qw(strtod);
1N/A        my $str = shift;
1N/A        $str =~ s/^\s+//;
1N/A        $str =~ s/\s+$//;
1N/A        $! = 0;
1N/A        my($num, $unparsed) = strtod($str);
1N/A        if (($str eq '') || ($unparsed != 0) || $!) {
1N/A            return undef;
1N/A        } else {
1N/A            return $num;
1N/A        }
1N/A    }
1N/A
1N/A    sub is_numeric { defined getnum($_[0]) }
1N/A
1N/AOr you could check out the L<String::Scanf> module on the CPAN
1N/Ainstead. The POSIX module (part of the standard Perl distribution) provides
1N/Athe C<strtod> and C<strtol> for converting strings to double and longs,
1N/Arespectively.
1N/A
1N/A=head2 How do I keep persistent data across program calls?
1N/A
1N/AFor some specific applications, you can use one of the DBM modules.
1N/ASee L<AnyDBM_File>.  More generically, you should consult the FreezeThaw
1N/Aor Storable modules from CPAN.  Starting from Perl 5.8 Storable is part
1N/Aof the standard distribution.  Here's one example using Storable's C<store>
1N/Aand C<retrieve> functions:
1N/A
1N/A    use Storable;
1N/A    store(\%hash, "filename");
1N/A
1N/A    # later on...
1N/A    $href = retrieve("filename");        # by ref
1N/A    %hash = %{ retrieve("filename") };   # direct to hash
1N/A
1N/A=head2 How do I print out or copy a recursive data structure?
1N/A
1N/AThe Data::Dumper module on CPAN (or the 5.005 release of Perl) is great
1N/Afor printing out data structures.  The Storable module on CPAN (or the
1N/A5.8 release of Perl), provides a function called C<dclone> that recursively
1N/Acopies its argument.
1N/A
1N/A    use Storable qw(dclone);
1N/A    $r2 = dclone($r1);
1N/A
1N/AWhere $r1 can be a reference to any kind of data structure you'd like.
1N/AIt will be deeply copied.  Because C<dclone> takes and returns references,
1N/Ayou'd have to add extra punctuation if you had a hash of arrays that
1N/Ayou wanted to copy.
1N/A
1N/A    %newhash = %{ dclone(\%oldhash) };
1N/A
1N/A=head2 How do I define methods for every class/object?
1N/A
1N/AUse the UNIVERSAL class (see L<UNIVERSAL>).
1N/A
1N/A=head2 How do I verify a credit card checksum?
1N/A
1N/AGet the Business::CreditCard module from CPAN.
1N/A
1N/A=head2 How do I pack arrays of doubles or floats for XS code?
1N/A
1N/AThe kgbpack.c code in the PGPLOT module on CPAN does just this.
1N/AIf you're doing a lot of float or double processing, consider using
1N/Athe PDL module from CPAN instead--it makes number-crunching easy.
1N/A
1N/A=head1 AUTHOR AND COPYRIGHT
1N/A
1N/ACopyright (c) 1997-2002 Tom Christiansen and Nathan Torkington.
1N/AAll rights reserved.
1N/A
1N/AThis documentation is free; you can redistribute it and/or modify it
1N/Aunder the same terms as Perl itself.
1N/A
1N/AIrrespective of its distribution, all code examples in this file
1N/Aare hereby placed into the public domain.  You are permitted and
1N/Aencouraged to use this code in your own programs for fun
1N/Aor for profit as you see fit.  A simple comment in the code giving
1N/Acredit would be courteous but is not required.