distrib/pod/perlhack.pod

1N/A=head1 NAME
1N/A
1N/Aperlhack - How to hack at the Perl internals
1N/A
1N/A=head1 DESCRIPTION
1N/A
1N/AThis document attempts to explain how Perl development takes place,
1N/Aand ends with some suggestions for people wanting to become bona fide
1N/Aporters.
1N/A
1N/AThe perl5-porters mailing list is where the Perl standard distribution
1N/Ais maintained and developed.  The list can get anywhere from 10 to 150
1N/Amessages a day, depending on the heatedness of the debate.  Most days
1N/Athere are two or three patches, extensions, features, or bugs being
1N/Adiscussed at a time.
1N/A
1N/AA searchable archive of the list is at either:
1N/A
1N/A    http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/
1N/A
1N/Aor
1N/A
1N/A    http://archive.develooper.com/perl5-porters@perl.org/
1N/A
1N/AList subscribers (the porters themselves) come in several flavours.
1N/ASome are quiet curious lurkers, who rarely pitch in and instead watch
1N/Athe ongoing development to ensure they're forewarned of new changes or
1N/Afeatures in Perl.  Some are representatives of vendors, who are there
1N/Ato make sure that Perl continues to compile and work on their
1N/Aplatforms.  Some patch any reported bug that they know how to fix,
1N/Asome are actively patching their pet area (threads, Win32, the regexp
1N/Aengine), while others seem to do nothing but complain.  In other
1N/Awords, it's your usual mix of technical people.
1N/A
1N/AOver this group of porters presides Larry Wall.  He has the final word
1N/Ain what does and does not change in the Perl language.  Various
1N/Areleases of Perl are shepherded by a ``pumpking'', a porter
1N/Aresponsible for gathering patches, deciding on a patch-by-patch
1N/Afeature-by-feature basis what will and will not go into the release.
1N/AFor instance, Gurusamy Sarathy was the pumpking for the 5.6 release of
1N/APerl, and Jarkko Hietaniemi is the pumpking for the 5.8 release, and
1N/AHugo van der Sanden will be the pumpking for the 5.10 release.
1N/A
1N/AIn addition, various people are pumpkings for different things.  For
1N/Ainstance, Andy Dougherty and Jarkko Hietaniemi share the I<Configure>
1N/Apumpkin.
1N/A
1N/ALarry sees Perl development along the lines of the US government:
1N/Athere's the Legislature (the porters), the Executive branch (the
1N/Apumpkings), and the Supreme Court (Larry).  The legislature can
1N/Adiscuss and submit patches to the executive branch all they like, but
1N/Athe executive branch is free to veto them.  Rarely, the Supreme Court
1N/Awill side with the executive branch over the legislature, or the
1N/Alegislature over the executive branch.  Mostly, however, the
1N/Alegislature and the executive branch are supposed to get along and
1N/Awork out their differences without impeachment or court cases.
1N/A
1N/AYou might sometimes see reference to Rule 1 and Rule 2.  Larry's power
1N/Aas Supreme Court is expressed in The Rules:
1N/A
1N/A=over 4
1N/A
1N/A=item 1
1N/A
1N/ALarry is always by definition right about how Perl should behave.
1N/AThis means he has final veto power on the core functionality.
1N/A
1N/A=item 2
1N/A
1N/ALarry is allowed to change his mind about any matter at a later date,
1N/Aregardless of whether he previously invoked Rule 1.
1N/A
1N/A=back
1N/A
1N/AGot that?  Larry is always right, even when he was wrong.  It's rare
1N/Ato see either Rule exercised, but they are often alluded to.
1N/A
1N/ANew features and extensions to the language are contentious, because
1N/Athe criteria used by the pumpkings, Larry, and other porters to decide
1N/Awhich features should be implemented and incorporated are not codified
1N/Ain a few small design goals as with some other languages.  Instead,
1N/Athe heuristics are flexible and often difficult to fathom.  Here is
1N/Aone person's list, roughly in decreasing order of importance, of
1N/Aheuristics that new features have to be weighed against:
1N/A
1N/A=over 4
1N/A
1N/A=item Does concept match the general goals of Perl?
1N/A
1N/AThese haven't been written anywhere in stone, but one approximation
1N/Ais:
1N/A
1N/A 1. Keep it fast, simple, and useful.
1N/A 2. Keep features/concepts as orthogonal as possible.
1N/A 3. No arbitrary limits (platforms, data sizes, cultures).
1N/A 4. Keep it open and exciting to use/patch/advocate Perl everywhere.
1N/A 5. Either assimilate new technologies, or build bridges to them.
1N/A
1N/A=item Where is the implementation?
1N/A
1N/AAll the talk in the world is useless without an implementation.  In
1N/Aalmost every case, the person or people who argue for a new feature
1N/Awill be expected to be the ones who implement it.  Porters capable
1N/Aof coding new features have their own agendas, and are not available
1N/Ato implement your (possibly good) idea.
1N/A
1N/A=item Backwards compatibility
1N/A
1N/AIt's a cardinal sin to break existing Perl programs.  New warnings are
1N/Acontentious--some say that a program that emits warnings is not
1N/Abroken, while others say it is.  Adding keywords has the potential to
1N/Abreak programs, changing the meaning of existing token sequences or
1N/Afunctions might break programs.
1N/A
1N/A=item Could it be a module instead?
1N/A
1N/APerl 5 has extension mechanisms, modules and XS, specifically to avoid
1N/Athe need to keep changing the Perl interpreter.  You can write modules
1N/Athat export functions, you can give those functions prototypes so they
1N/Acan be called like built-in functions, you can even write XS code to
1N/Amess with the runtime data structures of the Perl interpreter if you
1N/Awant to implement really complicated things.  If it can be done in a
1N/Amodule instead of in the core, it's highly unlikely to be added.
1N/A
1N/A=item Is the feature generic enough?
1N/A
1N/AIs this something that only the submitter wants added to the language,
1N/Aor would it be broadly useful?  Sometimes, instead of adding a feature
1N/Awith a tight focus, the porters might decide to wait until someone
1N/Aimplements the more generalized feature.  For instance, instead of
1N/Aimplementing a ``delayed evaluation'' feature, the porters are waiting
1N/Afor a macro system that would permit delayed evaluation and much more.
1N/A
1N/A=item Does it potentially introduce new bugs?
1N/A
1N/ARadical rewrites of large chunks of the Perl interpreter have the
1N/Apotential to introduce new bugs.  The smaller and more localized the
1N/Achange, the better.
1N/A
1N/A=item Does it preclude other desirable features?
1N/A
1N/AA patch is likely to be rejected if it closes off future avenues of
1N/Adevelopment.  For instance, a patch that placed a true and final
1N/Ainterpretation on prototypes is likely to be rejected because there
1N/Aare still options for the future of prototypes that haven't been
1N/Aaddressed.
1N/A
1N/A=item Is the implementation robust?
1N/A
1N/AGood patches (tight code, complete, correct) stand more chance of
1N/Agoing in.  Sloppy or incorrect patches might be placed on the back
1N/Aburner until the pumpking has time to fix, or might be discarded
1N/Aaltogether without further notice.
1N/A
1N/A=item Is the implementation generic enough to be portable?
1N/A
1N/AThe worst patches make use of a system-specific features.  It's highly
1N/Aunlikely that nonportable additions to the Perl language will be
1N/Aaccepted.
1N/A
1N/A=item Is the implementation tested?
1N/A
1N/APatches which change behaviour (fixing bugs or introducing new features)
1N/Amust include regression tests to verify that everything works as expected.
1N/AWithout tests provided by the original author, how can anyone else changing
1N/Aperl in the future be sure that they haven't unwittingly broken the behaviour
1N/Athe patch implements? And without tests, how can the patch's author be
1N/Aconfident that his/her hard work put into the patch won't be accidentally
1N/Athrown away by someone in the future?
1N/A
1N/A=item Is there enough documentation?
1N/A
1N/APatches without documentation are probably ill-thought out or
1N/Aincomplete.  Nothing can be added without documentation, so submitting
1N/Aa patch for the appropriate manpages as well as the source code is
1N/Aalways a good idea.
1N/A
1N/A=item Is there another way to do it?
1N/A
1N/ALarry said ``Although the Perl Slogan is I<There's More Than One Way
1N/Ato Do It>, I hesitate to make 10 ways to do something''.  This is a
1N/Atricky heuristic to navigate, though--one man's essential addition is
1N/Aanother man's pointless cruft.
1N/A
1N/A=item Does it create too much work?
1N/A
1N/AWork for the pumpking, work for Perl programmers, work for module
1N/Aauthors, ...  Perl is supposed to be easy.
1N/A
1N/A=item Patches speak louder than words
1N/A
1N/AWorking code is always preferred to pie-in-the-sky ideas.  A patch to
1N/Aadd a feature stands a much higher chance of making it to the language
1N/Athan does a random feature request, no matter how fervently argued the
1N/Arequest might be.  This ties into ``Will it be useful?'', as the fact
1N/Athat someone took the time to make the patch demonstrates a strong
1N/Adesire for the feature.
1N/A
1N/A=back
1N/A
1N/AIf you're on the list, you might hear the word ``core'' bandied
1N/Aaround.  It refers to the standard distribution.  ``Hacking on the
1N/Acore'' means you're changing the C source code to the Perl
1N/Ainterpreter.  ``A core module'' is one that ships with Perl.
1N/A
1N/A=head2 Keeping in sync
1N/A
1N/AThe source code to the Perl interpreter, in its different versions, is
1N/Akept in a repository managed by a revision control system ( which is
1N/Acurrently the Perforce program, see http://perforce.com/ ).  The
1N/Apumpkings and a few others have access to the repository to check in
1N/Achanges.  Periodically the pumpking for the development version of Perl
1N/Awill release a new version, so the rest of the porters can see what's
1N/Achanged.  The current state of the main trunk of repository, and patches
1N/Athat describe the individual changes that have happened since the last
1N/Apublic release are available at this location:
1N/A
1N/A    http://public.activestate.com/gsar/APC/
1N/A    ftp://ftp.linux.activestate.com/pub/staff/gsar/APC/
1N/A
1N/AIf you're looking for a particular change, or a change that affected
1N/Aa particular set of files, you may find the B<Perl Repository Browser>
1N/Auseful:
1N/A
1N/A    http://public.activestate.com/cgi-bin/perlbrowse
1N/A
1N/AYou may also want to subscribe to the perl5-changes mailing list to
1N/Areceive a copy of each patch that gets submitted to the maintenance
1N/Aand development "branches" of the perl repository.  See
1N/Ahttp://lists.perl.org/ for subscription information.
1N/A
1N/AIf you are a member of the perl5-porters mailing list, it is a good
1N/Athing to keep in touch with the most recent changes. If not only to
1N/Averify if what you would have posted as a bug report isn't already
1N/Asolved in the most recent available perl development branch, also
1N/Aknown as perl-current, bleading edge perl, bleedperl or bleadperl.
1N/A
1N/ANeedless to say, the source code in perl-current is usually in a perpetual
1N/Astate of evolution.  You should expect it to be very buggy.  Do B<not> use
1N/Ait for any purpose other than testing and development.
1N/A
1N/AKeeping in sync with the most recent branch can be done in several ways,
1N/Abut the most convenient and reliable way is using B<rsync>, available at
1N/Aftp://rsync.samba.org/pub/rsync/ .  (You can also get the most recent
1N/Abranch by FTP.)
1N/A
1N/AIf you choose to keep in sync using rsync, there are two approaches
1N/Ato doing so:
1N/A
1N/A=over 4
1N/A
1N/A=item rsync'ing the source tree
1N/A
1N/APresuming you are in the directory where your perl source resides
1N/Aand you have rsync installed and available, you can `upgrade' to
1N/Athe bleadperl using:
1N/A
1N/A # rsync -avz rsync://ftp.linux.activestate.com/perl-current/ .
1N/A
1N/AThis takes care of updating every single item in the source tree to
1N/Athe latest applied patch level, creating files that are new (to your
1N/Adistribution) and setting date/time stamps of existing files to
1N/Areflect the bleadperl status.
1N/A
1N/ANote that this will not delete any files that were in '.' before
1N/Athe rsync. Once you are sure that the rsync is running correctly,
1N/Arun it with the --delete and the --dry-run options like this:
1N/A
1N/A # rsync -avz --delete --dry-run rsync://ftp.linux.activestate.com/perl-current/ .
1N/A
1N/AThis will I<simulate> an rsync run that also deletes files not
1N/Apresent in the bleadperl master copy. Observe the results from
1N/Athis run closely. If you are sure that the actual run would delete
1N/Ano files precious to you, you could remove the '--dry-run' option.
1N/A
1N/AYou can than check what patch was the latest that was applied by
1N/Alooking in the file B<.patch>, which will show the number of the
1N/Alatest patch.
1N/A
1N/AIf you have more than one machine to keep in sync, and not all of
1N/Athem have access to the WAN (so you are not able to rsync all the
1N/Asource trees to the real source), there are some ways to get around
1N/Athis problem.
1N/A
1N/A=over 4
1N/A
1N/A=item Using rsync over the LAN
1N/A
1N/ASet up a local rsync server which makes the rsynced source tree
1N/Aavailable to the LAN and sync the other machines against this
1N/Adirectory.
1N/A
1N/AFrom http://rsync.samba.org/README.html :
1N/A
1N/A   "Rsync uses rsh or ssh for communication. It does not need to be
1N/A    setuid and requires no special privileges for installation.  It
1N/A    does not require an inetd entry or a daemon.  You must, however,
1N/A    have a working rsh or ssh system.  Using ssh is recommended for
1N/A    its security features."
1N/A
1N/A=item Using pushing over the NFS
1N/A
1N/AHaving the other systems mounted over the NFS, you can take an
1N/Aactive pushing approach by checking the just updated tree against
1N/Athe other not-yet synced trees. An example would be
1N/A
1N/A  #!/usr/bin/perl -w
1N/A
1N/A  use strict;
1N/A  use File::Copy;
1N/A
1N/A  my %MF = map {
1N/A      m/(\S+)/;
1N/A      $1 => [ (stat $1)[2, 7, 9] ]; # mode, size, mtime
1N/A      } `cat MANIFEST`;
1N/A
1N/A  my %remote = map { $_ => "/$_/pro/3gl/CPAN/perl-5.7.1" } qw(host1 host2);
1N/A
1N/A  foreach my $host (keys %remote) {
1N/A      unless (-d $remote{$host}) {
1N/A      print STDERR "Cannot Xsync for host $host\n";
1N/A      next;
1N/A      }
1N/A      foreach my $file (keys %MF) {
1N/A      my $rfile = "$remote{$host}/$file";
1N/A      my ($mode, $size, $mtime) = (stat $rfile)[2, 7, 9];
1N/A      defined $size or ($mode, $size, $mtime) = (0, 0, 0);
1N/A      $size == $MF{$file}[1] && $mtime == $MF{$file}[2] and next;
1N/A      printf "%4s %-34s %8d %9d  %8d %9d\n",
1N/A          $host, $file, $MF{$file}[1], $MF{$file}[2], $size, $mtime;
1N/A      unlink $rfile;
1N/A      copy ($file, $rfile);
1N/A      utime time, $MF{$file}[2], $rfile;
1N/A      chmod $MF{$file}[0], $rfile;
1N/A      }
1N/A      }
1N/A
1N/Athough this is not perfect. It could be improved with checking
1N/Afile checksums before updating. Not all NFS systems support
1N/Areliable utime support (when used over the NFS).
1N/A
1N/A=back
1N/A
1N/A=item rsync'ing the patches
1N/A
1N/AThe source tree is maintained by the pumpking who applies patches to
1N/Athe files in the tree. These patches are either created by the
1N/Apumpking himself using C<diff -c> after updating the file manually or
1N/Aby applying patches sent in by posters on the perl5-porters list.
1N/AThese patches are also saved and rsync'able, so you can apply them
1N/Ayourself to the source files.
1N/A
1N/APresuming you are in a directory where your patches reside, you can
1N/Aget them in sync with
1N/A
1N/A # rsync -avz rsync://ftp.linux.activestate.com/perl-current-diffs/ .
1N/A
1N/AThis makes sure the latest available patch is downloaded to your
1N/Apatch directory.
1N/A
1N/AIt's then up to you to apply these patches, using something like
1N/A
1N/A # last=`ls -t *.gz | sed q`
1N/A # rsync -avz rsync://ftp.linux.activestate.com/perl-current-diffs/ .
1N/A # find . -name '*.gz' -newer $last -exec gzcat {} \; >blead.patch
1N/A # cd ../perl-current
1N/A # patch -p1 -N <../perl-current-diffs/blead.patch
1N/A
1N/Aor, since this is only a hint towards how it works, use CPAN-patchaperl
1N/Afrom Andreas K�nig to have better control over the patching process.
1N/A
1N/A=back
1N/A
1N/A=head2 Why rsync the source tree
1N/A
1N/A=over 4
1N/A
1N/A=item It's easier to rsync the source tree
1N/A
1N/ASince you don't have to apply the patches yourself, you are sure all
1N/Afiles in the source tree are in the right state.
1N/A
1N/A=item It's more reliable
1N/A
1N/AWhile both the rsync-able source and patch areas are automatically
1N/Aupdated every few minutes, keep in mind that applying patches may
1N/Asometimes mean careful hand-holding, especially if your version of
1N/Athe C<patch> program does not understand how to deal with new files,
1N/Afiles with 8-bit characters, or files without trailing newlines.
1N/A
1N/A=back
1N/A
1N/A=head2 Why rsync the patches
1N/A
1N/A=over 4
1N/A
1N/A=item It's easier to rsync the patches
1N/A
1N/AIf you have more than one machine that you want to keep in track with
1N/Ableadperl, it's easier to rsync the patches only once and then apply
1N/Athem to all the source trees on the different machines.
1N/A
1N/AIn case you try to keep in pace on 5 different machines, for which
1N/Aonly one of them has access to the WAN, rsync'ing all the source
1N/Atrees should than be done 5 times over the NFS. Having
1N/Arsync'ed the patches only once, I can apply them to all the source
1N/Atrees automatically. Need you say more ;-)
1N/A
1N/A=item It's a good reference
1N/A
1N/AIf you do not only like to have the most recent development branch,
1N/Abut also like to B<fix> bugs, or extend features, you want to dive
1N/Ainto the sources. If you are a seasoned perl core diver, you don't
1N/Aneed no manuals, tips, roadmaps, perlguts.pod or other aids to find
1N/Ayour way around. But if you are a starter, the patches may help you
1N/Ain finding where you should start and how to change the bits that
1N/Abug you.
1N/A
1N/AThe file B<Changes> is updated on occasions the pumpking sees as his
1N/Aown little sync points. On those occasions, he releases a tar-ball of
1N/Athe current source tree (i.e. perl@7582.tar.gz), which will be an
1N/Aexcellent point to start with when choosing to use the 'rsync the
1N/Apatches' scheme. Starting with perl@7582, which means a set of source
1N/Afiles on which the latest applied patch is number 7582, you apply all
1N/Asucceeding patches available from then on (7583, 7584, ...).
1N/A
1N/AYou can use the patches later as a kind of search archive.
1N/A
1N/A=over 4
1N/A
1N/A=item Finding a start point
1N/A
1N/AIf you want to fix/change the behaviour of function/feature Foo, just
1N/Ascan the patches for patches that mention Foo either in the subject,
1N/Athe comments, or the body of the fix. A good chance the patch shows
1N/Ayou the files that are affected by that patch which are very likely
1N/Ato be the starting point of your journey into the guts of perl.
1N/A
1N/A=item Finding how to fix a bug
1N/A
1N/AIf you've found I<where> the function/feature Foo misbehaves, but you
1N/Adon't know how to fix it (but you do know the change you want to
1N/Amake), you can, again, peruse the patches for similar changes and
1N/Alook how others apply the fix.
1N/A
1N/A=item Finding the source of misbehaviour
1N/A
1N/AWhen you keep in sync with bleadperl, the pumpking would love to
1N/AI<see> that the community efforts really work. So after each of his
1N/Async points, you are to 'make test' to check if everything is still
1N/Ain working order. If it is, you do 'make ok', which will send an OK
1N/Areport to perlbug@perl.org. (If you do not have access to a mailer
1N/Afrom the system you just finished successfully 'make test', you can
1N/Ado 'make okfile', which creates the file C<perl.ok>, which you can
1N/Athan take to your favourite mailer and mail yourself).
1N/A
1N/ABut of course, as always, things will not always lead to a success
1N/Apath, and one or more test do not pass the 'make test'. Before
1N/Asending in a bug report (using 'make nok' or 'make nokfile'), check
1N/Athe mailing list if someone else has reported the bug already and if
1N/Aso, confirm it by replying to that message. If not, you might want to
1N/Atrace the source of that misbehaviour B<before> sending in the bug,
1N/Awhich will help all the other porters in finding the solution.
1N/A
1N/AHere the saved patches come in very handy. You can check the list of
1N/Apatches to see which patch changed what file and what change caused
1N/Athe misbehaviour. If you note that in the bug report, it saves the
1N/Aone trying to solve it, looking for that point.
1N/A
1N/A=back
1N/A
1N/AIf searching the patches is too bothersome, you might consider using
1N/Aperl's bugtron to find more information about discussions and
1N/Aramblings on posted bugs.
1N/A
1N/AIf you want to get the best of both worlds, rsync both the source
1N/Atree for convenience, reliability and ease and rsync the patches
1N/Afor reference.
1N/A
1N/A=back
1N/A
1N/A
1N/A=head2 Perlbug administration
1N/A
1N/AThere is a single remote administrative interface for modifying bug status,
1N/Acategory, open issues etc. using the B<RT> I<bugtracker> system, maintained
1N/Aby I<Robert Spier>.  Become an administrator, and close any bugs you can get
1N/Ayour sticky mitts on:
1N/A
1N/A    http://rt.perl.org
1N/A
1N/AThe bugtracker mechanism for B<perl5> bugs in particular is at:
1N/A
1N/A    http://bugs6.perl.org/perlbug
1N/A
1N/ATo email the bug system administrators:
1N/A
1N/A    "perlbug-admin" <perlbug-admin@perl.org>
1N/A
1N/A
1N/A=head2 Submitting patches
1N/A
1N/AAlways submit patches to I<perl5-porters@perl.org>.  If you're
1N/Apatching a core module and there's an author listed, send the author a
1N/Acopy (see L<Patching a core module>).  This lets other porters review
1N/Ayour patch, which catches a surprising number of errors in patches.
1N/AEither use the diff program (available in source code form from
1N/Aftp://ftp.gnu.org/pub/gnu/ , or use Johan Vromans' I<makepatch>
1N/A(available from I<CPAN/authors/id/JV/>).  Unified diffs are preferred,
1N/Abut context diffs are accepted.  Do not send RCS-style diffs or diffs
1N/Awithout context lines.  More information is given in the
1N/AI<Porting/patching.pod> file in the Perl source distribution.  Please
1N/Apatch against the latest B<development> version (e.g., if you're
1N/Afixing a bug in the 5.005 track, patch against the latest 5.005_5x
1N/Aversion).  Only patches that survive the heat of the development
1N/Abranch get applied to maintenance versions.
1N/A
1N/AYour patch should update the documentation and test suite.  See
1N/AL<Writing a test>.
1N/A
1N/ATo report a bug in Perl, use the program I<perlbug> which comes with
1N/APerl (if you can't get Perl to work, send mail to the address
1N/AI<perlbug@perl.org> or I<perlbug@perl.com>).  Reporting bugs through
1N/AI<perlbug> feeds into the automated bug-tracking system, access to
1N/Awhich is provided through the web at http://bugs.perl.org/ .  It
1N/Aoften pays to check the archives of the perl5-porters mailing list to
1N/Asee whether the bug you're reporting has been reported before, and if
1N/Aso whether it was considered a bug.  See above for the location of
1N/Athe searchable archives.
1N/A
1N/AThe CPAN testers ( http://testers.cpan.org/ ) are a group of
1N/Avolunteers who test CPAN modules on a variety of platforms.  Perl
1N/ASmokers ( http://archives.develooper.com/daily-build@perl.org/ )
1N/Aautomatically tests Perl source releases on platforms with various
1N/Aconfigurations.  Both efforts welcome volunteers.
1N/A
1N/AIt's a good idea to read and lurk for a while before chipping in.
1N/AThat way you'll get to see the dynamic of the conversations, learn the
1N/Apersonalities of the players, and hopefully be better prepared to make
1N/Aa useful contribution when do you speak up.
1N/A
1N/AIf after all this you still think you want to join the perl5-porters
1N/Amailing list, send mail to I<perl5-porters-subscribe@perl.org>.  To
1N/Aunsubscribe, send mail to I<perl5-porters-unsubscribe@perl.org>.
1N/A
1N/ATo hack on the Perl guts, you'll need to read the following things:
1N/A
1N/A=over 3
1N/A
1N/A=item L<perlguts>
1N/A
1N/AThis is of paramount importance, since it's the documentation of what
1N/Agoes where in the Perl source. Read it over a couple of times and it
1N/Amight start to make sense - don't worry if it doesn't yet, because the
1N/Abest way to study it is to read it in conjunction with poking at Perl
1N/Asource, and we'll do that later on.
1N/A
1N/AYou might also want to look at Gisle Aas's illustrated perlguts -
1N/Athere's no guarantee that this will be absolutely up-to-date with the
1N/Alatest documentation in the Perl core, but the fundamentals will be
1N/Aright. ( http://gisle.aas.no/perl/illguts/ )
1N/A
1N/A=item L<perlxstut> and L<perlxs>
1N/A
1N/AA working knowledge of XSUB programming is incredibly useful for core
1N/Ahacking; XSUBs use techniques drawn from the PP code, the portion of the
1N/Aguts that actually executes a Perl program. It's a lot gentler to learn
1N/Athose techniques from simple examples and explanation than from the core
1N/Aitself.
1N/A
1N/A=item L<perlapi>
1N/A
1N/AThe documentation for the Perl API explains what some of the internal
1N/Afunctions do, as well as the many macros used in the source.
1N/A
1N/A=item F<Porting/pumpkin.pod>
1N/A
1N/AThis is a collection of words of wisdom for a Perl porter; some of it is
1N/Aonly useful to the pumpkin holder, but most of it applies to anyone
1N/Awanting to go about Perl development.
1N/A
1N/A=item The perl5-porters FAQ
1N/A
1N/AThis should be available from http://simon-cozens.org/writings/p5p-faq ;
1N/Aalternatively, you can get the FAQ emailed to you by sending mail to
1N/AC<perl5-porters-faq@perl.org>. It contains hints on reading perl5-porters,
1N/Ainformation on how perl5-porters works and how Perl development in general
1N/Aworks.
1N/A
1N/A=back
1N/A
1N/A=head2 Finding Your Way Around
1N/A
1N/APerl maintenance can be split into a number of areas, and certain people
1N/A(pumpkins) will have responsibility for each area. These areas sometimes
1N/Acorrespond to files or directories in the source kit. Among the areas are:
1N/A
1N/A=over 3
1N/A
1N/A=item Core modules
1N/A
1N/AModules shipped as part of the Perl core live in the F<lib/> and F<ext/>
1N/Asubdirectories: F<lib/> is for the pure-Perl modules, and F<ext/>
1N/Acontains the core XS modules.
1N/A
1N/A=item Tests
1N/A
1N/AThere are tests for nearly all the modules, built-ins and major bits
1N/Aof functionality.  Test files all have a .t suffix.  Module tests live
1N/Ain the F<lib/> and F<ext/> directories next to the module being
1N/Atested.  Others live in F<t/>.  See L<Writing a test>
1N/A
1N/A=item Documentation
1N/A
1N/ADocumentation maintenance includes looking after everything in the
1N/AF<pod/> directory, (as well as contributing new documentation) and
1N/Athe documentation to the modules in core.
1N/A
1N/A=item Configure
1N/A
1N/AThe configure process is the way we make Perl portable across the
1N/Amyriad of operating systems it supports. Responsibility for the
1N/Aconfigure, build and installation process, as well as the overall
1N/Aportability of the core code rests with the configure pumpkin - others
1N/Ahelp out with individual operating systems.
1N/A
1N/AThe files involved are the operating system directories, (F<win32/>,
1N/AF<os2/>, F<vms/> and so on) the shell scripts which generate F<config.h>
1N/Aand F<Makefile>, as well as the metaconfig files which generate
1N/AF<Configure>. (metaconfig isn't included in the core distribution.)
1N/A
1N/A=item Interpreter
1N/A
1N/AAnd of course, there's the core of the Perl interpreter itself. Let's
1N/Ahave a look at that in a little more detail.
1N/A
1N/A=back
1N/A
1N/ABefore we leave looking at the layout, though, don't forget that
1N/AF<MANIFEST> contains not only the file names in the Perl distribution,
1N/Abut short descriptions of what's in them, too. For an overview of the
1N/Aimportant files, try this:
1N/A
1N/A    perl -lne 'print if /^[^\/]+\.[ch]\s+/' MANIFEST
1N/A
1N/A=head2 Elements of the interpreter
1N/A
1N/AThe work of the interpreter has two main stages: compiling the code
1N/Ainto the internal representation, or bytecode, and then executing it.
1N/AL<perlguts/Compiled code> explains exactly how the compilation stage
1N/Ahappens.
1N/A
1N/AHere is a short breakdown of perl's operation:
1N/A
1N/A=over 3
1N/A
1N/A=item Startup
1N/A
1N/AThe action begins in F<perlmain.c>. (or F<miniperlmain.c> for miniperl)
1N/AThis is very high-level code, enough to fit on a single screen, and it
1N/Aresembles the code found in L<perlembed>; most of the real action takes
1N/Aplace in F<perl.c>
1N/A
1N/AFirst, F<perlmain.c> allocates some memory and constructs a Perl
1N/Ainterpreter:
1N/A
1N/A    1 PERL_SYS_INIT3(&argc,&argv,&env);
1N/A    2
1N/A    3 if (!PL_do_undump) {
1N/A    4     my_perl = perl_alloc();
1N/A    5     if (!my_perl)
1N/A    6         exit(1);
1N/A    7     perl_construct(my_perl);
1N/A    8     PL_perl_destruct_level = 0;
1N/A    9 }
1N/A
1N/ALine 1 is a macro, and its definition is dependent on your operating
1N/Asystem. Line 3 references C<PL_do_undump>, a global variable - all
1N/Aglobal variables in Perl start with C<PL_>. This tells you whether the
1N/Acurrent running program was created with the C<-u> flag to perl and then
1N/AF<undump>, which means it's going to be false in any sane context.
1N/A
1N/ALine 4 calls a function in F<perl.c> to allocate memory for a Perl
1N/Ainterpreter. It's quite a simple function, and the guts of it looks like
1N/Athis:
1N/A
1N/A    my_perl = (PerlInterpreter*)PerlMem_malloc(sizeof(PerlInterpreter));
1N/A
1N/AHere you see an example of Perl's system abstraction, which we'll see
1N/Alater: C<PerlMem_malloc> is either your system's C<malloc>, or Perl's
1N/Aown C<malloc> as defined in F<malloc.c> if you selected that option at
1N/Aconfigure time.
1N/A
1N/ANext, in line 7, we construct the interpreter; this sets up all the
1N/Aspecial variables that Perl needs, the stacks, and so on.
1N/A
1N/ANow we pass Perl the command line options, and tell it to go:
1N/A
1N/A    exitstatus = perl_parse(my_perl, xs_init, argc, argv, (char **)NULL);
1N/A    if (!exitstatus) {
1N/A        exitstatus = perl_run(my_perl);
1N/A    }
1N/A
1N/A
1N/AC<perl_parse> is actually a wrapper around C<S_parse_body>, as defined
1N/Ain F<perl.c>, which processes the command line options, sets up any
1N/Astatically linked XS modules, opens the program and calls C<yyparse> to
1N/Aparse it.
1N/A
1N/A=item Parsing
1N/A
1N/AThe aim of this stage is to take the Perl source, and turn it into an op
1N/Atree. We'll see what one of those looks like later. Strictly speaking,
1N/Athere's three things going on here.
1N/A
1N/AC<yyparse>, the parser, lives in F<perly.c>, although you're better off
1N/Areading the original YACC input in F<perly.y>. (Yes, Virginia, there
1N/AB<is> a YACC grammar for Perl!) The job of the parser is to take your
1N/Acode and `understand' it, splitting it into sentences, deciding which
1N/Aoperands go with which operators and so on.
1N/A
1N/AThe parser is nobly assisted by the lexer, which chunks up your input
1N/Ainto tokens, and decides what type of thing each token is: a variable
1N/Aname, an operator, a bareword, a subroutine, a core function, and so on.
1N/AThe main point of entry to the lexer is C<yylex>, and that and its
1N/Aassociated routines can be found in F<toke.c>. Perl isn't much like
1N/Aother computer languages; it's highly context sensitive at times, it can
1N/Abe tricky to work out what sort of token something is, or where a token
1N/Aends. As such, there's a lot of interplay between the tokeniser and the
1N/Aparser, which can get pretty frightening if you're not used to it.
1N/A
1N/AAs the parser understands a Perl program, it builds up a tree of
1N/Aoperations for the interpreter to perform during execution. The routines
1N/Awhich construct and link together the various operations are to be found
1N/Ain F<op.c>, and will be examined later.
1N/A
1N/A=item Optimization
1N/A
1N/ANow the parsing stage is complete, and the finished tree represents
1N/Athe operations that the Perl interpreter needs to perform to execute our
1N/Aprogram. Next, Perl does a dry run over the tree looking for
1N/Aoptimisations: constant expressions such as C<3 + 4> will be computed
1N/Anow, and the optimizer will also see if any multiple operations can be
1N/Areplaced with a single one. For instance, to fetch the variable C<$foo>,
1N/Ainstead of grabbing the glob C<*foo> and looking at the scalar
1N/Acomponent, the optimizer fiddles the op tree to use a function which
1N/Adirectly looks up the scalar in question. The main optimizer is C<peep>
1N/Ain F<op.c>, and many ops have their own optimizing functions.
1N/A
1N/A=item Running
1N/A
1N/ANow we're finally ready to go: we have compiled Perl byte code, and all
1N/Athat's left to do is run it. The actual execution is done by the
1N/AC<runops_standard> function in F<run.c>; more specifically, it's done by
1N/Athese three innocent looking lines:
1N/A
1N/A    while ((PL_op = CALL_FPTR(PL_op->op_ppaddr)(aTHX))) {
1N/A        PERL_ASYNC_CHECK();
1N/A    }
1N/A
1N/AYou may be more comfortable with the Perl version of that:
1N/A
1N/A    PERL_ASYNC_CHECK() while $Perl::op = &{$Perl::op->{function}};
1N/A
1N/AWell, maybe not. Anyway, each op contains a function pointer, which
1N/Astipulates the function which will actually carry out the operation.
1N/AThis function will return the next op in the sequence - this allows for
1N/Athings like C<if> which choose the next op dynamically at run time.
1N/AThe C<PERL_ASYNC_CHECK> makes sure that things like signals interrupt
1N/Aexecution if required.
1N/A
1N/AThe actual functions called are known as PP code, and they're spread
1N/Abetween four files: F<pp_hot.c> contains the `hot' code, which is most
1N/Aoften used and highly optimized, F<pp_sys.c> contains all the
1N/Asystem-specific functions, F<pp_ctl.c> contains the functions which
1N/Aimplement control structures (C<if>, C<while> and the like) and F<pp.c>
1N/Acontains everything else. These are, if you like, the C code for Perl's
1N/Abuilt-in functions and operators.
1N/A
1N/A=back
1N/A
1N/A=head2 Internal Variable Types
1N/A
1N/AYou should by now have had a look at L<perlguts>, which tells you about
1N/APerl's internal variable types: SVs, HVs, AVs and the rest. If not, do
1N/Athat now.
1N/A
1N/AThese variables are used not only to represent Perl-space variables, but
1N/Aalso any constants in the code, as well as some structures completely
1N/Ainternal to Perl. The symbol table, for instance, is an ordinary Perl
1N/Ahash. Your code is represented by an SV as it's read into the parser;
1N/Aany program files you call are opened via ordinary Perl filehandles, and
1N/Aso on.
1N/A
1N/AThe core L<Devel::Peek|Devel::Peek> module lets us examine SVs from a
1N/APerl program. Let's see, for instance, how Perl treats the constant
1N/AC<"hello">.
1N/A
1N/A      % perl -MDevel::Peek -e 'Dump("hello")'
1N/A    1 SV = PV(0xa041450) at 0xa04ecbc
1N/A    2   REFCNT = 1
1N/A    3   FLAGS = (POK,READONLY,pPOK)
1N/A    4   PV = 0xa0484e0 "hello"\0
1N/A    5   CUR = 5
1N/A    6   LEN = 6
1N/A
1N/AReading C<Devel::Peek> output takes a bit of practise, so let's go
1N/Athrough it line by line.
1N/A
1N/ALine 1 tells us we're looking at an SV which lives at C<0xa04ecbc> in
1N/Amemory. SVs themselves are very simple structures, but they contain a
1N/Apointer to a more complex structure. In this case, it's a PV, a
1N/Astructure which holds a string value, at location C<0xa041450>.  Line 2
1N/Ais the reference count; there are no other references to this data, so
1N/Ait's 1.
1N/A
1N/ALine 3 are the flags for this SV - it's OK to use it as a PV, it's a
1N/Aread-only SV (because it's a constant) and the data is a PV internally.
1N/ANext we've got the contents of the string, starting at location
1N/AC<0xa0484e0>.
1N/A
1N/ALine 5 gives us the current length of the string - note that this does
1N/AB<not> include the null terminator. Line 6 is not the length of the
1N/Astring, but the length of the currently allocated buffer; as the string
1N/Agrows, Perl automatically extends the available storage via a routine
1N/Acalled C<SvGROW>.
1N/A
1N/AYou can get at any of these quantities from C very easily; just add
1N/AC<Sv> to the name of the field shown in the snippet, and you've got a
1N/Amacro which will return the value: C<SvCUR(sv)> returns the current
1N/Alength of the string, C<SvREFCOUNT(sv)> returns the reference count,
1N/AC<SvPV(sv, len)> returns the string itself with its length, and so on.
1N/AMore macros to manipulate these properties can be found in L<perlguts>.
1N/A
1N/ALet's take an example of manipulating a PV, from C<sv_catpvn>, in F<sv.c>
1N/A
1N/A     1  void
1N/A     2  Perl_sv_catpvn(pTHX_ register SV *sv, register const char *ptr, register STRLEN len)
1N/A     3  {
1N/A     4      STRLEN tlen;
1N/A     5      char *junk;
1N/A
1N/A     6      junk = SvPV_force(sv, tlen);
1N/A     7      SvGROW(sv, tlen + len + 1);
1N/A     8      if (ptr == junk)
1N/A     9          ptr = SvPVX(sv);
1N/A    10      Move(ptr,SvPVX(sv)+tlen,len,char);
1N/A    11      SvCUR(sv) += len;
1N/A    12      *SvEND(sv) = '\0';
1N/A    13      (void)SvPOK_only_UTF8(sv);          /* validate pointer */
1N/A    14      SvTAINT(sv);
1N/A    15  }
1N/A
1N/AThis is a function which adds a string, C<ptr>, of length C<len> onto
1N/Athe end of the PV stored in C<sv>. The first thing we do in line 6 is
1N/Amake sure that the SV B<has> a valid PV, by calling the C<SvPV_force>
1N/Amacro to force a PV. As a side effect, C<tlen> gets set to the current
1N/Avalue of the PV, and the PV itself is returned to C<junk>.
1N/A
1N/AIn line 7, we make sure that the SV will have enough room to accommodate
1N/Athe old string, the new string and the null terminator. If C<LEN> isn't
1N/Abig enough, C<SvGROW> will reallocate space for us.
1N/A
1N/ANow, if C<junk> is the same as the string we're trying to add, we can
1N/Agrab the string directly from the SV; C<SvPVX> is the address of the PV
1N/Ain the SV.
1N/A
1N/ALine 10 does the actual catenation: the C<Move> macro moves a chunk of
1N/Amemory around: we move the string C<ptr> to the end of the PV - that's
1N/Athe start of the PV plus its current length. We're moving C<len> bytes
1N/Aof type C<char>. After doing so, we need to tell Perl we've extended the
1N/Astring, by altering C<CUR> to reflect the new length. C<SvEND> is a
1N/Amacro which gives us the end of the string, so that needs to be a
1N/AC<"\0">.
1N/A
1N/ALine 13 manipulates the flags; since we've changed the PV, any IV or NV
1N/Avalues will no longer be valid: if we have C<$a=10; $a.="6";> we don't
1N/Awant to use the old IV of 10. C<SvPOK_only_utf8> is a special UTF-8-aware
1N/Aversion of C<SvPOK_only>, a macro which turns off the IOK and NOK flags
1N/Aand turns on POK. The final C<SvTAINT> is a macro which launders tainted
1N/Adata if taint mode is turned on.
1N/A
1N/AAVs and HVs are more complicated, but SVs are by far the most common
1N/Avariable type being thrown around. Having seen something of how we
1N/Amanipulate these, let's go on and look at how the op tree is
1N/Aconstructed.
1N/A
1N/A=head2 Op Trees
1N/A
1N/AFirst, what is the op tree, anyway? The op tree is the parsed
1N/Arepresentation of your program, as we saw in our section on parsing, and
1N/Ait's the sequence of operations that Perl goes through to execute your
1N/Aprogram, as we saw in L</Running>.
1N/A
1N/AAn op is a fundamental operation that Perl can perform: all the built-in
1N/Afunctions and operators are ops, and there are a series of ops which
1N/Adeal with concepts the interpreter needs internally - entering and
1N/Aleaving a block, ending a statement, fetching a variable, and so on.
1N/A
1N/AThe op tree is connected in two ways: you can imagine that there are two
1N/A"routes" through it, two orders in which you can traverse the tree.
1N/AFirst, parse order reflects how the parser understood the code, and
1N/Asecondly, execution order tells perl what order to perform the
1N/Aoperations in.
1N/A
1N/AThe easiest way to examine the op tree is to stop Perl after it has
1N/Afinished parsing, and get it to dump out the tree. This is exactly what
1N/Athe compiler backends L<B::Terse|B::Terse>, L<B::Concise|B::Concise>
1N/Aand L<B::Debug|B::Debug> do.
1N/A
1N/ALet's have a look at how Perl sees C<$a = $b + $c>:
1N/A
1N/A     % perl -MO=Terse -e '$a=$b+$c'
1N/A     1  LISTOP (0x8179888) leave
1N/A     2      OP (0x81798b0) enter
1N/A     3      COP (0x8179850) nextstate
1N/A     4      BINOP (0x8179828) sassign
1N/A     5          BINOP (0x8179800) add [1]
1N/A     6              UNOP (0x81796e0) null [15]
1N/A     7                  SVOP (0x80fafe0) gvsv  GV (0x80fa4cc) *b
1N/A     8              UNOP (0x81797e0) null [15]
1N/A     9                  SVOP (0x8179700) gvsv  GV (0x80efeb0) *c
1N/A    10          UNOP (0x816b4f0) null [15]
1N/A    11              SVOP (0x816dcf0) gvsv  GV (0x80fa460) *a
1N/A
1N/ALet's start in the middle, at line 4. This is a BINOP, a binary
1N/Aoperator, which is at location C<0x8179828>. The specific operator in
1N/Aquestion is C<sassign> - scalar assignment - and you can find the code
1N/Awhich implements it in the function C<pp_sassign> in F<pp_hot.c>. As a
1N/Abinary operator, it has two children: the add operator, providing the
1N/Aresult of C<$b+$c>, is uppermost on line 5, and the left hand side is on
1N/Aline 10.
1N/A
1N/ALine 10 is the null op: this does exactly nothing. What is that doing
1N/Athere? If you see the null op, it's a sign that something has been
1N/Aoptimized away after parsing. As we mentioned in L</Optimization>,
1N/Athe optimization stage sometimes converts two operations into one, for
1N/Aexample when fetching a scalar variable. When this happens, instead of
1N/Arewriting the op tree and cleaning up the dangling pointers, it's easier
1N/Ajust to replace the redundant operation with the null op. Originally,
1N/Athe tree would have looked like this:
1N/A
1N/A    10          SVOP (0x816b4f0) rv2sv [15]
1N/A    11              SVOP (0x816dcf0) gv  GV (0x80fa460) *a
1N/A
1N/AThat is, fetch the C<a> entry from the main symbol table, and then look
1N/Aat the scalar component of it: C<gvsv> (C<pp_gvsv> into F<pp_hot.c>)
1N/Ahappens to do both these things.
1N/A
1N/AThe right hand side, starting at line 5 is similar to what we've just
1N/Aseen: we have the C<add> op (C<pp_add> also in F<pp_hot.c>) add together
1N/Atwo C<gvsv>s.
1N/A
1N/ANow, what's this about?
1N/A
1N/A     1  LISTOP (0x8179888) leave
1N/A     2      OP (0x81798b0) enter
1N/A     3      COP (0x8179850) nextstate
1N/A
1N/AC<enter> and C<leave> are scoping ops, and their job is to perform any
1N/Ahousekeeping every time you enter and leave a block: lexical variables
1N/Aare tidied up, unreferenced variables are destroyed, and so on. Every
1N/Aprogram will have those first three lines: C<leave> is a list, and its
1N/Achildren are all the statements in the block. Statements are delimited
1N/Aby C<nextstate>, so a block is a collection of C<nextstate> ops, with
1N/Athe ops to be performed for each statement being the children of
1N/AC<nextstate>. C<enter> is a single op which functions as a marker.
1N/A
1N/AThat's how Perl parsed the program, from top to bottom:
1N/A
1N/A                        Program
1N/A                           |
1N/A                       Statement
1N/A                           |
1N/A                           =
1N/A                          / \
1N/A                         /   \
1N/A                        $a   +
1N/A                            / \
1N/A                          $b   $c
1N/A
1N/AHowever, it's impossible to B<perform> the operations in this order:
1N/Ayou have to find the values of C<$b> and C<$c> before you add them
1N/Atogether, for instance. So, the other thread that runs through the op
1N/Atree is the execution order: each op has a field C<op_next> which points
1N/Ato the next op to be run, so following these pointers tells us how perl
1N/Aexecutes the code. We can traverse the tree in this order using
1N/Athe C<exec> option to C<B::Terse>:
1N/A
1N/A     % perl -MO=Terse,exec -e '$a=$b+$c'
1N/A     1  OP (0x8179928) enter
1N/A     2  COP (0x81798c8) nextstate
1N/A     3  SVOP (0x81796c8) gvsv  GV (0x80fa4d4) *b
1N/A     4  SVOP (0x8179798) gvsv  GV (0x80efeb0) *c
1N/A     5  BINOP (0x8179878) add [1]
1N/A     6  SVOP (0x816dd38) gvsv  GV (0x80fa468) *a
1N/A     7  BINOP (0x81798a0) sassign
1N/A     8  LISTOP (0x8179900) leave
1N/A
1N/AThis probably makes more sense for a human: enter a block, start a
1N/Astatement. Get the values of C<$b> and C<$c>, and add them together.
1N/AFind C<$a>, and assign one to the other. Then leave.
1N/A
1N/AThe way Perl builds up these op trees in the parsing process can be
1N/Aunravelled by examining F<perly.y>, the YACC grammar. Let's take the
1N/Apiece we need to construct the tree for C<$a = $b + $c>
1N/A
1N/A    1 term    :   term ASSIGNOP term
1N/A    2                { $$ = newASSIGNOP(OPf_STACKED, $1, $2, $3); }
1N/A    3         |   term ADDOP term
1N/A    4                { $$ = newBINOP($2, 0, scalar($1), scalar($3)); }
1N/A
1N/AIf you're not used to reading BNF grammars, this is how it works: You're
1N/Afed certain things by the tokeniser, which generally end up in upper
1N/Acase. Here, C<ADDOP>, is provided when the tokeniser sees C<+> in your
1N/Acode. C<ASSIGNOP> is provided when C<=> is used for assigning. These are
1N/A`terminal symbols', because you can't get any simpler than them.
1N/A
1N/AThe grammar, lines one and three of the snippet above, tells you how to
1N/Abuild up more complex forms. These complex forms, `non-terminal symbols'
1N/Aare generally placed in lower case. C<term> here is a non-terminal
1N/Asymbol, representing a single expression.
1N/A
1N/AThe grammar gives you the following rule: you can make the thing on the
1N/Aleft of the colon if you see all the things on the right in sequence.
1N/AThis is called a "reduction", and the aim of parsing is to completely
1N/Areduce the input. There are several different ways you can perform a
1N/Areduction, separated by vertical bars: so, C<term> followed by C<=>
1N/Afollowed by C<term> makes a C<term>, and C<term> followed by C<+>
1N/Afollowed by C<term> can also make a C<term>.
1N/A
1N/ASo, if you see two terms with an C<=> or C<+>, between them, you can
1N/Aturn them into a single expression. When you do this, you execute the
1N/Acode in the block on the next line: if you see C<=>, you'll do the code
1N/Ain line 2. If you see C<+>, you'll do the code in line 4. It's this code
1N/Awhich contributes to the op tree.
1N/A
1N/A            |   term ADDOP term
1N/A            { $$ = newBINOP($2, 0, scalar($1), scalar($3)); }
1N/A
1N/AWhat this does is creates a new binary op, and feeds it a number of
1N/Avariables. The variables refer to the tokens: C<$1> is the first token in
1N/Athe input, C<$2> the second, and so on - think regular expression
1N/Abackreferences. C<$$> is the op returned from this reduction. So, we
1N/Acall C<newBINOP> to create a new binary operator. The first parameter to
1N/AC<newBINOP>, a function in F<op.c>, is the op type. It's an addition
1N/Aoperator, so we want the type to be C<ADDOP>. We could specify this
1N/Adirectly, but it's right there as the second token in the input, so we
1N/Ause C<$2>. The second parameter is the op's flags: 0 means `nothing
1N/Aspecial'. Then the things to add: the left and right hand side of our
1N/Aexpression, in scalar context.
1N/A
1N/A=head2 Stacks
1N/A
1N/AWhen perl executes something like C<addop>, how does it pass on its
1N/Aresults to the next op? The answer is, through the use of stacks. Perl
1N/Ahas a number of stacks to store things it's currently working on, and
1N/Awe'll look at the three most important ones here.
1N/A
1N/A=over 3
1N/A
1N/A=item Argument stack
1N/A
1N/AArguments are passed to PP code and returned from PP code using the
1N/Aargument stack, C<ST>. The typical way to handle arguments is to pop
1N/Athem off the stack, deal with them how you wish, and then push the result
1N/Aback onto the stack. This is how, for instance, the cosine operator
1N/Aworks:
1N/A
1N/A      NV value;
1N/A      value = POPn;
1N/A      value = Perl_cos(value);
1N/A      XPUSHn(value);
1N/A
1N/AWe'll see a more tricky example of this when we consider Perl's macros
1N/Abelow. C<POPn> gives you the NV (floating point value) of the top SV on
1N/Athe stack: the C<$x> in C<cos($x)>. Then we compute the cosine, and push
1N/Athe result back as an NV. The C<X> in C<XPUSHn> means that the stack
1N/Ashould be extended if necessary - it can't be necessary here, because we
1N/Aknow there's room for one more item on the stack, since we've just
1N/Aremoved one! The C<XPUSH*> macros at least guarantee safety.
1N/A
1N/AAlternatively, you can fiddle with the stack directly: C<SP> gives you
1N/Athe first element in your portion of the stack, and C<TOP*> gives you
1N/Athe top SV/IV/NV/etc. on the stack. So, for instance, to do unary
1N/Anegation of an integer:
1N/A
1N/A     SETi(-TOPi);
1N/A
1N/AJust set the integer value of the top stack entry to its negation.
1N/A
1N/AArgument stack manipulation in the core is exactly the same as it is in
1N/AXSUBs - see L<perlxstut>, L<perlxs> and L<perlguts> for a longer
1N/Adescription of the macros used in stack manipulation.
1N/A
1N/A=item Mark stack
1N/A
1N/AI say `your portion of the stack' above because PP code doesn't
1N/Anecessarily get the whole stack to itself: if your function calls
1N/Aanother function, you'll only want to expose the arguments aimed for the
1N/Acalled function, and not (necessarily) let it get at your own data. The
1N/Away we do this is to have a `virtual' bottom-of-stack, exposed to each
1N/Afunction. The mark stack keeps bookmarks to locations in the argument
1N/Astack usable by each function. For instance, when dealing with a tied
1N/Avariable, (internally, something with `P' magic) Perl has to call
1N/Amethods for accesses to the tied variables. However, we need to separate
1N/Athe arguments exposed to the method to the argument exposed to the
1N/Aoriginal function - the store or fetch or whatever it may be. Here's how
1N/Athe tied C<push> is implemented; see C<av_push> in F<av.c>:
1N/A
1N/A     1  PUSHMARK(SP);
1N/A     2  EXTEND(SP,2);
1N/A     3  PUSHs(SvTIED_obj((SV*)av, mg));
1N/A     4  PUSHs(val);
1N/A     5  PUTBACK;
1N/A     6  ENTER;
1N/A     7  call_method("PUSH", G_SCALAR|G_DISCARD);
1N/A     8  LEAVE;
1N/A     9  POPSTACK;
1N/A
1N/AThe lines which concern the mark stack are the first, fifth and last
1N/Alines: they save away, restore and remove the current position of the
1N/Aargument stack.
1N/A
1N/ALet's examine the whole implementation, for practice:
1N/A
1N/A     1  PUSHMARK(SP);
1N/A
1N/APush the current state of the stack pointer onto the mark stack. This is
1N/Aso that when we've finished adding items to the argument stack, Perl
1N/Aknows how many things we've added recently.
1N/A
1N/A     2  EXTEND(SP,2);
1N/A     3  PUSHs(SvTIED_obj((SV*)av, mg));
1N/A     4  PUSHs(val);
1N/A
1N/AWe're going to add two more items onto the argument stack: when you have
1N/Aa tied array, the C<PUSH> subroutine receives the object and the value
1N/Ato be pushed, and that's exactly what we have here - the tied object,
1N/Aretrieved with C<SvTIED_obj>, and the value, the SV C<val>.
1N/A
1N/A     5  PUTBACK;
1N/A
1N/ANext we tell Perl to make the change to the global stack pointer: C<dSP>
1N/Aonly gave us a local copy, not a reference to the global.
1N/A
1N/A     6  ENTER;
1N/A     7  call_method("PUSH", G_SCALAR|G_DISCARD);
1N/A     8  LEAVE;
1N/A
1N/AC<ENTER> and C<LEAVE> localise a block of code - they make sure that all
1N/Avariables are tidied up, everything that has been localised gets
1N/Aits previous value returned, and so on. Think of them as the C<{> and
1N/AC<}> of a Perl block.
1N/A
1N/ATo actually do the magic method call, we have to call a subroutine in
1N/APerl space: C<call_method> takes care of that, and it's described in
1N/AL<perlcall>. We call the C<PUSH> method in scalar context, and we're
1N/Agoing to discard its return value.
1N/A
1N/A     9  POPSTACK;
1N/A
1N/AFinally, we remove the value we placed on the mark stack, since we
1N/Adon't need it any more.
1N/A
1N/A=item Save stack
1N/A
1N/AC doesn't have a concept of local scope, so perl provides one. We've
1N/Aseen that C<ENTER> and C<LEAVE> are used as scoping braces; the save
1N/Astack implements the C equivalent of, for example:
1N/A
1N/A    {
1N/A        local $foo = 42;
1N/A        ...
1N/A    }
1N/A
1N/ASee L<perlguts/Localising Changes> for how to use the save stack.
1N/A
1N/A=back
1N/A
1N/A=head2 Millions of Macros
1N/A
1N/AOne thing you'll notice about the Perl source is that it's full of
1N/Amacros. Some have called the pervasive use of macros the hardest thing
1N/Ato understand, others find it adds to clarity. Let's take an example,
1N/Athe code which implements the addition operator:
1N/A
1N/A   1  PP(pp_add)
1N/A   2  {
1N/A   3      dSP; dATARGET; tryAMAGICbin(add,opASSIGN);
1N/A   4      {
1N/A   5        dPOPTOPnnrl_ul;
1N/A   6        SETn( left + right );
1N/A   7        RETURN;
1N/A   8      }
1N/A   9  }
1N/A
1N/AEvery line here (apart from the braces, of course) contains a macro. The
1N/Afirst line sets up the function declaration as Perl expects for PP code;
1N/Aline 3 sets up variable declarations for the argument stack and the
1N/Atarget, the return value of the operation. Finally, it tries to see if
1N/Athe addition operation is overloaded; if so, the appropriate subroutine
1N/Ais called.
1N/A
1N/ALine 5 is another variable declaration - all variable declarations start
1N/Awith C<d> - which pops from the top of the argument stack two NVs (hence
1N/AC<nn>) and puts them into the variables C<right> and C<left>, hence the
1N/AC<rl>. These are the two operands to the addition operator. Next, we
1N/Acall C<SETn> to set the NV of the return value to the result of adding
1N/Athe two values. This done, we return - the C<RETURN> macro makes sure
1N/Athat our return value is properly handled, and we pass the next operator
1N/Ato run back to the main run loop.
1N/A
1N/AMost of these macros are explained in L<perlapi>, and some of the more
1N/Aimportant ones are explained in L<perlxs> as well. Pay special attention
1N/Ato L<perlguts/Background and PERL_IMPLICIT_CONTEXT> for information on
1N/Athe C<[pad]THX_?> macros.
1N/A
1N/A=head2 The .i Targets
1N/A
1N/AYou can expand the macros in a F<foo.c> file by saying
1N/A
1N/A    make foo.i
1N/A
1N/Awhich will expand the macros using cpp.  Don't be scared by the results.
1N/A
1N/A=head2 Poking at Perl
1N/A
1N/ATo really poke around with Perl, you'll probably want to build Perl for
1N/Adebugging, like this:
1N/A
1N/A    ./Configure -d -D optimize=-g
1N/A    make
1N/A
1N/AC<-g> is a flag to the C compiler to have it produce debugging
1N/Ainformation which will allow us to step through a running program.
1N/AF<Configure> will also turn on the C<DEBUGGING> compilation symbol which
1N/Aenables all the internal debugging code in Perl. There are a whole bunch
1N/Aof things you can debug with this: L<perlrun> lists them all, and the
1N/Abest way to find out about them is to play about with them. The most
1N/Auseful options are probably
1N/A
1N/A    l  Context (loop) stack processing
1N/A    t  Trace execution
1N/A    o  Method and overloading resolution
1N/A    c  String/numeric conversions
1N/A
1N/ASome of the functionality of the debugging code can be achieved using XS
1N/Amodules.
1N/A
1N/A    -Dr => use re 'debug'
1N/A    -Dx => use O 'Debug'
1N/A
1N/A=head2 Using a source-level debugger
1N/A
1N/AIf the debugging output of C<-D> doesn't help you, it's time to step
1N/Athrough perl's execution with a source-level debugger.
1N/A
1N/A=over 3
1N/A
1N/A=item *
1N/A
1N/AWe'll use C<gdb> for our examples here; the principles will apply to any
1N/Adebugger, but check the manual of the one you're using.
1N/A
1N/A=back
1N/A
1N/ATo fire up the debugger, type
1N/A
1N/A    gdb ./perl
1N/A
1N/AYou'll want to do that in your Perl source tree so the debugger can read
1N/Athe source code. You should see the copyright message, followed by the
1N/Aprompt.
1N/A
1N/A    (gdb)
1N/A
1N/AC<help> will get you into the documentation, but here are the most
1N/Auseful commands:
1N/A
1N/A=over 3
1N/A
1N/A=item run [args]
1N/A
1N/ARun the program with the given arguments.
1N/A
1N/A=item break function_name
1N/A
1N/A=item break source.c:xxx
1N/A
1N/ATells the debugger that we'll want to pause execution when we reach
1N/Aeither the named function (but see L<perlguts/Internal Functions>!) or the given
1N/Aline in the named source file.
1N/A
1N/A=item step
1N/A
1N/ASteps through the program a line at a time.
1N/A
1N/A=item next
1N/A
1N/ASteps through the program a line at a time, without descending into
1N/Afunctions.
1N/A
1N/A=item continue
1N/A
1N/ARun until the next breakpoint.
1N/A
1N/A=item finish
1N/A
1N/ARun until the end of the current function, then stop again.
1N/A
1N/A=item 'enter'
1N/A
1N/AJust pressing Enter will do the most recent operation again - it's a
1N/Ablessing when stepping through miles of source code.
1N/A
1N/A=item print
1N/A
1N/AExecute the given C code and print its results. B<WARNING>: Perl makes
1N/Aheavy use of macros, and F<gdb> does not necessarily support macros
1N/A(see later L</"gdb macro support">).  You'll have to substitute them
1N/Ayourself, or to invoke cpp on the source code files
1N/A(see L</"The .i Targets">)
1N/ASo, for instance, you can't say
1N/A
1N/A    print SvPV_nolen(sv)
1N/A
1N/Abut you have to say
1N/A
1N/A    print Perl_sv_2pv_nolen(sv)
1N/A
1N/A=back
1N/A
1N/AYou may find it helpful to have a "macro dictionary", which you can
1N/Aproduce by saying C<cpp -dM perl.c | sort>. Even then, F<cpp> won't
1N/Arecursively apply those macros for you.
1N/A
1N/A=head2 gdb macro support
1N/A
1N/ARecent versions of F<gdb> have fairly good macro support, but
1N/Ain order to use it you'll need to compile perl with macro definitions
1N/Aincluded in the debugging information.  Using F<gcc> version 3.1, this
1N/Ameans configuring with C<-Doptimize=-g3>.  Other compilers might use a
1N/Adifferent switch (if they support debugging macros at all).
1N/A
1N/A=head2 Dumping Perl Data Structures
1N/A
1N/AOne way to get around this macro hell is to use the dumping functions in
1N/AF<dump.c>; these work a little like an internal
1N/AL<Devel::Peek|Devel::Peek>, but they also cover OPs and other structures
1N/Athat you can't get at from Perl. Let's take an example. We'll use the
1N/AC<$a = $b + $c> we used before, but give it a bit of context:
1N/AC<$b = "6XXXX"; $c = 2.3;>. Where's a good place to stop and poke around?
1N/A
1N/AWhat about C<pp_add>, the function we examined earlier to implement the
1N/AC<+> operator:
1N/A
1N/A    (gdb) break Perl_pp_add
1N/A    Breakpoint 1 at 0x46249f: file pp_hot.c, line 309.
1N/A
1N/ANotice we use C<Perl_pp_add> and not C<pp_add> - see L<perlguts/Internal Functions>.
1N/AWith the breakpoint in place, we can run our program:
1N/A
1N/A    (gdb) run -e '$b = "6XXXX"; $c = 2.3; $a = $b + $c'
1N/A
1N/ALots of junk will go past as gdb reads in the relevant source files and
1N/Alibraries, and then:
1N/A
1N/A    Breakpoint 1, Perl_pp_add () at pp_hot.c:309
1N/A    309         dSP; dATARGET; tryAMAGICbin(add,opASSIGN);
1N/A    (gdb) step
1N/A    311           dPOPTOPnnrl_ul;
1N/A    (gdb)
1N/A
1N/AWe looked at this bit of code before, and we said that C<dPOPTOPnnrl_ul>
1N/Aarranges for two C<NV>s to be placed into C<left> and C<right> - let's
1N/Aslightly expand it:
1N/A
1N/A    #define dPOPTOPnnrl_ul  NV right = POPn; \
1N/A                            SV *leftsv = TOPs; \
1N/A                            NV left = USE_LEFT(leftsv) ? SvNV(leftsv) : 0.0
1N/A
1N/AC<POPn> takes the SV from the top of the stack and obtains its NV either
1N/Adirectly (if C<SvNOK> is set) or by calling the C<sv_2nv> function.
1N/AC<TOPs> takes the next SV from the top of the stack - yes, C<POPn> uses
1N/AC<TOPs> - but doesn't remove it. We then use C<SvNV> to get the NV from
1N/AC<leftsv> in the same way as before - yes, C<POPn> uses C<SvNV>.
1N/A
1N/ASince we don't have an NV for C<$b>, we'll have to use C<sv_2nv> to
1N/Aconvert it. If we step again, we'll find ourselves there:
1N/A
1N/A    Perl_sv_2nv (sv=0xa0675d0) at sv.c:1669
1N/A    1669        if (!sv)
1N/A    (gdb)
1N/A
1N/AWe can now use C<Perl_sv_dump> to investigate the SV:
1N/A
1N/A    SV = PV(0xa057cc0) at 0xa0675d0
1N/A    REFCNT = 1
1N/A    FLAGS = (POK,pPOK)
1N/A    PV = 0xa06a510 "6XXXX"\0
1N/A    CUR = 5
1N/A    LEN = 6
1N/A    $1 = void
1N/A
1N/AWe know we're going to get C<6> from this, so let's finish the
1N/Asubroutine:
1N/A
1N/A    (gdb) finish
1N/A    Run till exit from #0  Perl_sv_2nv (sv=0xa0675d0) at sv.c:1671
1N/A    0x462669 in Perl_pp_add () at pp_hot.c:311
1N/A    311           dPOPTOPnnrl_ul;
1N/A
1N/AWe can also dump out this op: the current op is always stored in
1N/AC<PL_op>, and we can dump it with C<Perl_op_dump>. This'll give us
1N/Asimilar output to L<B::Debug|B::Debug>.
1N/A
1N/A    {
1N/A    13  TYPE = add  ===> 14
1N/A        TARG = 1
1N/A        FLAGS = (SCALAR,KIDS)
1N/A        {
1N/A            TYPE = null  ===> (12)
1N/A              (was rv2sv)
1N/A            FLAGS = (SCALAR,KIDS)
1N/A            {
1N/A    11          TYPE = gvsv  ===> 12
1N/A                FLAGS = (SCALAR)
1N/A                GV = main::b
1N/A            }
1N/A        }
1N/A
1N/A# finish this later #
1N/A
1N/A=head2 Patching
1N/A
1N/AAll right, we've now had a look at how to navigate the Perl sources and
1N/Asome things you'll need to know when fiddling with them. Let's now get
1N/Aon and create a simple patch. Here's something Larry suggested: if a
1N/AC<U> is the first active format during a C<pack>, (for example,
1N/AC<pack "U3C8", @stuff>) then the resulting string should be treated as
1N/AUTF-8 encoded.
1N/A
1N/AHow do we prepare to fix this up? First we locate the code in question -
1N/Athe C<pack> happens at runtime, so it's going to be in one of the F<pp>
1N/Afiles. Sure enough, C<pp_pack> is in F<pp.c>. Since we're going to be
1N/Aaltering this file, let's copy it to F<pp.c~>.
1N/A
1N/A[Well, it was in F<pp.c> when this tutorial was written. It has now been
1N/Asplit off with C<pp_unpack> to its own file, F<pp_pack.c>]
1N/A
1N/ANow let's look over C<pp_pack>: we take a pattern into C<pat>, and then
1N/Aloop over the pattern, taking each format character in turn into
1N/AC<datum_type>. Then for each possible format character, we swallow up
1N/Athe other arguments in the pattern (a field width, an asterisk, and so
1N/Aon) and convert the next chunk input into the specified format, adding
1N/Ait onto the output SV C<cat>.
1N/A
1N/AHow do we know if the C<U> is the first format in the C<pat>? Well, if
1N/Awe have a pointer to the start of C<pat> then, if we see a C<U> we can
1N/Atest whether we're still at the start of the string. So, here's where
1N/AC<pat> is set up:
1N/A
1N/A    STRLEN fromlen;
1N/A    register char *pat = SvPVx(*++MARK, fromlen);
1N/A    register char *patend = pat + fromlen;
1N/A    register I32 len;
1N/A    I32 datumtype;
1N/A    SV *fromstr;
1N/A
1N/AWe'll have another string pointer in there:
1N/A
1N/A    STRLEN fromlen;
1N/A    register char *pat = SvPVx(*++MARK, fromlen);
1N/A    register char *patend = pat + fromlen;
1N/A +  char *patcopy;
1N/A    register I32 len;
1N/A    I32 datumtype;
1N/A    SV *fromstr;
1N/A
1N/AAnd just before we start the loop, we'll set C<patcopy> to be the start
1N/Aof C<pat>:
1N/A
1N/A    items = SP - MARK;
1N/A    MARK++;
1N/A    sv_setpvn(cat, "", 0);
1N/A +  patcopy = pat;
1N/A    while (pat < patend) {
1N/A
1N/ANow if we see a C<U> which was at the start of the string, we turn on
1N/Athe C<UTF8> flag for the output SV, C<cat>:
1N/A
1N/A +  if (datumtype == 'U' && pat==patcopy+1)
1N/A +      SvUTF8_on(cat);
1N/A    if (datumtype == '#') {
1N/A        while (pat < patend && *pat != '\n')
1N/A            pat++;
1N/A
1N/ARemember that it has to be C<patcopy+1> because the first character of
1N/Athe string is the C<U> which has been swallowed into C<datumtype!>
1N/A
1N/AOops, we forgot one thing: what if there are spaces at the start of the
1N/Apattern? C<pack("  U*", @stuff)> will have C<U> as the first active
1N/Acharacter, even though it's not the first thing in the pattern. In this
1N/Acase, we have to advance C<patcopy> along with C<pat> when we see spaces:
1N/A
1N/A    if (isSPACE(datumtype))
1N/A        continue;
1N/A
1N/Aneeds to become
1N/A
1N/A    if (isSPACE(datumtype)) {
1N/A        patcopy++;
1N/A        continue;
1N/A    }
1N/A
1N/AOK. That's the C part done. Now we must do two additional things before
1N/Athis patch is ready to go: we've changed the behaviour of Perl, and so
1N/Awe must document that change. We must also provide some more regression
1N/Atests to make sure our patch works and doesn't create a bug somewhere
1N/Aelse along the line.
1N/A
1N/AThe regression tests for each operator live in F<t/op/>, and so we
1N/Amake a copy of F<t/op/pack.t> to F<t/op/pack.t~>. Now we can add our
1N/Atests to the end. First, we'll test that the C<U> does indeed create
1N/AUnicode strings.
1N/A
1N/At/op/pack.t has a sensible ok() function, but if it didn't we could
1N/Ause the one from t/test.pl.
1N/A
1N/A require './test.pl';
1N/A plan( tests => 159 );
1N/A
1N/Aso instead of this:
1N/A
1N/A print 'not ' unless "1.20.300.4000" eq sprintf "%vd", pack("U*",1,20,300,4000);
1N/A print "ok $test\n"; $test++;
1N/A
1N/Awe can write the more sensible (see L<Test::More> for a full
1N/Aexplanation of is() and other testing functions).
1N/A
1N/A is( "1.20.300.4000", sprintf "%vd", pack("U*",1,20,300,4000),
1N/A                                       "U* produces unicode" );
1N/A
1N/ANow we'll test that we got that space-at-the-beginning business right:
1N/A
1N/A is( "1.20.300.4000", sprintf "%vd", pack("  U*",1,20,300,4000),
1N/A                                       "  with spaces at the beginning" );
1N/A
1N/AAnd finally we'll test that we don't make Unicode strings if C<U> is B<not>
1N/Athe first active format:
1N/A
1N/A isnt( v1.20.300.4000, sprintf "%vd", pack("C0U*",1,20,300,4000),
1N/A                                       "U* not first isn't unicode" );
1N/A
1N/AMustn't forget to change the number of tests which appears at the top,
1N/Aor else the automated tester will get confused.  This will either look
1N/Alike this:
1N/A
1N/A print "1..156\n";
1N/A
1N/Aor this:
1N/A
1N/A plan( tests => 156 );
1N/A
1N/AWe now compile up Perl, and run it through the test suite. Our new
1N/Atests pass, hooray!
1N/A
1N/AFinally, the documentation. The job is never done until the paperwork is
1N/Aover, so let's describe the change we've just made. The relevant place
1N/Ais F<pod/perlfunc.pod>; again, we make a copy, and then we'll insert
1N/Athis text in the description of C<pack>:
1N/A
1N/A =item *
1N/A
1N/A If the pattern begins with a C<U>, the resulting string will be treated
1N/A as UTF-8-encoded Unicode. You can force UTF-8 encoding on in a string
1N/A with an initial C<U0>, and the bytes that follow will be interpreted as
1N/A Unicode characters. If you don't want this to happen, you can begin your
1N/A pattern with C<C0> (or anything else) to force Perl not to UTF-8 encode your
1N/A string, and then follow this with a C<U*> somewhere in your pattern.
1N/A
1N/AAll done. Now let's create the patch. F<Porting/patching.pod> tells us
1N/Athat if we're making major changes, we should copy the entire directory
1N/Ato somewhere safe before we begin fiddling, and then do
1N/A
1N/A    diff -ruN old new > patch
1N/A
1N/AHowever, we know which files we've changed, and we can simply do this:
1N/A
1N/A    diff -u pp.c~             pp.c             >  patch
1N/A    diff -u t/op/pack.t~      t/op/pack.t      >> patch
1N/A    diff -u pod/perlfunc.pod~ pod/perlfunc.pod >> patch
1N/A
1N/AWe end up with a patch looking a little like this:
1N/A
1N/A    --- pp.c~       Fri Jun 02 04:34:10 2000
1N/A    +++ pp.c        Fri Jun 16 11:37:25 2000
1N/A    @@ -4375,6 +4375,7 @@
1N/A         register I32 items;
1N/A         STRLEN fromlen;
1N/A         register char *pat = SvPVx(*++MARK, fromlen);
1N/A    +    char *patcopy;
1N/A         register char *patend = pat + fromlen;
1N/A         register I32 len;
1N/A         I32 datumtype;
1N/A    @@ -4405,6 +4406,7 @@
1N/A    ...
1N/A
1N/AAnd finally, we submit it, with our rationale, to perl5-porters. Job
1N/Adone!
1N/A
1N/A=head2 Patching a core module
1N/A
1N/AThis works just like patching anything else, with an extra
1N/Aconsideration.  Many core modules also live on CPAN.  If this is so,
1N/Apatch the CPAN version instead of the core and send the patch off to
1N/Athe module maintainer (with a copy to p5p).  This will help the module
1N/Amaintainer keep the CPAN version in sync with the core version without
1N/Aconstantly scanning p5p.
1N/A
1N/A=head2 Adding a new function to the core
1N/A
1N/AIf, as part of a patch to fix a bug, or just because you have an
1N/Aespecially good idea, you decide to add a new function to the core,
1N/Adiscuss your ideas on p5p well before you start work.  It may be that
1N/Asomeone else has already attempted to do what you are considering and
1N/Acan give lots of good advice or even provide you with bits of code
1N/Athat they already started (but never finished).
1N/A
1N/AYou have to follow all of the advice given above for patching.  It is
1N/Aextremely important to test any addition thoroughly and add new tests
1N/Ato explore all boundary conditions that your new function is expected
1N/Ato handle.  If your new function is used only by one module (e.g. toke),
1N/Athen it should probably be named S_your_function (for static); on the
1N/Aother hand, if you expect it to accessible from other functions in
1N/APerl, you should name it Perl_your_function.  See L<perlguts/Internal Functions>
1N/Afor more details.
1N/A
1N/AThe location of any new code is also an important consideration.  Don't
1N/Ajust create a new top level .c file and put your code there; you would
1N/Ahave to make changes to Configure (so the Makefile is created properly),
1N/Aas well as possibly lots of include files.  This is strictly pumpking
1N/Abusiness.
1N/A
1N/AIt is better to add your function to one of the existing top level
1N/Asource code files, but your choice is complicated by the nature of
1N/Athe Perl distribution.  Only the files that are marked as compiled
1N/Astatic are located in the perl executable.  Everything else is located
1N/Ain the shared library (or DLL if you are running under WIN32).  So,
1N/Afor example, if a function was only used by functions located in
1N/Atoke.c, then your code can go in toke.c.  If, however, you want to call
1N/Athe function from universal.c, then you should put your code in another
1N/Alocation, for example util.c.
1N/A
1N/AIn addition to writing your c-code, you will need to create an
1N/Aappropriate entry in embed.pl describing your function, then run
1N/A'make regen_headers' to create the entries in the numerous header
1N/Afiles that perl needs to compile correctly.  See L<perlguts/Internal Functions>
1N/Afor information on the various options that you can set in embed.pl.
1N/AYou will forget to do this a few (or many) times and you will get
1N/Awarnings during the compilation phase.  Make sure that you mention
1N/Athis when you post your patch to P5P; the pumpking needs to know this.
1N/A
1N/AWhen you write your new code, please be conscious of existing code
1N/Aconventions used in the perl source files.  See L<perlstyle> for
1N/Adetails.  Although most of the guidelines discussed seem to focus on
1N/APerl code, rather than c, they all apply (except when they don't ;).
1N/ASee also I<Porting/patching.pod> file in the Perl source distribution
1N/Afor lots of details about both formatting and submitting patches of
1N/Ayour changes.
1N/A
1N/ALastly, TEST TEST TEST TEST TEST any code before posting to p5p.
1N/ATest on as many platforms as you can find.  Test as many perl
1N/AConfigure options as you can (e.g. MULTIPLICITY).  If you have
1N/Aprofiling or memory tools, see L<EXTERNAL TOOLS FOR DEBUGGING PERL>
1N/Abelow for how to use them to further test your code.  Remember that
1N/Amost of the people on P5P are doing this on their own time and
1N/Adon't have the time to debug your code.
1N/A
1N/A=head2 Writing a test
1N/A
1N/AEvery module and built-in function has an associated test file (or
1N/Ashould...).  If you add or change functionality, you have to write a
1N/Atest.  If you fix a bug, you have to write a test so that bug never
1N/Acomes back.  If you alter the docs, it would be nice to test what the
1N/Anew documentation says.
1N/A
1N/AIn short, if you submit a patch you probably also have to patch the
1N/Atests.
1N/A
1N/AFor modules, the test file is right next to the module itself.
1N/AF<lib/strict.t> tests F<lib/strict.pm>.  This is a recent innovation,
1N/Aso there are some snags (and it would be wonderful for you to brush
1N/Athem out), but it basically works that way.  Everything else lives in
1N/AF<t/>.
1N/A
1N/A=over 3
1N/A
1N/A=item F<t/base/>
1N/A
1N/ATesting of the absolute basic functionality of Perl.  Things like
1N/AC<if>, basic file reads and writes, simple regexes, etc.  These are
1N/Arun first in the test suite and if any of them fail, something is
1N/AI<really> broken.
1N/A
1N/A=item F<t/cmd/>
1N/A
1N/AThese test the basic control structures, C<if/else>, C<while>,
1N/Asubroutines, etc.
1N/A
1N/A=item F<t/comp/>
1N/A
1N/ATests basic issues of how Perl parses and compiles itself.
1N/A
1N/A=item F<t/io/>
1N/A
1N/ATests for built-in IO functions, including command line arguments.
1N/A
1N/A=item F<t/lib/>
1N/A
1N/AThe old home for the module tests, you shouldn't put anything new in
1N/Ahere.  There are still some bits and pieces hanging around in here
1N/Athat need to be moved.  Perhaps you could move them?  Thanks!
1N/A
1N/A=item F<t/op/>
1N/A
1N/ATests for perl's built in functions that don't fit into any of the
1N/Aother directories.
1N/A
1N/A=item F<t/pod/>
1N/A
1N/ATests for POD directives.  There are still some tests for the Pod
1N/Amodules hanging around in here that need to be moved out into F<lib/>.
1N/A
1N/A=item F<t/run/>
1N/A
1N/ATesting features of how perl actually runs, including exit codes and
1N/Ahandling of PERL* environment variables.
1N/A
1N/A=item F<t/uni/>
1N/A
1N/ATests for the core support of Unicode.
1N/A
1N/A=item F<t/win32/>
1N/A
1N/AWindows-specific tests.
1N/A
1N/A=item F<t/x2p>
1N/A
1N/AA test suite for the s2p converter.
1N/A
1N/A=back
1N/A
1N/AThe core uses the same testing style as the rest of Perl, a simple
1N/A"ok/not ok" run through Test::Harness, but there are a few special
1N/Aconsiderations.
1N/A
1N/AThere are three ways to write a test in the core.  Test::More,
1N/At/test.pl and ad hoc C<print $test ? "ok 42\n" : "not ok 42\n">.  The
1N/Adecision of which to use depends on what part of the test suite you're
1N/Aworking on.  This is a measure to prevent a high-level failure (such
1N/Aas Config.pm breaking) from causing basic functionality tests to fail.
1N/A
1N/A=over 4
1N/A
1N/A=item t/base t/comp
1N/A
1N/ASince we don't know if require works, or even subroutines, use ad hoc
1N/Atests for these two.  Step carefully to avoid using the feature being
1N/Atested.
1N/A
1N/A=item t/cmd t/run t/io t/op
1N/A
1N/ANow that basic require() and subroutines are tested, you can use the
1N/At/test.pl library which emulates the important features of Test::More
1N/Awhile using a minimum of core features.
1N/A
1N/AYou can also conditionally use certain libraries like Config, but be
1N/Asure to skip the test gracefully if it's not there.
1N/A
1N/A=item t/lib ext lib
1N/A
1N/ANow that the core of Perl is tested, Test::More can be used.  You can
1N/Aalso use the full suite of core modules in the tests.
1N/A
1N/A=back
1N/A
1N/AWhen you say "make test" Perl uses the F<t/TEST> program to run the
1N/Atest suite.  All tests are run from the F<t/> directory, B<not> the
1N/Adirectory which contains the test.  This causes some problems with the
1N/Atests in F<lib/>, so here's some opportunity for some patching.
1N/A
1N/AYou must be triply conscious of cross-platform concerns.  This usually
1N/Aboils down to using File::Spec and avoiding things like C<fork()> and
1N/AC<system()> unless absolutely necessary.
1N/A
1N/A=head2 Special Make Test Targets
1N/A
1N/AThere are various special make targets that can be used to test Perl
1N/Aslightly differently than the standard "test" target.  Not all them
1N/Aare expected to give a 100% success rate.  Many of them have several
1N/Aaliases.
1N/A
1N/A=over 4
1N/A
1N/A=item coretest
1N/A
1N/ARun F<perl> on all core tests (F<t/*> and F<lib/[a-z]*> pragma tests).
1N/A
1N/A=item test.deparse
1N/A
1N/ARun all the tests through B::Deparse.  Not all tests will succeed.
1N/A
1N/A=item test.taintwarn
1N/A
1N/ARun all tests with the B<-t> command-line switch.  Not all tests
1N/Aare expected to succeed (until they're specifically fixed, of course).
1N/A
1N/A=item minitest
1N/A
1N/ARun F<miniperl> on F<t/base>, F<t/comp>, F<t/cmd>, F<t/run>, F<t/io>,
1N/AF<t/op>, and F<t/uni> tests.
1N/A
1N/A=item test.valgrind check.valgrind utest.valgrind ucheck.valgrind
1N/A
1N/A(Only in Linux) Run all the tests using the memory leak + naughty
1N/Amemory access tool "valgrind".  The log files will be named
1N/AF<testname.valgrind>.
1N/A
1N/A=item test.third check.third utest.third ucheck.third
1N/A
1N/A(Only in Tru64)  Run all the tests using the memory leak + naughty
1N/Amemory access tool "Third Degree".  The log files will be named
1N/AF<perl3.log.testname>.
1N/A
1N/A=item test.torture torturetest
1N/A
1N/ARun all the usual tests and some extra tests.  As of Perl 5.8.0 the
1N/Aonly extra tests are Abigail's JAPHs, F<t/japh/abigail.t>.
1N/A
1N/AYou can also run the torture test with F<t/harness> by giving
1N/AC<-torture> argument to F<t/harness>.
1N/A
1N/A=item utest ucheck test.utf8 check.utf8
1N/A
1N/ARun all the tests with -Mutf8.  Not all tests will succeed.
1N/A
1N/A=item test_harness
1N/A
1N/ARun the test suite with the F<t/harness> controlling program, instead of
1N/AF<t/TEST>. F<t/harness> is more sophisticated, and uses the
1N/AL<Test::Harness> module, thus using this test target supposes that perl
1N/Amostly works. The main advantage for our purposes is that it prints a
1N/Adetailed summary of failed tests at the end. Also, unlike F<t/TEST>, it
1N/Adoesn't redirect stderr to stdout.
1N/A
1N/A=back
1N/A
1N/A=head2 Running tests by hand
1N/A
1N/AYou can run part of the test suite by hand by using one the following
1N/Acommands from the F<t/> directory :
1N/A
1N/A    ./perl -I../lib TEST list-of-.t-files
1N/A
1N/Aor
1N/A
1N/A    ./perl -I../lib harness list-of-.t-files
1N/A
1N/A(if you don't specify test scripts, the whole test suite will be run.)
1N/A
1N/AYou can run an individual test by a command similar to
1N/A
1N/A    ./perl -I../lib patho/to/foo.t
1N/A
1N/Aexcept that the harnesses set up some environment variables that may
1N/Aaffect the execution of the test :
1N/A
1N/A=over 4
1N/A
1N/A=item PERL_CORE=1
1N/A
1N/Aindicates that we're running this test part of the perl core test suite.
1N/AThis is useful for modules that have a dual life on CPAN.
1N/A
1N/A=item PERL_DESTRUCT_LEVEL=2
1N/A
1N/Ais set to 2 if it isn't set already (see L</PERL_DESTRUCT_LEVEL>)
1N/A
1N/A=item PERL
1N/A
1N/A(used only by F<t/TEST>) if set, overrides the path to the perl executable
1N/Athat should be used to run the tests (the default being F<./perl>).
1N/A
1N/A=item PERL_SKIP_TTY_TEST
1N/A
1N/Aif set, tells to skip the tests that need a terminal. It's actually set
1N/Aautomatically by the Makefile, but can also be forced artificially by
1N/Arunning 'make test_notty'.
1N/A
1N/A=back
1N/A
1N/A=head1 EXTERNAL TOOLS FOR DEBUGGING PERL
1N/A
1N/ASometimes it helps to use external tools while debugging and
1N/Atesting Perl.  This section tries to guide you through using
1N/Asome common testing and debugging tools with Perl.  This is
1N/Ameant as a guide to interfacing these tools with Perl, not
1N/Aas any kind of guide to the use of the tools themselves.
1N/A
1N/AB<NOTE 1>: Running under memory debuggers such as Purify, valgrind, or
1N/AThird Degree greatly slows down the execution: seconds become minutes,
1N/Aminutes become hours.  For example as of Perl 5.8.1, the
1N/Aext/Encode/t/Unicode.t takes extraordinarily long to complete under
1N/Ae.g. Purify, Third Degree, and valgrind.  Under valgrind it takes more
1N/Athan six hours, even on a snappy computer-- the said test must be
1N/Adoing something that is quite unfriendly for memory debuggers.  If you
1N/Adon't feel like waiting, that you can simply kill away the perl
1N/Aprocess.
1N/A
1N/AB<NOTE 2>: To minimize the number of memory leak false alarms (see
1N/AL</PERL_DESTRUCT_LEVEL> for more information), you have to have
1N/Aenvironment variable PERL_DESTRUCT_LEVEL set to 2.  The F<TEST>
1N/Aand harness scripts do that automatically.  But if you are running
1N/Asome of the tests manually-- for csh-like shells:
1N/A
1N/A    setenv PERL_DESTRUCT_LEVEL 2
1N/A
1N/Aand for Bourne-type shells:
1N/A
1N/A    PERL_DESTRUCT_LEVEL=2
1N/A    export PERL_DESTRUCT_LEVEL
1N/A
1N/Aor in UNIXy environments you can also use the C<env> command:
1N/A
1N/A    env PERL_DESTRUCT_LEVEL=2 valgrind ./perl -Ilib ...
1N/A
1N/AB<NOTE 3>: There are known memory leaks when there are compile-time
1N/Aerrors within eval or require, seeing C<S_doeval> in the call stack
1N/Ais a good sign of these.  Fixing these leaks is non-trivial,
1N/Aunfortunately, but they must be fixed eventually.
1N/A
1N/A=head2 Rational Software's Purify
1N/A
1N/APurify is a commercial tool that is helpful in identifying
1N/Amemory overruns, wild pointers, memory leaks and other such
1N/Abadness.  Perl must be compiled in a specific way for
1N/Aoptimal testing with Purify.  Purify is available under
1N/AWindows NT, Solaris, HP-UX, SGI, and Siemens Unix.
1N/A
1N/A=head2 Purify on Unix
1N/A
1N/AOn Unix, Purify creates a new Perl binary.  To get the most
1N/Abenefit out of Purify, you should create the perl to Purify
1N/Ausing:
1N/A
1N/A    sh Configure -Accflags=-DPURIFY -Doptimize='-g' \
1N/A     -Uusemymalloc -Dusemultiplicity
1N/A
1N/Awhere these arguments mean:
1N/A
1N/A=over 4
1N/A
1N/A=item -Accflags=-DPURIFY
1N/A
1N/ADisables Perl's arena memory allocation functions, as well as
1N/Aforcing use of memory allocation functions derived from the
1N/Asystem malloc.
1N/A
1N/A=item -Doptimize='-g'
1N/A
1N/AAdds debugging information so that you see the exact source
1N/Astatements where the problem occurs.  Without this flag, all
1N/Ayou will see is the source filename of where the error occurred.
1N/A
1N/A=item -Uusemymalloc
1N/A
1N/ADisable Perl's malloc so that Purify can more closely monitor
1N/Aallocations and leaks.  Using Perl's malloc will make Purify
1N/Areport most leaks in the "potential" leaks category.
1N/A
1N/A=item -Dusemultiplicity
1N/A
1N/AEnabling the multiplicity option allows perl to clean up
1N/Athoroughly when the interpreter shuts down, which reduces the
1N/Anumber of bogus leak reports from Purify.
1N/A
1N/A=back
1N/A
1N/AOnce you've compiled a perl suitable for Purify'ing, then you
1N/Acan just:
1N/A
1N/A    make pureperl
1N/A
1N/Awhich creates a binary named 'pureperl' that has been Purify'ed.
1N/AThis binary is used in place of the standard 'perl' binary
1N/Awhen you want to debug Perl memory problems.
1N/A
1N/AAs an example, to show any memory leaks produced during the
1N/Astandard Perl testset you would create and run the Purify'ed
1N/Aperl as:
1N/A
1N/A    make pureperl
1N/A    cd t
1N/A    ../pureperl -I../lib harness
1N/A
1N/Awhich would run Perl on test.pl and report any memory problems.
1N/A
1N/APurify outputs messages in "Viewer" windows by default.  If
1N/Ayou don't have a windowing environment or if you simply
1N/Awant the Purify output to unobtrusively go to a log file
1N/Ainstead of to the interactive window, use these following
1N/Aoptions to output to the log file "perl.log":
1N/A
1N/A    setenv PURIFYOPTIONS "-chain-length=25 -windows=no \
1N/A     -log-file=perl.log -append-logfile=yes"
1N/A
1N/AIf you plan to use the "Viewer" windows, then you only need this option:
1N/A
1N/A    setenv PURIFYOPTIONS "-chain-length=25"
1N/A
1N/AIn Bourne-type shells:
1N/A
1N/A    PURIFYOPTIONS="..."
1N/A    export PURIFYOPTIONS
1N/A
1N/Aor if you have the "env" utility:
1N/A
1N/A    env PURIFYOPTIONS="..." ../pureperl ...
1N/A
1N/A=head2 Purify on NT
1N/A
1N/APurify on Windows NT instruments the Perl binary 'perl.exe'
1N/Aon the fly.  There are several options in the makefile you
1N/Ashould change to get the most use out of Purify:
1N/A
1N/A=over 4
1N/A
1N/A=item DEFINES
1N/A
1N/AYou should add -DPURIFY to the DEFINES line so the DEFINES
1N/Aline looks something like:
1N/A
1N/A    DEFINES = -DWIN32 -D_CONSOLE -DNO_STRICT $(CRYPT_FLAG) -DPURIFY=1
1N/A
1N/Ato disable Perl's arena memory allocation functions, as
1N/Awell as to force use of memory allocation functions derived
1N/Afrom the system malloc.
1N/A
1N/A=item USE_MULTI = define
1N/A
1N/AEnabling the multiplicity option allows perl to clean up
1N/Athoroughly when the interpreter shuts down, which reduces the
1N/Anumber of bogus leak reports from Purify.
1N/A
1N/A=item #PERL_MALLOC = define
1N/A
1N/ADisable Perl's malloc so that Purify can more closely monitor
1N/Aallocations and leaks.  Using Perl's malloc will make Purify
1N/Areport most leaks in the "potential" leaks category.
1N/A
1N/A=item CFG = Debug
1N/A
1N/AAdds debugging information so that you see the exact source
1N/Astatements where the problem occurs.  Without this flag, all
1N/Ayou will see is the source filename of where the error occurred.
1N/A
1N/A=back
1N/A
1N/AAs an example, to show any memory leaks produced during the
1N/Astandard Perl testset you would create and run Purify as:
1N/A
1N/A    cd win32
1N/A    make
1N/A    cd ../t
1N/A    purify ../perl -I../lib harness
1N/A
1N/Awhich would instrument Perl in memory, run Perl on test.pl,
1N/Athen finally report any memory problems.
1N/A
1N/A=head2 valgrind
1N/A
1N/AThe excellent valgrind tool can be used to find out both memory leaks
1N/Aand illegal memory accesses.  As of August 2003 it unfortunately works
1N/Aonly on x86 (ELF) Linux.  The special "test.valgrind" target can be used
1N/Ato run the tests under valgrind.  Found errors and memory leaks are
1N/Alogged in files named F<test.valgrind>.
1N/A
1N/AAs system libraries (most notably glibc) are also triggering errors,
1N/Avalgrind allows to suppress such errors using suppression files. The
1N/Adefault suppression file that comes with valgrind already catches a lot
1N/Aof them. Some additional suppressions are defined in F<t/perl.supp>.
1N/A
1N/ATo get valgrind and for more information see
1N/A
1N/A    http://developer.kde.org/~sewardj/
1N/A
1N/A=head2 Compaq's/Digital's/HP's Third Degree
1N/A
1N/AThird Degree is a tool for memory leak detection and memory access checks.
1N/AIt is one of the many tools in the ATOM toolkit.  The toolkit is only
1N/Aavailable on Tru64 (formerly known as Digital UNIX formerly known as
1N/ADEC OSF/1).
1N/A
1N/AWhen building Perl, you must first run Configure with -Doptimize=-g
1N/Aand -Uusemymalloc flags, after that you can use the make targets
1N/A"perl.third" and "test.third".  (What is required is that Perl must be
1N/Acompiled using the C<-g> flag, you may need to re-Configure.)
1N/A
1N/AThe short story is that with "atom" you can instrument the Perl
1N/Aexecutable to create a new executable called F<perl.third>.  When the
1N/Ainstrumented executable is run, it creates a log of dubious memory
1N/Atraffic in file called F<perl.3log>.  See the manual pages of atom and
1N/Athird for more information.  The most extensive Third Degree
1N/Adocumentation is available in the Compaq "Tru64 UNIX Programmer's
1N/AGuide", chapter "Debugging Programs with Third Degree".
1N/A
1N/AThe "test.third" leaves a lot of files named F<foo_bar.3log> in the t/
1N/Asubdirectory.  There is a problem with these files: Third Degree is so
1N/Aeffective that it finds problems also in the system libraries.
1N/ATherefore you should used the Porting/thirdclean script to cleanup
1N/Athe F<*.3log> files.
1N/A
1N/AThere are also leaks that for given certain definition of a leak,
1N/Aaren't.  See L</PERL_DESTRUCT_LEVEL> for more information.
1N/A
1N/A=head2 PERL_DESTRUCT_LEVEL
1N/A
1N/AIf you want to run any of the tests yourself manually using e.g.
1N/Avalgrind, or the pureperl or perl.third executables, please note that
1N/Aby default perl B<does not> explicitly cleanup all the memory it has
1N/Aallocated (such as global memory arenas) but instead lets the exit()
1N/Aof the whole program "take care" of such allocations, also known as
1N/A"global destruction of objects".
1N/A
1N/AThere is a way to tell perl to do complete cleanup: set the
1N/Aenvironment variable PERL_DESTRUCT_LEVEL to a non-zero value.
1N/AThe t/TEST wrapper does set this to 2, and this is what you
1N/Aneed to do too, if you don't want to see the "global leaks":
1N/AFor example, for "third-degreed" Perl:
1N/A
1N/A    env PERL_DESTRUCT_LEVEL=2 ./perl.third -Ilib t/foo/bar.t
1N/A
1N/A(Note: the mod_perl apache module uses also this environment variable
1N/Afor its own purposes and extended its semantics. Refer to the mod_perl
1N/Adocumentation for more information. Also, spawned threads do the
1N/Aequivalent of setting this variable to the value 1.)
1N/A
1N/AIf, at the end of a run you get the message I<N scalars leaked>, you can
1N/Arecompile with C<-DDEBUG_LEAKING_SCALARS>, which will cause
1N/Athe addresses of all those leaked SVs to be dumped; it also converts
1N/AC<new_SV()> from a macro into a real function, so you can use your
1N/Afavourite debugger to discover where those pesky SVs were allocated.
1N/A
1N/A=head2 Profiling
1N/A
1N/ADepending on your platform there are various of profiling Perl.
1N/A
1N/AThere are two commonly used techniques of profiling executables:
1N/AI<statistical time-sampling> and I<basic-block counting>.
1N/A
1N/AThe first method takes periodically samples of the CPU program
1N/Acounter, and since the program counter can be correlated with the code
1N/Agenerated for functions, we get a statistical view of in which
1N/Afunctions the program is spending its time.  The caveats are that very
1N/Asmall/fast functions have lower probability of showing up in the
1N/Aprofile, and that periodically interrupting the program (this is
1N/Ausually done rather frequently, in the scale of milliseconds) imposes
1N/Aan additional overhead that may skew the results.  The first problem
1N/Acan be alleviated by running the code for longer (in general this is a
1N/Agood idea for profiling), the second problem is usually kept in guard
1N/Aby the profiling tools themselves.
1N/A
1N/AThe second method divides up the generated code into I<basic blocks>.
1N/ABasic blocks are sections of code that are entered only in the
1N/Abeginning and exited only at the end.  For example, a conditional jump
1N/Astarts a basic block.  Basic block profiling usually works by
1N/AI<instrumenting> the code by adding I<enter basic block #nnnn>
1N/Abook-keeping code to the generated code.  During the execution of the
1N/Acode the basic block counters are then updated appropriately.  The
1N/Acaveat is that the added extra code can skew the results: again, the
1N/Aprofiling tools usually try to factor their own effects out of the
1N/Aresults.
1N/A
1N/A=head2 Gprof Profiling
1N/A
1N/Agprof is a profiling tool available in many UNIX platforms,
1N/Ait uses F<statistical time-sampling>.
1N/A
1N/AYou can build a profiled version of perl called "perl.gprof" by
1N/Ainvoking the make target "perl.gprof"  (What is required is that Perl
1N/Amust be compiled using the C<-pg> flag, you may need to re-Configure).
1N/ARunning the profiled version of Perl will create an output file called
1N/AF<gmon.out> is created which contains the profiling data collected
1N/Aduring the execution.
1N/A
1N/AThe gprof tool can then display the collected data in various ways.
1N/AUsually gprof understands the following options:
1N/A
1N/A=over 4
1N/A
1N/A=item -a
1N/A
1N/ASuppress statically defined functions from the profile.
1N/A
1N/A=item -b
1N/A
1N/ASuppress the verbose descriptions in the profile.
1N/A
1N/A=item -e routine
1N/A
1N/AExclude the given routine and its descendants from the profile.
1N/A
1N/A=item -f routine
1N/A
1N/ADisplay only the given routine and its descendants in the profile.
1N/A
1N/A=item -s
1N/A
1N/AGenerate a summary file called F<gmon.sum> which then may be given
1N/Ato subsequent gprof runs to accumulate data over several runs.
1N/A
1N/A=item -z
1N/A
1N/ADisplay routines that have zero usage.
1N/A
1N/A=back
1N/A
1N/AFor more detailed explanation of the available commands and output
1N/Aformats, see your own local documentation of gprof.
1N/A
1N/A=head2 GCC gcov Profiling
1N/A
1N/AStarting from GCC 3.0 I<basic block profiling> is officially available
1N/Afor the GNU CC.
1N/A
1N/AYou can build a profiled version of perl called F<perl.gcov> by
1N/Ainvoking the make target "perl.gcov" (what is required that Perl must
1N/Abe compiled using gcc with the flags C<-fprofile-arcs
1N/A-ftest-coverage>, you may need to re-Configure).
1N/A
1N/ARunning the profiled version of Perl will cause profile output to be
1N/Agenerated.  For each source file an accompanying ".da" file will be
1N/Acreated.
1N/A
1N/ATo display the results you use the "gcov" utility (which should
1N/Abe installed if you have gcc 3.0 or newer installed).  F<gcov> is
1N/Arun on source code files, like this
1N/A
1N/A    gcov sv.c
1N/A
1N/Awhich will cause F<sv.c.gcov> to be created.  The F<.gcov> files
1N/Acontain the source code annotated with relative frequencies of
1N/Aexecution indicated by "#" markers.
1N/A
1N/AUseful options of F<gcov> include C<-b> which will summarise the
1N/Abasic block, branch, and function call coverage, and C<-c> which
1N/Ainstead of relative frequencies will use the actual counts.  For
1N/Amore information on the use of F<gcov> and basic block profiling
1N/Awith gcc, see the latest GNU CC manual, as of GCC 3.0 see
1N/A
1N/A    http://gcc.gnu.org/onlinedocs/gcc-3.0/gcc.html
1N/A
1N/Aand its section titled "8. gcov: a Test Coverage Program"
1N/A
1N/A    http://gcc.gnu.org/onlinedocs/gcc-3.0/gcc_8.html#SEC132
1N/A
1N/A=head2 Pixie Profiling
1N/A
1N/APixie is a profiling tool available on IRIX and Tru64 (aka Digital
1N/AUNIX aka DEC OSF/1) platforms.  Pixie does its profiling using
1N/AI<basic-block counting>.
1N/A
1N/AYou can build a profiled version of perl called F<perl.pixie> by
1N/Ainvoking the make target "perl.pixie" (what is required is that Perl
1N/Amust be compiled using the C<-g> flag, you may need to re-Configure).
1N/A
1N/AIn Tru64 a file called F<perl.Addrs> will also be silently created,
1N/Athis file contains the addresses of the basic blocks.  Running the
1N/Aprofiled version of Perl will create a new file called "perl.Counts"
1N/Awhich contains the counts for the basic block for that particular
1N/Aprogram execution.
1N/A
1N/ATo display the results you use the F<prof> utility.  The exact
1N/Aincantation depends on your operating system, "prof perl.Counts" in
1N/AIRIX, and "prof -pixie -all -L. perl" in Tru64.
1N/A
1N/AIn IRIX the following prof options are available:
1N/A
1N/A=over 4
1N/A
1N/A=item -h
1N/A
1N/AReports the most heavily used lines in descending order of use.
1N/AUseful for finding the hotspot lines.
1N/A
1N/A=item -l
1N/A
1N/AGroups lines by procedure, with procedures sorted in descending order of use.
1N/AWithin a procedure, lines are listed in source order.
1N/AUseful for finding the hotspots of procedures.
1N/A
1N/A=back
1N/A
1N/AIn Tru64 the following options are available:
1N/A
1N/A=over 4
1N/A
1N/A=item -p[rocedures]
1N/A
1N/AProcedures sorted in descending order by the number of cycles executed
1N/Ain each procedure.  Useful for finding the hotspot procedures.
1N/A(This is the default option.)
1N/A
1N/A=item -h[eavy]
1N/A
1N/ALines sorted in descending order by the number of cycles executed in
1N/Aeach line.  Useful for finding the hotspot lines.
1N/A
1N/A=item -i[nvocations]
1N/A
1N/AThe called procedures are sorted in descending order by number of calls
1N/Amade to the procedures.  Useful for finding the most used procedures.
1N/A
1N/A=item -l[ines]
1N/A
1N/AGrouped by procedure, sorted by cycles executed per procedure.
1N/AUseful for finding the hotspots of procedures.
1N/A
1N/A=item -testcoverage
1N/A
1N/AThe compiler emitted code for these lines, but the code was unexecuted.
1N/A
1N/A=item -z[ero]
1N/A
1N/AUnexecuted procedures.
1N/A
1N/A=back
1N/A
1N/AFor further information, see your system's manual pages for pixie and prof.
1N/A
1N/A=head2 Miscellaneous tricks
1N/A
1N/A=over 4
1N/A
1N/A=item *
1N/A
1N/AThose debugging perl with the DDD frontend over gdb may find the
1N/Afollowing useful:
1N/A
1N/AYou can extend the data conversion shortcuts menu, so for example you
1N/Acan display an SV's IV value with one click, without doing any typing.
1N/ATo do that simply edit ~/.ddd/init file and add after:
1N/A
1N/A  ! Display shortcuts.
1N/A  Ddd*gdbDisplayShortcuts: \
1N/A  /t ()   // Convert to Bin\n\
1N/A  /d ()   // Convert to Dec\n\
1N/A  /x ()   // Convert to Hex\n\
1N/A  /o ()   // Convert to Oct(\n\
1N/A
1N/Athe following two lines:
1N/A
1N/A  ((XPV*) (())->sv_any )->xpv_pv  // 2pvx\n\
1N/A  ((XPVIV*) (())->sv_any )->xiv_iv // 2ivx
1N/A
1N/Aso now you can do ivx and pvx lookups or you can plug there the
1N/Asv_peek "conversion":
1N/A
1N/A  Perl_sv_peek(my_perl, (SV*)()) // sv_peek
1N/A
1N/A(The my_perl is for threaded builds.)
1N/AJust remember that every line, but the last one, should end with \n\
1N/A
1N/AAlternatively edit the init file interactively via:
1N/A3rd mouse button -> New Display -> Edit Menu
1N/A
1N/ANote: you can define up to 20 conversion shortcuts in the gdb
1N/Asection.
1N/A
1N/A=item *
1N/A
1N/AIf you see in a debugger a memory area mysteriously full of 0xabababab,
1N/Ayou may be seeing the effect of the Poison() macro, see L<perlclib>.
1N/A
1N/A=back
1N/A
1N/A=head2 CONCLUSION
1N/A
1N/AWe've had a brief look around the Perl source, an overview of the stages
1N/AF<perl> goes through when it's running your code, and how to use a
1N/Adebugger to poke at the Perl guts. We took a very simple problem and
1N/Ademonstrated how to solve it fully - with documentation, regression
1N/Atests, and finally a patch for submission to p5p.  Finally, we talked
1N/Aabout how to use external tools to debug and test Perl.
1N/A
1N/AI'd now suggest you read over those references again, and then, as soon
1N/Aas possible, get your hands dirty. The best way to learn is by doing,
1N/Aso:
1N/A
1N/A=over 3
1N/A
1N/A=item *
1N/A
1N/ASubscribe to perl5-porters, follow the patches and try and understand
1N/Athem; don't be afraid to ask if there's a portion you're not clear on -
1N/Awho knows, you may unearth a bug in the patch...
1N/A
1N/A=item *
1N/A
1N/AKeep up to date with the bleeding edge Perl distributions and get
1N/Afamiliar with the changes. Try and get an idea of what areas people are
1N/Aworking on and the changes they're making.
1N/A
1N/A=item *
1N/A
1N/ADo read the README associated with your operating system, e.g. README.aix
1N/Aon the IBM AIX OS. Don't hesitate to supply patches to that README if
1N/Ayou find anything missing or changed over a new OS release.
1N/A
1N/A=item *
1N/A
1N/AFind an area of Perl that seems interesting to you, and see if you can
1N/Awork out how it works. Scan through the source, and step over it in the
1N/Adebugger. Play, poke, investigate, fiddle! You'll probably get to
1N/Aunderstand not just your chosen area but a much wider range of F<perl>'s
1N/Aactivity as well, and probably sooner than you'd think.
1N/A
1N/A=back
1N/A
1N/A=over 3
1N/A
1N/A=item I<The Road goes ever on and on, down from the door where it began.>
1N/A
1N/A=back
1N/A
1N/AIf you can do these things, you've started on the long road to Perl porting.
1N/AThanks for wanting to help make Perl better - and happy hacking!
1N/A
1N/A=head1 AUTHOR
1N/A
1N/AThis document was written by Nathan Torkington, and is maintained by
1N/Athe perl5-porters mailing list.
1N/A