1N/A=head1 NAME
1N/A
1N/Aperlcompile - Introduction to the Perl Compiler-Translator
1N/A
1N/A=head1 DESCRIPTION
1N/A
1N/APerl has always had a compiler: your source is compiled into an
1N/Ainternal form (a parse tree) which is then optimized before being
1N/Arun. Since version 5.005, Perl has shipped with a module
1N/Acapable of inspecting the optimized parse tree (C<B>), and this has
1N/Abeen used to write many useful utilities, including a module that lets
1N/Ayou turn your Perl into C source code that can be compiled into a
1N/Anative executable.
1N/A
1N/AThe C<B> module provides access to the parse tree, and other modules
1N/A("back ends") do things with the tree. Some write it out as
1N/Abytecode, C source code, or a semi-human-readable text. Another
1N/Atraverses the parse tree to build a cross-reference of which
1N/Asubroutines, formats, and variables are used where. Another checks
1N/Ayour code for dubious constructs. Yet another back end dumps the
1N/Aparse tree back out as Perl source, acting as a source code beautifier
1N/Aor deobfuscator.
1N/A
1N/ABecause its original purpose was to be a way to produce C code
1N/Acorresponding to a Perl program, and in turn a native executable, the
1N/AC<B> module and its associated back ends are known as "the
1N/Acompiler", even though they don't really compile anything.
1N/ADifferent parts of the compiler are more accurately a "translator",
1N/Aor an "inspector", but people want Perl to have a "compiler
1N/Aoption" not an "inspector gadget". What can you do?
1N/A
1N/AThis document covers the use of the Perl compiler: which modules
1N/Ait comprises, how to use the most important of the back end modules,
1N/Awhat problems there are, and how to work around them.
1N/A
1N/A=head2 Layout
1N/A
1N/AThe compiler back ends are in the C<B::> hierarchy, and the front-end
1N/A(the module that you, the user of the compiler, will sometimes
1N/Ainteract with) is the O module. Some back ends (e.g., C<B::C>) have
1N/Aprograms (e.g., I<perlcc>) to hide the modules' complexity.
1N/A
1N/AHere are the important back ends to know about, with their status
1N/Aexpressed as a number from 0 (outline for later implementation) to
1N/A10 (if there's a bug in it, we're very surprised):
1N/A
1N/A=over 4
1N/A
1N/A=item B::Bytecode
1N/A
1N/AStores the parse tree in a machine-independent format, suitable
1N/Afor later reloading through the ByteLoader module. Status: 5 (some
1N/Athings work, some things don't, some things are untested).
1N/A
1N/A=item B::C
1N/A
1N/ACreates a C source file containing code to rebuild the parse tree
1N/Aand resume the interpreter. Status: 6 (many things work adequately,
1N/Aincluding programs using Tk).
1N/A
1N/A=item B::CC
1N/A
1N/ACreates a C source file corresponding to the run time code path in
1N/Athe parse tree. This is the closest to a Perl-to-C translator there
1N/Ais, but the code it generates is almost incomprehensible because it
1N/Atranslates the parse tree into a giant switch structure that
1N/Amanipulates Perl structures. Eventual goal is to reduce (given
1N/Asufficient type information in the Perl program) some of the
1N/APerl data structure manipulations into manipulations of C-level
1N/Aints, floats, etc. Status: 5 (some things work, including
1N/Auncomplicated Tk examples).
1N/A
1N/A=item B::Lint
1N/A
1N/AComplains if it finds dubious constructs in your source code. Status:
1N/A6 (it works adequately, but only has a very limited number of areas
1N/Athat it checks).
1N/A
1N/A=item B::Deparse
1N/A
1N/ARecreates the Perl source, making an attempt to format it coherently.
1N/AStatus: 8 (it works nicely, but a few obscure things are missing).
1N/A
1N/A=item B::Xref
1N/A
1N/AReports on the declaration and use of subroutines and variables.
1N/AStatus: 8 (it works nicely, but still has a few lingering bugs).
1N/A
1N/A=back
1N/A
1N/A=head1 Using The Back Ends
1N/A
1N/AThe following sections describe how to use the various compiler back
1N/Aends. They're presented roughly in order of maturity, so that the
1N/Amost stable and proven back ends are described first, and the most
1N/Aexperimental and incomplete back ends are described last.
1N/A
1N/AThe O module automatically enabled the B<-c> flag to Perl, which
1N/Aprevents Perl from executing your code once it has been compiled.
1N/AThis is why all the back ends print:
1N/A
1N/A myperlprogram syntax OK
1N/A
1N/Abefore producing any other output.
1N/A
1N/A=head2 The Cross Referencing Back End
1N/A
1N/AThe cross referencing back end (B::Xref) produces a report on your program,
1N/Abreaking down declarations and uses of subroutines and variables (and
1N/Aformats) by file and subroutine. For instance, here's part of the
1N/Areport from the I<pod2man> program that comes with Perl:
1N/A
1N/A Subroutine clear_noremap
1N/A Package (lexical)
1N/A $ready_to_print i1069, 1079
1N/A Package main
1N/A $& 1086
1N/A $. 1086
1N/A $0 1086
1N/A $1 1087
1N/A $2 1085, 1085
1N/A $3 1085, 1085
1N/A $ARGV 1086
1N/A %HTML_Escapes 1085, 1085
1N/A
1N/AThis shows the variables used in the subroutine C<clear_noremap>. The
1N/Avariable C<$ready_to_print> is a my() (lexical) variable,
1N/AB<i>ntroduced (first declared with my()) on line 1069, and used on
1N/Aline 1079. The variable C<$&> from the main package is used on 1086,
1N/Aand so on.
1N/A
1N/AA line number may be prefixed by a single letter:
1N/A
1N/A=over 4
1N/A
1N/A=item i
1N/A
1N/ALexical variable introduced (declared with my()) for the first time.
1N/A
1N/A=item &
1N/A
1N/ASubroutine or method call.
1N/A
1N/A=item s
1N/A
1N/ASubroutine defined.
1N/A
1N/A=item r
1N/A
1N/AFormat defined.
1N/A
1N/A=back
1N/A
1N/AThe most useful option the cross referencer has is to save the report
1N/Ato a separate file. For instance, to save the report on
1N/AI<myperlprogram> to the file I<report>:
1N/A
1N/A $ perl -MO=Xref,-oreport myperlprogram
1N/A
1N/A=head2 The Decompiling Back End
1N/A
1N/AThe Deparse back end turns your Perl source back into Perl source. It
1N/Acan reformat along the way, making it useful as a de-obfuscator. The
1N/Amost basic way to use it is:
1N/A
1N/A $ perl -MO=Deparse myperlprogram
1N/A
1N/AYou'll notice immediately that Perl has no idea of how to paragraph
1N/Ayour code. You'll have to separate chunks of code from each other
1N/Awith newlines by hand. However, watch what it will do with
1N/Aone-liners:
1N/A
1N/A $ perl -MO=Deparse -e '$op=shift||die "usage: $0
1N/A code [...]";chomp(@ARGV=<>)unless@ARGV; for(@ARGV){$was=$_;eval$op;
1N/A die$@ if$@; rename$was,$_ unless$was eq $_}'
1N/A -e syntax OK
1N/A $op = shift @ARGV || die("usage: $0 code [...]");
1N/A chomp(@ARGV = <ARGV>) unless @ARGV;
1N/A foreach $_ (@ARGV) {
1N/A $was = $_;
1N/A eval $op;
1N/A die $@ if $@;
1N/A rename $was, $_ unless $was eq $_;
1N/A }
1N/A
1N/AThe decompiler has several options for the code it generates. For
1N/Ainstance, you can set the size of each indent from 4 (as above) to
1N/A2 with:
1N/A
1N/A $ perl -MO=Deparse,-si2 myperlprogram
1N/A
1N/AThe B<-p> option adds parentheses where normally they are omitted:
1N/A
1N/A $ perl -MO=Deparse -e 'print "Hello, world\n"'
1N/A -e syntax OK
1N/A print "Hello, world\n";
1N/A $ perl -MO=Deparse,-p -e 'print "Hello, world\n"'
1N/A -e syntax OK
1N/A print("Hello, world\n");
1N/A
1N/ASee L<B::Deparse> for more information on the formatting options.
1N/A
1N/A=head2 The Lint Back End
1N/A
1N/AThe lint back end (B::Lint) inspects programs for poor style. One
1N/Aprogrammer's bad style is another programmer's useful tool, so options
1N/Alet you select what is complained about.
1N/A
1N/ATo run the style checker across your source code:
1N/A
1N/A $ perl -MO=Lint myperlprogram
1N/A
1N/ATo disable context checks and undefined subroutines:
1N/A
1N/A $ perl -MO=Lint,-context,-undefined-subs myperlprogram
1N/A
1N/ASee L<B::Lint> for information on the options.
1N/A
1N/A=head2 The Simple C Back End
1N/A
1N/AThis module saves the internal compiled state of your Perl program
1N/Ato a C source file, which can be turned into a native executable
1N/Afor that particular platform using a C compiler. The resulting
1N/Aprogram links against the Perl interpreter library, so it
1N/Awill not save you disk space (unless you build Perl with a shared
1N/Alibrary) or program size. It may, however, save you startup time.
1N/A
1N/AThe C<perlcc> tool generates such executables by default.
1N/A
1N/A perlcc myperlprogram.pl
1N/A
1N/A=head2 The Bytecode Back End
1N/A
1N/AThis back end is only useful if you also have a way to load and
1N/Aexecute the bytecode that it produces. The ByteLoader module provides
1N/Athis functionality.
1N/A
1N/ATo turn a Perl program into executable byte code, you can use C<perlcc>
1N/Awith the C<-B> switch:
1N/A
1N/A perlcc -B myperlprogram.pl
1N/A
1N/AThe byte code is machine independent, so once you have a compiled
1N/Amodule or program, it is as portable as Perl source (assuming that
1N/Athe user of the module or program has a modern-enough Perl interpreter
1N/Ato decode the byte code).
1N/A
1N/ASee B<B::Bytecode> for information on options to control the
1N/Aoptimization and nature of the code generated by the Bytecode module.
1N/A
1N/A=head2 The Optimized C Back End
1N/A
1N/AThe optimized C back end will turn your Perl program's run time
1N/Acode-path into an equivalent (but optimized) C program that manipulates
1N/Athe Perl data structures directly. The program will still link against
1N/Athe Perl interpreter library, to allow for eval(), C<s///e>,
1N/AC<require>, etc.
1N/A
1N/AThe C<perlcc> tool generates such executables when using the -O
1N/Aswitch. To compile a Perl program (ending in C<.pl>
1N/Aor C<.p>):
1N/A
1N/A perlcc -O myperlprogram.pl
1N/A
1N/ATo produce a shared library from a Perl module (ending in C<.pm>):
1N/A
1N/A perlcc -O Myperlmodule.pm
1N/A
1N/AFor more information, see L<perlcc> and L<B::CC>.
1N/A
1N/A=head1 Module List for the Compiler Suite
1N/A
1N/A=over 4
1N/A
1N/A=item B
1N/A
1N/AThis module is the introspective ("reflective" in Java terms)
1N/Amodule, which allows a Perl program to inspect its innards. The
1N/Aback end modules all use this module to gain access to the compiled
1N/Aparse tree. You, the user of a back end module, will not need to
1N/Ainteract with B.
1N/A
1N/A=item O
1N/A
1N/AThis module is the front-end to the compiler's back ends. Normally
1N/Acalled something like this:
1N/A
1N/A $ perl -MO=Deparse myperlprogram
1N/A
1N/AThis is like saying C<use O 'Deparse'> in your Perl program.
1N/A
1N/A=item B::Asmdata
1N/A
1N/AThis module is used by the B::Assembler module, which is in turn used
1N/Aby the B::Bytecode module, which stores a parse-tree as
1N/Abytecode for later loading. It's not a back end itself, but rather a
1N/Acomponent of a back end.
1N/A
1N/A=item B::Assembler
1N/A
1N/AThis module turns a parse-tree into data suitable for storing
1N/Aand later decoding back into a parse-tree. It's not a back end
1N/Aitself, but rather a component of a back end. It's used by the
1N/AI<assemble> program that produces bytecode.
1N/A
1N/A=item B::Bblock
1N/A
1N/AThis module is used by the B::CC back end. It walks "basic blocks".
1N/AA basic block is a series of operations which is known to execute from
1N/Astart to finish, with no possibility of branching or halting.
1N/A
1N/A=item B::Bytecode
1N/A
1N/AThis module is a back end that generates bytecode from a
1N/Aprogram's parse tree. This bytecode is written to a file, from where
1N/Ait can later be reconstructed back into a parse tree. The goal is to
1N/Ado the expensive program compilation once, save the interpreter's
1N/Astate into a file, and then restore the state from the file when the
1N/Aprogram is to be executed. See L</"The Bytecode Back End">
1N/Afor details about usage.
1N/A
1N/A=item B::C
1N/A
1N/AThis module writes out C code corresponding to the parse tree and
1N/Aother interpreter internal structures. You compile the corresponding
1N/AC file, and get an executable file that will restore the internal
1N/Astructures and the Perl interpreter will begin running the
1N/Aprogram. See L</"The Simple C Back End"> for details about usage.
1N/A
1N/A=item B::CC
1N/A
1N/AThis module writes out C code corresponding to your program's
1N/Aoperations. Unlike the B::C module, which merely stores the
1N/Ainterpreter and its state in a C program, the B::CC module makes a
1N/AC program that does not involve the interpreter. As a consequence,
1N/Aprograms translated into C by B::CC can execute faster than normal
1N/Ainterpreted programs. See L</"The Optimized C Back End"> for
1N/Adetails about usage.
1N/A
1N/A=item B::Concise
1N/A
1N/AThis module prints a concise (but complete) version of the Perl parse
1N/Atree. Its output is more customizable than the one of B::Terse or
1N/AB::Debug (and it can emulate them). This module useful for people who
1N/Aare writing their own back end, or who are learning about the Perl
1N/Ainternals. It's not useful to the average programmer.
1N/A
1N/A=item B::Debug
1N/A
1N/AThis module dumps the Perl parse tree in verbose detail to STDOUT.
1N/AIt's useful for people who are writing their own back end, or who
1N/Aare learning about the Perl internals. It's not useful to the
1N/Aaverage programmer.
1N/A
1N/A=item B::Deparse
1N/A
1N/AThis module produces Perl source code from the compiled parse tree.
1N/AIt is useful in debugging and deconstructing other people's code,
1N/Aalso as a pretty-printer for your own source. See
1N/AL</"The Decompiling Back End"> for details about usage.
1N/A
1N/A=item B::Disassembler
1N/A
1N/AThis module turns bytecode back into a parse tree. It's not a back
1N/Aend itself, but rather a component of a back end. It's used by the
1N/AI<disassemble> program that comes with the bytecode.
1N/A
1N/A=item B::Lint
1N/A
1N/AThis module inspects the compiled form of your source code for things
1N/Awhich, while some people frown on them, aren't necessarily bad enough
1N/Ato justify a warning. For instance, use of an array in scalar context
1N/Awithout explicitly saying C<scalar(@array)> is something that Lint
1N/Acan identify. See L</"The Lint Back End"> for details about usage.
1N/A
1N/A=item B::Showlex
1N/A
1N/AThis module prints out the my() variables used in a function or a
1N/Afile. To get a list of the my() variables used in the subroutine
1N/Amysub() defined in the file myperlprogram:
1N/A
1N/A $ perl -MO=Showlex,mysub myperlprogram
1N/A
1N/ATo get a list of the my() variables used in the file myperlprogram:
1N/A
1N/A $ perl -MO=Showlex myperlprogram
1N/A
1N/A[BROKEN]
1N/A
1N/A=item B::Stackobj
1N/A
1N/AThis module is used by the B::CC module. It's not a back end itself,
1N/Abut rather a component of a back end.
1N/A
1N/A=item B::Stash
1N/A
1N/AThis module is used by the L<perlcc> program, which compiles a module
1N/Ainto an executable. B::Stash prints the symbol tables in use by a
1N/Aprogram, and is used to prevent B::CC from producing C code for the
1N/AB::* and O modules. It's not a back end itself, but rather a
1N/Acomponent of a back end.
1N/A
1N/A=item B::Terse
1N/A
1N/AThis module prints the contents of the parse tree, but without as much
1N/Ainformation as B::Debug. For comparison, C<print "Hello, world.">
1N/Aproduced 96 lines of output from B::Debug, but only 6 from B::Terse.
1N/A
1N/AThis module is useful for people who are writing their own back end,
1N/Aor who are learning about the Perl internals. It's not useful to the
1N/Aaverage programmer.
1N/A
1N/A=item B::Xref
1N/A
1N/AThis module prints a report on where the variables, subroutines, and
1N/Aformats are defined and used within a program and the modules it
1N/Aloads. See L</"The Cross Referencing Back End"> for details about
1N/Ausage.
1N/A
1N/A=back
1N/A
1N/A=head1 KNOWN PROBLEMS
1N/A
1N/AThe simple C backend currently only saves typeglobs with alphanumeric
1N/Anames.
1N/A
1N/AThe optimized C backend outputs code for more modules than it should
1N/A(e.g., DirHandle). It also has little hope of properly handling
1N/AC<goto LABEL> outside the running subroutine (C<goto &sub> is okay).
1N/AC<goto LABEL> currently does not work at all in this backend.
1N/AIt also creates a huge initialization function that gives
1N/AC compilers headaches. Splitting the initialization function gives
1N/Abetter results. Other problems include: unsigned math does not
1N/Awork correctly; some opcodes are handled incorrectly by default
1N/Aopcode handling mechanism.
1N/A
1N/ABEGIN{} blocks are executed while compiling your code. Any external
1N/Astate that is initialized in BEGIN{}, such as opening files, initiating
1N/Adatabase connections etc., do not behave properly. To work around
1N/Athis, Perl has an INIT{} block that corresponds to code being executed
1N/Abefore your program begins running but after your program has finished
1N/Abeing compiled. Execution order: BEGIN{}, (possible save of state
1N/Athrough compiler back-end), INIT{}, program runs, END{}.
1N/A
1N/A=head1 AUTHOR
1N/A
1N/AThis document was originally written by Nathan Torkington, and is now
1N/Amaintained by the perl5-porters mailing list
1N/AI<perl5-porters@perl.org>.
1N/A
1N/A=cut