1N/A=head1 NAME
1N/A
1N/Aperlguts - Introduction to the Perl API
1N/A
1N/A=head1 DESCRIPTION
1N/A
1N/AThis document attempts to describe how to use the Perl API, as well as
1N/Ato provide some info on the basic workings of the Perl core. It is far
1N/Afrom complete and probably contains many errors. Please refer any
1N/Aquestions or comments to the author below.
1N/A
1N/A=head1 Variables
1N/A
1N/A=head2 Datatypes
1N/A
1N/APerl has three typedefs that handle Perl's three main data types:
1N/A
1N/A SV Scalar Value
1N/A AV Array Value
1N/A HV Hash Value
1N/A
1N/AEach typedef has specific routines that manipulate the various data types.
1N/A
1N/A=head2 What is an "IV"?
1N/A
1N/APerl uses a special typedef IV which is a simple signed integer type that is
1N/Aguaranteed to be large enough to hold a pointer (as well as an integer).
1N/AAdditionally, there is the UV, which is simply an unsigned IV.
1N/A
1N/APerl also uses two special typedefs, I32 and I16, which will always be at
1N/Aleast 32-bits and 16-bits long, respectively. (Again, there are U32 and U16,
1N/Aas well.) They will usually be exactly 32 and 16 bits long, but on Crays
1N/Athey will both be 64 bits.
1N/A
1N/A=head2 Working with SVs
1N/A
1N/AAn SV can be created and loaded with one command. There are five types of
1N/Avalues that can be loaded: an integer value (IV), an unsigned integer
1N/Avalue (UV), a double (NV), a string (PV), and another scalar (SV).
1N/A
1N/AThe seven routines are:
1N/A
1N/A SV* newSViv(IV);
1N/A SV* newSVuv(UV);
1N/A SV* newSVnv(double);
1N/A SV* newSVpv(const char*, STRLEN);
1N/A SV* newSVpvn(const char*, STRLEN);
1N/A SV* newSVpvf(const char*, ...);
1N/A SV* newSVsv(SV*);
1N/A
1N/AC<STRLEN> is an integer type (Size_t, usually defined as size_t in
1N/AF<config.h>) guaranteed to be large enough to represent the size of
1N/Aany string that perl can handle.
1N/A
1N/AIn the unlikely case of a SV requiring more complex initialisation, you
1N/Acan create an empty SV with newSV(len). If C<len> is 0 an empty SV of
1N/Atype NULL is returned, else an SV of type PV is returned with len + 1 (for
1N/Athe NUL) bytes of storage allocated, accessible via SvPVX. In both cases
1N/Athe SV has value undef.
1N/A
1N/A SV *sv = newSV(0); /* no storage allocated */
1N/A SV *sv = newSV(10); /* 10 (+1) bytes of uninitialised storage allocated */
1N/A
1N/ATo change the value of an I<already-existing> SV, there are eight routines:
1N/A
1N/A void sv_setiv(SV*, IV);
1N/A void sv_setuv(SV*, UV);
1N/A void sv_setnv(SV*, double);
1N/A void sv_setpv(SV*, const char*);
1N/A void sv_setpvn(SV*, const char*, STRLEN)
1N/A void sv_setpvf(SV*, const char*, ...);
1N/A void sv_vsetpvfn(SV*, const char*, STRLEN, va_list *, SV **, I32, bool *);
1N/A void sv_setsv(SV*, SV*);
1N/A
1N/ANotice that you can choose to specify the length of the string to be
1N/Aassigned by using C<sv_setpvn>, C<newSVpvn>, or C<newSVpv>, or you may
1N/Aallow Perl to calculate the length by using C<sv_setpv> or by specifying
1N/A0 as the second argument to C<newSVpv>. Be warned, though, that Perl will
1N/Adetermine the string's length by using C<strlen>, which depends on the
1N/Astring terminating with a NUL character.
1N/A
1N/AThe arguments of C<sv_setpvf> are processed like C<sprintf>, and the
1N/Aformatted output becomes the value.
1N/A
1N/AC<sv_vsetpvfn> is an analogue of C<vsprintf>, but it allows you to specify
1N/Aeither a pointer to a variable argument list or the address and length of
1N/Aan array of SVs. The last argument points to a boolean; on return, if that
1N/Aboolean is true, then locale-specific information has been used to format
1N/Athe string, and the string's contents are therefore untrustworthy (see
1N/AL<perlsec>). This pointer may be NULL if that information is not
1N/Aimportant. Note that this function requires you to specify the length of
1N/Athe format.
1N/A
1N/AThe C<sv_set*()> functions are not generic enough to operate on values
1N/Athat have "magic". See L<Magic Virtual Tables> later in this document.
1N/A
1N/AAll SVs that contain strings should be terminated with a NUL character.
1N/AIf it is not NUL-terminated there is a risk of
1N/Acore dumps and corruptions from code which passes the string to C
1N/Afunctions or system calls which expect a NUL-terminated string.
1N/APerl's own functions typically add a trailing NUL for this reason.
1N/ANevertheless, you should be very careful when you pass a string stored
1N/Ain an SV to a C function or system call.
1N/A
1N/ATo access the actual value that an SV points to, you can use the macros:
1N/A
1N/A SvIV(SV*)
1N/A SvUV(SV*)
1N/A SvNV(SV*)
1N/A SvPV(SV*, STRLEN len)
1N/A SvPV_nolen(SV*)
1N/A
1N/Awhich will automatically coerce the actual scalar type into an IV, UV, double,
1N/Aor string.
1N/A
1N/AIn the C<SvPV> macro, the length of the string returned is placed into the
1N/Avariable C<len> (this is a macro, so you do I<not> use C<&len>). If you do
1N/Anot care what the length of the data is, use the C<SvPV_nolen> macro.
1N/AHistorically the C<SvPV> macro with the global variable C<PL_na> has been
1N/Aused in this case. But that can be quite inefficient because C<PL_na> must
1N/Abe accessed in thread-local storage in threaded Perl. In any case, remember
1N/Athat Perl allows arbitrary strings of data that may both contain NULs and
1N/Amight not be terminated by a NUL.
1N/A
1N/AAlso remember that C doesn't allow you to safely say C<foo(SvPV(s, len),
1N/Alen);>. It might work with your compiler, but it won't work for everyone.
1N/ABreak this sort of statement up into separate assignments:
1N/A
1N/A SV *s;
1N/A STRLEN len;
1N/A char * ptr;
1N/A ptr = SvPV(s, len);
1N/A foo(ptr, len);
1N/A
1N/AIf you want to know if the scalar value is TRUE, you can use:
1N/A
1N/A SvTRUE(SV*)
1N/A
1N/AAlthough Perl will automatically grow strings for you, if you need to force
1N/APerl to allocate more memory for your SV, you can use the macro
1N/A
1N/A SvGROW(SV*, STRLEN newlen)
1N/A
1N/Awhich will determine if more memory needs to be allocated. If so, it will
1N/Acall the function C<sv_grow>. Note that C<SvGROW> can only increase, not
1N/Adecrease, the allocated memory of an SV and that it does not automatically
1N/Aadd a byte for the a trailing NUL (perl's own string functions typically do
1N/AC<SvGROW(sv, len + 1)>).
1N/A
1N/AIf you have an SV and want to know what kind of data Perl thinks is stored
1N/Ain it, you can use the following macros to check the type of SV you have.
1N/A
1N/A SvIOK(SV*)
1N/A SvNOK(SV*)
1N/A SvPOK(SV*)
1N/A
1N/AYou can get and set the current length of the string stored in an SV with
1N/Athe following macros:
1N/A
1N/A SvCUR(SV*)
1N/A SvCUR_set(SV*, I32 val)
1N/A
1N/AYou can also get a pointer to the end of the string stored in the SV
1N/Awith the macro:
1N/A
1N/A SvEND(SV*)
1N/A
1N/ABut note that these last three macros are valid only if C<SvPOK()> is true.
1N/A
1N/AIf you want to append something to the end of string stored in an C<SV*>,
1N/Ayou can use the following functions:
1N/A
1N/A void sv_catpv(SV*, const char*);
1N/A void sv_catpvn(SV*, const char*, STRLEN);
1N/A void sv_catpvf(SV*, const char*, ...);
1N/A void sv_vcatpvfn(SV*, const char*, STRLEN, va_list *, SV **, I32, bool);
1N/A void sv_catsv(SV*, SV*);
1N/A
1N/AThe first function calculates the length of the string to be appended by
1N/Ausing C<strlen>. In the second, you specify the length of the string
1N/Ayourself. The third function processes its arguments like C<sprintf> and
1N/Aappends the formatted output. The fourth function works like C<vsprintf>.
1N/AYou can specify the address and length of an array of SVs instead of the
1N/Ava_list argument. The fifth function extends the string stored in the first
1N/ASV with the string stored in the second SV. It also forces the second SV
1N/Ato be interpreted as a string.
1N/A
1N/AThe C<sv_cat*()> functions are not generic enough to operate on values that
1N/Ahave "magic". See L<Magic Virtual Tables> later in this document.
1N/A
1N/AIf you know the name of a scalar variable, you can get a pointer to its SV
1N/Aby using the following:
1N/A
1N/A SV* get_sv("package::varname", FALSE);
1N/A
1N/AThis returns NULL if the variable does not exist.
1N/A
1N/AIf you want to know if this variable (or any other SV) is actually C<defined>,
1N/Ayou can call:
1N/A
1N/A SvOK(SV*)
1N/A
1N/AThe scalar C<undef> value is stored in an SV instance called C<PL_sv_undef>.
1N/AIts address can be used whenever an C<SV*> is needed.
1N/AHowever, you have to be careful when using C<&PL_sv_undef> as a value in AVs
1N/Aor HVs (see L<AVs, HVs and undefined values>).
1N/A
1N/AThere are also the two values C<PL_sv_yes> and C<PL_sv_no>, which contain
1N/Aboolean TRUE and FALSE values, respectively. Like C<PL_sv_undef>, their
1N/Aaddresses can be used whenever an C<SV*> is needed.
1N/A
1N/ADo not be fooled into thinking that C<(SV *) 0> is the same as C<&PL_sv_undef>.
1N/ATake this code:
1N/A
1N/A SV* sv = (SV*) 0;
1N/A if (I-am-to-return-a-real-value) {
1N/A sv = sv_2mortal(newSViv(42));
1N/A }
1N/A sv_setsv(ST(0), sv);
1N/A
1N/AThis code tries to return a new SV (which contains the value 42) if it should
1N/Areturn a real value, or undef otherwise. Instead it has returned a NULL
1N/Apointer which, somewhere down the line, will cause a segmentation violation,
1N/Abus error, or just weird results. Change the zero to C<&PL_sv_undef> in the
1N/Afirst line and all will be well.
1N/A
1N/ATo free an SV that you've created, call C<SvREFCNT_dec(SV*)>. Normally this
1N/Acall is not necessary (see L<Reference Counts and Mortality>).
1N/A
1N/A=head2 Offsets
1N/A
1N/APerl provides the function C<sv_chop> to efficiently remove characters
1N/Afrom the beginning of a string; you give it an SV and a pointer to
1N/Asomewhere inside the PV, and it discards everything before the
1N/Apointer. The efficiency comes by means of a little hack: instead of
1N/Aactually removing the characters, C<sv_chop> sets the flag C<OOK>
1N/A(offset OK) to signal to other functions that the offset hack is in
1N/Aeffect, and it puts the number of bytes chopped off into the IV field
1N/Aof the SV. It then moves the PV pointer (called C<SvPVX>) forward that
1N/Amany bytes, and adjusts C<SvCUR> and C<SvLEN>.
1N/A
1N/AHence, at this point, the start of the buffer that we allocated lives
1N/Aat C<SvPVX(sv) - SvIV(sv)> in memory and the PV pointer is pointing
1N/Ainto the middle of this allocated storage.
1N/A
1N/AThis is best demonstrated by example:
1N/A
1N/A % ./perl -Ilib -MDevel::Peek -le '$a="12345"; $a=~s/.//; Dump($a)'
1N/A SV = PVIV(0x8128450) at 0x81340f0
1N/A REFCNT = 1
1N/A FLAGS = (POK,OOK,pPOK)
1N/A IV = 1 (OFFSET)
1N/A PV = 0x8135781 ( "1" . ) "2345"\0
1N/A CUR = 4
1N/A LEN = 5
1N/A
1N/AHere the number of bytes chopped off (1) is put into IV, and
1N/AC<Devel::Peek::Dump> helpfully reminds us that this is an offset. The
1N/Aportion of the string between the "real" and the "fake" beginnings is
1N/Ashown in parentheses, and the values of C<SvCUR> and C<SvLEN> reflect
1N/Athe fake beginning, not the real one.
1N/A
1N/ASomething similar to the offset hack is performed on AVs to enable
1N/Aefficient shifting and splicing off the beginning of the array; while
1N/AC<AvARRAY> points to the first element in the array that is visible from
1N/APerl, C<AvALLOC> points to the real start of the C array. These are
1N/Ausually the same, but a C<shift> operation can be carried out by
1N/Aincreasing C<AvARRAY> by one and decreasing C<AvFILL> and C<AvLEN>.
1N/AAgain, the location of the real start of the C array only comes into
1N/Aplay when freeing the array. See C<av_shift> in F<av.c>.
1N/A
1N/A=head2 What's Really Stored in an SV?
1N/A
1N/ARecall that the usual method of determining the type of scalar you have is
1N/Ato use C<Sv*OK> macros. Because a scalar can be both a number and a string,
1N/Ausually these macros will always return TRUE and calling the C<Sv*V>
1N/Amacros will do the appropriate conversion of string to integer/double or
1N/Ainteger/double to string.
1N/A
1N/AIf you I<really> need to know if you have an integer, double, or string
1N/Apointer in an SV, you can use the following three macros instead:
1N/A
1N/A SvIOKp(SV*)
1N/A SvNOKp(SV*)
1N/A SvPOKp(SV*)
1N/A
1N/AThese will tell you if you truly have an integer, double, or string pointer
1N/Astored in your SV. The "p" stands for private.
1N/A
1N/AThe are various ways in which the private and public flags may differ.
1N/AFor example, a tied SV may have a valid underlying value in the IV slot
1N/A(so SvIOKp is true), but the data should be accessed via the FETCH
1N/Aroutine rather than directly, so SvIOK is false. Another is when
1N/Anumeric conversion has occured and precision has been lost: only the
1N/Aprivate flag is set on 'lossy' values. So when an NV is converted to an
1N/AIV with loss, SvIOKp, SvNOKp and SvNOK will be set, while SvIOK wont be.
1N/A
1N/AIn general, though, it's best to use the C<Sv*V> macros.
1N/A
1N/A=head2 Working with AVs
1N/A
1N/AThere are two ways to create and load an AV. The first method creates an
1N/Aempty AV:
1N/A
1N/A AV* newAV();
1N/A
1N/AThe second method both creates the AV and initially populates it with SVs:
1N/A
1N/A AV* av_make(I32 num, SV **ptr);
1N/A
1N/AThe second argument points to an array containing C<num> C<SV*>'s. Once the
1N/AAV has been created, the SVs can be destroyed, if so desired.
1N/A
1N/AOnce the AV has been created, the following operations are possible on AVs:
1N/A
1N/A void av_push(AV*, SV*);
1N/A SV* av_pop(AV*);
1N/A SV* av_shift(AV*);
1N/A void av_unshift(AV*, I32 num);
1N/A
1N/AThese should be familiar operations, with the exception of C<av_unshift>.
1N/AThis routine adds C<num> elements at the front of the array with the C<undef>
1N/Avalue. You must then use C<av_store> (described below) to assign values
1N/Ato these new elements.
1N/A
1N/AHere are some other functions:
1N/A
1N/A I32 av_len(AV*);
1N/A SV** av_fetch(AV*, I32 key, I32 lval);
1N/A SV** av_store(AV*, I32 key, SV* val);
1N/A
1N/AThe C<av_len> function returns the highest index value in array (just
1N/Alike $#array in Perl). If the array is empty, -1 is returned. The
1N/AC<av_fetch> function returns the value at index C<key>, but if C<lval>
1N/Ais non-zero, then C<av_fetch> will store an undef value at that index.
1N/AThe C<av_store> function stores the value C<val> at index C<key>, and does
1N/Anot increment the reference count of C<val>. Thus the caller is responsible
1N/Afor taking care of that, and if C<av_store> returns NULL, the caller will
1N/Ahave to decrement the reference count to avoid a memory leak. Note that
1N/AC<av_fetch> and C<av_store> both return C<SV**>'s, not C<SV*>'s as their
1N/Areturn value.
1N/A
1N/A void av_clear(AV*);
1N/A void av_undef(AV*);
1N/A void av_extend(AV*, I32 key);
1N/A
1N/AThe C<av_clear> function deletes all the elements in the AV* array, but
1N/Adoes not actually delete the array itself. The C<av_undef> function will
1N/Adelete all the elements in the array plus the array itself. The
1N/AC<av_extend> function extends the array so that it contains at least C<key+1>
1N/Aelements. If C<key+1> is less than the currently allocated length of the array,
1N/Athen nothing is done.
1N/A
1N/AIf you know the name of an array variable, you can get a pointer to its AV
1N/Aby using the following:
1N/A
1N/A AV* get_av("package::varname", FALSE);
1N/A
1N/AThis returns NULL if the variable does not exist.
1N/A
1N/ASee L<Understanding the Magic of Tied Hashes and Arrays> for more
1N/Ainformation on how to use the array access functions on tied arrays.
1N/A
1N/A=head2 Working with HVs
1N/A
1N/ATo create an HV, you use the following routine:
1N/A
1N/A HV* newHV();
1N/A
1N/AOnce the HV has been created, the following operations are possible on HVs:
1N/A
1N/A SV** hv_store(HV*, const char* key, U32 klen, SV* val, U32 hash);
1N/A SV** hv_fetch(HV*, const char* key, U32 klen, I32 lval);
1N/A
1N/AThe C<klen> parameter is the length of the key being passed in (Note that
1N/Ayou cannot pass 0 in as a value of C<klen> to tell Perl to measure the
1N/Alength of the key). The C<val> argument contains the SV pointer to the
1N/Ascalar being stored, and C<hash> is the precomputed hash value (zero if
1N/Ayou want C<hv_store> to calculate it for you). The C<lval> parameter
1N/Aindicates whether this fetch is actually a part of a store operation, in
1N/Awhich case a new undefined value will be added to the HV with the supplied
1N/Akey and C<hv_fetch> will return as if the value had already existed.
1N/A
1N/ARemember that C<hv_store> and C<hv_fetch> return C<SV**>'s and not just
1N/AC<SV*>. To access the scalar value, you must first dereference the return
1N/Avalue. However, you should check to make sure that the return value is
1N/Anot NULL before dereferencing it.
1N/A
1N/AThese two functions check if a hash table entry exists, and deletes it.
1N/A
1N/A bool hv_exists(HV*, const char* key, U32 klen);
1N/A SV* hv_delete(HV*, const char* key, U32 klen, I32 flags);
1N/A
1N/AIf C<flags> does not include the C<G_DISCARD> flag then C<hv_delete> will
1N/Acreate and return a mortal copy of the deleted value.
1N/A
1N/AAnd more miscellaneous functions:
1N/A
1N/A void hv_clear(HV*);
1N/A void hv_undef(HV*);
1N/A
1N/ALike their AV counterparts, C<hv_clear> deletes all the entries in the hash
1N/Atable but does not actually delete the hash table. The C<hv_undef> deletes
1N/Aboth the entries and the hash table itself.
1N/A
1N/APerl keeps the actual data in linked list of structures with a typedef of HE.
1N/AThese contain the actual key and value pointers (plus extra administrative
1N/Aoverhead). The key is a string pointer; the value is an C<SV*>. However,
1N/Aonce you have an C<HE*>, to get the actual key and value, use the routines
1N/Aspecified below.
1N/A
1N/A I32 hv_iterinit(HV*);
1N/A /* Prepares starting point to traverse hash table */
1N/A HE* hv_iternext(HV*);
1N/A /* Get the next entry, and return a pointer to a
1N/A structure that has both the key and value */
1N/A char* hv_iterkey(HE* entry, I32* retlen);
1N/A /* Get the key from an HE structure and also return
1N/A the length of the key string */
1N/A SV* hv_iterval(HV*, HE* entry);
1N/A /* Return an SV pointer to the value of the HE
1N/A structure */
1N/A SV* hv_iternextsv(HV*, char** key, I32* retlen);
1N/A /* This convenience routine combines hv_iternext,
1N/A hv_iterkey, and hv_iterval. The key and retlen
1N/A arguments are return values for the key and its
1N/A length. The value is returned in the SV* argument */
1N/A
1N/AIf you know the name of a hash variable, you can get a pointer to its HV
1N/Aby using the following:
1N/A
1N/A HV* get_hv("package::varname", FALSE);
1N/A
1N/AThis returns NULL if the variable does not exist.
1N/A
1N/AThe hash algorithm is defined in the C<PERL_HASH(hash, key, klen)> macro:
1N/A
1N/A hash = 0;
1N/A while (klen--)
1N/A hash = (hash * 33) + *key++;
1N/A hash = hash + (hash >> 5); /* after 5.6 */
1N/A
1N/AThe last step was added in version 5.6 to improve distribution of
1N/Alower bits in the resulting hash value.
1N/A
1N/ASee L<Understanding the Magic of Tied Hashes and Arrays> for more
1N/Ainformation on how to use the hash access functions on tied hashes.
1N/A
1N/A=head2 Hash API Extensions
1N/A
1N/ABeginning with version 5.004, the following functions are also supported:
1N/A
1N/A HE* hv_fetch_ent (HV* tb, SV* key, I32 lval, U32 hash);
1N/A HE* hv_store_ent (HV* tb, SV* key, SV* val, U32 hash);
1N/A
1N/A bool hv_exists_ent (HV* tb, SV* key, U32 hash);
1N/A SV* hv_delete_ent (HV* tb, SV* key, I32 flags, U32 hash);
1N/A
1N/A SV* hv_iterkeysv (HE* entry);
1N/A
1N/ANote that these functions take C<SV*> keys, which simplifies writing
1N/Aof extension code that deals with hash structures. These functions
1N/Aalso allow passing of C<SV*> keys to C<tie> functions without forcing
1N/Ayou to stringify the keys (unlike the previous set of functions).
1N/A
1N/AThey also return and accept whole hash entries (C<HE*>), making their
1N/Ause more efficient (since the hash number for a particular string
1N/Adoesn't have to be recomputed every time). See L<perlapi> for detailed
1N/Adescriptions.
1N/A
1N/AThe following macros must always be used to access the contents of hash
1N/Aentries. Note that the arguments to these macros must be simple
1N/Avariables, since they may get evaluated more than once. See
1N/AL<perlapi> for detailed descriptions of these macros.
1N/A
1N/A HePV(HE* he, STRLEN len)
1N/A HeVAL(HE* he)
1N/A HeHASH(HE* he)
1N/A HeSVKEY(HE* he)
1N/A HeSVKEY_force(HE* he)
1N/A HeSVKEY_set(HE* he, SV* sv)
1N/A
1N/AThese two lower level macros are defined, but must only be used when
1N/Adealing with keys that are not C<SV*>s:
1N/A
1N/A HeKEY(HE* he)
1N/A HeKLEN(HE* he)
1N/A
1N/ANote that both C<hv_store> and C<hv_store_ent> do not increment the
1N/Areference count of the stored C<val>, which is the caller's responsibility.
1N/AIf these functions return a NULL value, the caller will usually have to
1N/Adecrement the reference count of C<val> to avoid a memory leak.
1N/A
1N/A=head2 AVs, HVs and undefined values
1N/A
1N/ASometimes you have to store undefined values in AVs or HVs. Although
1N/Athis may be a rare case, it can be tricky. That's because you're
1N/Aused to using C<&PL_sv_undef> if you need an undefined SV.
1N/A
1N/AFor example, intuition tells you that this XS code:
1N/A
1N/A AV *av = newAV();
1N/A av_store( av, 0, &PL_sv_undef );
1N/A
1N/Ais equivalent to this Perl code:
1N/A
1N/A my @av;
1N/A $av[0] = undef;
1N/A
1N/AUnfortunately, this isn't true. AVs use C<&PL_sv_undef> as a marker
1N/Afor indicating that an array element has not yet been initialized.
1N/AThus, C<exists $av[0]> would be true for the above Perl code, but
1N/Afalse for the array generated by the XS code.
1N/A
1N/AOther problems can occur when storing C<&PL_sv_undef> in HVs:
1N/A
1N/A hv_store( hv, "key", 3, &PL_sv_undef, 0 );
1N/A
1N/AThis will indeed make the value C<undef>, but if you try to modify
1N/Athe value of C<key>, you'll get the following error:
1N/A
1N/A Modification of non-creatable hash value attempted
1N/A
1N/AIn perl 5.8.0, C<&PL_sv_undef> was also used to mark placeholders
1N/Ain restricted hashes. This caused such hash entries not to appear
1N/Awhen iterating over the hash or when checking for the keys
1N/Awith the C<hv_exists> function.
1N/A
1N/AYou can run into similar problems when you store C<&PL_sv_true> or
1N/AC<&PL_sv_false> into AVs or HVs. Trying to modify such elements
1N/Awill give you the following error:
1N/A
1N/A Modification of a read-only value attempted
1N/A
1N/ATo make a long story short, you can use the special variables
1N/AC<&PL_sv_undef>, C<&PL_sv_true> and C<&PL_sv_false> with AVs and
1N/AHVs, but you have to make sure you know what you're doing.
1N/A
1N/AGenerally, if you want to store an undefined value in an AV
1N/Aor HV, you should not use C<&PL_sv_undef>, but rather create a
1N/Anew undefined value using the C<newSV> function, for example:
1N/A
1N/A av_store( av, 42, newSV(0) );
1N/A hv_store( hv, "foo", 3, newSV(0), 0 );
1N/A
1N/A=head2 References
1N/A
1N/AReferences are a special type of scalar that point to other data types
1N/A(including references).
1N/A
1N/ATo create a reference, use either of the following functions:
1N/A
1N/A SV* newRV_inc((SV*) thing);
1N/A SV* newRV_noinc((SV*) thing);
1N/A
1N/AThe C<thing> argument can be any of an C<SV*>, C<AV*>, or C<HV*>. The
1N/Afunctions are identical except that C<newRV_inc> increments the reference
1N/Acount of the C<thing>, while C<newRV_noinc> does not. For historical
1N/Areasons, C<newRV> is a synonym for C<newRV_inc>.
1N/A
1N/AOnce you have a reference, you can use the following macro to dereference
1N/Athe reference:
1N/A
1N/A SvRV(SV*)
1N/A
1N/Athen call the appropriate routines, casting the returned C<SV*> to either an
1N/AC<AV*> or C<HV*>, if required.
1N/A
1N/ATo determine if an SV is a reference, you can use the following macro:
1N/A
1N/A SvROK(SV*)
1N/A
1N/ATo discover what type of value the reference refers to, use the following
1N/Amacro and then check the return value.
1N/A
1N/A SvTYPE(SvRV(SV*))
1N/A
1N/AThe most useful types that will be returned are:
1N/A
1N/A SVt_IV Scalar
1N/A SVt_NV Scalar
1N/A SVt_PV Scalar
1N/A SVt_RV Scalar
1N/A SVt_PVAV Array
1N/A SVt_PVHV Hash
1N/A SVt_PVCV Code
1N/A SVt_PVGV Glob (possible a file handle)
1N/A SVt_PVMG Blessed or Magical Scalar
1N/A
1N/A See the sv.h header file for more details.
1N/A
1N/A=head2 Blessed References and Class Objects
1N/A
1N/AReferences are also used to support object-oriented programming. In perl's
1N/AOO lexicon, an object is simply a reference that has been blessed into a
1N/Apackage (or class). Once blessed, the programmer may now use the reference
1N/Ato access the various methods in the class.
1N/A
1N/AA reference can be blessed into a package with the following function:
1N/A
1N/A SV* sv_bless(SV* sv, HV* stash);
1N/A
1N/AThe C<sv> argument must be a reference value. The C<stash> argument
1N/Aspecifies which class the reference will belong to. See
1N/AL<Stashes and Globs> for information on converting class names into stashes.
1N/A
1N/A/* Still under construction */
1N/A
1N/AUpgrades rv to reference if not already one. Creates new SV for rv to
1N/Apoint to. If C<classname> is non-null, the SV is blessed into the specified
1N/Aclass. SV is returned.
1N/A
1N/A SV* newSVrv(SV* rv, const char* classname);
1N/A
1N/ACopies integer, unsigned integer or double into an SV whose reference is C<rv>. SV is blessed
1N/Aif C<classname> is non-null.
1N/A
1N/A SV* sv_setref_iv(SV* rv, const char* classname, IV iv);
1N/A SV* sv_setref_uv(SV* rv, const char* classname, UV uv);
1N/A SV* sv_setref_nv(SV* rv, const char* classname, NV iv);
1N/A
1N/ACopies the pointer value (I<the address, not the string!>) into an SV whose
1N/Areference is rv. SV is blessed if C<classname> is non-null.
1N/A
1N/A SV* sv_setref_pv(SV* rv, const char* classname, PV iv);
1N/A
1N/ACopies string into an SV whose reference is C<rv>. Set length to 0 to let
1N/APerl calculate the string length. SV is blessed if C<classname> is non-null.
1N/A
1N/A SV* sv_setref_pvn(SV* rv, const char* classname, PV iv, STRLEN length);
1N/A
1N/ATests whether the SV is blessed into the specified class. It does not
1N/Acheck inheritance relationships.
1N/A
1N/A int sv_isa(SV* sv, const char* name);
1N/A
1N/ATests whether the SV is a reference to a blessed object.
1N/A
1N/A int sv_isobject(SV* sv);
1N/A
1N/ATests whether the SV is derived from the specified class. SV can be either
1N/Aa reference to a blessed object or a string containing a class name. This
1N/Ais the function implementing the C<UNIVERSAL::isa> functionality.
1N/A
1N/A bool sv_derived_from(SV* sv, const char* name);
1N/A
1N/ATo check if you've got an object derived from a specific class you have
1N/Ato write:
1N/A
1N/A if (sv_isobject(sv) && sv_derived_from(sv, class)) { ... }
1N/A
1N/A=head2 Creating New Variables
1N/A
1N/ATo create a new Perl variable with an undef value which can be accessed from
1N/Ayour Perl script, use the following routines, depending on the variable type.
1N/A
1N/A SV* get_sv("package::varname", TRUE);
1N/A AV* get_av("package::varname", TRUE);
1N/A HV* get_hv("package::varname", TRUE);
1N/A
1N/ANotice the use of TRUE as the second parameter. The new variable can now
1N/Abe set, using the routines appropriate to the data type.
1N/A
1N/AThere are additional macros whose values may be bitwise OR'ed with the
1N/AC<TRUE> argument to enable certain extra features. Those bits are:
1N/A
1N/A=over
1N/A
1N/A=item GV_ADDMULTI
1N/A
1N/AMarks the variable as multiply defined, thus preventing the:
1N/A
1N/A Name <varname> used only once: possible typo
1N/A
1N/Awarning.
1N/A
1N/A=item GV_ADDWARN
1N/A
1N/AIssues the warning:
1N/A
1N/A Had to create <varname> unexpectedly
1N/A
1N/Aif the variable did not exist before the function was called.
1N/A
1N/A=back
1N/A
1N/AIf you do not specify a package name, the variable is created in the current
1N/Apackage.
1N/A
1N/A=head2 Reference Counts and Mortality
1N/A
1N/APerl uses a reference count-driven garbage collection mechanism. SVs,
1N/AAVs, or HVs (xV for short in the following) start their life with a
1N/Areference count of 1. If the reference count of an xV ever drops to 0,
1N/Athen it will be destroyed and its memory made available for reuse.
1N/A
1N/AThis normally doesn't happen at the Perl level unless a variable is
1N/Aundef'ed or the last variable holding a reference to it is changed or
1N/Aoverwritten. At the internal level, however, reference counts can be
1N/Amanipulated with the following macros:
1N/A
1N/A int SvREFCNT(SV* sv);
1N/A SV* SvREFCNT_inc(SV* sv);
1N/A void SvREFCNT_dec(SV* sv);
1N/A
1N/AHowever, there is one other function which manipulates the reference
1N/Acount of its argument. The C<newRV_inc> function, you will recall,
1N/Acreates a reference to the specified argument. As a side effect,
1N/Ait increments the argument's reference count. If this is not what
1N/Ayou want, use C<newRV_noinc> instead.
1N/A
1N/AFor example, imagine you want to return a reference from an XSUB function.
1N/AInside the XSUB routine, you create an SV which initially has a reference
1N/Acount of one. Then you call C<newRV_inc>, passing it the just-created SV.
1N/AThis returns the reference as a new SV, but the reference count of the
1N/ASV you passed to C<newRV_inc> has been incremented to two. Now you
1N/Areturn the reference from the XSUB routine and forget about the SV.
1N/ABut Perl hasn't! Whenever the returned reference is destroyed, the
1N/Areference count of the original SV is decreased to one and nothing happens.
1N/AThe SV will hang around without any way to access it until Perl itself
1N/Aterminates. This is a memory leak.
1N/A
1N/AThe correct procedure, then, is to use C<newRV_noinc> instead of
1N/AC<newRV_inc>. Then, if and when the last reference is destroyed,
1N/Athe reference count of the SV will go to zero and it will be destroyed,
1N/Astopping any memory leak.
1N/A
1N/AThere are some convenience functions available that can help with the
1N/Adestruction of xVs. These functions introduce the concept of "mortality".
1N/AAn xV that is mortal has had its reference count marked to be decremented,
1N/Abut not actually decremented, until "a short time later". Generally the
1N/Aterm "short time later" means a single Perl statement, such as a call to
1N/Aan XSUB function. The actual determinant for when mortal xVs have their
1N/Areference count decremented depends on two macros, SAVETMPS and FREETMPS.
1N/ASee L<perlcall> and L<perlxs> for more details on these macros.
1N/A
1N/A"Mortalization" then is at its simplest a deferred C<SvREFCNT_dec>.
1N/AHowever, if you mortalize a variable twice, the reference count will
1N/Alater be decremented twice.
1N/A
1N/A"Mortal" SVs are mainly used for SVs that are placed on perl's stack.
1N/AFor example an SV which is created just to pass a number to a called sub
1N/Ais made mortal to have it cleaned up automatically when it's popped off
1N/Athe stack. Similarly, results returned by XSUBs (which are pushed on the
1N/Astack) are often made mortal.
1N/A
1N/ATo create a mortal variable, use the functions:
1N/A
1N/A SV* sv_newmortal()
1N/A SV* sv_2mortal(SV*)
1N/A SV* sv_mortalcopy(SV*)
1N/A
1N/AThe first call creates a mortal SV (with no value), the second converts an existing
1N/ASV to a mortal SV (and thus defers a call to C<SvREFCNT_dec>), and the
1N/Athird creates a mortal copy of an existing SV.
1N/ABecause C<sv_newmortal> gives the new SV no value,it must normally be given one
1N/Avia C<sv_setpv>, C<sv_setiv>, etc. :
1N/A
1N/A SV *tmp = sv_newmortal();
1N/A sv_setiv(tmp, an_integer);
1N/A
1N/AAs that is multiple C statements it is quite common so see this idiom instead:
1N/A
1N/A SV *tmp = sv_2mortal(newSViv(an_integer));
1N/A
1N/A
1N/AYou should be careful about creating mortal variables. Strange things
1N/Acan happen if you make the same value mortal within multiple contexts,
1N/Aor if you make a variable mortal multiple times. Thinking of "Mortalization"
1N/Aas deferred C<SvREFCNT_dec> should help to minimize such problems.
1N/AFor example if you are passing an SV which you I<know> has high enough REFCNT
1N/Ato survive its use on the stack you need not do any mortalization.
1N/AIf you are not sure then doing an C<SvREFCNT_inc> and C<sv_2mortal>, or
1N/Amaking a C<sv_mortalcopy> is safer.
1N/A
1N/AThe mortal routines are not just for SVs -- AVs and HVs can be
1N/Amade mortal by passing their address (type-casted to C<SV*>) to the
1N/AC<sv_2mortal> or C<sv_mortalcopy> routines.
1N/A
1N/A=head2 Stashes and Globs
1N/A
1N/AA B<stash> is a hash that contains all variables that are defined
1N/Awithin a package. Each key of the stash is a symbol
1N/Aname (shared by all the different types of objects that have the same
1N/Aname), and each value in the hash table is a GV (Glob Value). This GV
1N/Ain turn contains references to the various objects of that name,
1N/Aincluding (but not limited to) the following:
1N/A
1N/A Scalar Value
1N/A Array Value
1N/A Hash Value
1N/A I/O Handle
1N/A Format
1N/A Subroutine
1N/A
1N/AThere is a single stash called C<PL_defstash> that holds the items that exist
1N/Ain the C<main> package. To get at the items in other packages, append the
1N/Astring "::" to the package name. The items in the C<Foo> package are in
1N/Athe stash C<Foo::> in PL_defstash. The items in the C<Bar::Baz> package are
1N/Ain the stash C<Baz::> in C<Bar::>'s stash.
1N/A
1N/ATo get the stash pointer for a particular package, use the function:
1N/A
1N/A HV* gv_stashpv(const char* name, I32 create)
1N/A HV* gv_stashsv(SV*, I32 create)
1N/A
1N/AThe first function takes a literal string, the second uses the string stored
1N/Ain the SV. Remember that a stash is just a hash table, so you get back an
1N/AC<HV*>. The C<create> flag will create a new package if it is set.
1N/A
1N/AThe name that C<gv_stash*v> wants is the name of the package whose symbol table
1N/Ayou want. The default package is called C<main>. If you have multiply nested
1N/Apackages, pass their names to C<gv_stash*v>, separated by C<::> as in the Perl
1N/Alanguage itself.
1N/A
1N/AAlternately, if you have an SV that is a blessed reference, you can find
1N/Aout the stash pointer by using:
1N/A
1N/A HV* SvSTASH(SvRV(SV*));
1N/A
1N/Athen use the following to get the package name itself:
1N/A
1N/A char* HvNAME(HV* stash);
1N/A
1N/AIf you need to bless or re-bless an object you can use the following
1N/Afunction:
1N/A
1N/A SV* sv_bless(SV*, HV* stash)
1N/A
1N/Awhere the first argument, an C<SV*>, must be a reference, and the second
1N/Aargument is a stash. The returned C<SV*> can now be used in the same way
1N/Aas any other SV.
1N/A
1N/AFor more information on references and blessings, consult L<perlref>.
1N/A
1N/A=head2 Double-Typed SVs
1N/A
1N/AScalar variables normally contain only one type of value, an integer,
1N/Adouble, pointer, or reference. Perl will automatically convert the
1N/Aactual scalar data from the stored type into the requested type.
1N/A
1N/ASome scalar variables contain more than one type of scalar data. For
1N/Aexample, the variable C<$!> contains either the numeric value of C<errno>
1N/Aor its string equivalent from either C<strerror> or C<sys_errlist[]>.
1N/A
1N/ATo force multiple data values into an SV, you must do two things: use the
1N/AC<sv_set*v> routines to add the additional scalar type, then set a flag
1N/Aso that Perl will believe it contains more than one type of data. The
1N/Afour macros to set the flags are:
1N/A
1N/A SvIOK_on
1N/A SvNOK_on
1N/A SvPOK_on
1N/A SvROK_on
1N/A
1N/AThe particular macro you must use depends on which C<sv_set*v> routine
1N/Ayou called first. This is because every C<sv_set*v> routine turns on
1N/Aonly the bit for the particular type of data being set, and turns off
1N/Aall the rest.
1N/A
1N/AFor example, to create a new Perl variable called "dberror" that contains
1N/Aboth the numeric and descriptive string error values, you could use the
1N/Afollowing code:
1N/A
1N/A extern int dberror;
1N/A extern char *dberror_list;
1N/A
1N/A SV* sv = get_sv("dberror", TRUE);
1N/A sv_setiv(sv, (IV) dberror);
1N/A sv_setpv(sv, dberror_list[dberror]);
1N/A SvIOK_on(sv);
1N/A
1N/AIf the order of C<sv_setiv> and C<sv_setpv> had been reversed, then the
1N/Amacro C<SvPOK_on> would need to be called instead of C<SvIOK_on>.
1N/A
1N/A=head2 Magic Variables
1N/A
1N/A[This section still under construction. Ignore everything here. Post no
1N/Abills. Everything not permitted is forbidden.]
1N/A
1N/AAny SV may be magical, that is, it has special features that a normal
1N/ASV does not have. These features are stored in the SV structure in a
1N/Alinked list of C<struct magic>'s, typedef'ed to C<MAGIC>.
1N/A
1N/A struct magic {
1N/A MAGIC* mg_moremagic;
1N/A MGVTBL* mg_virtual;
1N/A U16 mg_private;
1N/A char mg_type;
1N/A U8 mg_flags;
1N/A SV* mg_obj;
1N/A char* mg_ptr;
1N/A I32 mg_len;
1N/A };
1N/A
1N/ANote this is current as of patchlevel 0, and could change at any time.
1N/A
1N/A=head2 Assigning Magic
1N/A
1N/APerl adds magic to an SV using the sv_magic function:
1N/A
1N/A void sv_magic(SV* sv, SV* obj, int how, const char* name, I32 namlen);
1N/A
1N/AThe C<sv> argument is a pointer to the SV that is to acquire a new magical
1N/Afeature.
1N/A
1N/AIf C<sv> is not already magical, Perl uses the C<SvUPGRADE> macro to
1N/Aconvert C<sv> to type C<SVt_PVMG>. Perl then continues by adding new magic
1N/Ato the beginning of the linked list of magical features. Any prior entry
1N/Aof the same type of magic is deleted. Note that this can be overridden,
1N/Aand multiple instances of the same type of magic can be associated with an
1N/ASV.
1N/A
1N/AThe C<name> and C<namlen> arguments are used to associate a string with
1N/Athe magic, typically the name of a variable. C<namlen> is stored in the
1N/AC<mg_len> field and if C<name> is non-null and C<namlen> E<gt>= 0 a malloc'd
1N/Acopy of the name is stored in C<mg_ptr> field.
1N/A
1N/AThe sv_magic function uses C<how> to determine which, if any, predefined
1N/A"Magic Virtual Table" should be assigned to the C<mg_virtual> field.
1N/ASee the L<Magic Virtual Tables> section below. The C<how> argument is also
1N/Astored in the C<mg_type> field. The value of C<how> should be chosen
1N/Afrom the set of macros C<PERL_MAGIC_foo> found in F<perl.h>. Note that before
1N/Athese macros were added, Perl internals used to directly use character
1N/Aliterals, so you may occasionally come across old code or documentation
1N/Areferring to 'U' magic rather than C<PERL_MAGIC_uvar> for example.
1N/A
1N/AThe C<obj> argument is stored in the C<mg_obj> field of the C<MAGIC>
1N/Astructure. If it is not the same as the C<sv> argument, the reference
1N/Acount of the C<obj> object is incremented. If it is the same, or if
1N/Athe C<how> argument is C<PERL_MAGIC_arylen>, or if it is a NULL pointer,
1N/Athen C<obj> is merely stored, without the reference count being incremented.
1N/A
1N/AThere is also a function to add magic to an C<HV>:
1N/A
1N/A void hv_magic(HV *hv, GV *gv, int how);
1N/A
1N/AThis simply calls C<sv_magic> and coerces the C<gv> argument into an C<SV>.
1N/A
1N/ATo remove the magic from an SV, call the function sv_unmagic:
1N/A
1N/A void sv_unmagic(SV *sv, int type);
1N/A
1N/AThe C<type> argument should be equal to the C<how> value when the C<SV>
1N/Awas initially made magical.
1N/A
1N/A=head2 Magic Virtual Tables
1N/A
1N/AThe C<mg_virtual> field in the C<MAGIC> structure is a pointer to an
1N/AC<MGVTBL>, which is a structure of function pointers and stands for
1N/A"Magic Virtual Table" to handle the various operations that might be
1N/Aapplied to that variable.
1N/A
1N/AThe C<MGVTBL> has five pointers to the following routine types:
1N/A
1N/A int (*svt_get)(SV* sv, MAGIC* mg);
1N/A int (*svt_set)(SV* sv, MAGIC* mg);
1N/A U32 (*svt_len)(SV* sv, MAGIC* mg);
1N/A int (*svt_clear)(SV* sv, MAGIC* mg);
1N/A int (*svt_free)(SV* sv, MAGIC* mg);
1N/A
1N/AThis MGVTBL structure is set at compile-time in F<perl.h> and there are
1N/Acurrently 19 types (or 21 with overloading turned on). These different
1N/Astructures contain pointers to various routines that perform additional
1N/Aactions depending on which function is being called.
1N/A
1N/A Function pointer Action taken
1N/A ---------------- ------------
1N/A svt_get Do something before the value of the SV is retrieved.
1N/A svt_set Do something after the SV is assigned a value.
1N/A svt_len Report on the SV's length.
1N/A svt_clear Clear something the SV represents.
1N/A svt_free Free any extra storage associated with the SV.
1N/A
1N/AFor instance, the MGVTBL structure called C<vtbl_sv> (which corresponds
1N/Ato an C<mg_type> of C<PERL_MAGIC_sv>) contains:
1N/A
1N/A { magic_get, magic_set, magic_len, 0, 0 }
1N/A
1N/AThus, when an SV is determined to be magical and of type C<PERL_MAGIC_sv>,
1N/Aif a get operation is being performed, the routine C<magic_get> is
1N/Acalled. All the various routines for the various magical types begin
1N/Awith C<magic_>. NOTE: the magic routines are not considered part of
1N/Athe Perl API, and may not be exported by the Perl library.
1N/A
1N/AThe current kinds of Magic Virtual Tables are:
1N/A
1N/A mg_type
1N/A (old-style char and macro) MGVTBL Type of magic
1N/A -------------------------- ------ ----------------------------
1N/A \0 PERL_MAGIC_sv vtbl_sv Special scalar variable
1N/A A PERL_MAGIC_overload vtbl_amagic %OVERLOAD hash
1N/A a PERL_MAGIC_overload_elem vtbl_amagicelem %OVERLOAD hash element
1N/A c PERL_MAGIC_overload_table (none) Holds overload table (AMT)
1N/A on stash
1N/A B PERL_MAGIC_bm vtbl_bm Boyer-Moore (fast string search)
1N/A D PERL_MAGIC_regdata vtbl_regdata Regex match position data
1N/A (@+ and @- vars)
1N/A d PERL_MAGIC_regdatum vtbl_regdatum Regex match position data
1N/A element
1N/A E PERL_MAGIC_env vtbl_env %ENV hash
1N/A e PERL_MAGIC_envelem vtbl_envelem %ENV hash element
1N/A f PERL_MAGIC_fm vtbl_fm Formline ('compiled' format)
1N/A g PERL_MAGIC_regex_global vtbl_mglob m//g target / study()ed string
1N/A I PERL_MAGIC_isa vtbl_isa @ISA array
1N/A i PERL_MAGIC_isaelem vtbl_isaelem @ISA array element
1N/A k PERL_MAGIC_nkeys vtbl_nkeys scalar(keys()) lvalue
1N/A L PERL_MAGIC_dbfile (none) Debugger %_<filename
1N/A l PERL_MAGIC_dbline vtbl_dbline Debugger %_<filename element
1N/A m PERL_MAGIC_mutex vtbl_mutex ???
1N/A o PERL_MAGIC_collxfrm vtbl_collxfrm Locale collate transformation
1N/A P PERL_MAGIC_tied vtbl_pack Tied array or hash
1N/A p PERL_MAGIC_tiedelem vtbl_packelem Tied array or hash element
1N/A q PERL_MAGIC_tiedscalar vtbl_packelem Tied scalar or handle
1N/A r PERL_MAGIC_qr vtbl_qr precompiled qr// regex
1N/A S PERL_MAGIC_sig vtbl_sig %SIG hash
1N/A s PERL_MAGIC_sigelem vtbl_sigelem %SIG hash element
1N/A t PERL_MAGIC_taint vtbl_taint Taintedness
1N/A U PERL_MAGIC_uvar vtbl_uvar Available for use by extensions
1N/A v PERL_MAGIC_vec vtbl_vec vec() lvalue
1N/A V PERL_MAGIC_vstring (none) v-string scalars
1N/A w PERL_MAGIC_utf8 vtbl_utf8 UTF-8 length+offset cache
1N/A x PERL_MAGIC_substr vtbl_substr substr() lvalue
1N/A y PERL_MAGIC_defelem vtbl_defelem Shadow "foreach" iterator
1N/A variable / smart parameter
1N/A vivification
1N/A * PERL_MAGIC_glob vtbl_glob GV (typeglob)
1N/A # PERL_MAGIC_arylen vtbl_arylen Array length ($#ary)
1N/A . PERL_MAGIC_pos vtbl_pos pos() lvalue
1N/A < PERL_MAGIC_backref vtbl_backref ???
1N/A ~ PERL_MAGIC_ext (none) Available for use by extensions
1N/A
1N/AWhen an uppercase and lowercase letter both exist in the table, then the
1N/Auppercase letter is typically used to represent some kind of composite type
1N/A(a list or a hash), and the lowercase letter is used to represent an element
1N/Aof that composite type. Some internals code makes use of this case
1N/Arelationship. However, 'v' and 'V' (vec and v-string) are in no way related.
1N/A
1N/AThe C<PERL_MAGIC_ext> and C<PERL_MAGIC_uvar> magic types are defined
1N/Aspecifically for use by extensions and will not be used by perl itself.
1N/AExtensions can use C<PERL_MAGIC_ext> magic to 'attach' private information
1N/Ato variables (typically objects). This is especially useful because
1N/Athere is no way for normal perl code to corrupt this private information
1N/A(unlike using extra elements of a hash object).
1N/A
1N/ASimilarly, C<PERL_MAGIC_uvar> magic can be used much like tie() to call a
1N/AC function any time a scalar's value is used or changed. The C<MAGIC>'s
1N/AC<mg_ptr> field points to a C<ufuncs> structure:
1N/A
1N/A struct ufuncs {
1N/A I32 (*uf_val)(pTHX_ IV, SV*);
1N/A I32 (*uf_set)(pTHX_ IV, SV*);
1N/A IV uf_index;
1N/A };
1N/A
1N/AWhen the SV is read from or written to, the C<uf_val> or C<uf_set>
1N/Afunction will be called with C<uf_index> as the first arg and a pointer to
1N/Athe SV as the second. A simple example of how to add C<PERL_MAGIC_uvar>
1N/Amagic is shown below. Note that the ufuncs structure is copied by
1N/Asv_magic, so you can safely allocate it on the stack.
1N/A
1N/A void
1N/A Umagic(sv)
1N/A SV *sv;
1N/A PREINIT:
1N/A struct ufuncs uf;
1N/A CODE:
1N/A uf.uf_val = &my_get_fn;
1N/A uf.uf_set = &my_set_fn;
1N/A uf.uf_index = 0;
1N/A sv_magic(sv, 0, PERL_MAGIC_uvar, (char*)&uf, sizeof(uf));
1N/A
1N/ANote that because multiple extensions may be using C<PERL_MAGIC_ext>
1N/Aor C<PERL_MAGIC_uvar> magic, it is important for extensions to take
1N/Aextra care to avoid conflict. Typically only using the magic on
1N/Aobjects blessed into the same class as the extension is sufficient.
1N/AFor C<PERL_MAGIC_ext> magic, it may also be appropriate to add an I32
1N/A'signature' at the top of the private data area and check that.
1N/A
1N/AAlso note that the C<sv_set*()> and C<sv_cat*()> functions described
1N/Aearlier do B<not> invoke 'set' magic on their targets. This must
1N/Abe done by the user either by calling the C<SvSETMAGIC()> macro after
1N/Acalling these functions, or by using one of the C<sv_set*_mg()> or
1N/AC<sv_cat*_mg()> functions. Similarly, generic C code must call the
1N/AC<SvGETMAGIC()> macro to invoke any 'get' magic if they use an SV
1N/Aobtained from external sources in functions that don't handle magic.
1N/ASee L<perlapi> for a description of these functions.
1N/AFor example, calls to the C<sv_cat*()> functions typically need to be
1N/Afollowed by C<SvSETMAGIC()>, but they don't need a prior C<SvGETMAGIC()>
1N/Asince their implementation handles 'get' magic.
1N/A
1N/A=head2 Finding Magic
1N/A
1N/A MAGIC* mg_find(SV*, int type); /* Finds the magic pointer of that type */
1N/A
1N/AThis routine returns a pointer to the C<MAGIC> structure stored in the SV.
1N/AIf the SV does not have that magical feature, C<NULL> is returned. Also,
1N/Aif the SV is not of type SVt_PVMG, Perl may core dump.
1N/A
1N/A int mg_copy(SV* sv, SV* nsv, const char* key, STRLEN klen);
1N/A
1N/AThis routine checks to see what types of magic C<sv> has. If the mg_type
1N/Afield is an uppercase letter, then the mg_obj is copied to C<nsv>, but
1N/Athe mg_type field is changed to be the lowercase letter.
1N/A
1N/A=head2 Understanding the Magic of Tied Hashes and Arrays
1N/A
1N/ATied hashes and arrays are magical beasts of the C<PERL_MAGIC_tied>
1N/Amagic type.
1N/A
1N/AWARNING: As of the 5.004 release, proper usage of the array and hash
1N/Aaccess functions requires understanding a few caveats. Some
1N/Aof these caveats are actually considered bugs in the API, to be fixed
1N/Ain later releases, and are bracketed with [MAYCHANGE] below. If
1N/Ayou find yourself actually applying such information in this section, be
1N/Aaware that the behavior may change in the future, umm, without warning.
1N/A
1N/AThe perl tie function associates a variable with an object that implements
1N/Athe various GET, SET, etc methods. To perform the equivalent of the perl
1N/Atie function from an XSUB, you must mimic this behaviour. The code below
1N/Acarries out the necessary steps - firstly it creates a new hash, and then
1N/Acreates a second hash which it blesses into the class which will implement
1N/Athe tie methods. Lastly it ties the two hashes together, and returns a
1N/Areference to the new tied hash. Note that the code below does NOT call the
1N/ATIEHASH method in the MyTie class -
1N/Asee L<Calling Perl Routines from within C Programs> for details on how
1N/Ato do this.
1N/A
1N/A SV*
1N/A mytie()
1N/A PREINIT:
1N/A HV *hash;
1N/A HV *stash;
1N/A SV *tie;
1N/A CODE:
1N/A hash = newHV();
1N/A tie = newRV_noinc((SV*)newHV());
1N/A stash = gv_stashpv("MyTie", TRUE);
1N/A sv_bless(tie, stash);
1N/A hv_magic(hash, (GV*)tie, PERL_MAGIC_tied);
1N/A RETVAL = newRV_noinc(hash);
1N/A OUTPUT:
1N/A RETVAL
1N/A
1N/AThe C<av_store> function, when given a tied array argument, merely
1N/Acopies the magic of the array onto the value to be "stored", using
1N/AC<mg_copy>. It may also return NULL, indicating that the value did not
1N/Aactually need to be stored in the array. [MAYCHANGE] After a call to
1N/AC<av_store> on a tied array, the caller will usually need to call
1N/AC<mg_set(val)> to actually invoke the perl level "STORE" method on the
1N/ATIEARRAY object. If C<av_store> did return NULL, a call to
1N/AC<SvREFCNT_dec(val)> will also be usually necessary to avoid a memory
1N/Aleak. [/MAYCHANGE]
1N/A
1N/AThe previous paragraph is applicable verbatim to tied hash access using the
1N/AC<hv_store> and C<hv_store_ent> functions as well.
1N/A
1N/AC<av_fetch> and the corresponding hash functions C<hv_fetch> and
1N/AC<hv_fetch_ent> actually return an undefined mortal value whose magic
1N/Ahas been initialized using C<mg_copy>. Note the value so returned does not
1N/Aneed to be deallocated, as it is already mortal. [MAYCHANGE] But you will
1N/Aneed to call C<mg_get()> on the returned value in order to actually invoke
1N/Athe perl level "FETCH" method on the underlying TIE object. Similarly,
1N/Ayou may also call C<mg_set()> on the return value after possibly assigning
1N/Aa suitable value to it using C<sv_setsv>, which will invoke the "STORE"
1N/Amethod on the TIE object. [/MAYCHANGE]
1N/A
1N/A[MAYCHANGE]
1N/AIn other words, the array or hash fetch/store functions don't really
1N/Afetch and store actual values in the case of tied arrays and hashes. They
1N/Amerely call C<mg_copy> to attach magic to the values that were meant to be
1N/A"stored" or "fetched". Later calls to C<mg_get> and C<mg_set> actually
1N/Ado the job of invoking the TIE methods on the underlying objects. Thus
1N/Athe magic mechanism currently implements a kind of lazy access to arrays
1N/Aand hashes.
1N/A
1N/ACurrently (as of perl version 5.004), use of the hash and array access
1N/Afunctions requires the user to be aware of whether they are operating on
1N/A"normal" hashes and arrays, or on their tied variants. The API may be
1N/Achanged to provide more transparent access to both tied and normal data
1N/Atypes in future versions.
1N/A[/MAYCHANGE]
1N/A
1N/AYou would do well to understand that the TIEARRAY and TIEHASH interfaces
1N/Aare mere sugar to invoke some perl method calls while using the uniform hash
1N/Aand array syntax. The use of this sugar imposes some overhead (typically
1N/Aabout two to four extra opcodes per FETCH/STORE operation, in addition to
1N/Athe creation of all the mortal variables required to invoke the methods).
1N/AThis overhead will be comparatively small if the TIE methods are themselves
1N/Asubstantial, but if they are only a few statements long, the overhead
1N/Awill not be insignificant.
1N/A
1N/A=head2 Localizing changes
1N/A
1N/APerl has a very handy construction
1N/A
1N/A {
1N/A local $var = 2;
1N/A ...
1N/A }
1N/A
1N/AThis construction is I<approximately> equivalent to
1N/A
1N/A {
1N/A my $oldvar = $var;
1N/A $var = 2;
1N/A ...
1N/A $var = $oldvar;
1N/A }
1N/A
1N/AThe biggest difference is that the first construction would
1N/Areinstate the initial value of $var, irrespective of how control exits
1N/Athe block: C<goto>, C<return>, C<die>/C<eval>, etc. It is a little bit
1N/Amore efficient as well.
1N/A
1N/AThere is a way to achieve a similar task from C via Perl API: create a
1N/AI<pseudo-block>, and arrange for some changes to be automatically
1N/Aundone at the end of it, either explicit, or via a non-local exit (via
1N/Adie()). A I<block>-like construct is created by a pair of
1N/AC<ENTER>/C<LEAVE> macros (see L<perlcall/"Returning a Scalar">).
1N/ASuch a construct may be created specially for some important localized
1N/Atask, or an existing one (like boundaries of enclosing Perl
1N/Asubroutine/block, or an existing pair for freeing TMPs) may be
1N/Aused. (In the second case the overhead of additional localization must
1N/Abe almost negligible.) Note that any XSUB is automatically enclosed in
1N/Aan C<ENTER>/C<LEAVE> pair.
1N/A
1N/AInside such a I<pseudo-block> the following service is available:
1N/A
1N/A=over 4
1N/A
1N/A=item C<SAVEINT(int i)>
1N/A
1N/A=item C<SAVEIV(IV i)>
1N/A
1N/A=item C<SAVEI32(I32 i)>
1N/A
1N/A=item C<SAVELONG(long i)>
1N/A
1N/AThese macros arrange things to restore the value of integer variable
1N/AC<i> at the end of enclosing I<pseudo-block>.
1N/A
1N/A=item C<SAVESPTR(s)>
1N/A
1N/A=item C<SAVEPPTR(p)>
1N/A
1N/AThese macros arrange things to restore the value of pointers C<s> and
1N/AC<p>. C<s> must be a pointer of a type which survives conversion to
1N/AC<SV*> and back, C<p> should be able to survive conversion to C<char*>
1N/Aand back.
1N/A
1N/A=item C<SAVEFREESV(SV *sv)>
1N/A
1N/AThe refcount of C<sv> would be decremented at the end of
1N/AI<pseudo-block>. This is similar to C<sv_2mortal> in that it is also a
1N/Amechanism for doing a delayed C<SvREFCNT_dec>. However, while C<sv_2mortal>
1N/Aextends the lifetime of C<sv> until the beginning of the next statement,
1N/AC<SAVEFREESV> extends it until the end of the enclosing scope. These
1N/Alifetimes can be wildly different.
1N/A
1N/AAlso compare C<SAVEMORTALIZESV>.
1N/A
1N/A=item C<SAVEMORTALIZESV(SV *sv)>
1N/A
1N/AJust like C<SAVEFREESV>, but mortalizes C<sv> at the end of the current
1N/Ascope instead of decrementing its reference count. This usually has the
1N/Aeffect of keeping C<sv> alive until the statement that called the currently
1N/Alive scope has finished executing.
1N/A
1N/A=item C<SAVEFREEOP(OP *op)>
1N/A
1N/AThe C<OP *> is op_free()ed at the end of I<pseudo-block>.
1N/A
1N/A=item C<SAVEFREEPV(p)>
1N/A
1N/AThe chunk of memory which is pointed to by C<p> is Safefree()ed at the
1N/Aend of I<pseudo-block>.
1N/A
1N/A=item C<SAVECLEARSV(SV *sv)>
1N/A
1N/AClears a slot in the current scratchpad which corresponds to C<sv> at
1N/Athe end of I<pseudo-block>.
1N/A
1N/A=item C<SAVEDELETE(HV *hv, char *key, I32 length)>
1N/A
1N/AThe key C<key> of C<hv> is deleted at the end of I<pseudo-block>. The
1N/Astring pointed to by C<key> is Safefree()ed. If one has a I<key> in
1N/Ashort-lived storage, the corresponding string may be reallocated like
1N/Athis:
1N/A
1N/A SAVEDELETE(PL_defstash, savepv(tmpbuf), strlen(tmpbuf));
1N/A
1N/A=item C<SAVEDESTRUCTOR(DESTRUCTORFUNC_NOCONTEXT_t f, void *p)>
1N/A
1N/AAt the end of I<pseudo-block> the function C<f> is called with the
1N/Aonly argument C<p>.
1N/A
1N/A=item C<SAVEDESTRUCTOR_X(DESTRUCTORFUNC_t f, void *p)>
1N/A
1N/AAt the end of I<pseudo-block> the function C<f> is called with the
1N/Aimplicit context argument (if any), and C<p>.
1N/A
1N/A=item C<SAVESTACK_POS()>
1N/A
1N/AThe current offset on the Perl internal stack (cf. C<SP>) is restored
1N/Aat the end of I<pseudo-block>.
1N/A
1N/A=back
1N/A
1N/AThe following API list contains functions, thus one needs to
1N/Aprovide pointers to the modifiable data explicitly (either C pointers,
1N/Aor Perlish C<GV *>s). Where the above macros take C<int>, a similar
1N/Afunction takes C<int *>.
1N/A
1N/A=over 4
1N/A
1N/A=item C<SV* save_scalar(GV *gv)>
1N/A
1N/AEquivalent to Perl code C<local $gv>.
1N/A
1N/A=item C<AV* save_ary(GV *gv)>
1N/A
1N/A=item C<HV* save_hash(GV *gv)>
1N/A
1N/ASimilar to C<save_scalar>, but localize C<@gv> and C<%gv>.
1N/A
1N/A=item C<void save_item(SV *item)>
1N/A
1N/ADuplicates the current value of C<SV>, on the exit from the current
1N/AC<ENTER>/C<LEAVE> I<pseudo-block> will restore the value of C<SV>
1N/Ausing the stored value.
1N/A
1N/A=item C<void save_list(SV **sarg, I32 maxsarg)>
1N/A
1N/AA variant of C<save_item> which takes multiple arguments via an array
1N/AC<sarg> of C<SV*> of length C<maxsarg>.
1N/A
1N/A=item C<SV* save_svref(SV **sptr)>
1N/A
1N/ASimilar to C<save_scalar>, but will reinstate an C<SV *>.
1N/A
1N/A=item C<void save_aptr(AV **aptr)>
1N/A
1N/A=item C<void save_hptr(HV **hptr)>
1N/A
1N/ASimilar to C<save_svref>, but localize C<AV *> and C<HV *>.
1N/A
1N/A=back
1N/A
1N/AThe C<Alias> module implements localization of the basic types within the
1N/AI<caller's scope>. People who are interested in how to localize things in
1N/Athe containing scope should take a look there too.
1N/A
1N/A=head1 Subroutines
1N/A
1N/A=head2 XSUBs and the Argument Stack
1N/A
1N/AThe XSUB mechanism is a simple way for Perl programs to access C subroutines.
1N/AAn XSUB routine will have a stack that contains the arguments from the Perl
1N/Aprogram, and a way to map from the Perl data structures to a C equivalent.
1N/A
1N/AThe stack arguments are accessible through the C<ST(n)> macro, which returns
1N/Athe C<n>'th stack argument. Argument 0 is the first argument passed in the
1N/APerl subroutine call. These arguments are C<SV*>, and can be used anywhere
1N/Aan C<SV*> is used.
1N/A
1N/AMost of the time, output from the C routine can be handled through use of
1N/Athe RETVAL and OUTPUT directives. However, there are some cases where the
1N/Aargument stack is not already long enough to handle all the return values.
1N/AAn example is the POSIX tzname() call, which takes no arguments, but returns
1N/Atwo, the local time zone's standard and summer time abbreviations.
1N/A
1N/ATo handle this situation, the PPCODE directive is used and the stack is
1N/Aextended using the macro:
1N/A
1N/A EXTEND(SP, num);
1N/A
1N/Awhere C<SP> is the macro that represents the local copy of the stack pointer,
1N/Aand C<num> is the number of elements the stack should be extended by.
1N/A
1N/ANow that there is room on the stack, values can be pushed on it using C<PUSHs>
1N/Amacro. The pushed values will often need to be "mortal" (See
1N/AL</Reference Counts and Mortality>).
1N/A
1N/A PUSHs(sv_2mortal(newSViv(an_integer)))
1N/A PUSHs(sv_2mortal(newSVpv("Some String",0)))
1N/A PUSHs(sv_2mortal(newSVnv(3.141592)))
1N/A
1N/AAnd now the Perl program calling C<tzname>, the two values will be assigned
1N/Aas in:
1N/A
1N/A ($standard_abbrev, $summer_abbrev) = POSIX::tzname;
1N/A
1N/AAn alternate (and possibly simpler) method to pushing values on the stack is
1N/Ato use the macro:
1N/A
1N/A XPUSHs(SV*)
1N/A
1N/AThis macro automatically adjust the stack for you, if needed. Thus, you
1N/Ado not need to call C<EXTEND> to extend the stack.
1N/A
1N/ADespite their suggestions in earlier versions of this document the macros
1N/AC<PUSHi>, C<PUSHn> and C<PUSHp> are I<not> suited to XSUBs which return
1N/Amultiple results, see L</Putting a C value on Perl stack>.
1N/A
1N/AFor more information, consult L<perlxs> and L<perlxstut>.
1N/A
1N/A=head2 Calling Perl Routines from within C Programs
1N/A
1N/AThere are four routines that can be used to call a Perl subroutine from
1N/Awithin a C program. These four are:
1N/A
1N/A I32 call_sv(SV*, I32);
1N/A I32 call_pv(const char*, I32);
1N/A I32 call_method(const char*, I32);
1N/A I32 call_argv(const char*, I32, register char**);
1N/A
1N/AThe routine most often used is C<call_sv>. The C<SV*> argument
1N/Acontains either the name of the Perl subroutine to be called, or a
1N/Areference to the subroutine. The second argument consists of flags
1N/Athat control the context in which the subroutine is called, whether
1N/Aor not the subroutine is being passed arguments, how errors should be
1N/Atrapped, and how to treat return values.
1N/A
1N/AAll four routines return the number of arguments that the subroutine returned
1N/Aon the Perl stack.
1N/A
1N/AThese routines used to be called C<perl_call_sv>, etc., before Perl v5.6.0,
1N/Abut those names are now deprecated; macros of the same name are provided for
1N/Acompatibility.
1N/A
1N/AWhen using any of these routines (except C<call_argv>), the programmer
1N/Amust manipulate the Perl stack. These include the following macros and
1N/Afunctions:
1N/A
1N/A dSP
1N/A SP
1N/A PUSHMARK()
1N/A PUTBACK
1N/A SPAGAIN
1N/A ENTER
1N/A SAVETMPS
1N/A FREETMPS
1N/A LEAVE
1N/A XPUSH*()
1N/A POP*()
1N/A
1N/AFor a detailed description of calling conventions from C to Perl,
1N/Aconsult L<perlcall>.
1N/A
1N/A=head2 Memory Allocation
1N/A
1N/A=head3 Allocation
1N/A
1N/AAll memory meant to be used with the Perl API functions should be manipulated
1N/Ausing the macros described in this section. The macros provide the necessary
1N/Atransparency between differences in the actual malloc implementation that is
1N/Aused within perl.
1N/A
1N/AIt is suggested that you enable the version of malloc that is distributed
1N/Awith Perl. It keeps pools of various sizes of unallocated memory in
1N/Aorder to satisfy allocation requests more quickly. However, on some
1N/Aplatforms, it may cause spurious malloc or free errors.
1N/A
1N/AThe following three macros are used to initially allocate memory :
1N/A
1N/A New(x, pointer, number, type);
1N/A Newc(x, pointer, number, type, cast);
1N/A Newz(x, pointer, number, type);
1N/A
1N/AThe first argument C<x> was a "magic cookie" that was used to keep track
1N/Aof who called the macro, to help when debugging memory problems. However,
1N/Athe current code makes no use of this feature (most Perl developers now
1N/Ause run-time memory checkers), so this argument can be any number.
1N/A
1N/AThe second argument C<pointer> should be the name of a variable that will
1N/Apoint to the newly allocated memory.
1N/A
1N/AThe third and fourth arguments C<number> and C<type> specify how many of
1N/Athe specified type of data structure should be allocated. The argument
1N/AC<type> is passed to C<sizeof>. The final argument to C<Newc>, C<cast>,
1N/Ashould be used if the C<pointer> argument is different from the C<type>
1N/Aargument.
1N/A
1N/AUnlike the C<New> and C<Newc> macros, the C<Newz> macro calls C<memzero>
1N/Ato zero out all the newly allocated memory.
1N/A
1N/A=head3 Reallocation
1N/A
1N/A Renew(pointer, number, type);
1N/A Renewc(pointer, number, type, cast);
1N/A Safefree(pointer)
1N/A
1N/AThese three macros are used to change a memory buffer size or to free a
1N/Apiece of memory no longer needed. The arguments to C<Renew> and C<Renewc>
1N/Amatch those of C<New> and C<Newc> with the exception of not needing the
1N/A"magic cookie" argument.
1N/A
1N/A=head3 Moving
1N/A
1N/A Move(source, dest, number, type);
1N/A Copy(source, dest, number, type);
1N/A Zero(dest, number, type);
1N/A
1N/AThese three macros are used to move, copy, or zero out previously allocated
1N/Amemory. The C<source> and C<dest> arguments point to the source and
1N/Adestination starting points. Perl will move, copy, or zero out C<number>
1N/Ainstances of the size of the C<type> data structure (using the C<sizeof>
1N/Afunction).
1N/A
1N/A=head2 PerlIO
1N/A
1N/AThe most recent development releases of Perl has been experimenting with
1N/Aremoving Perl's dependency on the "normal" standard I/O suite and allowing
1N/Aother stdio implementations to be used. This involves creating a new
1N/Aabstraction layer that then calls whichever implementation of stdio Perl
1N/Awas compiled with. All XSUBs should now use the functions in the PerlIO
1N/Aabstraction layer and not make any assumptions about what kind of stdio
1N/Ais being used.
1N/A
1N/AFor a complete description of the PerlIO abstraction, consult L<perlapio>.
1N/A
1N/A=head2 Putting a C value on Perl stack
1N/A
1N/AA lot of opcodes (this is an elementary operation in the internal perl
1N/Astack machine) put an SV* on the stack. However, as an optimization
1N/Athe corresponding SV is (usually) not recreated each time. The opcodes
1N/Areuse specially assigned SVs (I<target>s) which are (as a corollary)
1N/Anot constantly freed/created.
1N/A
1N/AEach of the targets is created only once (but see
1N/AL<Scratchpads and recursion> below), and when an opcode needs to put
1N/Aan integer, a double, or a string on stack, it just sets the
1N/Acorresponding parts of its I<target> and puts the I<target> on stack.
1N/A
1N/AThe macro to put this target on stack is C<PUSHTARG>, and it is
1N/Adirectly used in some opcodes, as well as indirectly in zillions of
1N/Aothers, which use it via C<(X)PUSH[pni]>.
1N/A
1N/ABecause the target is reused, you must be careful when pushing multiple
1N/Avalues on the stack. The following code will not do what you think:
1N/A
1N/A XPUSHi(10);
1N/A XPUSHi(20);
1N/A
1N/AThis translates as "set C<TARG> to 10, push a pointer to C<TARG> onto
1N/Athe stack; set C<TARG> to 20, push a pointer to C<TARG> onto the stack".
1N/AAt the end of the operation, the stack does not contain the values 10
1N/Aand 20, but actually contains two pointers to C<TARG>, which we have set
1N/Ato 20. If you need to push multiple different values, use C<XPUSHs>,
1N/Awhich bypasses C<TARG>.
1N/A
1N/AOn a related note, if you do use C<(X)PUSH[npi]>, then you're going to
1N/Aneed a C<dTARG> in your variable declarations so that the C<*PUSH*>
1N/Amacros can make use of the local variable C<TARG>.
1N/A
1N/A=head2 Scratchpads
1N/A
1N/AThe question remains on when the SVs which are I<target>s for opcodes
1N/Aare created. The answer is that they are created when the current unit --
1N/Aa subroutine or a file (for opcodes for statements outside of
1N/Asubroutines) -- is compiled. During this time a special anonymous Perl
1N/Aarray is created, which is called a scratchpad for the current
1N/Aunit.
1N/A
1N/AA scratchpad keeps SVs which are lexicals for the current unit and are
1N/Atargets for opcodes. One can deduce that an SV lives on a scratchpad
1N/Aby looking on its flags: lexicals have C<SVs_PADMY> set, and
1N/AI<target>s have C<SVs_PADTMP> set.
1N/A
1N/AThe correspondence between OPs and I<target>s is not 1-to-1. Different
1N/AOPs in the compile tree of the unit can use the same target, if this
1N/Awould not conflict with the expected life of the temporary.
1N/A
1N/A=head2 Scratchpads and recursion
1N/A
1N/AIn fact it is not 100% true that a compiled unit contains a pointer to
1N/Athe scratchpad AV. In fact it contains a pointer to an AV of
1N/A(initially) one element, and this element is the scratchpad AV. Why do
1N/Awe need an extra level of indirection?
1N/A
1N/AThe answer is B<recursion>, and maybe B<threads>. Both
1N/Athese can create several execution pointers going into the same
1N/Asubroutine. For the subroutine-child not write over the temporaries
1N/Afor the subroutine-parent (lifespan of which covers the call to the
1N/Achild), the parent and the child should have different
1N/Ascratchpads. (I<And> the lexicals should be separate anyway!)
1N/A
1N/ASo each subroutine is born with an array of scratchpads (of length 1).
1N/AOn each entry to the subroutine it is checked that the current
1N/Adepth of the recursion is not more than the length of this array, and
1N/Aif it is, new scratchpad is created and pushed into the array.
1N/A
1N/AThe I<target>s on this scratchpad are C<undef>s, but they are already
1N/Amarked with correct flags.
1N/A
1N/A=head1 Compiled code
1N/A
1N/A=head2 Code tree
1N/A
1N/AHere we describe the internal form your code is converted to by
1N/APerl. Start with a simple example:
1N/A
1N/A $a = $b + $c;
1N/A
1N/AThis is converted to a tree similar to this one:
1N/A
1N/A assign-to
1N/A / \
1N/A + $a
1N/A / \
1N/A $b $c
1N/A
1N/A(but slightly more complicated). This tree reflects the way Perl
1N/Aparsed your code, but has nothing to do with the execution order.
1N/AThere is an additional "thread" going through the nodes of the tree
1N/Awhich shows the order of execution of the nodes. In our simplified
1N/Aexample above it looks like:
1N/A
1N/A $b ---> $c ---> + ---> $a ---> assign-to
1N/A
1N/ABut with the actual compile tree for C<$a = $b + $c> it is different:
1N/Asome nodes I<optimized away>. As a corollary, though the actual tree
1N/Acontains more nodes than our simplified example, the execution order
1N/Ais the same as in our example.
1N/A
1N/A=head2 Examining the tree
1N/A
1N/AIf you have your perl compiled for debugging (usually done with
1N/AC<-DDEBUGGING> on the C<Configure> command line), you may examine the
1N/Acompiled tree by specifying C<-Dx> on the Perl command line. The
1N/Aoutput takes several lines per node, and for C<$b+$c> it looks like
1N/Athis:
1N/A
1N/A 5 TYPE = add ===> 6
1N/A TARG = 1
1N/A FLAGS = (SCALAR,KIDS)
1N/A {
1N/A TYPE = null ===> (4)
1N/A (was rv2sv)
1N/A FLAGS = (SCALAR,KIDS)
1N/A {
1N/A 3 TYPE = gvsv ===> 4
1N/A FLAGS = (SCALAR)
1N/A GV = main::b
1N/A }
1N/A }
1N/A {
1N/A TYPE = null ===> (5)
1N/A (was rv2sv)
1N/A FLAGS = (SCALAR,KIDS)
1N/A {
1N/A 4 TYPE = gvsv ===> 5
1N/A FLAGS = (SCALAR)
1N/A GV = main::c
1N/A }
1N/A }
1N/A
1N/AThis tree has 5 nodes (one per C<TYPE> specifier), only 3 of them are
1N/Anot optimized away (one per number in the left column). The immediate
1N/Achildren of the given node correspond to C<{}> pairs on the same level
1N/Aof indentation, thus this listing corresponds to the tree:
1N/A
1N/A add
1N/A / \
1N/A null null
1N/A | |
1N/A gvsv gvsv
1N/A
1N/AThe execution order is indicated by C<===E<gt>> marks, thus it is C<3
1N/A4 5 6> (node C<6> is not included into above listing), i.e.,
1N/AC<gvsv gvsv add whatever>.
1N/A
1N/AEach of these nodes represents an op, a fundamental operation inside the
1N/APerl core. The code which implements each operation can be found in the
1N/AF<pp*.c> files; the function which implements the op with type C<gvsv>
1N/Ais C<pp_gvsv>, and so on. As the tree above shows, different ops have
1N/Adifferent numbers of children: C<add> is a binary operator, as one would
1N/Aexpect, and so has two children. To accommodate the various different
1N/Anumbers of children, there are various types of op data structure, and
1N/Athey link together in different ways.
1N/A
1N/AThe simplest type of op structure is C<OP>: this has no children. Unary
1N/Aoperators, C<UNOP>s, have one child, and this is pointed to by the
1N/AC<op_first> field. Binary operators (C<BINOP>s) have not only an
1N/AC<op_first> field but also an C<op_last> field. The most complex type of
1N/Aop is a C<LISTOP>, which has any number of children. In this case, the
1N/Afirst child is pointed to by C<op_first> and the last child by
1N/AC<op_last>. The children in between can be found by iteratively
1N/Afollowing the C<op_sibling> pointer from the first child to the last.
1N/A
1N/AThere are also two other op types: a C<PMOP> holds a regular expression,
1N/Aand has no children, and a C<LOOP> may or may not have children. If the
1N/AC<op_children> field is non-zero, it behaves like a C<LISTOP>. To
1N/Acomplicate matters, if a C<UNOP> is actually a C<null> op after
1N/Aoptimization (see L</Compile pass 2: context propagation>) it will still
1N/Ahave children in accordance with its former type.
1N/A
1N/AAnother way to examine the tree is to use a compiler back-end module, such
1N/Aas L<B::Concise>.
1N/A
1N/A=head2 Compile pass 1: check routines
1N/A
1N/AThe tree is created by the compiler while I<yacc> code feeds it
1N/Athe constructions it recognizes. Since I<yacc> works bottom-up, so does
1N/Athe first pass of perl compilation.
1N/A
1N/AWhat makes this pass interesting for perl developers is that some
1N/Aoptimization may be performed on this pass. This is optimization by
1N/Aso-called "check routines". The correspondence between node names
1N/Aand corresponding check routines is described in F<opcode.pl> (do not
1N/Aforget to run C<make regen_headers> if you modify this file).
1N/A
1N/AA check routine is called when the node is fully constructed except
1N/Afor the execution-order thread. Since at this time there are no
1N/Aback-links to the currently constructed node, one can do most any
1N/Aoperation to the top-level node, including freeing it and/or creating
1N/Anew nodes above/below it.
1N/A
1N/AThe check routine returns the node which should be inserted into the
1N/Atree (if the top-level node was not modified, check routine returns
1N/Aits argument).
1N/A
1N/ABy convention, check routines have names C<ck_*>. They are usually
1N/Acalled from C<new*OP> subroutines (or C<convert>) (which in turn are
1N/Acalled from F<perly.y>).
1N/A
1N/A=head2 Compile pass 1a: constant folding
1N/A
1N/AImmediately after the check routine is called the returned node is
1N/Achecked for being compile-time executable. If it is (the value is
1N/Ajudged to be constant) it is immediately executed, and a I<constant>
1N/Anode with the "return value" of the corresponding subtree is
1N/Asubstituted instead. The subtree is deleted.
1N/A
1N/AIf constant folding was not performed, the execution-order thread is
1N/Acreated.
1N/A
1N/A=head2 Compile pass 2: context propagation
1N/A
1N/AWhen a context for a part of compile tree is known, it is propagated
1N/Adown through the tree. At this time the context can have 5 values
1N/A(instead of 2 for runtime context): void, boolean, scalar, list, and
1N/Alvalue. In contrast with the pass 1 this pass is processed from top
1N/Ato bottom: a node's context determines the context for its children.
1N/A
1N/AAdditional context-dependent optimizations are performed at this time.
1N/ASince at this moment the compile tree contains back-references (via
1N/A"thread" pointers), nodes cannot be free()d now. To allow
1N/Aoptimized-away nodes at this stage, such nodes are null()ified instead
1N/Aof free()ing (i.e. their type is changed to OP_NULL).
1N/A
1N/A=head2 Compile pass 3: peephole optimization
1N/A
1N/AAfter the compile tree for a subroutine (or for an C<eval> or a file)
1N/Ais created, an additional pass over the code is performed. This pass
1N/Ais neither top-down or bottom-up, but in the execution order (with
1N/Aadditional complications for conditionals). These optimizations are
1N/Adone in the subroutine peep(). Optimizations performed at this stage
1N/Aare subject to the same restrictions as in the pass 2.
1N/A
1N/A=head2 Pluggable runops
1N/A
1N/AThe compile tree is executed in a runops function. There are two runops
1N/Afunctions, in F<run.c> and in F<dump.c>. C<Perl_runops_debug> is used
1N/Awith DEBUGGING and C<Perl_runops_standard> is used otherwise. For fine
1N/Acontrol over the execution of the compile tree it is possible to provide
1N/Ayour own runops function.
1N/A
1N/AIt's probably best to copy one of the existing runops functions and
1N/Achange it to suit your needs. Then, in the BOOT section of your XS
1N/Afile, add the line:
1N/A
1N/A PL_runops = my_runops;
1N/A
1N/AThis function should be as efficient as possible to keep your programs
1N/Arunning as fast as possible.
1N/A
1N/A=head1 Examining internal data structures with the C<dump> functions
1N/A
1N/ATo aid debugging, the source file F<dump.c> contains a number of
1N/Afunctions which produce formatted output of internal data structures.
1N/A
1N/AThe most commonly used of these functions is C<Perl_sv_dump>; it's used
1N/Afor dumping SVs, AVs, HVs, and CVs. The C<Devel::Peek> module calls
1N/AC<sv_dump> to produce debugging output from Perl-space, so users of that
1N/Amodule should already be familiar with its format.
1N/A
1N/AC<Perl_op_dump> can be used to dump an C<OP> structure or any of its
1N/Aderivatives, and produces output similar to C<perl -Dx>; in fact,
1N/AC<Perl_dump_eval> will dump the main root of the code being evaluated,
1N/Aexactly like C<-Dx>.
1N/A
1N/AOther useful functions are C<Perl_dump_sub>, which turns a C<GV> into an
1N/Aop tree, C<Perl_dump_packsubs> which calls C<Perl_dump_sub> on all the
1N/Asubroutines in a package like so: (Thankfully, these are all xsubs, so
1N/Athere is no op tree)
1N/A
1N/A (gdb) print Perl_dump_packsubs(PL_defstash)
1N/A
1N/A SUB attributes::bootstrap = (xsub 0x811fedc 0)
1N/A
1N/A SUB UNIVERSAL::can = (xsub 0x811f50c 0)
1N/A
1N/A SUB UNIVERSAL::isa = (xsub 0x811f304 0)
1N/A
1N/A SUB UNIVERSAL::VERSION = (xsub 0x811f7ac 0)
1N/A
1N/A SUB DynaLoader::boot_DynaLoader = (xsub 0x805b188 0)
1N/A
1N/Aand C<Perl_dump_all>, which dumps all the subroutines in the stash and
1N/Athe op tree of the main root.
1N/A
1N/A=head1 How multiple interpreters and concurrency are supported
1N/A
1N/A=head2 Background and PERL_IMPLICIT_CONTEXT
1N/A
1N/AThe Perl interpreter can be regarded as a closed box: it has an API
1N/Afor feeding it code or otherwise making it do things, but it also has
1N/Afunctions for its own use. This smells a lot like an object, and
1N/Athere are ways for you to build Perl so that you can have multiple
1N/Ainterpreters, with one interpreter represented either as a C structure,
1N/Aor inside a thread-specific structure. These structures contain all
1N/Athe context, the state of that interpreter.
1N/A
1N/ATwo macros control the major Perl build flavors: MULTIPLICITY and
1N/AUSE_5005THREADS. The MULTIPLICITY build has a C structure
1N/Athat packages all the interpreter state, and there is a similar thread-specific
1N/Adata structure under USE_5005THREADS. In both cases,
1N/APERL_IMPLICIT_CONTEXT is also normally defined, and enables the
1N/Asupport for passing in a "hidden" first argument that represents all three
1N/Adata structures.
1N/A
1N/AAll this obviously requires a way for the Perl internal functions to be
1N/Aeither subroutines taking some kind of structure as the first
1N/Aargument, or subroutines taking nothing as the first argument. To
1N/Aenable these two very different ways of building the interpreter,
1N/Athe Perl source (as it does in so many other situations) makes heavy
1N/Ause of macros and subroutine naming conventions.
1N/A
1N/AFirst problem: deciding which functions will be public API functions and
1N/Awhich will be private. All functions whose names begin C<S_> are private
1N/A(think "S" for "secret" or "static"). All other functions begin with
1N/A"Perl_", but just because a function begins with "Perl_" does not mean it is
1N/Apart of the API. (See L</Internal Functions>.) The easiest way to be B<sure> a
1N/Afunction is part of the API is to find its entry in L<perlapi>.
1N/AIf it exists in L<perlapi>, it's part of the API. If it doesn't, and you
1N/Athink it should be (i.e., you need it for your extension), send mail via
1N/AL<perlbug> explaining why you think it should be.
1N/A
1N/ASecond problem: there must be a syntax so that the same subroutine
1N/Adeclarations and calls can pass a structure as their first argument,
1N/Aor pass nothing. To solve this, the subroutines are named and
1N/Adeclared in a particular way. Here's a typical start of a static
1N/Afunction used within the Perl guts:
1N/A
1N/A STATIC void
1N/A S_incline(pTHX_ char *s)
1N/A
1N/ASTATIC becomes "static" in C, and may be #define'd to nothing in some
1N/Aconfigurations in future.
1N/A
1N/AA public function (i.e. part of the internal API, but not necessarily
1N/Asanctioned for use in extensions) begins like this:
1N/A
1N/A void
1N/A Perl_sv_setiv(pTHX_ SV* dsv, IV num)
1N/A
1N/AC<pTHX_> is one of a number of macros (in perl.h) that hide the
1N/Adetails of the interpreter's context. THX stands for "thread", "this",
1N/Aor "thingy", as the case may be. (And no, George Lucas is not involved. :-)
1N/AThe first character could be 'p' for a B<p>rototype, 'a' for B<a>rgument,
1N/Aor 'd' for B<d>eclaration, so we have C<pTHX>, C<aTHX> and C<dTHX>, and
1N/Atheir variants.
1N/A
1N/AWhen Perl is built without options that set PERL_IMPLICIT_CONTEXT, there is no
1N/Afirst argument containing the interpreter's context. The trailing underscore
1N/Ain the pTHX_ macro indicates that the macro expansion needs a comma
1N/Aafter the context argument because other arguments follow it. If
1N/APERL_IMPLICIT_CONTEXT is not defined, pTHX_ will be ignored, and the
1N/Asubroutine is not prototyped to take the extra argument. The form of the
1N/Amacro without the trailing underscore is used when there are no additional
1N/Aexplicit arguments.
1N/A
1N/AWhen a core function calls another, it must pass the context. This
1N/Ais normally hidden via macros. Consider C<sv_setiv>. It expands into
1N/Asomething like this:
1N/A
1N/A #ifdef PERL_IMPLICIT_CONTEXT
1N/A #define sv_setiv(a,b) Perl_sv_setiv(aTHX_ a, b)
1N/A /* can't do this for vararg functions, see below */
1N/A #else
1N/A #define sv_setiv Perl_sv_setiv
1N/A #endif
1N/A
1N/AThis works well, and means that XS authors can gleefully write:
1N/A
1N/A sv_setiv(foo, bar);
1N/A
1N/Aand still have it work under all the modes Perl could have been
1N/Acompiled with.
1N/A
1N/AThis doesn't work so cleanly for varargs functions, though, as macros
1N/Aimply that the number of arguments is known in advance. Instead we
1N/Aeither need to spell them out fully, passing C<aTHX_> as the first
1N/Aargument (the Perl core tends to do this with functions like
1N/APerl_warner), or use a context-free version.
1N/A
1N/AThe context-free version of Perl_warner is called
1N/APerl_warner_nocontext, and does not take the extra argument. Instead
1N/Ait does dTHX; to get the context from thread-local storage. We
1N/AC<#define warner Perl_warner_nocontext> so that extensions get source
1N/Acompatibility at the expense of performance. (Passing an arg is
1N/Acheaper than grabbing it from thread-local storage.)
1N/A
1N/AYou can ignore [pad]THXx when browsing the Perl headers/sources.
1N/AThose are strictly for use within the core. Extensions and embedders
1N/Aneed only be aware of [pad]THX.
1N/A
1N/A=head2 So what happened to dTHR?
1N/A
1N/AC<dTHR> was introduced in perl 5.005 to support the older thread model.
1N/AThe older thread model now uses the C<THX> mechanism to pass context
1N/Apointers around, so C<dTHR> is not useful any more. Perl 5.6.0 and
1N/Alater still have it for backward source compatibility, but it is defined
1N/Ato be a no-op.
1N/A
1N/A=head2 How do I use all this in extensions?
1N/A
1N/AWhen Perl is built with PERL_IMPLICIT_CONTEXT, extensions that call
1N/Aany functions in the Perl API will need to pass the initial context
1N/Aargument somehow. The kicker is that you will need to write it in
1N/Asuch a way that the extension still compiles when Perl hasn't been
1N/Abuilt with PERL_IMPLICIT_CONTEXT enabled.
1N/A
1N/AThere are three ways to do this. First, the easy but inefficient way,
1N/Awhich is also the default, in order to maintain source compatibility
1N/Awith extensions: whenever XSUB.h is #included, it redefines the aTHX
1N/Aand aTHX_ macros to call a function that will return the context.
1N/AThus, something like:
1N/A
1N/A sv_setiv(sv, num);
1N/A
1N/Ain your extension will translate to this when PERL_IMPLICIT_CONTEXT is
1N/Ain effect:
1N/A
1N/A Perl_sv_setiv(Perl_get_context(), sv, num);
1N/A
1N/Aor to this otherwise:
1N/A
1N/A Perl_sv_setiv(sv, num);
1N/A
1N/AYou have to do nothing new in your extension to get this; since
1N/Athe Perl library provides Perl_get_context(), it will all just
1N/Awork.
1N/A
1N/AThe second, more efficient way is to use the following template for
1N/Ayour Foo.xs:
1N/A
1N/A #define PERL_NO_GET_CONTEXT /* we want efficiency */
1N/A #include "EXTERN.h"
1N/A #include "perl.h"
1N/A #include "XSUB.h"
1N/A
1N/A static my_private_function(int arg1, int arg2);
1N/A
1N/A static SV *
1N/A my_private_function(int arg1, int arg2)
1N/A {
1N/A dTHX; /* fetch context */
1N/A ... call many Perl API functions ...
1N/A }
1N/A
1N/A [... etc ...]
1N/A
1N/A MODULE = Foo PACKAGE = Foo
1N/A
1N/A /* typical XSUB */
1N/A
1N/A void
1N/A my_xsub(arg)
1N/A int arg
1N/A CODE:
1N/A my_private_function(arg, 10);
1N/A
1N/ANote that the only two changes from the normal way of writing an
1N/Aextension is the addition of a C<#define PERL_NO_GET_CONTEXT> before
1N/Aincluding the Perl headers, followed by a C<dTHX;> declaration at
1N/Athe start of every function that will call the Perl API. (You'll
1N/Aknow which functions need this, because the C compiler will complain
1N/Athat there's an undeclared identifier in those functions.) No changes
1N/Aare needed for the XSUBs themselves, because the XS() macro is
1N/Acorrectly defined to pass in the implicit context if needed.
1N/A
1N/AThe third, even more efficient way is to ape how it is done within
1N/Athe Perl guts:
1N/A
1N/A
1N/A #define PERL_NO_GET_CONTEXT /* we want efficiency */
1N/A #include "EXTERN.h"
1N/A #include "perl.h"
1N/A #include "XSUB.h"
1N/A
1N/A /* pTHX_ only needed for functions that call Perl API */
1N/A static my_private_function(pTHX_ int arg1, int arg2);
1N/A
1N/A static SV *
1N/A my_private_function(pTHX_ int arg1, int arg2)
1N/A {
1N/A /* dTHX; not needed here, because THX is an argument */
1N/A ... call Perl API functions ...
1N/A }
1N/A
1N/A [... etc ...]
1N/A
1N/A MODULE = Foo PACKAGE = Foo
1N/A
1N/A /* typical XSUB */
1N/A
1N/A void
1N/A my_xsub(arg)
1N/A int arg
1N/A CODE:
1N/A my_private_function(aTHX_ arg, 10);
1N/A
1N/AThis implementation never has to fetch the context using a function
1N/Acall, since it is always passed as an extra argument. Depending on
1N/Ayour needs for simplicity or efficiency, you may mix the previous
1N/Atwo approaches freely.
1N/A
1N/ANever add a comma after C<pTHX> yourself--always use the form of the
1N/Amacro with the underscore for functions that take explicit arguments,
1N/Aor the form without the argument for functions with no explicit arguments.
1N/A
1N/A=head2 Should I do anything special if I call perl from multiple threads?
1N/A
1N/AIf you create interpreters in one thread and then proceed to call them in
1N/Aanother, you need to make sure perl's own Thread Local Storage (TLS) slot is
1N/Ainitialized correctly in each of those threads.
1N/A
1N/AThe C<perl_alloc> and C<perl_clone> API functions will automatically set
1N/Athe TLS slot to the interpreter they created, so that there is no need to do
1N/Aanything special if the interpreter is always accessed in the same thread that
1N/Acreated it, and that thread did not create or call any other interpreters
1N/Aafterwards. If that is not the case, you have to set the TLS slot of the
1N/Athread before calling any functions in the Perl API on that particular
1N/Ainterpreter. This is done by calling the C<PERL_SET_CONTEXT> macro in that
1N/Athread as the first thing you do:
1N/A
1N/A /* do this before doing anything else with some_perl */
1N/A PERL_SET_CONTEXT(some_perl);
1N/A
1N/A ... other Perl API calls on some_perl go here ...
1N/A
1N/A=head2 Future Plans and PERL_IMPLICIT_SYS
1N/A
1N/AJust as PERL_IMPLICIT_CONTEXT provides a way to bundle up everything
1N/Athat the interpreter knows about itself and pass it around, so too are
1N/Athere plans to allow the interpreter to bundle up everything it knows
1N/Aabout the environment it's running on. This is enabled with the
1N/APERL_IMPLICIT_SYS macro. Currently it only works with USE_ITHREADS
1N/Aand USE_5005THREADS on Windows (see inside iperlsys.h).
1N/A
1N/AThis allows the ability to provide an extra pointer (called the "host"
1N/Aenvironment) for all the system calls. This makes it possible for
1N/Aall the system stuff to maintain their own state, broken down into
1N/Aseven C structures. These are thin wrappers around the usual system
1N/Acalls (see win32/perllib.c) for the default perl executable, but for a
1N/Amore ambitious host (like the one that would do fork() emulation) all
1N/Athe extra work needed to pretend that different interpreters are
1N/Aactually different "processes", would be done here.
1N/A
1N/AThe Perl engine/interpreter and the host are orthogonal entities.
1N/AThere could be one or more interpreters in a process, and one or
1N/Amore "hosts", with free association between them.
1N/A
1N/A=head1 Internal Functions
1N/A
1N/AAll of Perl's internal functions which will be exposed to the outside
1N/Aworld are prefixed by C<Perl_> so that they will not conflict with XS
1N/Afunctions or functions used in a program in which Perl is embedded.
1N/ASimilarly, all global variables begin with C<PL_>. (By convention,
1N/Astatic functions start with C<S_>.)
1N/A
1N/AInside the Perl core, you can get at the functions either with or
1N/Awithout the C<Perl_> prefix, thanks to a bunch of defines that live in
1N/AF<embed.h>. This header file is generated automatically from
1N/AF<embed.pl> and F<embed.fnc>. F<embed.pl> also creates the prototyping
1N/Aheader files for the internal functions, generates the documentation
1N/Aand a lot of other bits and pieces. It's important that when you add
1N/Aa new function to the core or change an existing one, you change the
1N/Adata in the table in F<embed.fnc> as well. Here's a sample entry from
1N/Athat table:
1N/A
1N/A Apd |SV** |av_fetch |AV* ar|I32 key|I32 lval
1N/A
1N/AThe second column is the return type, the third column the name. Columns
1N/Aafter that are the arguments. The first column is a set of flags:
1N/A
1N/A=over 3
1N/A
1N/A=item A
1N/A
1N/AThis function is a part of the public API.
1N/A
1N/A=item p
1N/A
1N/AThis function has a C<Perl_> prefix; ie, it is defined as C<Perl_av_fetch>
1N/A
1N/A=item d
1N/A
1N/AThis function has documentation using the C<apidoc> feature which we'll
1N/Alook at in a second.
1N/A
1N/A=back
1N/A
1N/AOther available flags are:
1N/A
1N/A=over 3
1N/A
1N/A=item s
1N/A
1N/AThis is a static function and is defined as C<S_whatever>, and usually
1N/Acalled within the sources as C<whatever(...)>.
1N/A
1N/A=item n
1N/A
1N/AThis does not use C<aTHX_> and C<pTHX> to pass interpreter context. (See
1N/AL<perlguts/Background and PERL_IMPLICIT_CONTEXT>.)
1N/A
1N/A=item r
1N/A
1N/AThis function never returns; C<croak>, C<exit> and friends.
1N/A
1N/A=item f
1N/A
1N/AThis function takes a variable number of arguments, C<printf> style.
1N/AThe argument list should end with C<...>, like this:
1N/A
1N/A Afprd |void |croak |const char* pat|...
1N/A
1N/A=item M
1N/A
1N/AThis function is part of the experimental development API, and may change
1N/Aor disappear without notice.
1N/A
1N/A=item o
1N/A
1N/AThis function should not have a compatibility macro to define, say,
1N/AC<Perl_parse> to C<parse>. It must be called as C<Perl_parse>.
1N/A
1N/A=item x
1N/A
1N/AThis function isn't exported out of the Perl core.
1N/A
1N/A=item m
1N/A
1N/AThis is implemented as a macro.
1N/A
1N/A=item X
1N/A
1N/AThis function is explicitly exported.
1N/A
1N/A=item E
1N/A
1N/AThis function is visible to extensions included in the Perl core.
1N/A
1N/A=item b
1N/A
1N/ABinary backward compatibility; this function is a macro but also has
1N/Aa C<Perl_> implementation (which is exported).
1N/A
1N/A=back
1N/A
1N/AIf you edit F<embed.pl> or F<embed.fnc>, you will need to run
1N/AC<make regen_headers> to force a rebuild of F<embed.h> and other
1N/Aauto-generated files.
1N/A
1N/A=head2 Formatted Printing of IVs, UVs, and NVs
1N/A
1N/AIf you are printing IVs, UVs, or NVS instead of the stdio(3) style
1N/Aformatting codes like C<%d>, C<%ld>, C<%f>, you should use the
1N/Afollowing macros for portability
1N/A
1N/A IVdf IV in decimal
1N/A UVuf UV in decimal
1N/A UVof UV in octal
1N/A UVxf UV in hexadecimal
1N/A NVef NV %e-like
1N/A NVff NV %f-like
1N/A NVgf NV %g-like
1N/A
1N/AThese will take care of 64-bit integers and long doubles.
1N/AFor example:
1N/A
1N/A printf("IV is %"IVdf"\n", iv);
1N/A
1N/AThe IVdf will expand to whatever is the correct format for the IVs.
1N/A
1N/AIf you are printing addresses of pointers, use UVxf combined
1N/Awith PTR2UV(), do not use %lx or %p.
1N/A
1N/A=head2 Pointer-To-Integer and Integer-To-Pointer
1N/A
1N/ABecause pointer size does not necessarily equal integer size,
1N/Ause the follow macros to do it right.
1N/A
1N/A PTR2UV(pointer)
1N/A PTR2IV(pointer)
1N/A PTR2NV(pointer)
1N/A INT2PTR(pointertotype, integer)
1N/A
1N/AFor example:
1N/A
1N/A IV iv = ...;
1N/A SV *sv = INT2PTR(SV*, iv);
1N/A
1N/Aand
1N/A
1N/A AV *av = ...;
1N/A UV uv = PTR2UV(av);
1N/A
1N/A=head2 Source Documentation
1N/A
1N/AThere's an effort going on to document the internal functions and
1N/Aautomatically produce reference manuals from them - L<perlapi> is one
1N/Asuch manual which details all the functions which are available to XS
1N/Awriters. L<perlintern> is the autogenerated manual for the functions
1N/Awhich are not part of the API and are supposedly for internal use only.
1N/A
1N/ASource documentation is created by putting POD comments into the C
1N/Asource, like this:
1N/A
1N/A /*
1N/A =for apidoc sv_setiv
1N/A
1N/A Copies an integer into the given SV. Does not handle 'set' magic. See
1N/A C<sv_setiv_mg>.
1N/A
1N/A =cut
1N/A */
1N/A
1N/APlease try and supply some documentation if you add functions to the
1N/APerl core.
1N/A
1N/A=head1 Unicode Support
1N/A
1N/APerl 5.6.0 introduced Unicode support. It's important for porters and XS
1N/Awriters to understand this support and make sure that the code they
1N/Awrite does not corrupt Unicode data.
1N/A
1N/A=head2 What B<is> Unicode, anyway?
1N/A
1N/AIn the olden, less enlightened times, we all used to use ASCII. Most of
1N/Aus did, anyway. The big problem with ASCII is that it's American. Well,
1N/Ano, that's not actually the problem; the problem is that it's not
1N/Aparticularly useful for people who don't use the Roman alphabet. What
1N/Aused to happen was that particular languages would stick their own
1N/Aalphabet in the upper range of the sequence, between 128 and 255. Of
1N/Acourse, we then ended up with plenty of variants that weren't quite
1N/AASCII, and the whole point of it being a standard was lost.
1N/A
1N/AWorse still, if you've got a language like Chinese or
1N/AJapanese that has hundreds or thousands of characters, then you really
1N/Acan't fit them into a mere 256, so they had to forget about ASCII
1N/Aaltogether, and build their own systems using pairs of numbers to refer
1N/Ato one character.
1N/A
1N/ATo fix this, some people formed Unicode, Inc. and
1N/Aproduced a new character set containing all the characters you can
1N/Apossibly think of and more. There are several ways of representing these
1N/Acharacters, and the one Perl uses is called UTF-8. UTF-8 uses
1N/Aa variable number of bytes to represent a character, instead of just
1N/Aone. You can learn more about Unicode at http://www.unicode.org/
1N/A
1N/A=head2 How can I recognise a UTF-8 string?
1N/A
1N/AYou can't. This is because UTF-8 data is stored in bytes just like
1N/Anon-UTF-8 data. The Unicode character 200, (C<0xC8> for you hex types)
1N/Acapital E with a grave accent, is represented by the two bytes
1N/AC<v196.172>. Unfortunately, the non-Unicode string C<chr(196).chr(172)>
1N/Ahas that byte sequence as well. So you can't tell just by looking - this
1N/Ais what makes Unicode input an interesting problem.
1N/A
1N/AThe API function C<is_utf8_string> can help; it'll tell you if a string
1N/Acontains only valid UTF-8 characters. However, it can't do the work for
1N/Ayou. On a character-by-character basis, C<is_utf8_char> will tell you
1N/Awhether the current character in a string is valid UTF-8.
1N/A
1N/A=head2 How does UTF-8 represent Unicode characters?
1N/A
1N/AAs mentioned above, UTF-8 uses a variable number of bytes to store a
1N/Acharacter. Characters with values 1...128 are stored in one byte, just
1N/Alike good ol' ASCII. Character 129 is stored as C<v194.129>; this
1N/Acontinues up to character 191, which is C<v194.191>. Now we've run out of
1N/Abits (191 is binary C<10111111>) so we move on; 192 is C<v195.128>. And
1N/Aso it goes on, moving to three bytes at character 2048.
1N/A
1N/AAssuming you know you're dealing with a UTF-8 string, you can find out
1N/Ahow long the first character in it is with the C<UTF8SKIP> macro:
1N/A
1N/A char *utf = "\305\233\340\240\201";
1N/A I32 len;
1N/A
1N/A len = UTF8SKIP(utf); /* len is 2 here */
1N/A utf += len;
1N/A len = UTF8SKIP(utf); /* len is 3 here */
1N/A
1N/AAnother way to skip over characters in a UTF-8 string is to use
1N/AC<utf8_hop>, which takes a string and a number of characters to skip
1N/Aover. You're on your own about bounds checking, though, so don't use it
1N/Alightly.
1N/A
1N/AAll bytes in a multi-byte UTF-8 character will have the high bit set,
1N/Aso you can test if you need to do something special with this
1N/Acharacter like this (the UTF8_IS_INVARIANT() is a macro that tests
1N/Awhether the byte can be encoded as a single byte even in UTF-8):
1N/A
1N/A U8 *utf;
1N/A UV uv; /* Note: a UV, not a U8, not a char */
1N/A
1N/A if (!UTF8_IS_INVARIANT(*utf))
1N/A /* Must treat this as UTF-8 */
1N/A uv = utf8_to_uv(utf);
1N/A else
1N/A /* OK to treat this character as a byte */
1N/A uv = *utf;
1N/A
1N/AYou can also see in that example that we use C<utf8_to_uv> to get the
1N/Avalue of the character; the inverse function C<uv_to_utf8> is available
1N/Afor putting a UV into UTF-8:
1N/A
1N/A if (!UTF8_IS_INVARIANT(uv))
1N/A /* Must treat this as UTF8 */
1N/A utf8 = uv_to_utf8(utf8, uv);
1N/A else
1N/A /* OK to treat this character as a byte */
1N/A *utf8++ = uv;
1N/A
1N/AYou B<must> convert characters to UVs using the above functions if
1N/Ayou're ever in a situation where you have to match UTF-8 and non-UTF-8
1N/Acharacters. You may not skip over UTF-8 characters in this case. If you
1N/Ado this, you'll lose the ability to match hi-bit non-UTF-8 characters;
1N/Afor instance, if your UTF-8 string contains C<v196.172>, and you skip
1N/Athat character, you can never match a C<chr(200)> in a non-UTF-8 string.
1N/ASo don't do that!
1N/A
1N/A=head2 How does Perl store UTF-8 strings?
1N/A
1N/ACurrently, Perl deals with Unicode strings and non-Unicode strings
1N/Aslightly differently. If a string has been identified as being UTF-8
1N/Aencoded, Perl will set a flag in the SV, C<SVf_UTF8>. You can check and
1N/Amanipulate this flag with the following macros:
1N/A
1N/A SvUTF8(sv)
1N/A SvUTF8_on(sv)
1N/A SvUTF8_off(sv)
1N/A
1N/AThis flag has an important effect on Perl's treatment of the string: if
1N/AUnicode data is not properly distinguished, regular expressions,
1N/AC<length>, C<substr> and other string handling operations will have
1N/Aundesirable results.
1N/A
1N/AThe problem comes when you have, for instance, a string that isn't
1N/Aflagged is UTF-8, and contains a byte sequence that could be UTF-8 -
1N/Aespecially when combining non-UTF-8 and UTF-8 strings.
1N/A
1N/ANever forget that the C<SVf_UTF8> flag is separate to the PV value; you
1N/Aneed be sure you don't accidentally knock it off while you're
1N/Amanipulating SVs. More specifically, you cannot expect to do this:
1N/A
1N/A SV *sv;
1N/A SV *nsv;
1N/A STRLEN len;
1N/A char *p;
1N/A
1N/A p = SvPV(sv, len);
1N/A frobnicate(p);
1N/A nsv = newSVpvn(p, len);
1N/A
1N/AThe C<char*> string does not tell you the whole story, and you can't
1N/Acopy or reconstruct an SV just by copying the string value. Check if the
1N/Aold SV has the UTF-8 flag set, and act accordingly:
1N/A
1N/A p = SvPV(sv, len);
1N/A frobnicate(p);
1N/A nsv = newSVpvn(p, len);
1N/A if (SvUTF8(sv))
1N/A SvUTF8_on(nsv);
1N/A
1N/AIn fact, your C<frobnicate> function should be made aware of whether or
1N/Anot it's dealing with UTF-8 data, so that it can handle the string
1N/Aappropriately.
1N/A
1N/ASince just passing an SV to an XS function and copying the data of
1N/Athe SV is not enough to copy the UTF-8 flags, even less right is just
1N/Apassing a C<char *> to an XS function.
1N/A
1N/A=head2 How do I convert a string to UTF-8?
1N/A
1N/AIf you're mixing UTF-8 and non-UTF-8 strings, you might find it necessary
1N/Ato upgrade one of the strings to UTF-8. If you've got an SV, the easiest
1N/Away to do this is:
1N/A
1N/A sv_utf8_upgrade(sv);
1N/A
1N/AHowever, you must not do this, for example:
1N/A
1N/A if (!SvUTF8(left))
1N/A sv_utf8_upgrade(left);
1N/A
1N/AIf you do this in a binary operator, you will actually change one of the
1N/Astrings that came into the operator, and, while it shouldn't be noticeable
1N/Aby the end user, it can cause problems.
1N/A
1N/AInstead, C<bytes_to_utf8> will give you a UTF-8-encoded B<copy> of its
1N/Astring argument. This is useful for having the data available for
1N/Acomparisons and so on, without harming the original SV. There's also
1N/AC<utf8_to_bytes> to go the other way, but naturally, this will fail if
1N/Athe string contains any characters above 255 that can't be represented
1N/Ain a single byte.
1N/A
1N/A=head2 Is there anything else I need to know?
1N/A
1N/ANot really. Just remember these things:
1N/A
1N/A=over 3
1N/A
1N/A=item *
1N/A
1N/AThere's no way to tell if a string is UTF-8 or not. You can tell if an SV
1N/Ais UTF-8 by looking at is C<SvUTF8> flag. Don't forget to set the flag if
1N/Asomething should be UTF-8. Treat the flag as part of the PV, even though
1N/Ait's not - if you pass on the PV to somewhere, pass on the flag too.
1N/A
1N/A=item *
1N/A
1N/AIf a string is UTF-8, B<always> use C<utf8_to_uv> to get at the value,
1N/Aunless C<UTF8_IS_INVARIANT(*s)> in which case you can use C<*s>.
1N/A
1N/A=item *
1N/A
1N/AWhen writing a character C<uv> to a UTF-8 string, B<always> use
1N/AC<uv_to_utf8>, unless C<UTF8_IS_INVARIANT(uv))> in which case
1N/Ayou can use C<*s = uv>.
1N/A
1N/A=item *
1N/A
1N/AMixing UTF-8 and non-UTF-8 strings is tricky. Use C<bytes_to_utf8> to get
1N/Aa new string which is UTF-8 encoded. There are tricks you can use to
1N/Adelay deciding whether you need to use a UTF-8 string until you get to a
1N/Ahigh character - C<HALF_UPGRADE> is one of those.
1N/A
1N/A=back
1N/A
1N/A=head1 Custom Operators
1N/A
1N/ACustom operator support is a new experimental feature that allows you to
1N/Adefine your own ops. This is primarily to allow the building of
1N/Ainterpreters for other languages in the Perl core, but it also allows
1N/Aoptimizations through the creation of "macro-ops" (ops which perform the
1N/Afunctions of multiple ops which are usually executed together, such as
1N/AC<gvsv, gvsv, add>.)
1N/A
1N/AThis feature is implemented as a new op type, C<OP_CUSTOM>. The Perl
1N/Acore does not "know" anything special about this op type, and so it will
1N/Anot be involved in any optimizations. This also means that you can
1N/Adefine your custom ops to be any op structure - unary, binary, list and
1N/Aso on - you like.
1N/A
1N/AIt's important to know what custom operators won't do for you. They
1N/Awon't let you add new syntax to Perl, directly. They won't even let you
1N/Aadd new keywords, directly. In fact, they won't change the way Perl
1N/Acompiles a program at all. You have to do those changes yourself, after
1N/APerl has compiled the program. You do this either by manipulating the op
1N/Atree using a C<CHECK> block and the C<B::Generate> module, or by adding
1N/Aa custom peephole optimizer with the C<optimize> module.
1N/A
1N/AWhen you do this, you replace ordinary Perl ops with custom ops by
1N/Acreating ops with the type C<OP_CUSTOM> and the C<pp_addr> of your own
1N/APP function. This should be defined in XS code, and should look like
1N/Athe PP ops in C<pp_*.c>. You are responsible for ensuring that your op
1N/Atakes the appropriate number of values from the stack, and you are
1N/Aresponsible for adding stack marks if necessary.
1N/A
1N/AYou should also "register" your op with the Perl interpreter so that it
1N/Acan produce sensible error and warning messages. Since it is possible to
1N/Ahave multiple custom ops within the one "logical" op type C<OP_CUSTOM>,
1N/APerl uses the value of C<< o->op_ppaddr >> as a key into the
1N/AC<PL_custom_op_descs> and C<PL_custom_op_names> hashes. This means you
1N/Aneed to enter a name and description for your op at the appropriate
1N/Aplace in the C<PL_custom_op_names> and C<PL_custom_op_descs> hashes.
1N/A
1N/AForthcoming versions of C<B::Generate> (version 1.0 and above) should
1N/Adirectly support the creation of custom ops by name; C<Opcodes::Custom>
1N/Awill provide functions which make it trivial to "register" custom ops to
1N/Athe Perl interpreter.
1N/A
1N/A=head1 AUTHORS
1N/A
1N/AUntil May 1997, this document was maintained by Jeff Okamoto
1N/AE<lt>okamoto@corp.hp.comE<gt>. It is now maintained as part of Perl
1N/Aitself by the Perl 5 Porters E<lt>perl5-porters@perl.orgE<gt>.
1N/A
1N/AWith lots of help and suggestions from Dean Roehrich, Malcolm Beattie,
1N/AAndreas Koenig, Paul Hudson, Ilya Zakharevich, Paul Marquess, Neil
1N/ABowers, Matthew Green, Tim Bunce, Spider Boardman, Ulrich Pfeifer,
1N/AStephen McCamant, and Gurusamy Sarathy.
1N/A
1N/A=head1 SEE ALSO
1N/A
1N/Aperlapi(1), perlintern(1), perlxs(1), perlembed(1)