distrib/pod/perlopentut.pod

1N/A=head1 NAME
1N/A
1N/Aperlopentut - tutorial on opening things in Perl
1N/A
1N/A=head1 DESCRIPTION
1N/A
1N/APerl has two simple, built-in ways to open files: the shell way for
1N/Aconvenience, and the C way for precision.  The shell way also has 2- and
1N/A3-argument forms, which have different semantics for handling the filename.
1N/AThe choice is yours.
1N/A
1N/A=head1 Open E<agrave> la shell
1N/A
1N/APerl's C<open> function was designed to mimic the way command-line
1N/Aredirection in the shell works.  Here are some basic examples
1N/Afrom the shell:
1N/A
1N/A    $ myprogram file1 file2 file3
1N/A    $ myprogram    <  inputfile
1N/A    $ myprogram    >  outputfile
1N/A    $ myprogram    >> outputfile
1N/A    $ myprogram    |  otherprogram
1N/A    $ otherprogram |  myprogram
1N/A
1N/AAnd here are some more advanced examples:
1N/A
1N/A    $ otherprogram      | myprogram f1 - f2
1N/A    $ otherprogram 2>&1 | myprogram -
1N/A    $ myprogram     <&3
1N/A    $ myprogram     >&4
1N/A
1N/AProgrammers accustomed to constructs like those above can take comfort
1N/Ain learning that Perl directly supports these familiar constructs using
1N/Avirtually the same syntax as the shell.
1N/A
1N/A=head2 Simple Opens
1N/A
1N/AThe C<open> function takes two arguments: the first is a filehandle,
1N/Aand the second is a single string comprising both what to open and how
1N/Ato open it.  C<open> returns true when it works, and when it fails,
1N/Areturns a false value and sets the special variable C<$!> to reflect
1N/Athe system error.  If the filehandle was previously opened, it will
1N/Abe implicitly closed first.
1N/A
1N/AFor example:
1N/A
1N/A    open(INFO,      "datafile") || die("can't open datafile: $!");
1N/A    open(INFO,   "<  datafile") || die("can't open datafile: $!");
1N/A    open(RESULTS,">  runstats") || die("can't open runstats: $!");
1N/A    open(LOG,    ">> logfile ") || die("can't open logfile:  $!");
1N/A
1N/AIf you prefer the low-punctuation version, you could write that this way:
1N/A
1N/A    open INFO,   "<  datafile"  or die "can't open datafile: $!";
1N/A    open RESULTS,">  runstats"  or die "can't open runstats: $!";
1N/A    open LOG,    ">> logfile "  or die "can't open logfile:  $!";
1N/A
1N/AA few things to notice.  First, the leading less-than is optional.
1N/AIf omitted, Perl assumes that you want to open the file for reading.
1N/A
1N/ANote also that the first example uses the C<||> logical operator, and the
1N/Asecond uses C<or>, which has lower precedence.  Using C<||> in the latter
1N/Aexamples would effectively mean
1N/A
1N/A    open INFO, ( "<  datafile"  || die "can't open datafile: $!" );
1N/A
1N/Awhich is definitely not what you want.
1N/A
1N/AThe other important thing to notice is that, just as in the shell,
1N/Aany white space before or after the filename is ignored.  This is good,
1N/Abecause you wouldn't want these to do different things:
1N/A
1N/A    open INFO,   "<datafile"
1N/A    open INFO,   "< datafile"
1N/A    open INFO,   "<  datafile"
1N/A
1N/AIgnoring surrounding whitespace also helps for when you read a filename
1N/Ain from a different file, and forget to trim it before opening:
1N/A
1N/A    $filename = <INFO>;         # oops, \n still there
1N/A    open(EXTRA, "< $filename") || die "can't open $filename: $!";
1N/A
1N/AThis is not a bug, but a feature.  Because C<open> mimics the shell in
1N/Aits style of using redirection arrows to specify how to open the file, it
1N/Aalso does so with respect to extra white space around the filename itself
1N/Aas well.  For accessing files with naughty names, see
1N/AL<"Dispelling the Dweomer">.
1N/A
1N/AThere is also a 3-argument version of C<open>, which lets you put the
1N/Aspecial redirection characters into their own argument:
1N/A
1N/A    open( INFO, ">", $datafile ) || die "Can't create $datafile: $!";
1N/A
1N/AIn this case, the filename to open is the actual string in C<$datafile>,
1N/Aso you don't have to worry about C<$datafile> containing characters
1N/Athat might influence the open mode, or whitespace at the beginning of
1N/Athe filename that would be absorbed in the 2-argument version.  Also,
1N/Aany reduction of unnecessary string interpolation is a good thing.
1N/A
1N/A=head2 Indirect Filehandles
1N/A
1N/AC<open>'s first argument can be a reference to a filehandle.  As of
1N/Aperl 5.6.0, if the argument is uninitialized, Perl will automatically
1N/Acreate a filehandle and put a reference to it in the first argument,
1N/Alike so:
1N/A
1N/A    open( my $in, $infile )   or die "Couldn't read $infile: $!";
1N/A    while ( <$in> ) {
1N/A    # do something with $_
1N/A    }
1N/A    close $in;
1N/A
1N/AIndirect filehandles make namespace management easier.  Since filehandles
1N/Aare global to the current package, two subroutines trying to open
1N/AC<INFILE> will clash.  With two functions opening indirect filehandles
1N/Alike C<my $infile>, there's no clash and no need to worry about future
1N/Aconflicts.
1N/A
1N/AAnother convenient behavior is that an indirect filehandle automatically
1N/Acloses when it goes out of scope or when you undefine it:
1N/A
1N/A    sub firstline {
1N/A    open( my $in, shift ) && return scalar <$in>;
1N/A    # no close() required
1N/A    }
1N/A
1N/A=head2 Pipe Opens
1N/A
1N/AIn C, when you want to open a file using the standard I/O library,
1N/Ayou use the C<fopen> function, but when opening a pipe, you use the
1N/AC<popen> function.  But in the shell, you just use a different redirection
1N/Acharacter.  That's also the case for Perl.  The C<open> call
1N/Aremains the same--just its argument differs.
1N/A
1N/AIf the leading character is a pipe symbol, C<open> starts up a new
1N/Acommand and opens a write-only filehandle leading into that command.
1N/AThis lets you write into that handle and have what you write show up on
1N/Athat command's standard input.  For example:
1N/A
1N/A    open(PRINTER, "| lpr -Plp1")    || die "can't run lpr: $!";
1N/A    print PRINTER "stuff\n";
1N/A    close(PRINTER)                  || die "can't close lpr: $!";
1N/A
1N/AIf the trailing character is a pipe, you start up a new command and open a
1N/Aread-only filehandle leading out of that command.  This lets whatever that
1N/Acommand writes to its standard output show up on your handle for reading.
1N/AFor example:
1N/A
1N/A    open(NET, "netstat -i -n |")    || die "can't fork netstat: $!";
1N/A    while (<NET>) { }               # do something with input
1N/A    close(NET)                      || die "can't close netstat: $!";
1N/A
1N/AWhat happens if you try to open a pipe to or from a non-existent
1N/Acommand?  If possible, Perl will detect the failure and set C<$!> as
1N/Ausual.  But if the command contains special shell characters, such as
1N/AC<E<gt>> or C<*>, called 'metacharacters', Perl does not execute the
1N/Acommand directly.  Instead, Perl runs the shell, which then tries to
1N/Arun the command.  This means that it's the shell that gets the error
1N/Aindication.  In such a case, the C<open> call will only indicate
1N/Afailure if Perl can't even run the shell.  See L<perlfaq8/"How can I
1N/Acapture STDERR from an external command?"> to see how to cope with
1N/Athis.  There's also an explanation in L<perlipc>.
1N/A
1N/AIf you would like to open a bidirectional pipe, the IPC::Open2
1N/Alibrary will handle this for you.  Check out
1N/AL<perlipc/"Bidirectional Communication with Another Process">
1N/A
1N/A=head2 The Minus File
1N/A
1N/AAgain following the lead of the standard shell utilities, Perl's
1N/AC<open> function treats a file whose name is a single minus, "-", in a
1N/Aspecial way.  If you open minus for reading, it really means to access
1N/Athe standard input.  If you open minus for writing, it really means to
1N/Aaccess the standard output.
1N/A
1N/AIf minus can be used as the default input or default output, what happens
1N/Aif you open a pipe into or out of minus?  What's the default command it
1N/Awould run?  The same script as you're currently running!  This is actually
1N/Aa stealth C<fork> hidden inside an C<open> call.  See
1N/AL<perlipc/"Safe Pipe Opens"> for details.
1N/A
1N/A=head2 Mixing Reads and Writes
1N/A
1N/AIt is possible to specify both read and write access.  All you do is
1N/Aadd a "+" symbol in front of the redirection.  But as in the shell,
1N/Ausing a less-than on a file never creates a new file; it only opens an
1N/Aexisting one.  On the other hand, using a greater-than always clobbers
1N/A(truncates to zero length) an existing file, or creates a brand-new one
1N/Aif there isn't an old one.  Adding a "+" for read-write doesn't affect
1N/Awhether it only works on existing files or always clobbers existing ones.
1N/A
1N/A    open(WTMP, "+< /usr/adm/wtmp")
1N/A        || die "can't open /usr/adm/wtmp: $!";
1N/A
1N/A    open(SCREEN, "+> lkscreen")
1N/A        || die "can't open lkscreen: $!";
1N/A
1N/A    open(LOGFILE, "+>> /var/log/applog"
1N/A        || die "can't open /var/log/applog: $!";
1N/A
1N/AThe first one won't create a new file, and the second one will always
1N/Aclobber an old one.  The third one will create a new file if necessary
1N/Aand not clobber an old one, and it will allow you to read at any point
1N/Ain the file, but all writes will always go to the end.  In short,
1N/Athe first case is substantially more common than the second and third
1N/Acases, which are almost always wrong.  (If you know C, the plus in
1N/APerl's C<open> is historically derived from the one in C's fopen(3S),
1N/Awhich it ultimately calls.)
1N/A
1N/AIn fact, when it comes to updating a file, unless you're working on
1N/Aa binary file as in the WTMP case above, you probably don't want to
1N/Ause this approach for updating.  Instead, Perl's B<-i> flag comes to
1N/Athe rescue.  The following command takes all the C, C++, or yacc source
1N/Aor header files and changes all their foo's to bar's, leaving
1N/Athe old version in the original filename with a ".orig" tacked
1N/Aon the end:
1N/A
1N/A    $ perl -i.orig -pe 's/\bfoo\b/bar/g' *.[Cchy]
1N/A
1N/AThis is a short cut for some renaming games that are really
1N/Athe best way to update textfiles.  See the second question in
1N/AL<perlfaq5> for more details.
1N/A
1N/A=head2 Filters
1N/A
1N/AOne of the most common uses for C<open> is one you never
1N/Aeven notice.  When you process the ARGV filehandle using
1N/AC<< <ARGV> >>, Perl actually does an implicit open
1N/Aon each file in @ARGV.  Thus a program called like this:
1N/A
1N/A    $ myprogram file1 file2 file3
1N/A
1N/ACan have all its files opened and processed one at a time
1N/Ausing a construct no more complex than:
1N/A
1N/A    while (<>) {
1N/A        # do something with $_
1N/A    }
1N/A
1N/AIf @ARGV is empty when the loop first begins, Perl pretends you've opened
1N/Aup minus, that is, the standard input.  In fact, $ARGV, the currently
1N/Aopen file during C<< <ARGV> >> processing, is even set to "-"
1N/Ain these circumstances.
1N/A
1N/AYou are welcome to pre-process your @ARGV before starting the loop to
1N/Amake sure it's to your liking.  One reason to do this might be to remove
1N/Acommand options beginning with a minus.  While you can always roll the
1N/Asimple ones by hand, the Getopts modules are good for this:
1N/A
1N/A    use Getopt::Std;
1N/A
1N/A    # -v, -D, -o ARG, sets $opt_v, $opt_D, $opt_o
1N/A    getopts("vDo:");
1N/A
1N/A    # -v, -D, -o ARG, sets $args{v}, $args{D}, $args{o}
1N/A    getopts("vDo:", \%args);
1N/A
1N/AOr the standard Getopt::Long module to permit named arguments:
1N/A
1N/A    use Getopt::Long;
1N/A    GetOptions( "verbose"  => \$verbose,        # --verbose
1N/A                "Debug"    => \$debug,          # --Debug
1N/A                "output=s" => \$output );
1N/A        # --output=somestring or --output somestring
1N/A
1N/AAnother reason for preprocessing arguments is to make an empty
1N/Aargument list default to all files:
1N/A
1N/A    @ARGV = glob("*") unless @ARGV;
1N/A
1N/AYou could even filter out all but plain, text files.  This is a bit
1N/Asilent, of course, and you might prefer to mention them on the way.
1N/A
1N/A    @ARGV = grep { -f && -T } @ARGV;
1N/A
1N/AIf you're using the B<-n> or B<-p> command-line options, you
1N/Ashould put changes to @ARGV in a C<BEGIN{}> block.
1N/A
1N/ARemember that a normal C<open> has special properties, in that it might
1N/Acall fopen(3S) or it might called popen(3S), depending on what its
1N/Aargument looks like; that's why it's sometimes called "magic open".
1N/AHere's an example:
1N/A
1N/A    $pwdinfo = `domainname` =~ /^(\(none\))?$/
1N/A                    ? '< /etc/passwd'
1N/A                    : 'ypcat passwd |';
1N/A
1N/A    open(PWD, $pwdinfo)
1N/A                or die "can't open $pwdinfo: $!";
1N/A
1N/AThis sort of thing also comes into play in filter processing.  Because
1N/AC<< <ARGV> >> processing employs the normal, shell-style Perl C<open>,
1N/Ait respects all the special things we've already seen:
1N/A
1N/A    $ myprogram f1 "cmd1|" - f2 "cmd2|" f3 < tmpfile
1N/A
1N/AThat program will read from the file F<f1>, the process F<cmd1>, standard
1N/Ainput (F<tmpfile> in this case), the F<f2> file, the F<cmd2> command,
1N/Aand finally the F<f3> file.
1N/A
1N/AYes, this also means that if you have files named "-" (and so on) in
1N/Ayour directory, they won't be processed as literal files by C<open>.
1N/AYou'll need to pass them as "./-", much as you would for the I<rm> program,
1N/Aor you could use C<sysopen> as described below.
1N/A
1N/AOne of the more interesting applications is to change files of a certain
1N/Aname into pipes.  For example, to autoprocess gzipped or compressed
1N/Afiles by decompressing them with I<gzip>:
1N/A
1N/A    @ARGV = map { /^\.(gz|Z)$/ ? "gzip -dc $_ |" : $_  } @ARGV;
1N/A
1N/AOr, if you have the I<GET> program installed from LWP,
1N/Ayou can fetch URLs before processing them:
1N/A
1N/A    @ARGV = map { m#^\w+://# ? "GET $_ |" : $_ } @ARGV;
1N/A
1N/AIt's not for nothing that this is called magic C<< <ARGV> >>.
1N/APretty nifty, eh?
1N/A
1N/A=head1 Open E<agrave> la C
1N/A
1N/AIf you want the convenience of the shell, then Perl's C<open> is
1N/Adefinitely the way to go.  On the other hand, if you want finer precision
1N/Athan C's simplistic fopen(3S) provides you should look to Perl's
1N/AC<sysopen>, which is a direct hook into the open(2) system call.
1N/AThat does mean it's a bit more involved, but that's the price of
1N/Aprecision.
1N/A
1N/AC<sysopen> takes 3 (or 4) arguments.
1N/A
1N/A    sysopen HANDLE, PATH, FLAGS, [MASK]
1N/A
1N/AThe HANDLE argument is a filehandle just as with C<open>.  The PATH is
1N/Aa literal path, one that doesn't pay attention to any greater-thans or
1N/Aless-thans or pipes or minuses, nor ignore white space.  If it's there,
1N/Ait's part of the path.  The FLAGS argument contains one or more values
1N/Aderived from the Fcntl module that have been or'd together using the
1N/Abitwise "|" operator.  The final argument, the MASK, is optional; if
1N/Apresent, it is combined with the user's current umask for the creation
1N/Amode of the file.  You should usually omit this.
1N/A
1N/AAlthough the traditional values of read-only, write-only, and read-write
1N/Aare 0, 1, and 2 respectively, this is known not to hold true on some
1N/Asystems.  Instead, it's best to load in the appropriate constants first
1N/Afrom the Fcntl module, which supplies the following standard flags:
1N/A
1N/A    O_RDONLY            Read only
1N/A    O_WRONLY            Write only
1N/A    O_RDWR              Read and write
1N/A    O_CREAT             Create the file if it doesn't exist
1N/A    O_EXCL              Fail if the file already exists
1N/A    O_APPEND            Append to the file
1N/A    O_TRUNC             Truncate the file
1N/A    O_NONBLOCK          Non-blocking access
1N/A
1N/ALess common flags that are sometimes available on some operating
1N/Asystems include C<O_BINARY>, C<O_TEXT>, C<O_SHLOCK>, C<O_EXLOCK>,
1N/AC<O_DEFER>, C<O_SYNC>, C<O_ASYNC>, C<O_DSYNC>, C<O_RSYNC>,
1N/AC<O_NOCTTY>, C<O_NDELAY> and C<O_LARGEFILE>.  Consult your open(2)
1N/Amanpage or its local equivalent for details.  (Note: starting from
1N/APerl release 5.6 the C<O_LARGEFILE> flag, if available, is automatically
1N/Aadded to the sysopen() flags because large files are the default.)
1N/A
1N/AHere's how to use C<sysopen> to emulate the simple C<open> calls we had
1N/Abefore.  We'll omit the C<|| die $!> checks for clarity, but make sure
1N/Ayou always check the return values in real code.  These aren't quite
1N/Athe same, since C<open> will trim leading and trailing white space,
1N/Abut you'll get the idea.
1N/A
1N/ATo open a file for reading:
1N/A
1N/A    open(FH, "< $path");
1N/A    sysopen(FH, $path, O_RDONLY);
1N/A
1N/ATo open a file for writing, creating a new file if needed or else truncating
1N/Aan old file:
1N/A
1N/A    open(FH, "> $path");
1N/A    sysopen(FH, $path, O_WRONLY | O_TRUNC | O_CREAT);
1N/A
1N/ATo open a file for appending, creating one if necessary:
1N/A
1N/A    open(FH, ">> $path");
1N/A    sysopen(FH, $path, O_WRONLY | O_APPEND | O_CREAT);
1N/A
1N/ATo open a file for update, where the file must already exist:
1N/A
1N/A    open(FH, "+< $path");
1N/A    sysopen(FH, $path, O_RDWR);
1N/A
1N/AAnd here are things you can do with C<sysopen> that you cannot do with
1N/Aa regular C<open>.  As you'll see, it's just a matter of controlling the
1N/Aflags in the third argument.
1N/A
1N/ATo open a file for writing, creating a new file which must not previously
1N/Aexist:
1N/A
1N/A    sysopen(FH, $path, O_WRONLY | O_EXCL | O_CREAT);
1N/A
1N/ATo open a file for appending, where that file must already exist:
1N/A
1N/A    sysopen(FH, $path, O_WRONLY | O_APPEND);
1N/A
1N/ATo open a file for update, creating a new file if necessary:
1N/A
1N/A    sysopen(FH, $path, O_RDWR | O_CREAT);
1N/A
1N/ATo open a file for update, where that file must not already exist:
1N/A
1N/A    sysopen(FH, $path, O_RDWR | O_EXCL | O_CREAT);
1N/A
1N/ATo open a file without blocking, creating one if necessary:
1N/A
1N/A    sysopen(FH, $path, O_WRONLY | O_NONBLOCK | O_CREAT);
1N/A
1N/A=head2 Permissions E<agrave> la mode
1N/A
1N/AIf you omit the MASK argument to C<sysopen>, Perl uses the octal value
1N/A0666.  The normal MASK to use for executables and directories should
1N/Abe 0777, and for anything else, 0666.
1N/A
1N/AWhy so permissive?  Well, it isn't really.  The MASK will be modified
1N/Aby your process's current C<umask>.  A umask is a number representing
1N/AI<disabled> permissions bits; that is, bits that will not be turned on
1N/Ain the created files' permissions field.
1N/A
1N/AFor example, if your C<umask> were 027, then the 020 part would
1N/Adisable the group from writing, and the 007 part would disable others
1N/Afrom reading, writing, or executing.  Under these conditions, passing
1N/AC<sysopen> 0666 would create a file with mode 0640, since C<0666 & ~027>
1N/Ais 0640.
1N/A
1N/AYou should seldom use the MASK argument to C<sysopen()>.  That takes
1N/Aaway the user's freedom to choose what permission new files will have.
1N/ADenying choice is almost always a bad thing.  One exception would be for
1N/Acases where sensitive or private data is being stored, such as with mail
1N/Afolders, cookie files, and internal temporary files.
1N/A
1N/A=head1 Obscure Open Tricks
1N/A
1N/A=head2 Re-Opening Files (dups)
1N/A
1N/ASometimes you already have a filehandle open, and want to make another
1N/Ahandle that's a duplicate of the first one.  In the shell, we place an
1N/Aampersand in front of a file descriptor number when doing redirections.
1N/AFor example, C<< 2>&1 >> makes descriptor 2 (that's STDERR in Perl)
1N/Abe redirected into descriptor 1 (which is usually Perl's STDOUT).
1N/AThe same is essentially true in Perl: a filename that begins with an
1N/Aampersand is treated instead as a file descriptor if a number, or as a
1N/Afilehandle if a string.
1N/A
1N/A    open(SAVEOUT, ">&SAVEERR") || die "couldn't dup SAVEERR: $!";
1N/A    open(MHCONTEXT, "<&4")     || die "couldn't dup fd4: $!";
1N/A
1N/AThat means that if a function is expecting a filename, but you don't
1N/Awant to give it a filename because you already have the file open, you
1N/Acan just pass the filehandle with a leading ampersand.  It's best to
1N/Ause a fully qualified handle though, just in case the function happens
1N/Ato be in a different package:
1N/A
1N/A    somefunction("&main::LOGFILE");
1N/A
1N/AThis way if somefunction() is planning on opening its argument, it can
1N/Ajust use the already opened handle.  This differs from passing a handle,
1N/Abecause with a handle, you don't open the file.  Here you have something
1N/Ayou can pass to open.
1N/A
1N/AIf you have one of those tricky, newfangled I/O objects that the C++
1N/Afolks are raving about, then this doesn't work because those aren't a
1N/Aproper filehandle in the native Perl sense.  You'll have to use fileno()
1N/Ato pull out the proper descriptor number, assuming you can:
1N/A
1N/A    use IO::Socket;
1N/A    $handle = IO::Socket::INET->new("www.perl.com:80");
1N/A    $fd = $handle->fileno;
1N/A    somefunction("&$fd");  # not an indirect function call
1N/A
1N/AIt can be easier (and certainly will be faster) just to use real
1N/Afilehandles though:
1N/A
1N/A    use IO::Socket;
1N/A    local *REMOTE = IO::Socket::INET->new("www.perl.com:80");
1N/A    die "can't connect" unless defined(fileno(REMOTE));
1N/A    somefunction("&main::REMOTE");
1N/A
1N/AIf the filehandle or descriptor number is preceded not just with a simple
1N/A"&" but rather with a "&=" combination, then Perl will not create a
1N/Acompletely new descriptor opened to the same place using the dup(2)
1N/Asystem call.  Instead, it will just make something of an alias to the
1N/Aexisting one using the fdopen(3S) library call  This is slightly more
1N/Aparsimonious of systems resources, although this is less a concern
1N/Athese days.  Here's an example of that:
1N/A
1N/A    $fd = $ENV{"MHCONTEXTFD"};
1N/A    open(MHCONTEXT, "<&=$fd")   or die "couldn't fdopen $fd: $!";
1N/A
1N/AIf you're using magic C<< <ARGV> >>, you could even pass in as a
1N/Acommand line argument in @ARGV something like C<"<&=$MHCONTEXTFD">,
1N/Abut we've never seen anyone actually do this.
1N/A
1N/A=head2 Dispelling the Dweomer
1N/A
1N/APerl is more of a DWIMmer language than something like Java--where DWIM
1N/Ais an acronym for "do what I mean".  But this principle sometimes leads
1N/Ato more hidden magic than one knows what to do with.  In this way, Perl
1N/Ais also filled with I<dweomer>, an obscure word meaning an enchantment.
1N/ASometimes, Perl's DWIMmer is just too much like dweomer for comfort.
1N/A
1N/AIf magic C<open> is a bit too magical for you, you don't have to turn
1N/Ato C<sysopen>.  To open a file with arbitrary weird characters in
1N/Ait, it's necessary to protect any leading and trailing whitespace.
1N/ALeading whitespace is protected by inserting a C<"./"> in front of a
1N/Afilename that starts with whitespace.  Trailing whitespace is protected
1N/Aby appending an ASCII NUL byte (C<"\0">) at the end of the string.
1N/A
1N/A    $file =~ s#^(\s)#./$1#;
1N/A    open(FH, "< $file\0")   || die "can't open $file: $!";
1N/A
1N/AThis assumes, of course, that your system considers dot the current
1N/Aworking directory, slash the directory separator, and disallows ASCII
1N/ANULs within a valid filename.  Most systems follow these conventions,
1N/Aincluding all POSIX systems as well as proprietary Microsoft systems.
1N/AThe only vaguely popular system that doesn't work this way is the
1N/Aproprietary Macintosh system, which uses a colon where the rest of us
1N/Ause a slash.  Maybe C<sysopen> isn't such a bad idea after all.
1N/A
1N/AIf you want to use C<< <ARGV> >> processing in a totally boring
1N/Aand non-magical way, you could do this first:
1N/A
1N/A    #   "Sam sat on the ground and put his head in his hands.
1N/A    #   'I wish I had never come here, and I don't want to see
1N/A    #   no more magic,' he said, and fell silent."
1N/A    for (@ARGV) {
1N/A        s#^([^./])#./$1#;
1N/A        $_ .= "\0";
1N/A    }
1N/A    while (<>) {
1N/A        # now process $_
1N/A    }
1N/A
1N/ABut be warned that users will not appreciate being unable to use "-"
1N/Ato mean standard input, per the standard convention.
1N/A
1N/A=head2 Paths as Opens
1N/A
1N/AYou've probably noticed how Perl's C<warn> and C<die> functions can
1N/Aproduce messages like:
1N/A
1N/A    Some warning at scriptname line 29, <FH> line 7.
1N/A
1N/AThat's because you opened a filehandle FH, and had read in seven records
1N/Afrom it.  But what was the name of the file, rather than the handle?
1N/A
1N/AIf you aren't running with C<strict refs>, or if you've turned them off
1N/Atemporarily, then all you have to do is this:
1N/A
1N/A    open($path, "< $path") || die "can't open $path: $!";
1N/A    while (<$path>) {
1N/A        # whatever
1N/A    }
1N/A
1N/ASince you're using the pathname of the file as its handle,
1N/Ayou'll get warnings more like
1N/A
1N/A    Some warning at scriptname line 29, </etc/motd> line 7.
1N/A
1N/A=head2 Single Argument Open
1N/A
1N/ARemember how we said that Perl's open took two arguments?  That was a
1N/Apassive prevarication.  You see, it can also take just one argument.
1N/AIf and only if the variable is a global variable, not a lexical, you
1N/Acan pass C<open> just one argument, the filehandle, and it will
1N/Aget the path from the global scalar variable of the same name.
1N/A
1N/A    $FILE = "/etc/motd";
1N/A    open FILE or die "can't open $FILE: $!";
1N/A    while (<FILE>) {
1N/A        # whatever
1N/A    }
1N/A
1N/AWhy is this here?  Someone has to cater to the hysterical porpoises.
1N/AIt's something that's been in Perl since the very beginning, if not
1N/Abefore.
1N/A
1N/A=head2 Playing with STDIN and STDOUT
1N/A
1N/AOne clever move with STDOUT is to explicitly close it when you're done
1N/Awith the program.
1N/A
1N/A    END { close(STDOUT) || die "can't close stdout: $!" }
1N/A
1N/AIf you don't do this, and your program fills up the disk partition due
1N/Ato a command line redirection, it won't report the error exit with a
1N/Afailure status.
1N/A
1N/AYou don't have to accept the STDIN and STDOUT you were given.  You are
1N/Awelcome to reopen them if you'd like.
1N/A
1N/A    open(STDIN, "< datafile")
1N/A    || die "can't open datafile: $!";
1N/A
1N/A    open(STDOUT, "> output")
1N/A    || die "can't open output: $!";
1N/A
1N/AAnd then these can be accessed directly or passed on to subprocesses.
1N/AThis makes it look as though the program were initially invoked
1N/Awith those redirections from the command line.
1N/A
1N/AIt's probably more interesting to connect these to pipes.  For example:
1N/A
1N/A    $pager = $ENV{PAGER} || "(less || more)";
1N/A    open(STDOUT, "| $pager")
1N/A    || die "can't fork a pager: $!";
1N/A
1N/AThis makes it appear as though your program were called with its stdout
1N/Aalready piped into your pager.  You can also use this kind of thing
1N/Ain conjunction with an implicit fork to yourself.  You might do this
1N/Aif you would rather handle the post processing in your own program,
1N/Ajust in a different process:
1N/A
1N/A    head(100);
1N/A    while (<>) {
1N/A        print;
1N/A    }
1N/A
1N/A    sub head {
1N/A        my $lines = shift || 20;
1N/A        return if $pid = open(STDOUT, "|-");       # return if parent
1N/A        die "cannot fork: $!" unless defined $pid;
1N/A        while (<STDIN>) {
1N/A            last if --$lines < 0;
1N/A            print;
1N/A        }
1N/A        exit;
1N/A    }
1N/A
1N/AThis technique can be applied to repeatedly push as many filters on your
1N/Aoutput stream as you wish.
1N/A
1N/A=head1 Other I/O Issues
1N/A
1N/AThese topics aren't really arguments related to C<open> or C<sysopen>,
1N/Abut they do affect what you do with your open files.
1N/A
1N/A=head2 Opening Non-File Files
1N/A
1N/AWhen is a file not a file?  Well, you could say when it exists but
1N/Aisn't a plain file.   We'll check whether it's a symbolic link first,
1N/Ajust in case.
1N/A
1N/A    if (-l $file || ! -f _) {
1N/A        print "$file is not a plain file\n";
1N/A    }
1N/A
1N/AWhat other kinds of files are there than, well, files?  Directories,
1N/Asymbolic links, named pipes, Unix-domain sockets, and block and character
1N/Adevices.  Those are all files, too--just not I<plain> files.  This isn't
1N/Athe same issue as being a text file. Not all text files are plain files.
1N/ANot all plain files are text files.  That's why there are separate C<-f>
1N/Aand C<-T> file tests.
1N/A
1N/ATo open a directory, you should use the C<opendir> function, then
1N/Aprocess it with C<readdir>, carefully restoring the directory
1N/Aname if necessary:
1N/A
1N/A    opendir(DIR, $dirname) or die "can't opendir $dirname: $!";
1N/A    while (defined($file = readdir(DIR))) {
1N/A        # do something with "$dirname/$file"
1N/A    }
1N/A    closedir(DIR);
1N/A
1N/AIf you want to process directories recursively, it's better to use the
1N/AFile::Find module.  For example, this prints out all files recursively
1N/Aand adds a slash to their names if the file is a directory.
1N/A
1N/A    @ARGV = qw(.) unless @ARGV;
1N/A    use File::Find;
1N/A    find sub { print $File::Find::name, -d && '/', "\n" }, @ARGV;
1N/A
1N/AThis finds all bogus symbolic links beneath a particular directory:
1N/A
1N/A    find sub { print "$File::Find::name\n" if -l && !-e }, $dir;
1N/A
1N/AAs you see, with symbolic links, you can just pretend that it is
1N/Awhat it points to.  Or, if you want to know I<what> it points to, then
1N/AC<readlink> is called for:
1N/A
1N/A    if (-l $file) {
1N/A        if (defined($whither = readlink($file))) {
1N/A            print "$file points to $whither\n";
1N/A        } else {
1N/A            print "$file points nowhere: $!\n";
1N/A        }
1N/A    }
1N/A
1N/A=head2 Opening Named Pipes
1N/A
1N/ANamed pipes are a different matter.  You pretend they're regular files,
1N/Abut their opens will normally block until there is both a reader and
1N/Aa writer.  You can read more about them in L<perlipc/"Named Pipes">.
1N/AUnix-domain sockets are rather different beasts as well; they're
1N/Adescribed in L<perlipc/"Unix-Domain TCP Clients and Servers">.
1N/A
1N/AWhen it comes to opening devices, it can be easy and it can be tricky.
1N/AWe'll assume that if you're opening up a block device, you know what
1N/Ayou're doing.  The character devices are more interesting.  These are
1N/Atypically used for modems, mice, and some kinds of printers.  This is
1N/Adescribed in L<perlfaq8/"How do I read and write the serial port?">
1N/AIt's often enough to open them carefully:
1N/A
1N/A    sysopen(TTYIN, "/dev/ttyS1", O_RDWR | O_NDELAY | O_NOCTTY)
1N/A        # (O_NOCTTY no longer needed on POSIX systems)
1N/A        or die "can't open /dev/ttyS1: $!";
1N/A    open(TTYOUT, "+>&TTYIN")
1N/A        or die "can't dup TTYIN: $!";
1N/A
1N/A    $ofh = select(TTYOUT); $| = 1; select($ofh);
1N/A
1N/A    print TTYOUT "+++at\015";
1N/A    $answer = <TTYIN>;
1N/A
1N/AWith descriptors that you haven't opened using C<sysopen>, such as
1N/Asockets, you can set them to be non-blocking using C<fcntl>:
1N/A
1N/A    use Fcntl;
1N/A    my $old_flags = fcntl($handle, F_GETFL, 0)
1N/A        or die "can't get flags: $!";
1N/A    fcntl($handle, F_SETFL, $old_flags | O_NONBLOCK)
1N/A        or die "can't set non blocking: $!";
1N/A
1N/ARather than losing yourself in a morass of twisting, turning C<ioctl>s,
1N/Aall dissimilar, if you're going to manipulate ttys, it's best to
1N/Amake calls out to the stty(1) program if you have it, or else use the
1N/Aportable POSIX interface.  To figure this all out, you'll need to read the
1N/Atermios(3) manpage, which describes the POSIX interface to tty devices,
1N/Aand then L<POSIX>, which describes Perl's interface to POSIX.  There are
1N/Aalso some high-level modules on CPAN that can help you with these games.
1N/ACheck out Term::ReadKey and Term::ReadLine.
1N/A
1N/A=head2 Opening Sockets
1N/A
1N/AWhat else can you open?  To open a connection using sockets, you won't use
1N/Aone of Perl's two open functions.  See
1N/AL<perlipc/"Sockets: Client/Server Communication"> for that.  Here's an
1N/Aexample.  Once you have it, you can use FH as a bidirectional filehandle.
1N/A
1N/A    use IO::Socket;
1N/A    local *FH = IO::Socket::INET->new("www.perl.com:80");
1N/A
1N/AFor opening up a URL, the LWP modules from CPAN are just what
1N/Athe doctor ordered.  There's no filehandle interface, but
1N/Ait's still easy to get the contents of a document:
1N/A
1N/A    use LWP::Simple;
1N/A    $doc = get('http://www.linpro.no/lwp/');
1N/A
1N/A=head2 Binary Files
1N/A
1N/AOn certain legacy systems with what could charitably be called terminally
1N/Aconvoluted (some would say broken) I/O models, a file isn't a file--at
1N/Aleast, not with respect to the C standard I/O library.  On these old
1N/Asystems whose libraries (but not kernels) distinguish between text and
1N/Abinary streams, to get files to behave properly you'll have to bend over
1N/Abackwards to avoid nasty problems.  On such infelicitous systems, sockets
1N/Aand pipes are already opened in binary mode, and there is currently no
1N/Away to turn that off.  With files, you have more options.
1N/A
1N/AAnother option is to use the C<binmode> function on the appropriate
1N/Ahandles before doing regular I/O on them:
1N/A
1N/A    binmode(STDIN);
1N/A    binmode(STDOUT);
1N/A    while (<STDIN>) { print }
1N/A
1N/APassing C<sysopen> a non-standard flag option will also open the file in
1N/Abinary mode on those systems that support it.  This is the equivalent of
1N/Aopening the file normally, then calling C<binmode> on the handle.
1N/A
1N/A    sysopen(BINDAT, "records.data", O_RDWR | O_BINARY)
1N/A        || die "can't open records.data: $!";
1N/A
1N/ANow you can use C<read> and C<print> on that handle without worrying
1N/Aabout the non-standard system I/O library breaking your data.  It's not
1N/Aa pretty picture, but then, legacy systems seldom are.  CP/M will be
1N/Awith us until the end of days, and after.
1N/A
1N/AOn systems with exotic I/O systems, it turns out that, astonishingly
1N/Aenough, even unbuffered I/O using C<sysread> and C<syswrite> might do
1N/Asneaky data mutilation behind your back.
1N/A
1N/A    while (sysread(WHENCE, $buf, 1024)) {
1N/A        syswrite(WHITHER, $buf, length($buf));
1N/A    }
1N/A
1N/ADepending on the vicissitudes of your runtime system, even these calls
1N/Amay need C<binmode> or C<O_BINARY> first.  Systems known to be free of
1N/Asuch difficulties include Unix, the Mac OS, Plan 9, and Inferno.
1N/A
1N/A=head2 File Locking
1N/A
1N/AIn a multitasking environment, you may need to be careful not to collide
1N/Awith other processes who want to do I/O on the same files as you
1N/Aare working on.  You'll often need shared or exclusive locks
1N/Aon files for reading and writing respectively.  You might just
1N/Apretend that only exclusive locks exist.
1N/A
1N/ANever use the existence of a file C<-e $file> as a locking indication,
1N/Abecause there is a race condition between the test for the existence of
1N/Athe file and its creation.  It's possible for another process to create
1N/Aa file in the slice of time between your existence check and your attempt
1N/Ato create the file.  Atomicity is critical.
1N/A
1N/APerl's most portable locking interface is via the C<flock> function,
1N/Awhose simplicity is emulated on systems that don't directly support it
1N/Asuch as SysV or Windows.  The underlying semantics may affect how
1N/Ait all works, so you should learn how C<flock> is implemented on your
1N/Asystem's port of Perl.
1N/A
1N/AFile locking I<does not> lock out another process that would like to
1N/Ado I/O.  A file lock only locks out others trying to get a lock, not
1N/Aprocesses trying to do I/O.  Because locks are advisory, if one process
1N/Auses locking and another doesn't, all bets are off.
1N/A
1N/ABy default, the C<flock> call will block until a lock is granted.
1N/AA request for a shared lock will be granted as soon as there is no
1N/Aexclusive locker.  A request for an exclusive lock will be granted as
1N/Asoon as there is no locker of any kind.  Locks are on file descriptors,
1N/Anot file names.  You can't lock a file until you open it, and you can't
1N/Ahold on to a lock once the file has been closed.
1N/A
1N/AHere's how to get a blocking shared lock on a file, typically used
1N/Afor reading:
1N/A
1N/A    use 5.004;
1N/A    use Fcntl qw(:DEFAULT :flock);
1N/A    open(FH, "< filename")  or die "can't open filename: $!";
1N/A    flock(FH, LOCK_SH)      or die "can't lock filename: $!";
1N/A    # now read from FH
1N/A
1N/AYou can get a non-blocking lock by using C<LOCK_NB>.
1N/A
1N/A    flock(FH, LOCK_SH | LOCK_NB)
1N/A        or die "can't lock filename: $!";
1N/A
1N/AThis can be useful for producing more user-friendly behaviour by warning
1N/Aif you're going to be blocking:
1N/A
1N/A    use 5.004;
1N/A    use Fcntl qw(:DEFAULT :flock);
1N/A    open(FH, "< filename")  or die "can't open filename: $!";
1N/A    unless (flock(FH, LOCK_SH | LOCK_NB)) {
1N/A    $| = 1;
1N/A    print "Waiting for lock...";
1N/A    flock(FH, LOCK_SH)  or die "can't lock filename: $!";
1N/A    print "got it.\n"
1N/A    }
1N/A    # now read from FH
1N/A
1N/ATo get an exclusive lock, typically used for writing, you have to be
1N/Acareful.  We C<sysopen> the file so it can be locked before it gets
1N/Aemptied.  You can get a nonblocking version using C<LOCK_EX | LOCK_NB>.
1N/A
1N/A    use 5.004;
1N/A    use Fcntl qw(:DEFAULT :flock);
1N/A    sysopen(FH, "filename", O_WRONLY | O_CREAT)
1N/A        or die "can't open filename: $!";
1N/A    flock(FH, LOCK_EX)
1N/A        or die "can't lock filename: $!";
1N/A    truncate(FH, 0)
1N/A        or die "can't truncate filename: $!";
1N/A    # now write to FH
1N/A
1N/AFinally, due to the uncounted millions who cannot be dissuaded from
1N/Awasting cycles on useless vanity devices called hit counters, here's
1N/Ahow to increment a number in a file safely:
1N/A
1N/A    use Fcntl qw(:DEFAULT :flock);
1N/A
1N/A    sysopen(FH, "numfile", O_RDWR | O_CREAT)
1N/A        or die "can't open numfile: $!";
1N/A    # autoflush FH
1N/A    $ofh = select(FH); $| = 1; select ($ofh);
1N/A    flock(FH, LOCK_EX)
1N/A        or die "can't write-lock numfile: $!";
1N/A
1N/A    $num = <FH> || 0;
1N/A    seek(FH, 0, 0)
1N/A        or die "can't rewind numfile : $!";
1N/A    print FH $num+1, "\n"
1N/A        or die "can't write numfile: $!";
1N/A
1N/A    truncate(FH, tell(FH))
1N/A        or die "can't truncate numfile: $!";
1N/A    close(FH)
1N/A        or die "can't close numfile: $!";
1N/A
1N/A=head2 IO Layers
1N/A
1N/AIn Perl 5.8.0 a new I/O framework called "PerlIO" was introduced.
1N/AThis is a new "plumbing" for all the I/O happening in Perl; for the
1N/Amost part everything will work just as it did, but PerlIO also brought
1N/Ain some new features such as the ability to think of I/O as "layers".
1N/AOne I/O layer may in addition to just moving the data also do
1N/Atransformations on the data.  Such transformations may include
1N/Acompression and decompression, encryption and decryption, and transforming
1N/Abetween various character encodings.
1N/A
1N/AFull discussion about the features of PerlIO is out of scope for this
1N/Atutorial, but here is how to recognize the layers being used:
1N/A
1N/A=over 4
1N/A
1N/A=item *
1N/A
1N/AThe three-(or more)-argument form of C<open> is being used and the
1N/Asecond argument contains something else in addition to the usual
1N/AC<< '<' >>, C<< '>' >>, C<< '>>' >>, C<< '|' >> and their variants,
1N/Afor example:
1N/A
1N/A    open(my $fh, "<:utf8", $fn);
1N/A
1N/A=item *
1N/A
1N/AThe two-argument form of C<binmode> is being used, for example
1N/A
1N/A    binmode($fh, ":encoding(utf16)");
1N/A
1N/A=back
1N/A
1N/AFor more detailed discussion about PerlIO see L<PerlIO>;
1N/Afor more detailed discussion about Unicode and I/O see L<perluniintro>.
1N/A
1N/A=head1 SEE ALSO
1N/A
1N/AThe C<open> and C<sysopen> functions in perlfunc(1);
1N/Athe system open(2), dup(2), fopen(3), and fdopen(3) manpages;
1N/Athe POSIX documentation.
1N/A
1N/A=head1 AUTHOR and COPYRIGHT
1N/A
1N/ACopyright 1998 Tom Christiansen.
1N/A
1N/AThis documentation is free; you can redistribute it and/or modify it
1N/Aunder the same terms as Perl itself.
1N/A
1N/AIrrespective of its distribution, all code examples in these files are
1N/Ahereby placed into the public domain.  You are permitted and
1N/Aencouraged to use this code in your own programs for fun or for profit
1N/Aas you see fit.  A simple comment in the code giving credit would be
1N/Acourteous but is not required.
1N/A
1N/A=head1 HISTORY
1N/A
1N/AFirst release: Sat Jan  9 08:09:11 MST 1999