84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore# @(#)POSIX 8.1 (Berkeley) 6/6/93
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore# $FreeBSD$
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore
84441f85b19f6b8080883f30109e58e43c893709Garrett D'AmoreComments on the IEEE P1003.2 Draft 12
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore Part 2: Shell and Utilities
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore Section 4.55: sed - Stream editor
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore
84441f85b19f6b8080883f30109e58e43c893709Garrett D'AmoreDiomidis Spinellis <dds@doc.ic.ac.uk>
84441f85b19f6b8080883f30109e58e43c893709Garrett D'AmoreKeith Bostic <bostic@cs.berkeley.edu>
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore
84441f85b19f6b8080883f30109e58e43c893709Garrett D'AmoreIn the following paragraphs, "wrong" usually means "inconsistent with
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amorehistoric practice", as most of the following comments refer to
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amoreundocumented inconsistencies between the historical versions of sed and
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amorethe POSIX 1003.2 standard. All the comments are notes taken while
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amoreimplementing a POSIX-compatible version of sed, and should not be
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amoreinterpreted as official opinions or criticism towards the POSIX committee.
84441f85b19f6b8080883f30109e58e43c893709Garrett D'AmoreAll uses of "POSIX" refer to section 4.55, Draft 12 of POSIX 1003.2.
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore 1. 32V and BSD derived implementations of sed strip the text
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore arguments of the a, c and i commands of their initial blanks,
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore i.e.
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore #!/bin/sed -f
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore a\
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore foo\
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore \ indent\
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore bar
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore produces:
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore foo
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore indent
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore bar
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore POSIX does not specify this behavior as the System V versions of
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore sed do not do this stripping. The argument against stripping is
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore that it is difficult to write sed scripts that have leading blanks
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore if they are stripped. The argument for stripping is that it is
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore difficult to write readable sed scripts unless indentation is allowed
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore and ignored, and leading whitespace is obtainable by entering a
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore backslash in front of it. This implementation follows the BSD
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore historic practice.
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore 2. Historical versions of sed required that the w flag be the last
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore flag to an s command as it takes an additional argument. This
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore is obvious, but not specified in POSIX.
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore 3. Historical versions of sed required that whitespace follow a w
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore flag to an s command. This is not specified in POSIX. This
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore implementation permits whitespace but does not require it.
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore 4. Historical versions of sed permitted any number of whitespace
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore characters to follow the w command. This is not specified in
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore POSIX. This implementation permits whitespace but does not
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore require it.
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore 5. The rule for the l command differs from historic practice. Table
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore 2-15 includes the various ANSI C escape sequences, including \\
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore for backslash. Some historical versions of sed displayed two
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore digit octal numbers, too, not three as specified by POSIX. POSIX
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore is a cleanup, and is followed by this implementation.
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore 6. The POSIX specification for ! does not specify that for a single
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore command the command must not contain an address specification
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore whereas the command list can contain address specifications. The
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore specification for ! implies that "3!/hello/p" works, and it never
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore has, historically. Note,
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore 3!{
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore /hello/p
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore }
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore does work.
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore 7. POSIX does not specify what happens with consecutive ! commands
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore (e.g. /foo/!!!p). Historic implementations allow any number of
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore !'s without changing the behaviour. (It seems logical that each
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore one might reverse the behaviour.) This implementation follows
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore historic practice.
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore 8. Historic versions of sed permitted commands to be separated
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore by semi-colons, e.g. 'sed -ne '1p;2p;3q' printed the first
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore three lines of a file. This is not specified by POSIX.
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore Note, the ; command separator is not allowed for the commands
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore a, c, i, w, r, :, b, t, # and at the end of a w flag in the s
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore command. This implementation follows historic practice and
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore implements the ; separator.
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore 9. Historic versions of sed terminated the script if EOF was reached
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore during the execution of the 'n' command, i.e.:
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore sed -e '
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore n
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore i\
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore hello
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore ' </dev/null
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore did not produce any output. POSIX does not specify this behavior.
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore This implementation follows historic practice.
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore10. Deleted.
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore11. Historical implementations do not output the change text of a c
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore command in the case of an address range whose first line number
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore is greater than the second (e.g. 3,1). POSIX requires that the
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore text be output. Since the historic behavior doesn't seem to have
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore any particular purpose, this implementation follows the POSIX
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore behavior.
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore12. POSIX does not specify whether address ranges are checked and
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore reset if a command is not executed due to a jump. The following
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore program will behave in different ways depending on whether the
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore 'c' command is triggered at the third line, i.e. will the text
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore be output even though line 3 of the input will never logically
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore encounter that command.
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore 2,4b
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore 1,3c\
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore text
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore Historic implementations did not output the text in the above
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore example. Therefore it was believed that a range whose second
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore address was never matched extended to the end of the input.
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore However, the current practice adopted by this implementation,
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore as well as by those from GNU and SUN, is as follows: The text
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore from the 'c' command still isn't output because the second address
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore isn't actually matched; but the range is reset after all if its
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore second address is a line number. In the above example, only the
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore first line of the input will be deleted.
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore13. Historical implementations allow an output suppressing #n at the
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore beginning of -e arguments as well as in a script file. POSIX
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore does not specify this. This implementation follows historical
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore practice.
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore14. POSIX does not explicitly specify how sed behaves if no script is
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore specified. Since the sed Synopsis permits this form of the command,
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore and the language in the Description section states that the input
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore is output, it seems reasonable that it behave like the cat(1)
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore command. Historic sed implementations behave differently for "ls |
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore sed", where they produce no output, and "ls | sed -e#", where they
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore behave like cat. This implementation behaves like cat in both cases.
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore15. The POSIX requirement to open all w files at the beginning makes
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore sed behave nonintuitively when the w commands are preceded by
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore addresses or are within conditional blocks. This implementation
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore follows historic practice and POSIX, by default, and provides the
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore -a option which opens the files only when they are needed.
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore16. POSIX does not specify how escape sequences other than \n and \D
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore (where D is the delimiter character) are to be treated. This is
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore reasonable, however, it also doesn't state that the backslash is
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore to be discarded from the output regardless. A strict reading of
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore POSIX would be that "echo xyz | sed s/./\a" would display "\ayz".
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore As historic sed implementations always discarded the backslash,
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore this implementation does as well.
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore17. POSIX specifies that an address can be "empty". This implies
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore that constructs like ",d" or "1,d" and ",5d" are allowed. This
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore is not true for historic implementations or this implementation
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore of sed.
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore18. The b t and : commands are documented in POSIX to ignore leading
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore white space, but no mention is made of trailing white space.
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore Historic implementations of sed assigned different locations to
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore the labels "x" and "x ". This is not useful, and leads to subtle
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore programming errors, but it is historic practice and changing it
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore could theoretically break working scripts. This implementation
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore follows historic practice.
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore19. Although POSIX specifies that reading from files that do not exist
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore from within the script must not terminate the script, it does not
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore specify what happens if a write command fails. Historic practice
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore is to fail immediately if the file cannot be opened or written.
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore This implementation follows historic practice.
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore20. Historic practice is that the \n construct can be used for either
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore string1 or string2 of the y command. This is not specified by
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore POSIX. This implementation follows historic practice.
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore21. Deleted.
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore22. Historic implementations of sed ignore the RE delimiter characters
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore within character classes. This is not specified in POSIX. This
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore implementation follows historic practice.
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore23. Historic implementations handle empty RE's in a special way: the
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore empty RE is interpreted as if it were the last RE encountered,
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore whether in an address or elsewhere. POSIX does not document this
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore behavior. For example the command:
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore sed -e /abc/s//XXX/
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore substitutes XXX for the pattern abc. The semantics of "the last
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore RE" can be defined in two different ways:
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore 1. The last RE encountered when compiling (lexical/static scope).
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore 2. The last RE encountered while running (dynamic scope).
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore While many historical implementations fail on programs depending
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore on scope differences, the SunOS version exhibited dynamic scope
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore behaviour. This implementation does dynamic scoping, as this seems
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore the most useful and in order to remain consistent with historical
84441f85b19f6b8080883f30109e58e43c893709Garrett D'Amore practice.