sysid.htm revision 7c478bd95313f5f23a4c958a745db2134aa03244
<!-- SCCS keyword
#pragma ident "%Z%%M% %I% %E% SMI"
-->
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML Strict//EN">
<HTML>
<HEAD>
<TITLE>SP - System identifiers</TITLE>
</HEAD>
<BODY>
<H1>System identifiers</H1>
<P>
There are two kinds of system identifier: formal system identifiers
and simple system identifiers. A system identifier that does not
start with <SAMP><</SAMP> will always be interpreted as a simple
system identifier. A simple system identifier will always be
interpreted either as a filename or as a URL.
<H2>Formal system identifiers</H2>
<P>
Formal system identifiers are based on the
Corrigendum 1, Annex D.
A system identifier that is a formal system
identifier consists of a sequence of one or more storage object
specifications. The objects specified by the storage object
specifications are concatenated to form the entity. A storage object
specification consists of an SGML start-tag in the reference concrete
syntax followed by character data content. The generic identifier of
the start-tag is the name of a storage manager. The content is a
storage object identifier which identifies the storage object in a
manner dependent on the storage manager. The start-tag can also
specify attributes giving additional information about the storage
object. Numeric character references are recognized in storage object
identifiers and attribute value literals in the start-tag. Record
ends are ignored in the storage object identifier as with SGML. A
system identifier will be interpreted as a formal system identifier if
it starts with a <SAMP><</SAMP> followed by a storage manager name,
followed by either <SAMP>></SAMP> or white-space; otherwise it will be
interpreted as a simple system identifier. A storage object
identifier extends until the end of the system identifier or until the
first occurrence of <SAMP><</SAMP> followed by a storage manager
name, followed by either <SAMP>></SAMP> or white-space.
<P>
The following storage managers are available:
<DL>
<DT>
<A NAME="osfile"><SAMP>osfile</SAMP></A>
<DD>
The storage object identifier is a filename. If the filename is
relative it is resolved using a base filename. Normally the base
filename is the name of the file in which the storage object
identifier was specified, but this can be changed using the
<SAMP>base</SAMP> attribute. The filename will be searched for first
in the directory of the base filename. If it is not found there, then
it will be searched for in directories specified with the
<SAMP>-D</SAMP> option in the order in which they were specified on
the command line, and then in the list of directories specified by the
environment variable <SAMP>SGML_SEARCH_PATH</SAMP>. The list
is separated by colons under Unix and by semi-colons under MSDOS.
<DT>
<SAMP>osfd</SAMP>
<DD>
The storage object identifier is an integer specifying a file
descriptor. Thus a system identifier of <SAMP><osfd>0</SAMP> will
refer to the standard input.
<DT>
<SAMP>url</SAMP>
<DD>
The storage object identifier is a URL. Only the <SAMP>http</SAMP>
scheme is currently supported and not on all systems.
<DT>
<SAMP>neutral</SAMP>
<DD>
The storage manager is the storage manager of storage object in which
the system identifier was specified (the <I>underlying storage
manager</I>). However if the underlying storage manager does not
support named storage objects (ie it is <SAMP>osfd</SAMP>), then the
storage manager will be <SAMP>osfile</SAMP>. The storage object
identifier is treated as a relative, hierarchical name separated by
slashes (<SAMP>/</SAMP>) and will be transformed as appropriate for
the underlying storage manager.
<DT>
<SAMP>literal</SAMP>
<DD>
The bit combinations of the storage object identifier are
the contents of the storage object.
</DL>
<P>
The following attributes are supported:
<DL>
<DT>
<SAMP>records</SAMP>
<DD>
This describes how records are delimited in the storage object:
<DL>
<DT><SAMP>cr</SAMP>
<DD>
Records are terminated by a carriage return.
<DT>
<SAMP>lf</SAMP>
<DD>
Records are terminated by a line feed.
<DT>
<SAMP>crlf</SAMP>
<DD>
Records are terminated by a carriage return followed by a line feed.
<DT>
<SAMP>find</SAMP>
<DD>
Records are terminated by whichever of
<SAMP>cr</SAMP>,
<SAMP>lf</SAMP>
or
<SAMP>crlf</SAMP>
is first encountered in the storage object.
<DT>
<SAMP>asis</SAMP>
<DD>
No recognition of records is performed.
</DL>
<P>
The default is <SAMP>find</SAMP> except for NDATA entities for which
the default is <SAMP>asis</SAMP>. This attribute is not applicable to
the <SAMP>literal</SAMP> storage manager.
<P>
When records are recognized in a storage object, a record start is
inserted at the beginning of each record, and a record end at the end
of each record. If there is a partial record (a record that doesn't
end with the record terminator) at the end of the entity, then a
record start will be inserted before it but no record end will be
inserted after it.
<P>
The attribute name and <SAMP>=</SAMP> can be omitted for this attribute.
<DT>
<SAMP>zapeof</SAMP>
<DD>
This specifies whether a Control-Z character that occurs as the final byte
in the storage object should be stripped.
The following values are allowed:
<DL>
<DT><SAMP>zapeof</SAMP>
<DD>
A final Control-Z should be stripped.
<DT><SAMP>nozapeof</SAMP>
<DD>
A final Control-Z should not be stripped.
</DL>
<P>
The default is <SAMP>zapeof</SAMP> except for NDATA entities, entities
declared in storage objects with <SAMP>zapeof=nozapeof</SAMP> and
storage objects with <SAMP>records=asis</SAMP>. This attribute is not
applicable to the <SAMP>literal</SAMP> storage manager.
<P>
The attribute name and <SAMP>=</SAMP> can be omitted for this
attribute.
<DT>
<A NAME="bctf"><SAMP>bctf</SAMP></A>
<DD>
The bctf (bit combination transformation format) attribute describes
how the bit combinations of the storage object are transformed into
the sequence of bytes that are contained in the object identified by
the storage object identifier. This inverse of this transformation is
performed when the entity manager reads the storage object. It has
one of the following values:
<DL>
<DT>
<SAMP>identity</SAMP>
<DD>
Each bit combination is represented by a single byte.
<DT>
<SAMP>fixed-2</SAMP>
<DD>
Each bit combination is represented by exactly 2
bytes, with the more significant byte first.
<DT>
<SAMP>utf-8</SAMP>
<DD>
Each bit combination is represented by a variable number of bytes
according to UCS Transformation Format 8 defined in Annex P to be
10646-1:1993.
<DT>
<SAMP>euc-jp</SAMP>
<DD>
Each bit combination is treated as a pair of bytes, most significant
byte first, encoding a character using the
Extended_UNIX_Code_Fixed_Width_for_Japanese Internet charset, and is
transformed into the variable length sequence of octets that would
encode that character using the
Extended_UNIX_Code_Packed_Format_for_Japanese Internet charset.
<DT>
<SAMP>sjis</SAMP>
<DD>
Each bit combination is treated as a pair of bytes, most significant
byte first, encoding a character using the
Extended_UNIX_Code_Fixed_Width_for_Japanese Internet charset, and is
transformed into the variable length sequence of bytes that would
encode that character using the Shift_JIS Internet charset.
<DT>
<SAMP>unicode</SAMP>
<DD>
Each bit combination is represented by 2 bytes. The bytes
representing the entire storage object may be preceded by a pair of
bytes representing the byte order mark character (0xFEFF). The bytes
representing each bit combination are in the system byte order, unless
the byte order mark character is present, in which case the order of
its bytes determines the byte order. When the storage object is read,
any byte order mark character is discarded.
<DT>
<SAMP>is8859-<VAR>n</VAR></SAMP>
<DD>
<SAMP><VAR>n</VAR></SAMP> can be any single digit other than 0. Each
10646 and is represented by the single byte that would encode that
character in ISO 8859-<VAR>n</VAR>. These values are not supported
with the <SAMP>-b</SAMP> option.
</DL>
<P>
Values other than <SAMP>identity</SAMP> are supported only with the
multi-byte version of nsgmls. This attribute is not applicable to the
<SAMP>literal</SAMP> storage manager.
<DT>
<SAMP>tracking</SAMP>
<DD>
This specifies whether line boundaries should be tracked for this
object: a value of <SAMP>track</SAMP> specifies that they should; a
value of <SAMP>notrack</SAMP> specifies that they should not. The
default value is <SAMP>track</SAMP>. Keeping track of where line
boundaries occur in a storage object requires approximately one byte
of storage per line and it may be desirable to disable this for very
large storage objects.
<P>
The attribute name and
<SAMP>=</SAMP>
can be omitted for this attribute.
<DT>
<SAMP>base</SAMP>
<DD>
When the storage object identifier specified in the content of the
storage object specification is relative, this specifies the base
storage object identifier relative to which that storage object
identifier should be resolved.
When not specified a storage object identifier is interpreted
relative to the storage object in which it is specified,
provided that this has the same storage manager.
This applies both to system identifiers specified in SGML
documents and to system identifiers specified in the catalog entry
files.
<DT>
<SAMP>smcrd</SAMP>
<DD>
The value is a single character that will be recognized in storage
object identifiers (both in the content of storage object
specifications and in the value of <SAMP>base</SAMP> attributes) as a
storage manager character reference delimiter when followed by a
digit. A storage manager character reference is like an SGML numeric
character reference except that the number is interpreted as a
character number in the inherent character set of the storage manager
rather than the document character set. The default is for no
character to be recognized as a storage manager character reference
delimiter. Numeric character references cannot be used to prevent
recognition of storage manager character reference delimiters.
<DT>
<SAMP>fold</SAMP>
<DD>
This applies only to the <SAMP>neutral</SAMP> storage manager. It
specifies whether the storage object identifier should be folded to
the customary case of the underlying storage manager if storage object
identifiers for the underlying storage manager are case sensitive.
The following values are allowed:
<DL>
<DT><SAMP>fold</SAMP>
<DD>
The storage object identifier will be folded.
<DT>
<SAMP>nofold</SAMP>
<DD>
The storage object identifier will not be folded.
</DL>
<P>
The default value is <SAMP>fold</SAMP>. The attribute name and
<SAMP>=</SAMP> can be omitted for this attribute.
<P>
For example, on Unix filenames are case-sensitive and the customary
case is lower-case. So if the underlying storage manager were
<SAMP>osfile</SAMP> and the system was a Unix system, then
</DL>
<H2>Simple system identfiers</H2>
<P>
A simple system identifier is interpreted as a storage object
identifier with a storage manager that depends on where the system
identifier was specified: if it was specified in a storage object
whose storage manager was <SAMP>url</SAMP> or if the system identifier
looks like an absolute URL in a supported scheme, the storage manager
will be <SAMP>url</SAMP>; otherwise the storage manager will be
<SAMP>osfile</SAMP>. The storage manager attributes are defaulted as
for a formal system identifier. Numeric character references are not
recognized in simple system identifiers.
<P>
<ADDRESS>
James Clark<BR>
jjc@jclark.com
</ADDRESS>
</BODY>
</HTML>