distrib/pod/perlebcdic.pod

1N/A=head1 NAME
1N/A
1N/Aperlebcdic - Considerations for running Perl on EBCDIC platforms
1N/A
1N/A=head1 DESCRIPTION
1N/A
1N/AAn exploration of some of the issues facing Perl programmers
1N/Aon EBCDIC based computers.  We do not cover localization,
1N/Ainternationalization, or multi byte character set issues other
1N/Athan some discussion of UTF-8 and UTF-EBCDIC.
1N/A
1N/APortions that are still incomplete are marked with XXX.
1N/A
1N/A=head1 COMMON CHARACTER CODE SETS
1N/A
1N/A=head2 ASCII
1N/A
1N/AThe American Standard Code for Information Interchange is a set of
1N/Aintegers running from 0 to 127 (decimal) that imply character
1N/Ainterpretation by the display and other system(s) of computers.
1N/AThe range 0..127 can be covered by setting the bits in a 7-bit binary
1N/Adigit, hence the set is sometimes referred to as a "7-bit ASCII".
1N/AASCII was described by the American National Standards Institute
1N/Adocument ANSI X3.4-1986.  It was also described by ISO 646:1991
1N/A(with localization for currency symbols).  The full ASCII set is
1N/Agiven in the table below as the first 128 elements.  Languages that
1N/Acan be written adequately with the characters in ASCII include
1N/AEnglish, Hawaiian, Indonesian, Swahili and some Native American
1N/Alanguages.
1N/A
1N/AThere are many character sets that extend the range of integers
1N/Afrom 0..2**7-1 up to 2**8-1, or 8 bit bytes (octets if you prefer).
1N/AOne common one is the ISO 8859-1 character set.
1N/A
1N/A=head2 ISO 8859
1N/A
1N/AThe ISO 8859-$n are a collection of character code sets from the
1N/AInternational Organization for Standardization (ISO) each of which
1N/Aadds characters to the ASCII set that are typically found in European
1N/Alanguages many of which are based on the Roman, or Latin, alphabet.
1N/A
1N/A=head2 Latin 1 (ISO 8859-1)
1N/A
1N/AA particular 8-bit extension to ASCII that includes grave and acute
1N/Aaccented Latin characters.  Languages that can employ ISO 8859-1
1N/Ainclude all the languages covered by ASCII as well as Afrikaans,
1N/AAlbanian, Basque, Catalan, Danish, Faroese, Finnish, Norwegian,
1N/APortuguese, Spanish, and Swedish.  Dutch is covered albeit without
1N/Athe ij ligature.  French is covered too but without the oe ligature.
1N/AGerman can use ISO 8859-1 but must do so without German-style
1N/Aquotation marks.  This set is based on Western European extensions
1N/Ato ASCII and is commonly encountered in world wide web work.
1N/AIn IBM character code set identification terminology ISO 8859-1 is
1N/Aalso known as CCSID 819 (or sometimes 0819 or even 00819).
1N/A
1N/A=head2 EBCDIC
1N/A
1N/AThe Extended Binary Coded Decimal Interchange Code refers to a
1N/Alarge collection of slightly different single and multi byte
1N/Acoded character sets that are different from ASCII or ISO 8859-1
1N/Aand typically run on host computers.  The EBCDIC encodings derive
1N/Afrom 8 bit byte extensions of Hollerith punched card encodings.
1N/AThe layout on the cards was such that high bits were set for the
1N/Aupper and lower case alphabet characters [a-z] and [A-Z], but there
1N/Awere gaps within each latin alphabet range.
1N/A
1N/ASome IBM EBCDIC character sets may be known by character code set
1N/Aidentification numbers (CCSID numbers) or code page numbers.  Leading
1N/Azero digits in CCSID numbers within this document are insignificant.
1N/AE.g. CCSID 0037 may be referred to as 37 in places.
1N/A
1N/A=head2 13 variant characters
1N/A
1N/AAmong IBM EBCDIC character code sets there are 13 characters that
1N/Aare often mapped to different integer values.  Those characters
1N/Aare known as the 13 "variant" characters and are:
1N/A
1N/A    \ [ ] { } ^ ~ ! # | $ @ `
1N/A
1N/A=head2 0037
1N/A
1N/ACharacter code set ID 0037 is a mapping of the ASCII plus Latin-1
1N/Acharacters (i.e. ISO 8859-1) to an EBCDIC set.  0037 is used
1N/Ain North American English locales on the OS/400 operating system
1N/Athat runs on AS/400 computers.  CCSID 37 differs from ISO 8859-1
1N/Ain 237 places, in other words they agree on only 19 code point values.
1N/A
1N/A=head2 1047
1N/A
1N/ACharacter code set ID 1047 is also a mapping of the ASCII plus
1N/ALatin-1 characters (i.e. ISO 8859-1) to an EBCDIC set.  1047 is
1N/Aused under Unix System Services for OS/390 or z/OS, and OpenEdition
1N/Afor VM/ESA.  CCSID 1047 differs from CCSID 0037 in eight places.
1N/A
1N/A=head2 POSIX-BC
1N/A
1N/AThe EBCDIC code page in use on Siemens' BS2000 system is distinct from
1N/A1047 and 0037.  It is identified below as the POSIX-BC set.
1N/A
1N/A=head2 Unicode code points versus EBCDIC code points
1N/A
1N/AIn Unicode terminology a I<code point> is the number assigned to a
1N/Acharacter: for example, in EBCDIC the character "A" is usually assigned
1N/Athe number 193.  In Unicode the character "A" is assigned the number 65.
1N/AThis causes a problem with the semantics of the pack/unpack "U", which
1N/Aare supposed to pack Unicode code points to characters and back to numbers.
1N/AThe problem is: which code points to use for code points less than 256?
1N/A(for 256 and over there's no problem: Unicode code points are used)
1N/AIn EBCDIC, for the low 256 the EBCDIC code points are used.  This
1N/Ameans that the equivalences
1N/A
1N/A    pack("U", ord($character)) eq $character
1N/A    unpack("U", $character) == ord $character
1N/A
1N/Awill hold.  (If Unicode code points were applied consistently over
1N/Aall the possible code points, pack("U",ord("A")) would in EBCDIC
1N/Aequal I<A with acute> or chr(101), and unpack("U", "A") would equal
1N/A65, or I<non-breaking space>, not 193, or ord "A".)
1N/A
1N/A=head2 Remaining Perl Unicode problems in EBCDIC
1N/A
1N/A=over 4
1N/A
1N/A=item *
1N/A
1N/AMany of the remaining seem to be related to case-insensitive matching:
1N/Afor example, C<< /[\x{131}]/ >> (LATIN SMALL LETTER DOTLESS I) does
1N/Anot match "I" case-insensitively, as it should under Unicode.
1N/A(The match succeeds in ASCII-derived platforms.)
1N/A
1N/A=item *
1N/A
1N/AThe extensions Unicode::Collate and Unicode::Normalized are not
1N/Asupported under EBCDIC, likewise for the encoding pragma.
1N/A
1N/A=back
1N/A
1N/A=head2 Unicode and UTF
1N/A
1N/AUTF is a Unicode Transformation Format.  UTF-8 is a Unicode conforming
1N/Arepresentation of the Unicode standard that looks very much like ASCII.
1N/AUTF-EBCDIC is an attempt to represent Unicode characters in an EBCDIC
1N/Atransparent manner.
1N/A
1N/A=head2 Using Encode
1N/A
1N/AStarting from Perl 5.8 you can use the standard new module Encode
1N/Ato translate from EBCDIC to Latin-1 code points
1N/A
1N/A    use Encode 'from_to';
1N/A
1N/A    my %ebcdic = ( 176 => 'cp37', 95 => 'cp1047', 106 => 'posix-bc' );
1N/A
1N/A    # $a is in EBCDIC code points
1N/A    from_to($a, $ebcdic{ord '^'}, 'latin1');
1N/A    # $a is ISO 8859-1 code points
1N/A
1N/Aand from Latin-1 code points to EBCDIC code points
1N/A
1N/A    use Encode 'from_to';
1N/A
1N/A    my %ebcdic = ( 176 => 'cp37', 95 => 'cp1047', 106 => 'posix-bc' );
1N/A
1N/A    # $a is ISO 8859-1 code points
1N/A    from_to($a, 'latin1', $ebcdic{ord '^'});
1N/A    # $a is in EBCDIC code points
1N/A
1N/AFor doing I/O it is suggested that you use the autotranslating features
1N/Aof PerlIO, see L<perluniintro>.
1N/A
1N/ASince version 5.8 Perl uses the new PerlIO I/O library.  This enables
1N/Ayou to use different encodings per IO channel.  For example you may use
1N/A
1N/A    use Encode;
1N/A    open($f, ">:encoding(ascii)", "test.ascii");
1N/A    print $f "Hello World!\n";
1N/A    open($f, ">:encoding(cp37)", "test.ebcdic");
1N/A    print $f "Hello World!\n";
1N/A    open($f, ">:encoding(latin1)", "test.latin1");
1N/A    print $f "Hello World!\n";
1N/A    open($f, ">:encoding(utf8)", "test.utf8");
1N/A    print $f "Hello World!\n";
1N/A
1N/Ato get two files containing "Hello World!\n" in ASCII, CP 37 EBCDIC,
1N/AISO 8859-1 (Latin-1) (in this example identical to ASCII) respective
1N/AUTF-EBCDIC (in this example identical to normal EBCDIC).  See the
1N/Adocumentation of Encode::PerlIO for details.
1N/A
1N/AAs the PerlIO layer uses raw IO (bytes) internally, all this totally
1N/Aignores things like the type of your filesystem (ASCII or EBCDIC).
1N/A
1N/A=head1 SINGLE OCTET TABLES
1N/A
1N/AThe following tables list the ASCII and Latin 1 ordered sets including
1N/Athe subsets: C0 controls (0..31), ASCII graphics (32..7e), delete (7f),
1N/AC1 controls (80..9f), and Latin-1 (a.k.a. ISO 8859-1) (a0..ff).  In the
1N/Atable non-printing control character names as well as the Latin 1
1N/Aextensions to ASCII have been labelled with character names roughly
1N/Acorresponding to I<The Unicode Standard, Version 3.0> albeit with
1N/Asubstitutions such as s/LATIN// and s/VULGAR// in all cases,
1N/As/CAPITAL LETTER// in some cases, and s/SMALL LETTER ([A-Z])/\l$1/
1N/Ain some other cases (the C<charnames> pragma names unfortunately do
1N/Anot list explicit names for the C0 or C1 control characters).  The
1N/A"names" of the C1 control set (128..159 in ISO 8859-1) listed here are
1N/Asomewhat arbitrary.  The differences between the 0037 and 1047 sets are
1N/Aflagged with ***.  The differences between the 1047 and POSIX-BC sets
1N/Aare flagged with ###.  All ord() numbers listed are decimal.  If you
1N/Awould rather see this table listing octal values then run the table
1N/A(that is, the pod version of this document since this recipe may not
1N/Awork with a pod2_other_format translation) through:
1N/A
1N/A=over 4
1N/A
1N/A=item recipe 0
1N/A
1N/A=back
1N/A
1N/A    perl -ne 'if(/(.{33})(\d+)\s+(\d+)\s+(\d+)\s+(\d+)/)' \
1N/A     -e '{printf("%s%-9o%-9o%-9o%o\n",$1,$2,$3,$4,$5)}' perlebcdic.pod
1N/A
1N/AIf you want to retain the UTF-x code points then in script form you
1N/Amight want to write:
1N/A
1N/A=over 4
1N/A
1N/A=item recipe 1
1N/A
1N/A=back
1N/A
1N/A    open(FH,"<perlebcdic.pod") or die "Could not open perlebcdic.pod: $!";
1N/A    while (<FH>) {
1N/A        if (/(.{33})(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\.?(\d*)\s+(\d+)\.?(\d*)/)  {
1N/A            if ($7 ne '' && $9 ne '') {
1N/A                printf("%s%-9o%-9o%-9o%-9o%-3o.%-5o%-3o.%o\n",$1,$2,$3,$4,$5,$6,$7,$8,$9);
1N/A            }
1N/A            elsif ($7 ne '') {
1N/A                printf("%s%-9o%-9o%-9o%-9o%-3o.%-5o%o\n",$1,$2,$3,$4,$5,$6,$7,$8);
1N/A            }
1N/A            else {
1N/A                printf("%s%-9o%-9o%-9o%-9o%-9o%o\n",$1,$2,$3,$4,$5,$6,$8);
1N/A            }
1N/A        }
1N/A    }
1N/A
1N/AIf you would rather see this table listing hexadecimal values then
1N/Arun the table through:
1N/A
1N/A=over 4
1N/A
1N/A=item recipe 2
1N/A
1N/A=back
1N/A
1N/A    perl -ne 'if(/(.{33})(\d+)\s+(\d+)\s+(\d+)\s+(\d+)/)' \
1N/A     -e '{printf("%s%-9X%-9X%-9X%X\n",$1,$2,$3,$4,$5)}' perlebcdic.pod
1N/A
1N/AOr, in order to retain the UTF-x code points in hexadecimal:
1N/A
1N/A=over 4
1N/A
1N/A=item recipe 3
1N/A
1N/A=back
1N/A
1N/A    open(FH,"<perlebcdic.pod") or die "Could not open perlebcdic.pod: $!";
1N/A    while (<FH>) {
1N/A        if (/(.{33})(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\.?(\d*)\s+(\d+)\.?(\d*)/)  {
1N/A            if ($7 ne '' && $9 ne '') {
1N/A                printf("%s%-9X%-9X%-9X%-9X%-2X.%-6X%-2X.%X\n",$1,$2,$3,$4,$5,$6,$7,$8,$9);
1N/A            }
1N/A            elsif ($7 ne '') {
1N/A                printf("%s%-9X%-9X%-9X%-9X%-2X.%-6X%X\n",$1,$2,$3,$4,$5,$6,$7,$8);
1N/A            }
1N/A            else {
1N/A                printf("%s%-9X%-9X%-9X%-9X%-9X%X\n",$1,$2,$3,$4,$5,$6,$8);
1N/A            }
1N/A        }
1N/A    }
1N/A
1N/A
1N/A                                                                     incomp-  incomp-
1N/A                                 8859-1                              lete     lete
1N/A    chr                          0819     0037     1047     POSIX-BC UTF-8    UTF-EBCDIC
1N/A    ------------------------------------------------------------------------------------
1N/A    <NULL>                       0        0        0        0        0        0
1N/A    <START OF HEADING>           1        1        1        1        1        1
1N/A    <START OF TEXT>              2        2        2        2        2        2
1N/A    <END OF TEXT>                3        3        3        3        3        3
1N/A    <END OF TRANSMISSION>        4        55       55       55       4        55
1N/A    <ENQUIRY>                    5        45       45       45       5        45
1N/A    <ACKNOWLEDGE>                6        46       46       46       6        46
1N/A    <BELL>                       7        47       47       47       7        47
1N/A    <BACKSPACE>                  8        22       22       22       8        22
1N/A    <HORIZONTAL TABULATION>      9        5        5        5        9        5
1N/A    <LINE FEED>                  10       37       21       21       10       21       ***
1N/A    <VERTICAL TABULATION>        11       11       11       11       11       11
1N/A    <FORM FEED>                  12       12       12       12       12       12
1N/A    <CARRIAGE RETURN>            13       13       13       13       13       13
1N/A    <SHIFT OUT>                  14       14       14       14       14       14
1N/A    <SHIFT IN>                   15       15       15       15       15       15
1N/A    <DATA LINK ESCAPE>           16       16       16       16       16       16
1N/A    <DEVICE CONTROL ONE>         17       17       17       17       17       17
1N/A    <DEVICE CONTROL TWO>         18       18       18       18       18       18
1N/A    <DEVICE CONTROL THREE>       19       19       19       19       19       19
1N/A    <DEVICE CONTROL FOUR>        20       60       60       60       20       60
1N/A    <NEGATIVE ACKNOWLEDGE>       21       61       61       61       21       61
1N/A    <SYNCHRONOUS IDLE>           22       50       50       50       22       50
1N/A    <END OF TRANSMISSION BLOCK>  23       38       38       38       23       38
1N/A    <CANCEL>                     24       24       24       24       24       24
1N/A    <END OF MEDIUM>              25       25       25       25       25       25
1N/A    <SUBSTITUTE>                 26       63       63       63       26       63
1N/A    <ESCAPE>                     27       39       39       39       27       39
1N/A    <FILE SEPARATOR>             28       28       28       28       28       28
1N/A    <GROUP SEPARATOR>            29       29       29       29       29       29
1N/A    <RECORD SEPARATOR>           30       30       30       30       30       30
1N/A    <UNIT SEPARATOR>             31       31       31       31       31       31
1N/A    <SPACE>                      32       64       64       64       32       64
1N/A    !                            33       90       90       90       33       90
1N/A    "                            34       127      127      127      34       127
1N/A    #                            35       123      123      123      35       123
1N/A    $                            36       91       91       91       36       91
1N/A    %                            37       108      108      108      37       108
1N/A    &                            38       80       80       80       38       80
1N/A    '                            39       125      125      125      39       125
1N/A    (                            40       77       77       77       40       77
1N/A    )                            41       93       93       93       41       93
1N/A    *                            42       92       92       92       42       92
1N/A    +                            43       78       78       78       43       78
1N/A    ,                            44       107      107      107      44       107
1N/A    -                            45       96       96       96       45       96
1N/A    .                            46       75       75       75       46       75
1N/A    /                            47       97       97       97       47       97
1N/A    0                            48       240      240      240      48       240
1N/A    1                            49       241      241      241      49       241
1N/A    2                            50       242      242      242      50       242
1N/A    3                            51       243      243      243      51       243
1N/A    4                            52       244      244      244      52       244
1N/A    5                            53       245      245      245      53       245
1N/A    6                            54       246      246      246      54       246
1N/A    7                            55       247      247      247      55       247
1N/A    8                            56       248      248      248      56       248
1N/A    9                            57       249      249      249      57       249
1N/A    :                            58       122      122      122      58       122
1N/A    ;                            59       94       94       94       59       94
1N/A    <                            60       76       76       76       60       76
1N/A    =                            61       126      126      126      61       126
1N/A    >                            62       110      110      110      62       110
1N/A    ?                            63       111      111      111      63       111
1N/A    @                            64       124      124      124      64       124
1N/A    A                            65       193      193      193      65       193
1N/A    B                            66       194      194      194      66       194
1N/A    C                            67       195      195      195      67       195
1N/A    D                            68       196      196      196      68       196
1N/A    E                            69       197      197      197      69       197
1N/A    F                            70       198      198      198      70       198
1N/A    G                            71       199      199      199      71       199
1N/A    H                            72       200      200      200      72       200
1N/A    I                            73       201      201      201      73       201
1N/A    J                            74       209      209      209      74       209
1N/A    K                            75       210      210      210      75       210
1N/A    L                            76       211      211      211      76       211
1N/A    M                            77       212      212      212      77       212
1N/A    N                            78       213      213      213      78       213
1N/A    O                            79       214      214      214      79       214
1N/A    P                            80       215      215      215      80       215
1N/A    Q                            81       216      216      216      81       216
1N/A    R                            82       217      217      217      82       217
1N/A    S                            83       226      226      226      83       226
1N/A    T                            84       227      227      227      84       227
1N/A    U                            85       228      228      228      85       228
1N/A    V                            86       229      229      229      86       229
1N/A    W                            87       230      230      230      87       230
1N/A    X                            88       231      231      231      88       231
1N/A    Y                            89       232      232      232      89       232
1N/A    Z                            90       233      233      233      90       233
1N/A    [                            91       186      173      187      91       173      *** ###
1N/A    \                            92       224      224      188      92       224      ###
1N/A    ]                            93       187      189      189      93       189      ***
1N/A    ^                            94       176      95       106      94       95       *** ###
1N/A    _                            95       109      109      109      95       109
1N/A    `                            96       121      121      74       96       121      ###
1N/A    a                            97       129      129      129      97       129
1N/A    b                            98       130      130      130      98       130
1N/A    c                            99       131      131      131      99       131
1N/A    d                            100      132      132      132      100      132
1N/A    e                            101      133      133      133      101      133
1N/A    f                            102      134      134      134      102      134
1N/A    g                            103      135      135      135      103      135
1N/A    h                            104      136      136      136      104      136
1N/A    i                            105      137      137      137      105      137
1N/A    j                            106      145      145      145      106      145
1N/A    k                            107      146      146      146      107      146
1N/A    l                            108      147      147      147      108      147
1N/A    m                            109      148      148      148      109      148
1N/A    n                            110      149      149      149      110      149
1N/A    o                            111      150      150      150      111      150
1N/A    p                            112      151      151      151      112      151
1N/A    q                            113      152      152      152      113      152
1N/A    r                            114      153      153      153      114      153
1N/A    s                            115      162      162      162      115      162
1N/A    t                            116      163      163      163      116      163
1N/A    u                            117      164      164      164      117      164
1N/A    v                            118      165      165      165      118      165
1N/A    w                            119      166      166      166      119      166
1N/A    x                            120      167      167      167      120      167
1N/A    y                            121      168      168      168      121      168
1N/A    z                            122      169      169      169      122      169
1N/A    {                            123      192      192      251      123      192      ###
1N/A    |                            124      79       79       79       124      79
1N/A    }                            125      208      208      253      125      208      ###
1N/A    ~                            126      161      161      255      126      161      ###
1N/A    <DELETE>                     127      7        7        7        127      7
1N/A    <C1 0>                       128      32       32       32       194.128  32
1N/A    <C1 1>                       129      33       33       33       194.129  33
1N/A    <C1 2>                       130      34       34       34       194.130  34
1N/A    <C1 3>                       131      35       35       35       194.131  35
1N/A    <C1 4>                       132      36       36       36       194.132  36
1N/A    <C1 5>                       133      21       37       37       194.133  37       ***
1N/A    <C1 6>                       134      6        6        6        194.134  6
1N/A    <C1 7>                       135      23       23       23       194.135  23
1N/A    <C1 8>                       136      40       40       40       194.136  40
1N/A    <C1 9>                       137      41       41       41       194.137  41
1N/A    <C1 10>                      138      42       42       42       194.138  42
1N/A    <C1 11>                      139      43       43       43       194.139  43
1N/A    <C1 12>                      140      44       44       44       194.140  44
1N/A    <C1 13>                      141      9        9        9        194.141  9
1N/A    <C1 14>                      142      10       10       10       194.142  10
1N/A    <C1 15>                      143      27       27       27       194.143  27
1N/A    <C1 16>                      144      48       48       48       194.144  48
1N/A    <C1 17>                      145      49       49       49       194.145  49
1N/A    <C1 18>                      146      26       26       26       194.146  26
1N/A    <C1 19>                      147      51       51       51       194.147  51
1N/A    <C1 20>                      148      52       52       52       194.148  52
1N/A    <C1 21>                      149      53       53       53       194.149  53
1N/A    <C1 22>                      150      54       54       54       194.150  54
1N/A    <C1 23>                      151      8        8        8        194.151  8
1N/A    <C1 24>                      152      56       56       56       194.152  56
1N/A    <C1 25>                      153      57       57       57       194.153  57
1N/A    <C1 26>                      154      58       58       58       194.154  58
1N/A    <C1 27>                      155      59       59       59       194.155  59
1N/A    <C1 28>                      156      4        4        4        194.156  4
1N/A    <C1 29>                      157      20       20       20       194.157  20
1N/A    <C1 30>                      158      62       62       62       194.158  62
1N/A    <C1 31>                      159      255      255      95       194.159  255      ###
1N/A    <NON-BREAKING SPACE>         160      65       65       65       194.160  128.65
1N/A    <INVERTED EXCLAMATION MARK>  161      170      170      170      194.161  128.66
1N/A    <CENT SIGN>                  162      74       74       176      194.162  128.67   ###
1N/A    <POUND SIGN>                 163      177      177      177      194.163  128.68
1N/A    <CURRENCY SIGN>              164      159      159      159      194.164  128.69
1N/A    <YEN SIGN>                   165      178      178      178      194.165  128.70
1N/A    <BROKEN BAR>                 166      106      106      208      194.166  128.71   ###
1N/A    <SECTION SIGN>               167      181      181      181      194.167  128.72
1N/A    <DIAERESIS>                  168      189      187      121      194.168  128.73   *** ###
1N/A    <COPYRIGHT SIGN>             169      180      180      180      194.169  128.74
1N/A    <FEMININE ORDINAL INDICATOR> 170      154      154      154      194.170  128.81
1N/A    <LEFT POINTING GUILLEMET>    171      138      138      138      194.171  128.82
1N/A    <NOT SIGN>                   172      95       176      186      194.172  128.83   *** ###
1N/A    <SOFT HYPHEN>                173      202      202      202      194.173  128.84
1N/A    <REGISTERED TRADE MARK SIGN> 174      175      175      175      194.174  128.85
1N/A    <MACRON>                     175      188      188      161      194.175  128.86   ###
1N/A    <DEGREE SIGN>                176      144      144      144      194.176  128.87
1N/A    <PLUS-OR-MINUS SIGN>         177      143      143      143      194.177  128.88
1N/A    <SUPERSCRIPT TWO>            178      234      234      234      194.178  128.89
1N/A    <SUPERSCRIPT THREE>          179      250      250      250      194.179  128.98
1N/A    <ACUTE ACCENT>               180      190      190      190      194.180  128.99
1N/A    <MICRO SIGN>                 181      160      160      160      194.181  128.100
1N/A    <PARAGRAPH SIGN>             182      182      182      182      194.182  128.101
1N/A    <MIDDLE DOT>                 183      179      179      179      194.183  128.102
1N/A    <CEDILLA>                    184      157      157      157      194.184  128.103
1N/A    <SUPERSCRIPT ONE>            185      218      218      218      194.185  128.104
1N/A    <MASC. ORDINAL INDICATOR>    186      155      155      155      194.186  128.105
1N/A    <RIGHT POINTING GUILLEMET>   187      139      139      139      194.187  128.106
1N/A    <FRACTION ONE QUARTER>       188      183      183      183      194.188  128.112
1N/A    <FRACTION ONE HALF>          189      184      184      184      194.189  128.113
1N/A    <FRACTION THREE QUARTERS>    190      185      185      185      194.190  128.114
1N/A    <INVERTED QUESTION MARK>     191      171      171      171      194.191  128.115
1N/A    <A WITH GRAVE>               192      100      100      100      195.128  138.65
1N/A    <A WITH ACUTE>               193      101      101      101      195.129  138.66
1N/A    <A WITH CIRCUMFLEX>          194      98       98       98       195.130  138.67
1N/A    <A WITH TILDE>               195      102      102      102      195.131  138.68
1N/A    <A WITH DIAERESIS>           196      99       99       99       195.132  138.69
1N/A    <A WITH RING ABOVE>          197      103      103      103      195.133  138.70
1N/A    <CAPITAL LIGATURE AE>        198      158      158      158      195.134  138.71
1N/A    <C WITH CEDILLA>             199      104      104      104      195.135  138.72
1N/A    <E WITH GRAVE>               200      116      116      116      195.136  138.73
1N/A    <E WITH ACUTE>               201      113      113      113      195.137  138.74
1N/A    <E WITH CIRCUMFLEX>          202      114      114      114      195.138  138.81
1N/A    <E WITH DIAERESIS>           203      115      115      115      195.139  138.82
1N/A    <I WITH GRAVE>               204      120      120      120      195.140  138.83
1N/A    <I WITH ACUTE>               205      117      117      117      195.141  138.84
1N/A    <I WITH CIRCUMFLEX>          206      118      118      118      195.142  138.85
1N/A    <I WITH DIAERESIS>           207      119      119      119      195.143  138.86
1N/A    <CAPITAL LETTER ETH>         208      172      172      172      195.144  138.87
1N/A    <N WITH TILDE>               209      105      105      105      195.145  138.88
1N/A    <O WITH GRAVE>               210      237      237      237      195.146  138.89
1N/A    <O WITH ACUTE>               211      238      238      238      195.147  138.98
1N/A    <O WITH CIRCUMFLEX>          212      235      235      235      195.148  138.99
1N/A    <O WITH TILDE>               213      239      239      239      195.149  138.100
1N/A    <O WITH DIAERESIS>           214      236      236      236      195.150  138.101
1N/A    <MULTIPLICATION SIGN>        215      191      191      191      195.151  138.102
1N/A    <O WITH STROKE>              216      128      128      128      195.152  138.103
1N/A    <U WITH GRAVE>               217      253      253      224      195.153  138.104  ###
1N/A    <U WITH ACUTE>               218      254      254      254      195.154  138.105
1N/A    <U WITH CIRCUMFLEX>          219      251      251      221      195.155  138.106  ###
1N/A    <U WITH DIAERESIS>           220      252      252      252      195.156  138.112
1N/A    <Y WITH ACUTE>               221      173      186      173      195.157  138.113  *** ###
1N/A    <CAPITAL LETTER THORN>       222      174      174      174      195.158  138.114
1N/A    <SMALL LETTER SHARP S>       223      89       89       89       195.159  138.115
1N/A    <a WITH GRAVE>               224      68       68       68       195.160  139.65
1N/A    <a WITH ACUTE>               225      69       69       69       195.161  139.66
1N/A    <a WITH CIRCUMFLEX>          226      66       66       66       195.162  139.67
1N/A    <a WITH TILDE>               227      70       70       70       195.163  139.68
1N/A    <a WITH DIAERESIS>           228      67       67       67       195.164  139.69
1N/A    <a WITH RING ABOVE>          229      71       71       71       195.165  139.70
1N/A    <SMALL LIGATURE ae>          230      156      156      156      195.166  139.71
1N/A    <c WITH CEDILLA>             231      72       72       72       195.167  139.72
1N/A    <e WITH GRAVE>               232      84       84       84       195.168  139.73
1N/A    <e WITH ACUTE>               233      81       81       81       195.169  139.74
1N/A    <e WITH CIRCUMFLEX>          234      82       82       82       195.170  139.81
1N/A    <e WITH DIAERESIS>           235      83       83       83       195.171  139.82
1N/A    <i WITH GRAVE>               236      88       88       88       195.172  139.83
1N/A    <i WITH ACUTE>               237      85       85       85       195.173  139.84
1N/A    <i WITH CIRCUMFLEX>          238      86       86       86       195.174  139.85
1N/A    <i WITH DIAERESIS>           239      87       87       87       195.175  139.86
1N/A    <SMALL LETTER eth>           240      140      140      140      195.176  139.87
1N/A    <n WITH TILDE>               241      73       73       73       195.177  139.88
1N/A    <o WITH GRAVE>               242      205      205      205      195.178  139.89
1N/A    <o WITH ACUTE>               243      206      206      206      195.179  139.98
1N/A    <o WITH CIRCUMFLEX>          244      203      203      203      195.180  139.99
1N/A    <o WITH TILDE>               245      207      207      207      195.181  139.100
1N/A    <o WITH DIAERESIS>           246      204      204      204      195.182  139.101
1N/A    <DIVISION SIGN>              247      225      225      225      195.183  139.102
1N/A    <o WITH STROKE>              248      112      112      112      195.184  139.103
1N/A    <u WITH GRAVE>               249      221      221      192      195.185  139.104  ###
1N/A    <u WITH ACUTE>               250      222      222      222      195.186  139.105
1N/A    <u WITH CIRCUMFLEX>          251      219      219      219      195.187  139.106
1N/A    <u WITH DIAERESIS>           252      220      220      220      195.188  139.112
1N/A    <y WITH ACUTE>               253      141      141      141      195.189  139.113
1N/A    <SMALL LETTER thorn>         254      142      142      142      195.190  139.114
1N/A    <y WITH DIAERESIS>           255      223      223      223      195.191  139.115
1N/A
1N/AIf you would rather see the above table in CCSID 0037 order rather than
1N/AASCII + Latin-1 order then run the table through:
1N/A
1N/A=over 4
1N/A
1N/A=item recipe 4
1N/A
1N/A=back
1N/A
1N/A    perl -ne 'if(/.{33}\d{1,3}\s{6,8}\d{1,3}\s{6,8}\d{1,3}\s{6,8}\d{1,3}/)'\
1N/A     -e '{push(@l,$_)}' \
1N/A     -e 'END{print map{$_->[0]}' \
1N/A     -e '          sort{$a->[1] <=> $b->[1]}' \
1N/A     -e '          map{[$_,substr($_,42,3)]}@l;}' perlebcdic.pod
1N/A
1N/AIf you would rather see it in CCSID 1047 order then change the digit
1N/A42 in the last line to 51, like this:
1N/A
1N/A=over 4
1N/A
1N/A=item recipe 5
1N/A
1N/A=back
1N/A
1N/A    perl -ne 'if(/.{33}\d{1,3}\s{6,8}\d{1,3}\s{6,8}\d{1,3}\s{6,8}\d{1,3}/)'\
1N/A     -e '{push(@l,$_)}' \
1N/A     -e 'END{print map{$_->[0]}' \
1N/A     -e '          sort{$a->[1] <=> $b->[1]}' \
1N/A     -e '          map{[$_,substr($_,51,3)]}@l;}' perlebcdic.pod
1N/A
1N/AIf you would rather see it in POSIX-BC order then change the digit
1N/A51 in the last line to 60, like this:
1N/A
1N/A=over 4
1N/A
1N/A=item recipe 6
1N/A
1N/A=back
1N/A
1N/A    perl -ne 'if(/.{33}\d{1,3}\s{6,8}\d{1,3}\s{6,8}\d{1,3}\s{6,8}\d{1,3}/)'\
1N/A     -e '{push(@l,$_)}' \
1N/A     -e 'END{print map{$_->[0]}' \
1N/A     -e '          sort{$a->[1] <=> $b->[1]}' \
1N/A     -e '          map{[$_,substr($_,60,3)]}@l;}' perlebcdic.pod
1N/A
1N/A
1N/A=head1 IDENTIFYING CHARACTER CODE SETS
1N/A
1N/ATo determine the character set you are running under from perl one
1N/Acould use the return value of ord() or chr() to test one or more
1N/Acharacter values.  For example:
1N/A
1N/A    $is_ascii  = "A" eq chr(65);
1N/A    $is_ebcdic = "A" eq chr(193);
1N/A
1N/AAlso, "\t" is a C<HORIZONTAL TABULATION> character so that:
1N/A
1N/A    $is_ascii  = ord("\t") == 9;
1N/A    $is_ebcdic = ord("\t") == 5;
1N/A
1N/ATo distinguish EBCDIC code pages try looking at one or more of
1N/Athe characters that differ between them.  For example:
1N/A
1N/A    $is_ebcdic_37   = "\n" eq chr(37);
1N/A    $is_ebcdic_1047 = "\n" eq chr(21);
1N/A
1N/AOr better still choose a character that is uniquely encoded in any
1N/Aof the code sets, e.g.:
1N/A
1N/A    $is_ascii           = ord('[') == 91;
1N/A    $is_ebcdic_37       = ord('[') == 186;
1N/A    $is_ebcdic_1047     = ord('[') == 173;
1N/A    $is_ebcdic_POSIX_BC = ord('[') == 187;
1N/A
1N/AHowever, it would be unwise to write tests such as:
1N/A
1N/A    $is_ascii = "\r" ne chr(13);  #  WRONG
1N/A    $is_ascii = "\n" ne chr(10);  #  ILL ADVISED
1N/A
1N/AObviously the first of these will fail to distinguish most ASCII machines
1N/Afrom either a CCSID 0037, a 1047, or a POSIX-BC EBCDIC machine since "\r" eq
1N/Achr(13) under all of those coded character sets.  But note too that
1N/Abecause "\n" is chr(13) and "\r" is chr(10) on the MacIntosh (which is an
1N/AASCII machine) the second C<$is_ascii> test will lead to trouble there.
1N/A
1N/ATo determine whether or not perl was built under an EBCDIC
1N/Acode page you can use the Config module like so:
1N/A
1N/A    use Config;
1N/A    $is_ebcdic = $Config{'ebcdic'} eq 'define';
1N/A
1N/A=head1 CONVERSIONS
1N/A
1N/A=head2 tr///
1N/A
1N/AIn order to convert a string of characters from one character set to
1N/Aanother a simple list of numbers, such as in the right columns in the
1N/Aabove table, along with perl's tr/// operator is all that is needed.
1N/AThe data in the table are in ASCII order hence the EBCDIC columns
1N/Aprovide easy to use ASCII to EBCDIC operations that are also easily
1N/Areversed.
1N/A
1N/AFor example, to convert ASCII to code page 037 take the output of the second
1N/Acolumn from the output of recipe 0 (modified to add \\ characters) and use
1N/Ait in tr/// like so:
1N/A
1N/A    $cp_037 =
1N/A    '\000\001\002\003\234\011\206\177\227\215\216\013\014\015\016\017' .
1N/A    '\020\021\022\023\235\205\010\207\030\031\222\217\034\035\036\037' .
1N/A    '\200\201\202\203\204\012\027\033\210\211\212\213\214\005\006\007' .
1N/A    '\220\221\026\223\224\225\226\004\230\231\232\233\024\025\236\032' .
1N/A    '\040\240\342\344\340\341\343\345\347\361\242\056\074\050\053\174' .
1N/A    '\046\351\352\353\350\355\356\357\354\337\041\044\052\051\073\254' .
1N/A    '\055\057\302\304\300\301\303\305\307\321\246\054\045\137\076\077' .
1N/A    '\370\311\312\313\310\315\316\317\314\140\072\043\100\047\075\042' .
1N/A    '\330\141\142\143\144\145\146\147\150\151\253\273\360\375\376\261' .
1N/A    '\260\152\153\154\155\156\157\160\161\162\252\272\346\270\306\244' .
1N/A    '\265\176\163\164\165\166\167\170\171\172\241\277\320\335\336\256' .
1N/A    '\136\243\245\267\251\247\266\274\275\276\133\135\257\250\264\327' .
1N/A    '\173\101\102\103\104\105\106\107\110\111\255\364\366\362\363\365' .
1N/A    '\175\112\113\114\115\116\117\120\121\122\271\373\374\371\372\377' .
1N/A    '\134\367\123\124\125\126\127\130\131\132\262\324\326\322\323\325' .
1N/A    '\060\061\062\063\064\065\066\067\070\071\263\333\334\331\332\237' ;
1N/A
1N/A    my $ebcdic_string = $ascii_string;
1N/A    eval '$ebcdic_string =~ tr/' . $cp_037 . '/\000-\377/';
1N/A
1N/ATo convert from EBCDIC 037 to ASCII just reverse the order of the tr///
1N/Aarguments like so:
1N/A
1N/A    my $ascii_string = $ebcdic_string;
1N/A    eval '$ascii_string =~ tr/\000-\377/' . $cp_037 . '/';
1N/A
1N/ASimilarly one could take the output of the third column from recipe 0 to
1N/Aobtain a C<$cp_1047> table.  The fourth column of the output from recipe
1N/A0 could provide a C<$cp_posix_bc> table suitable for transcoding as well.
1N/A
1N/A=head2 iconv
1N/A
1N/AXPG operability often implies the presence of an I<iconv> utility
1N/Aavailable from the shell or from the C library.  Consult your system's
1N/Adocumentation for information on iconv.
1N/A
1N/AOn OS/390 or z/OS see the iconv(1) manpage.  One way to invoke the iconv
1N/Ashell utility from within perl would be to:
1N/A
1N/A    # OS/390 or z/OS example
1N/A    $ascii_data = `echo '$ebcdic_data'| iconv -f IBM-1047 -t ISO8859-1`
1N/A
1N/Aor the inverse map:
1N/A
1N/A    # OS/390 or z/OS example
1N/A    $ebcdic_data = `echo '$ascii_data'| iconv -f ISO8859-1 -t IBM-1047`
1N/A
1N/AFor other perl based conversion options see the Convert::* modules on CPAN.
1N/A
1N/A=head2 C RTL
1N/A
1N/AThe OS/390 and z/OS C run time libraries provide _atoe() and _etoa() functions.
1N/A
1N/A=head1 OPERATOR DIFFERENCES
1N/A
1N/AThe C<..> range operator treats certain character ranges with
1N/Acare on EBCDIC machines.  For example the following array
1N/Awill have twenty six elements on either an EBCDIC machine
1N/Aor an ASCII machine:
1N/A
1N/A    @alphabet = ('A'..'Z');   #  $#alphabet == 25
1N/A
1N/AThe bitwise operators such as & ^ | may return different results
1N/Awhen operating on string or character data in a perl program running
1N/Aon an EBCDIC machine than when run on an ASCII machine.  Here is
1N/Aan example adapted from the one in L<perlop>:
1N/A
1N/A    # EBCDIC-based examples
1N/A    print "j p \n" ^ " a h";                      # prints "JAPH\n"
1N/A    print "JA" | "  ph\n";                        # prints "japh\n"
1N/A    print "JAPH\nJunk" & "\277\277\277\277\277";  # prints "japh\n";
1N/A    print 'p N$' ^ " E<H\n";                      # prints "Perl\n";
1N/A
1N/AAn interesting property of the 32 C0 control characters
1N/Ain the ASCII table is that they can "literally" be constructed
1N/Aas control characters in perl, e.g. C<(chr(0) eq "\c@")>
1N/AC<(chr(1) eq "\cA")>, and so on.  Perl on EBCDIC machines has been
1N/Aported to take "\c@" to chr(0) and "\cA" to chr(1) as well, but the
1N/Athirty three characters that result depend on which code page you are
1N/Ausing.  The table below uses the character names from the previous table
1N/Abut with substitutions such as s/START OF/S.O./; s/END OF /E.O./;
1N/As/TRANSMISSION/TRANS./; s/TABULATION/TAB./; s/VERTICAL/VERT./;
1N/As/HORIZONTAL/HORIZ./; s/DEVICE CONTROL/D.C./; s/SEPARATOR/SEP./;
1N/As/NEGATIVE ACKNOWLEDGE/NEG. ACK./;.  The POSIX-BC and 1047 sets are
1N/Aidentical throughout this range and differ from the 0037 set at only
1N/Aone spot (21 decimal).  Note that the C<LINE FEED> character
1N/Amay be generated by "\cJ" on ASCII machines but by "\cU" on 1047 or POSIX-BC
1N/Amachines and cannot be generated as a C<"\c.letter."> control character on
1N/A0037 machines.  Note also that "\c\\" maps to two characters
1N/Anot one.
1N/A
1N/A    chr   ord  8859-1               0037                1047 && POSIX-BC
1N/A    ------------------------------------------------------------------------
1N/A    "\c?" 127  <DELETE>             "                   "              ***><
1N/A    "\c@"   0  <NULL>               <NULL>              <NULL>         ***><
1N/A    "\cA"   1  <S.O. HEADING>       <S.O. HEADING>      <S.O. HEADING>
1N/A    "\cB"   2  <S.O. TEXT>          <S.O. TEXT>         <S.O. TEXT>
1N/A    "\cC"   3  <E.O. TEXT>          <E.O. TEXT>         <E.O. TEXT>
1N/A    "\cD"   4  <E.O. TRANS.>        <C1 28>             <C1 28>
1N/A    "\cE"   5  <ENQUIRY>            <HORIZ. TAB.>       <HORIZ. TAB.>
1N/A    "\cF"   6  <ACKNOWLEDGE>        <C1 6>              <C1 6>
1N/A    "\cG"   7  <BELL>               <DELETE>            <DELETE>
1N/A    "\cH"   8  <BACKSPACE>          <C1 23>             <C1 23>
1N/A    "\cI"   9  <HORIZ. TAB.>        <C1 13>             <C1 13>
1N/A    "\cJ"  10  <LINE FEED>          <C1 14>             <C1 14>
1N/A    "\cK"  11  <VERT. TAB.>         <VERT. TAB.>        <VERT. TAB.>
1N/A    "\cL"  12  <FORM FEED>          <FORM FEED>         <FORM FEED>
1N/A    "\cM"  13  <CARRIAGE RETURN>    <CARRIAGE RETURN>   <CARRIAGE RETURN>
1N/A    "\cN"  14  <SHIFT OUT>          <SHIFT OUT>         <SHIFT OUT>
1N/A    "\cO"  15  <SHIFT IN>           <SHIFT IN>          <SHIFT IN>
1N/A    "\cP"  16  <DATA LINK ESCAPE>   <DATA LINK ESCAPE>  <DATA LINK ESCAPE>
1N/A    "\cQ"  17  <D.C. ONE>           <D.C. ONE>          <D.C. ONE>
1N/A    "\cR"  18  <D.C. TWO>           <D.C. TWO>          <D.C. TWO>
1N/A    "\cS"  19  <D.C. THREE>         <D.C. THREE>        <D.C. THREE>
1N/A    "\cT"  20  <D.C. FOUR>          <C1 29>             <C1 29>
1N/A    "\cU"  21  <NEG. ACK.>          <C1 5>              <LINE FEED>    ***
1N/A    "\cV"  22  <SYNCHRONOUS IDLE>   <BACKSPACE>         <BACKSPACE>
1N/A    "\cW"  23  <E.O. TRANS. BLOCK>  <C1 7>              <C1 7>
1N/A    "\cX"  24  <CANCEL>             <CANCEL>            <CANCEL>
1N/A    "\cY"  25  <E.O. MEDIUM>        <E.O. MEDIUM>       <E.O. MEDIUM>
1N/A    "\cZ"  26  <SUBSTITUTE>         <C1 18>             <C1 18>
1N/A    "\c["  27  <ESCAPE>             <C1 15>             <C1 15>
1N/A    "\c\\" 28  <FILE SEP.>\         <FILE SEP.>\        <FILE SEP.>\
1N/A    "\c]"  29  <GROUP SEP.>         <GROUP SEP.>        <GROUP SEP.>
1N/A    "\c^"  30  <RECORD SEP.>        <RECORD SEP.>       <RECORD SEP.>  ***><
1N/A    "\c_"  31  <UNIT SEP.>          <UNIT SEP.>         <UNIT SEP.>    ***><
1N/A
1N/A
1N/A=head1 FUNCTION DIFFERENCES
1N/A
1N/A=over 8
1N/A
1N/A=item chr()
1N/A
1N/Achr() must be given an EBCDIC code number argument to yield a desired
1N/Acharacter return value on an EBCDIC machine.  For example:
1N/A
1N/A    $CAPITAL_LETTER_A = chr(193);
1N/A
1N/A=item ord()
1N/A
1N/Aord() will return EBCDIC code number values on an EBCDIC machine.
1N/AFor example:
1N/A
1N/A    $the_number_193 = ord("A");
1N/A
1N/A=item pack()
1N/A
1N/AThe c and C templates for pack() are dependent upon character set
1N/Aencoding.  Examples of usage on EBCDIC include:
1N/A
1N/A    $foo = pack("CCCC",193,194,195,196);
1N/A    # $foo eq "ABCD"
1N/A    $foo = pack("C4",193,194,195,196);
1N/A    # same thing
1N/A
1N/A    $foo = pack("ccxxcc",193,194,195,196);
1N/A    # $foo eq "AB\0\0CD"
1N/A
1N/A=item print()
1N/A
1N/AOne must be careful with scalars and strings that are passed to
1N/Aprint that contain ASCII encodings.  One common place
1N/Afor this to occur is in the output of the MIME type header for
1N/ACGI script writing.  For example, many perl programming guides
1N/Arecommend something similar to:
1N/A
1N/A    print "Content-type:\ttext/html\015\012\015\012";
1N/A    # this may be wrong on EBCDIC
1N/A
1N/AUnder the IBM OS/390 USS Web Server or WebSphere on z/OS for example
1N/Ayou should instead write that as:
1N/A
1N/A    print "Content-type:\ttext/html\r\n\r\n"; # OK for DGW et alia
1N/A
1N/AThat is because the translation from EBCDIC to ASCII is done
1N/Aby the web server in this case (such code will not be appropriate for
1N/Athe Macintosh however).  Consult your web server's documentation for
1N/Afurther details.
1N/A
1N/A=item printf()
1N/A
1N/AThe formats that can convert characters to numbers and vice versa
1N/Awill be different from their ASCII counterparts when executed
1N/Aon an EBCDIC machine.  Examples include:
1N/A
1N/A    printf("%c%c%c",193,194,195);  # prints ABC
1N/A
1N/A=item sort()
1N/A
1N/AEBCDIC sort results may differ from ASCII sort results especially for
1N/Amixed case strings.  This is discussed in more detail below.
1N/A
1N/A=item sprintf()
1N/A
1N/ASee the discussion of printf() above.  An example of the use
1N/Aof sprintf would be:
1N/A
1N/A    $CAPITAL_LETTER_A = sprintf("%c",193);
1N/A
1N/A=item unpack()
1N/A
1N/ASee the discussion of pack() above.
1N/A
1N/A=back
1N/A
1N/A=head1 REGULAR EXPRESSION DIFFERENCES
1N/A
1N/AAs of perl 5.005_03 the letter range regular expression such as
1N/A[A-Z] and [a-z] have been especially coded to not pick up gap
1N/Acharacters.  For example, characters such as E<ocirc> C<o WITH CIRCUMFLEX>
1N/Athat lie between I and J would not be matched by the
1N/Aregular expression range C</[H-K]/>.  This works in
1N/Athe other direction, too, if either of the range end points is
1N/Aexplicitly numeric: C<[\x89-\x91]> will match C<\x8e>, even
1N/Athough C<\x89> is C<i> and C<\x91 > is C<j>, and C<\x8e>
1N/Ais a gap character from the alphabetic viewpoint.
1N/A
1N/AIf you do want to match the alphabet gap characters in a single octet
1N/Aregular expression try matching the hex or octal code such
1N/Aas C</\313/> on EBCDIC or C</\364/> on ASCII machines to
1N/Ahave your regular expression match C<o WITH CIRCUMFLEX>.
1N/A
1N/AAnother construct to be wary of is the inappropriate use of hex or
1N/Aoctal constants in regular expressions.  Consider the following
1N/Aset of subs:
1N/A
1N/A    sub is_c0 {
1N/A        my $char = substr(shift,0,1);
1N/A        $char =~ /[\000-\037]/;
1N/A    }
1N/A
1N/A    sub is_print_ascii {
1N/A        my $char = substr(shift,0,1);
1N/A        $char =~ /[\040-\176]/;
1N/A    }
1N/A
1N/A    sub is_delete {
1N/A        my $char = substr(shift,0,1);
1N/A        $char eq "\177";
1N/A    }
1N/A
1N/A    sub is_c1 {
1N/A        my $char = substr(shift,0,1);
1N/A        $char =~ /[\200-\237]/;
1N/A    }
1N/A
1N/A    sub is_latin_1 {
1N/A        my $char = substr(shift,0,1);
1N/A        $char =~ /[\240-\377]/;
1N/A    }
1N/A
1N/AThe above would be adequate if the concern was only with numeric code points.
1N/AHowever, the concern may be with characters rather than code points
1N/Aand on an EBCDIC machine it may be desirable for constructs such as
1N/AC<if (is_print_ascii("A")) {print "A is a printable character\n";}> to print
1N/Aout the expected message.  One way to represent the above collection
1N/Aof character classification subs that is capable of working across the
1N/Afour coded character sets discussed in this document is as follows:
1N/A
1N/A    sub Is_c0 {
1N/A        my $char = substr(shift,0,1);
1N/A        if (ord('^')==94)  { # ascii
1N/A            return $char =~ /[\000-\037]/;
1N/A        }
1N/A        if (ord('^')==176) { # 37
1N/A            return $char =~ /[\000-\003\067\055-\057\026\005\045\013-\023\074\075\062\046\030\031\077\047\034-\037]/;
1N/A        }
1N/A        if (ord('^')==95 || ord('^')==106) { # 1047 || posix-bc
1N/A            return $char =~ /[\000-\003\067\055-\057\026\005\025\013-\023\074\075\062\046\030\031\077\047\034-\037]/;
1N/A        }
1N/A    }
1N/A
1N/A    sub Is_print_ascii {
1N/A        my $char = substr(shift,0,1);
1N/A        $char =~ /[ !"\#\$%&'()*+,\-.\/0-9:;<=>?\@A-Z[\\\]^_`a-z{|}~]/;
1N/A    }
1N/A
1N/A    sub Is_delete {
1N/A        my $char = substr(shift,0,1);
1N/A        if (ord('^')==94)  { # ascii
1N/A            return $char eq "\177";
1N/A        }
1N/A        else  {              # ebcdic
1N/A            return $char eq "\007";
1N/A        }
1N/A    }
1N/A
1N/A    sub Is_c1 {
1N/A        my $char = substr(shift,0,1);
1N/A        if (ord('^')==94)  { # ascii
1N/A            return $char =~ /[\200-\237]/;
1N/A        }
1N/A        if (ord('^')==176) { # 37
1N/A            return $char =~ /[\040-\044\025\006\027\050-\054\011\012\033\060\061\032\063-\066\010\070-\073\040\024\076\377]/;
1N/A        }
1N/A        if (ord('^')==95)  { # 1047
1N/A            return $char =~ /[\040-\045\006\027\050-\054\011\012\033\060\061\032\063-\066\010\070-\073\040\024\076\377]/;
1N/A        }
1N/A        if (ord('^')==106) { # posix-bc
1N/A            return $char =~
1N/A              /[\040-\045\006\027\050-\054\011\012\033\060\061\032\063-\066\010\070-\073\040\024\076\137]/;
1N/A        }
1N/A    }
1N/A
1N/A    sub Is_latin_1 {
1N/A        my $char = substr(shift,0,1);
1N/A        if (ord('^')==94)  { # ascii
1N/A            return $char =~ /[\240-\377]/;
1N/A        }
1N/A        if (ord('^')==176) { # 37
1N/A            return $char =~
1N/A              /[\101\252\112\261\237\262\152\265\275\264\232\212\137\312\257\274\220\217\352\372\276\240\266\263\235\332\233\213\267\270\271\253\144\145\142\146\143\147\236\150\164\161-\163\170\165-\167\254\151\355\356\353\357\354\277\200\375\376\373\374\255\256\131\104\105\102\106\103\107\234\110\124\121-\123\130\125-\127\214\111\315\316\313\317\314\341\160\335\336\333\334\215\216\337]/;
1N/A        }
1N/A        if (ord('^')==95)  { # 1047
1N/A            return $char =~
1N/A              /[\101\252\112\261\237\262\152\265\273\264\232\212\260\312\257\274\220\217\352\372\276\240\266\263\235\332\233\213\267\270\271\253\144\145\142\146\143\147\236\150\164\161-\163\170\165-\167\254\151\355\356\353\357\354\277\200\375\376\373\374\272\256\131\104\105\102\106\103\107\234\110\124\121-\123\130\125-\127\214\111\315\316\313\317\314\341\160\335\336\333\334\215\216\337]/;
1N/A        }
1N/A        if (ord('^')==106) { # posix-bc
1N/A            return $char =~
1N/A              /[\101\252\260\261\237\262\320\265\171\264\232\212\272\312\257\241\220\217\352\372\276\240\266\263\235\332\233\213\267\270\271\253\144\145\142\146\143\147\236\150\164\161-\163\170\165-\167\254\151\355\356\353\357\354\277\200\340\376\335\374\255\256\131\104\105\102\106\103\107\234\110\124\121-\123\130\125-\127\214\111\315\316\313\317\314\341\160\300\336\333\334\215\216\337]/;
1N/A        }
1N/A    }
1N/A
1N/ANote however that only the C<Is_ascii_print()> sub is really independent
1N/Aof coded character set.  Another way to write C<Is_latin_1()> would be
1N/Ato use the characters in the range explicitly:
1N/A
1N/A    sub Is_latin_1 {
1N/A        my $char = substr(shift,0,1);
1N/A        $char =~ /[������������������������������������������������������������������������������������������������]/;
1N/A    }
1N/A
1N/AAlthough that form may run into trouble in network transit (due to the
1N/Apresence of 8 bit characters) or on non ISO-Latin character sets.
1N/A
1N/A=head1 SOCKETS
1N/A
1N/AMost socket programming assumes ASCII character encodings in network
1N/Abyte order.  Exceptions can include CGI script writing under a
1N/Ahost web server where the server may take care of translation for you.
1N/AMost host web servers convert EBCDIC data to ISO-8859-1 or Unicode on
1N/Aoutput.
1N/A
1N/A=head1 SORTING
1N/A
1N/AOne big difference between ASCII based character sets and EBCDIC ones
1N/Aare the relative positions of upper and lower case letters and the
1N/Aletters compared to the digits.  If sorted on an ASCII based machine the
1N/Atwo letter abbreviation for a physician comes before the two letter
1N/Afor drive, that is:
1N/A
1N/A    @sorted = sort(qw(Dr. dr.));  # @sorted holds ('Dr.','dr.') on ASCII,
1N/A                                  # but ('dr.','Dr.') on EBCDIC
1N/A
1N/AThe property of lower case before uppercase letters in EBCDIC is
1N/Aeven carried to the Latin 1 EBCDIC pages such as 0037 and 1047.
1N/AAn example would be that E<Euml> C<E WITH DIAERESIS> (203) comes
1N/Abefore E<euml> C<e WITH DIAERESIS> (235) on an ASCII machine, but
1N/Athe latter (83) comes before the former (115) on an EBCDIC machine.
1N/A(Astute readers will note that the upper case version of E<szlig>
1N/AC<SMALL LETTER SHARP S> is simply "SS" and that the upper case version of
1N/AE<yuml> C<y WITH DIAERESIS> is not in the 0..255 range but it is
1N/Aat U+x0178 in Unicode, or C<"\x{178}"> in a Unicode enabled Perl).
1N/A
1N/AThe sort order will cause differences between results obtained on
1N/AASCII machines versus EBCDIC machines.  What follows are some suggestions
1N/Aon how to deal with these differences.
1N/A
1N/A=head2 Ignore ASCII vs. EBCDIC sort differences.
1N/A
1N/AThis is the least computationally expensive strategy.  It may require
1N/Asome user education.
1N/A
1N/A=head2 MONO CASE then sort data.
1N/A
1N/AIn order to minimize the expense of mono casing mixed test try to
1N/AC<tr///> towards the character set case most employed within the data.
1N/AIf the data are primarily UPPERCASE non Latin 1 then apply tr/[a-z]/[A-Z]/
1N/Athen sort().  If the data are primarily lowercase non Latin 1 then
1N/Aapply tr/[A-Z]/[a-z]/ before sorting.  If the data are primarily UPPERCASE
1N/Aand include Latin-1 characters then apply:
1N/A
1N/A    tr/[a-z]/[A-Z]/;
1N/A    tr/[������������������������������]/[������������������������������]/;
1N/A    s/�/SS/g;
1N/A
1N/Athen sort().  Do note however that such Latin-1 manipulation does not
1N/Aaddress the E<yuml> C<y WITH DIAERESIS> character that will remain at
1N/Acode point 255 on ASCII machines, but 223 on most EBCDIC machines
1N/Awhere it will sort to a place less than the EBCDIC numerals.  With a
1N/AUnicode enabled Perl you might try:
1N/A
1N/A    tr/^?/\x{178}/;
1N/A
1N/AThe strategy of mono casing data before sorting does not preserve the case
1N/Aof the data and may not be acceptable for that reason.
1N/A
1N/A=head2 Convert, sort data, then re convert.
1N/A
1N/AThis is the most expensive proposition that does not employ a network
1N/Aconnection.
1N/A
1N/A=head2 Perform sorting on one type of machine only.
1N/A
1N/AThis strategy can employ a network connection.  As such
1N/Ait would be computationally expensive.
1N/A
1N/A=head1 TRANSFORMATION FORMATS
1N/A
1N/AThere are a variety of ways of transforming data with an intra character set
1N/Amapping that serve a variety of purposes.  Sorting was discussed in the
1N/Aprevious section and a few of the other more popular mapping techniques are
1N/Adiscussed next.
1N/A
1N/A=head2 URL decoding and encoding
1N/A
1N/ANote that some URLs have hexadecimal ASCII code points in them in an
1N/Aattempt to overcome character or protocol limitation issues.  For example
1N/Athe tilde character is not on every keyboard hence a URL of the form:
1N/A
1N/A    http://www.pvhp.com/~pvhp/
1N/A
1N/Amay also be expressed as either of:
1N/A
1N/A    http://www.pvhp.com/%7Epvhp/
1N/A
1N/A    http://www.pvhp.com/%7epvhp/
1N/A
1N/Awhere 7E is the hexadecimal ASCII code point for '~'.  Here is an example
1N/Aof decoding such a URL under CCSID 1047:
1N/A
1N/A    $url = 'http://www.pvhp.com/%7Epvhp/';
1N/A    # this array assumes code page 1047
1N/A    my @a2e_1047 = (
1N/A          0,  1,  2,  3, 55, 45, 46, 47, 22,  5, 21, 11, 12, 13, 14, 15,
1N/A         16, 17, 18, 19, 60, 61, 50, 38, 24, 25, 63, 39, 28, 29, 30, 31,
1N/A         64, 90,127,123, 91,108, 80,125, 77, 93, 92, 78,107, 96, 75, 97,
1N/A        240,241,242,243,244,245,246,247,248,249,122, 94, 76,126,110,111,
1N/A        124,193,194,195,196,197,198,199,200,201,209,210,211,212,213,214,
1N/A        215,216,217,226,227,228,229,230,231,232,233,173,224,189, 95,109,
1N/A        121,129,130,131,132,133,134,135,136,137,145,146,147,148,149,150,
1N/A        151,152,153,162,163,164,165,166,167,168,169,192, 79,208,161,  7,
1N/A         32, 33, 34, 35, 36, 37,  6, 23, 40, 41, 42, 43, 44,  9, 10, 27,
1N/A         48, 49, 26, 51, 52, 53, 54,  8, 56, 57, 58, 59,  4, 20, 62,255,
1N/A         65,170, 74,177,159,178,106,181,187,180,154,138,176,202,175,188,
1N/A        144,143,234,250,190,160,182,179,157,218,155,139,183,184,185,171,
1N/A        100,101, 98,102, 99,103,158,104,116,113,114,115,120,117,118,119,
1N/A        172,105,237,238,235,239,236,191,128,253,254,251,252,186,174, 89,
1N/A         68, 69, 66, 70, 67, 71,156, 72, 84, 81, 82, 83, 88, 85, 86, 87,
1N/A        140, 73,205,206,203,207,204,225,112,221,222,219,220,141,142,223
1N/A    );
1N/A    $url =~ s/%([0-9a-fA-F]{2})/pack("c",$a2e_1047[hex($1)])/ge;
1N/A
1N/AConversely, here is a partial solution for the task of encoding such
1N/Aa URL under the 1047 code page:
1N/A
1N/A    $url = 'http://www.pvhp.com/~pvhp/';
1N/A    # this array assumes code page 1047
1N/A    my @e2a_1047 = (
1N/A          0,  1,  2,  3,156,  9,134,127,151,141,142, 11, 12, 13, 14, 15,
1N/A         16, 17, 18, 19,157, 10,  8,135, 24, 25,146,143, 28, 29, 30, 31,
1N/A        128,129,130,131,132,133, 23, 27,136,137,138,139,140,  5,  6,  7,
1N/A        144,145, 22,147,148,149,150,  4,152,153,154,155, 20, 21,158, 26,
1N/A         32,160,226,228,224,225,227,229,231,241,162, 46, 60, 40, 43,124,
1N/A         38,233,234,235,232,237,238,239,236,223, 33, 36, 42, 41, 59, 94,
1N/A         45, 47,194,196,192,193,195,197,199,209,166, 44, 37, 95, 62, 63,
1N/A        248,201,202,203,200,205,206,207,204, 96, 58, 35, 64, 39, 61, 34,
1N/A        216, 97, 98, 99,100,101,102,103,104,105,171,187,240,253,254,177,
1N/A        176,106,107,108,109,110,111,112,113,114,170,186,230,184,198,164,
1N/A        181,126,115,116,117,118,119,120,121,122,161,191,208, 91,222,174,
1N/A        172,163,165,183,169,167,182,188,189,190,221,168,175, 93,180,215,
1N/A        123, 65, 66, 67, 68, 69, 70, 71, 72, 73,173,244,246,242,243,245,
1N/A        125, 74, 75, 76, 77, 78, 79, 80, 81, 82,185,251,252,249,250,255,
1N/A         92,247, 83, 84, 85, 86, 87, 88, 89, 90,178,212,214,210,211,213,
1N/A         48, 49, 50, 51, 52, 53, 54, 55, 56, 57,179,219,220,217,218,159
1N/A    );
1N/A    # The following regular expression does not address the
1N/A    # mappings for: ('.' => '%2E', '/' => '%2F', ':' => '%3A')
1N/A    $url =~ s/([\t "#%&\(\),;<=>\?\@\[\\\]^`{|}~])/sprintf("%%%02X",$e2a_1047[ord($1)])/ge;
1N/A
1N/Awhere a more complete solution would split the URL into components
1N/Aand apply a full s/// substitution only to the appropriate parts.
1N/A
1N/AIn the remaining examples a @e2a or @a2e array may be employed
1N/Abut the assignment will not be shown explicitly.  For code page 1047
1N/Ayou could use the @a2e_1047 or @e2a_1047 arrays just shown.
1N/A
1N/A=head2 uu encoding and decoding
1N/A
1N/AThe C<u> template to pack() or unpack() will render EBCDIC data in EBCDIC
1N/Acharacters equivalent to their ASCII counterparts.  For example, the
1N/Afollowing will print "Yes indeed\n" on either an ASCII or EBCDIC computer:
1N/A
1N/A    $all_byte_chrs = '';
1N/A    for (0..255) { $all_byte_chrs .= chr($_); }
1N/A    $uuencode_byte_chrs = pack('u', $all_byte_chrs);
1N/A    ($uu = <<'ENDOFHEREDOC') =~ s/^\s*//gm;
1N/A    M``$"`P0%!@<("0H+#`T.#Q`1$A,4%187&!D:&QP='A\@(2(C)"4F)R@I*BLL
1N/A    M+2XO,#$R,S0U-C<X.3H[/#T^/T!!0D-$149'2$E*2TQ-3D]045)35%565UA9
1N/A    M6EM<75Y?8&%B8V1E9F=H:6IK;&UN;W!Q<G-T=79W>'EZ>WQ]?G^`@8*#A(6&
1N/A    MAXB)BHN,C8Z/D)&2DY25EI>8F9J;G)V>GZ"AHJ.DI::GJ*FJJZRMKJ^PL;*S
1N/A    MM+6VM[BYNKN\O;Z_P,'"P\3%QL?(R<K+S,W.S]#1TM/4U=;7V-G:V]S=WM_@
1N/A    ?X>+CY.7FY^CIZNOL[>[O\/'R\_3U]O?X^?K[_/W^_P``
1N/A    ENDOFHEREDOC
1N/A    if ($uuencode_byte_chrs eq $uu) {
1N/A        print "Yes ";
1N/A    }
1N/A    $uudecode_byte_chrs = unpack('u', $uuencode_byte_chrs);
1N/A    if ($uudecode_byte_chrs eq $all_byte_chrs) {
1N/A        print "indeed\n";
1N/A    }
1N/A
1N/AHere is a very spartan uudecoder that will work on EBCDIC provided
1N/Athat the @e2a array is filled in appropriately:
1N/A
1N/A    #!/usr/local/bin/perl
1N/A    @e2a = ( # this must be filled in
1N/A           );
1N/A    $_ = <> until ($mode,$file) = /^begin\s*(\d*)\s*(\S*)/;
1N/A    open(OUT, "> $file") if $file ne "";
1N/A    while(<>) {
1N/A        last if /^end/;
1N/A        next if /[a-z]/;
1N/A        next unless int(((($e2a[ord()] - 32 ) & 077) + 2) / 3) ==
1N/A            int(length() / 4);
1N/A        print OUT unpack("u", $_);
1N/A    }
1N/A    close(OUT);
1N/A    chmod oct($mode), $file;
1N/A
1N/A
1N/A=head2 Quoted-Printable encoding and decoding
1N/A
1N/AOn ASCII encoded machines it is possible to strip characters outside of
1N/Athe printable set using:
1N/A
1N/A    # This QP encoder works on ASCII only
1N/A    $qp_string =~ s/([=\x00-\x1F\x80-\xFF])/sprintf("=%02X",ord($1))/ge;
1N/A
1N/AWhereas a QP encoder that works on both ASCII and EBCDIC machines
1N/Awould look somewhat like the following (where the EBCDIC branch @e2a
1N/Aarray is omitted for brevity):
1N/A
1N/A    if (ord('A') == 65) {    # ASCII
1N/A        $delete = "\x7F";    # ASCII
1N/A        @e2a = (0 .. 255)    # ASCII to ASCII identity map
1N/A    }
1N/A    else {                   # EBCDIC
1N/A        $delete = "\x07";    # EBCDIC
1N/A        @e2a =               # EBCDIC to ASCII map (as shown above)
1N/A    }
1N/A    $qp_string =~
1N/A      s/([^ !"\#\$%&'()*+,\-.\/0-9:;<>?\@A-Z[\\\]^_`a-z{|}~$delete])/sprintf("=%02X",$e2a[ord($1)])/ge;
1N/A
1N/A(although in production code the substitutions might be done
1N/Ain the EBCDIC branch with the @e2a array and separately in the
1N/AASCII branch without the expense of the identity map).
1N/A
1N/ASuch QP strings can be decoded with:
1N/A
1N/A    # This QP decoder is limited to ASCII only
1N/A    $string =~ s/=([0-9A-Fa-f][0-9A-Fa-f])/chr hex $1/ge;
1N/A    $string =~ s/=[\n\r]+$//;
1N/A
1N/AWhereas a QP decoder that works on both ASCII and EBCDIC machines
1N/Awould look somewhat like the following (where the @a2e array is
1N/Aomitted for brevity):
1N/A
1N/A    $string =~ s/=([0-9A-Fa-f][0-9A-Fa-f])/chr $a2e[hex $1]/ge;
1N/A    $string =~ s/=[\n\r]+$//;
1N/A
1N/A=head2 Caesarian ciphers
1N/A
1N/AThe practice of shifting an alphabet one or more characters for encipherment
1N/Adates back thousands of years and was explicitly detailed by Gaius Julius
1N/ACaesar in his B<Gallic Wars> text.  A single alphabet shift is sometimes
1N/Areferred to as a rotation and the shift amount is given as a number $n after
1N/Athe string 'rot' or "rot$n".  Rot0 and rot26 would designate identity maps
1N/Aon the 26 letter English version of the Latin alphabet.  Rot13 has the
1N/Ainteresting property that alternate subsequent invocations are identity maps
1N/A(thus rot13 is its own non-trivial inverse in the group of 26 alphabet
1N/Arotations).  Hence the following is a rot13 encoder and decoder that will
1N/Awork on ASCII and EBCDIC machines:
1N/A
1N/A    #!/usr/local/bin/perl
1N/A
1N/A    while(<>){
1N/A        tr/n-za-mN-ZA-M/a-zA-Z/;
1N/A        print;
1N/A    }
1N/A
1N/AIn one-liner form:
1N/A
1N/A    perl -ne 'tr/n-za-mN-ZA-M/a-zA-Z/;print'
1N/A
1N/A
1N/A=head1 Hashing order and checksums
1N/A
1N/ATo the extent that it is possible to write code that depends on
1N/Ahashing order there may be differences between hashes as stored
1N/Aon an ASCII based machine and hashes stored on an EBCDIC based machine.
1N/AXXX
1N/A
1N/A=head1 I18N AND L10N
1N/A
1N/AInternationalization(I18N) and localization(L10N) are supported at least
1N/Ain principle even on EBCDIC machines.  The details are system dependent
1N/Aand discussed under the L<perlebcdic/OS ISSUES> section below.
1N/A
1N/A=head1 MULTI OCTET CHARACTER SETS
1N/A
1N/APerl may work with an internal UTF-EBCDIC encoding form for wide characters
1N/Aon EBCDIC platforms in a manner analogous to the way that it works with
1N/Athe UTF-8 internal encoding form on ASCII based platforms.
1N/A
1N/ALegacy multi byte EBCDIC code pages XXX.
1N/A
1N/A=head1 OS ISSUES
1N/A
1N/AThere may be a few system dependent issues
1N/Aof concern to EBCDIC Perl programmers.
1N/A
1N/A=head2 OS/400
1N/A
1N/A=over 8
1N/A
1N/A=item PASE
1N/A
1N/AThe PASE environment is runtime environment for OS/400 that can run
1N/Aexecutables built for PowerPC AIX in OS/400, see L<perlos400>.  PASE
1N/Ais ASCII-based, not EBCDIC-based as the ILE.
1N/A
1N/A=item IFS access
1N/A
1N/AXXX.
1N/A
1N/A=back
1N/A
1N/A=head2 OS/390, z/OS
1N/A
1N/APerl runs under Unix Systems Services or USS.
1N/A
1N/A=over 8
1N/A
1N/A=item chcp
1N/A
1N/AB<chcp> is supported as a shell utility for displaying and changing
1N/Aone's code page.  See also L<chcp>.
1N/A
1N/A=item dataset access
1N/A
1N/AFor sequential data set access try:
1N/A
1N/A    my @ds_records = `cat //DSNAME`;
1N/A
1N/Aor:
1N/A
1N/A    my @ds_records = `cat //'HLQ.DSNAME'`;
1N/A
1N/ASee also the OS390::Stdio module on CPAN.
1N/A
1N/A=item OS/390, z/OS iconv
1N/A
1N/AB<iconv> is supported as both a shell utility and a C RTL routine.
1N/ASee also the iconv(1) and iconv(3) manual pages.
1N/A
1N/A=item locales
1N/A
1N/AOn OS/390 or z/OS see L<locale> for information on locales.  The L10N files
1N/Aare in F</usr/nls/locale>.  $Config{d_setlocale} is 'define' on OS/390
1N/Aor z/OS.
1N/A
1N/A=back
1N/A
1N/A=head2 VM/ESA?
1N/A
1N/AXXX.
1N/A
1N/A=head2 POSIX-BC?
1N/A
1N/AXXX.
1N/A
1N/A=head1 BUGS
1N/A
1N/AThis pod document contains literal Latin 1 characters and may encounter
1N/Atranslation difficulties.  In particular one popular nroff implementation
1N/Awas known to strip accented characters to their unaccented counterparts
1N/Awhile attempting to view this document through the B<pod2man> program
1N/A(for example, you may see a plain C<y> rather than one with a diaeresis
1N/Aas in E<yuml>).  Another nroff truncated the resultant manpage at
1N/Athe first occurrence of 8 bit characters.
1N/A
1N/ANot all shells will allow multiple C<-e> string arguments to perl to
1N/Abe concatenated together properly as recipes 0, 2, 4, 5, and 6 might
1N/Aseem to imply.
1N/A
1N/A=head1 SEE ALSO
1N/A
1N/AL<perllocale>, L<perlfunc>, L<perlunicode>, L<utf8>.
1N/A
1N/A=head1 REFERENCES
1N/A
1N/Ahttp://anubis.dkuug.dk/i18n/charmaps
1N/A
1N/Ahttp://www.unicode.org/
1N/A
1N/Ahttp://www.unicode.org/unicode/reports/tr16/
1N/A
1N/Ahttp://www.wps.com/texts/codes/
1N/AB<ASCII: American Standard Code for Information Infiltration> Tom Jennings,
1N/ASeptember 1999.
1N/A
1N/AB<The Unicode Standard, Version 3.0> The Unicode Consortium, Lisa Moore ed.,
1N/AISBN 0-201-61633-5, Addison Wesley Developers Press, February 2000.
1N/A
1N/AB<CDRA: IBM - Character Data Representation Architecture -
1N/AReference and Registry>, IBM SC09-2190-00, December 1996.
1N/A
1N/A"Demystifying Character Sets", Andrea Vine, Multilingual Computing
1N/A& Technology, B<#26 Vol. 10 Issue 4>, August/September 1999;
1N/AISSN 1523-0309; Multilingual Computing Inc. Sandpoint ID, USA.
1N/A
1N/AB<Codes, Ciphers, and Other Cryptic and Clandestine Communication>
1N/AFred B. Wrixon, ISBN 1-57912-040-7, Black Dog & Leventhal Publishers,
1N/A1998.
1N/A
1N/Ahttp://www.bobbemer.com/P-BIT.HTM
1N/AB<IBM - EBCDIC and the P-bit; The biggest Computer Goof Ever> Robert Bemer.
1N/A
1N/A=head1 HISTORY
1N/A
1N/A15 April 2001: added UTF-8 and UTF-EBCDIC to main table, pvhp.
1N/A
1N/A=head1 AUTHOR
1N/A
1N/APeter Prymmer pvhp@best.com wrote this in 1999 and 2000
1N/Awith CCSID 0819 and 0037 help from Chris Leach and
1N/AAndrE<eacute> Pirard A.Pirard@ulg.ac.be as well as POSIX-BC
1N/Ahelp from Thomas Dorner Thomas.Dorner@start.de.
1N/AThanks also to Vickie Cooper, Philip Newton, William Raffloer, and
1N/AJoe Smith.  Trademarks, registered trademarks, service marks and
1N/Aregistered service marks used in this document are the property of
1N/Atheir respective owners.
1N/A
1N/A