1N/A=head1 NAME
1N/A
1N/Aperlebcdic - Considerations for running Perl on EBCDIC platforms
1N/A
1N/A=head1 DESCRIPTION
1N/A
1N/AAn exploration of some of the issues facing Perl programmers
1N/Aon EBCDIC based computers. We do not cover localization,
1N/Ainternationalization, or multi byte character set issues other
1N/Athan some discussion of UTF-8 and UTF-EBCDIC.
1N/A
1N/APortions that are still incomplete are marked with XXX.
1N/A
1N/A=head1 COMMON CHARACTER CODE SETS
1N/A
1N/A=head2 ASCII
1N/A
1N/AThe American Standard Code for Information Interchange is a set of
1N/Aintegers running from 0 to 127 (decimal) that imply character
1N/Ainterpretation by the display and other system(s) of computers.
1N/AThe range 0..127 can be covered by setting the bits in a 7-bit binary
1N/Adigit, hence the set is sometimes referred to as a "7-bit ASCII".
1N/AASCII was described by the American National Standards Institute
1N/Adocument ANSI X3.4-1986. It was also described by ISO 646:1991
1N/A(with localization for currency symbols). The full ASCII set is
1N/Agiven in the table below as the first 128 elements. Languages that
1N/Acan be written adequately with the characters in ASCII include
1N/AEnglish, Hawaiian, Indonesian, Swahili and some Native American
1N/Alanguages.
1N/A
1N/AThere are many character sets that extend the range of integers
1N/Afrom 0..2**7-1 up to 2**8-1, or 8 bit bytes (octets if you prefer).
1N/AOne common one is the ISO 8859-1 character set.
1N/A
1N/A=head2 ISO 8859
1N/A
1N/AThe ISO 8859-$n are a collection of character code sets from the
1N/AInternational Organization for Standardization (ISO) each of which
1N/Aadds characters to the ASCII set that are typically found in European
1N/Alanguages many of which are based on the Roman, or Latin, alphabet.
1N/A
1N/A=head2 Latin 1 (ISO 8859-1)
1N/A
1N/AA particular 8-bit extension to ASCII that includes grave and acute
1N/Aaccented Latin characters. Languages that can employ ISO 8859-1
1N/Ainclude all the languages covered by ASCII as well as Afrikaans,
1N/AAlbanian, Basque, Catalan, Danish, Faroese, Finnish, Norwegian,
1N/APortuguese, Spanish, and Swedish. Dutch is covered albeit without
1N/Athe ij ligature. French is covered too but without the oe ligature.
1N/AGerman can use ISO 8859-1 but must do so without German-style
1N/Aquotation marks. This set is based on Western European extensions
1N/Ato ASCII and is commonly encountered in world wide web work.
1N/AIn IBM character code set identification terminology ISO 8859-1 is
1N/Aalso known as CCSID 819 (or sometimes 0819 or even 00819).
1N/A
1N/A=head2 EBCDIC
1N/A
1N/AThe Extended Binary Coded Decimal Interchange Code refers to a
1N/Alarge collection of slightly different single and multi byte
1N/Acoded character sets that are different from ASCII or ISO 8859-1
1N/Aand typically run on host computers. The EBCDIC encodings derive
1N/Afrom 8 bit byte extensions of Hollerith punched card encodings.
1N/AThe layout on the cards was such that high bits were set for the
1N/Aupper and lower case alphabet characters [a-z] and [A-Z], but there
1N/Awere gaps within each latin alphabet range.
1N/A
1N/ASome IBM EBCDIC character sets may be known by character code set
1N/Aidentification numbers (CCSID numbers) or code page numbers. Leading
1N/Azero digits in CCSID numbers within this document are insignificant.
1N/AE.g. CCSID 0037 may be referred to as 37 in places.
1N/A
1N/A=head2 13 variant characters
1N/A
1N/AAmong IBM EBCDIC character code sets there are 13 characters that
1N/Aare often mapped to different integer values. Those characters
1N/Aare known as the 13 "variant" characters and are:
1N/A
1N/A \ [ ] { } ^ ~ ! # | $ @ `
1N/A
1N/A=head2 0037
1N/A
1N/ACharacter code set ID 0037 is a mapping of the ASCII plus Latin-1
1N/Acharacters (i.e. ISO 8859-1) to an EBCDIC set. 0037 is used
1N/Ain North American English locales on the OS/400 operating system
1N/Athat runs on AS/400 computers. CCSID 37 differs from ISO 8859-1
1N/Ain 237 places, in other words they agree on only 19 code point values.
1N/A
1N/A=head2 1047
1N/A
1N/ACharacter code set ID 1047 is also a mapping of the ASCII plus
1N/ALatin-1 characters (i.e. ISO 8859-1) to an EBCDIC set. 1047 is
1N/Aused under Unix System Services for OS/390 or z/OS, and OpenEdition
1N/Afor VM/ESA. CCSID 1047 differs from CCSID 0037 in eight places.
1N/A
1N/A=head2 POSIX-BC
1N/A
1N/AThe EBCDIC code page in use on Siemens' BS2000 system is distinct from
1N/A1047 and 0037. It is identified below as the POSIX-BC set.
1N/A
1N/A=head2 Unicode code points versus EBCDIC code points
1N/A
1N/AIn Unicode terminology a I<code point> is the number assigned to a
1N/Acharacter: for example, in EBCDIC the character "A" is usually assigned
1N/Athe number 193. In Unicode the character "A" is assigned the number 65.
1N/AThis causes a problem with the semantics of the pack/unpack "U", which
1N/Aare supposed to pack Unicode code points to characters and back to numbers.
1N/AThe problem is: which code points to use for code points less than 256?
1N/A(for 256 and over there's no problem: Unicode code points are used)
1N/AIn EBCDIC, for the low 256 the EBCDIC code points are used. This
1N/Ameans that the equivalences
1N/A
1N/A pack("U", ord($character)) eq $character
1N/A unpack("U", $character) == ord $character
1N/A
1N/Awill hold. (If Unicode code points were applied consistently over
1N/Aall the possible code points, pack("U",ord("A")) would in EBCDIC
1N/Aequal I<A with acute> or chr(101), and unpack("U", "A") would equal
1N/A65, or I<non-breaking space>, not 193, or ord "A".)
1N/A
1N/A=head2 Remaining Perl Unicode problems in EBCDIC
1N/A
1N/A=over 4
1N/A
1N/A=item *
1N/A
1N/AMany of the remaining seem to be related to case-insensitive matching:
1N/Afor example, C<< /[\x{131}]/ >> (LATIN SMALL LETTER DOTLESS I) does
1N/Anot match "I" case-insensitively, as it should under Unicode.
1N/A(The match succeeds in ASCII-derived platforms.)
1N/A
1N/A=item *
1N/A
1N/AThe extensions Unicode::Collate and Unicode::Normalized are not
1N/Asupported under EBCDIC, likewise for the encoding pragma.
1N/A
1N/A=back
1N/A
1N/A=head2 Unicode and UTF
1N/A
1N/AUTF is a Unicode Transformation Format. UTF-8 is a Unicode conforming
1N/Arepresentation of the Unicode standard that looks very much like ASCII.
1N/AUTF-EBCDIC is an attempt to represent Unicode characters in an EBCDIC
1N/Atransparent manner.
1N/A
1N/A=head2 Using Encode
1N/A
1N/AStarting from Perl 5.8 you can use the standard new module Encode
1N/Ato translate from EBCDIC to Latin-1 code points
1N/A
1N/A use Encode 'from_to';
1N/A
1N/A my %ebcdic = ( 176 => 'cp37', 95 => 'cp1047', 106 => 'posix-bc' );
1N/A
1N/A # $a is in EBCDIC code points
1N/A from_to($a, $ebcdic{ord '^'}, 'latin1');
1N/A # $a is ISO 8859-1 code points
1N/A
1N/Aand from Latin-1 code points to EBCDIC code points
1N/A
1N/A use Encode 'from_to';
1N/A
1N/A my %ebcdic = ( 176 => 'cp37', 95 => 'cp1047', 106 => 'posix-bc' );
1N/A
1N/A # $a is ISO 8859-1 code points
1N/A from_to($a, 'latin1', $ebcdic{ord '^'});
1N/A # $a is in EBCDIC code points
1N/A
1N/AFor doing I/O it is suggested that you use the autotranslating features
1N/Aof PerlIO, see L<perluniintro>.
1N/A
1N/ASince version 5.8 Perl uses the new PerlIO I/O library. This enables
1N/Ayou to use different encodings per IO channel. For example you may use
1N/A
1N/A use Encode;
1N/A open($f, ">:encoding(ascii)", "test.ascii");
1N/A print $f "Hello World!\n";
1N/A open($f, ">:encoding(cp37)", "test.ebcdic");
1N/A print $f "Hello World!\n";
1N/A open($f, ">:encoding(latin1)", "test.latin1");
1N/A print $f "Hello World!\n";
1N/A open($f, ">:encoding(utf8)", "test.utf8");
1N/A print $f "Hello World!\n";
1N/A
1N/Ato get two files containing "Hello World!\n" in ASCII, CP 37 EBCDIC,
1N/AISO 8859-1 (Latin-1) (in this example identical to ASCII) respective
1N/AUTF-EBCDIC (in this example identical to normal EBCDIC). See the
1N/Adocumentation of Encode::PerlIO for details.
1N/A
1N/AAs the PerlIO layer uses raw IO (bytes) internally, all this totally
1N/Aignores things like the type of your filesystem (ASCII or EBCDIC).
1N/A
1N/A=head1 SINGLE OCTET TABLES
1N/A
1N/AThe following tables list the ASCII and Latin 1 ordered sets including
1N/Athe subsets: C0 controls (0..31), ASCII graphics (32..7e), delete (7f),
1N/AC1 controls (80..9f), and Latin-1 (a.k.a. ISO 8859-1) (a0..ff). In the
1N/Atable non-printing control character names as well as the Latin 1
1N/Aextensions to ASCII have been labelled with character names roughly
1N/Acorresponding to I<The Unicode Standard, Version 3.0> albeit with
1N/Asubstitutions such as s/LATIN// and s/VULGAR// in all cases,
1N/As/CAPITAL LETTER// in some cases, and s/SMALL LETTER ([A-Z])/\l$1/
1N/Ain some other cases (the C<charnames> pragma names unfortunately do
1N/Anot list explicit names for the C0 or C1 control characters). The
1N/A"names" of the C1 control set (128..159 in ISO 8859-1) listed here are
1N/Asomewhat arbitrary. The differences between the 0037 and 1047 sets are
1N/Aflagged with ***. The differences between the 1047 and POSIX-BC sets
1N/Aare flagged with ###. All ord() numbers listed are decimal. If you
1N/Awould rather see this table listing octal values then run the table
1N/A(that is, the pod version of this document since this recipe may not
1N/Awork with a pod2_other_format translation) through:
1N/A
1N/A=over 4
1N/A
1N/A=item recipe 0
1N/A
1N/A=back
1N/A
1N/A perl -ne 'if(/(.{33})(\d+)\s+(\d+)\s+(\d+)\s+(\d+)/)' \
1N/A -e '{printf("%s%-9o%-9o%-9o%o\n",$1,$2,$3,$4,$5)}' perlebcdic.pod
1N/A
1N/AIf you want to retain the UTF-x code points then in script form you
1N/Amight want to write:
1N/A
1N/A=over 4
1N/A
1N/A=item recipe 1
1N/A
1N/A=back
1N/A
1N/A open(FH,"<perlebcdic.pod") or die "Could not open perlebcdic.pod: $!";
1N/A while (<FH>) {
1N/A if (/(.{33})(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\.?(\d*)\s+(\d+)\.?(\d*)/) {
1N/A if ($7 ne '' && $9 ne '') {
1N/A printf("%s%-9o%-9o%-9o%-9o%-3o.%-5o%-3o.%o\n",$1,$2,$3,$4,$5,$6,$7,$8,$9);
1N/A }
1N/A elsif ($7 ne '') {
1N/A printf("%s%-9o%-9o%-9o%-9o%-3o.%-5o%o\n",$1,$2,$3,$4,$5,$6,$7,$8);
1N/A }
1N/A else {
1N/A printf("%s%-9o%-9o%-9o%-9o%-9o%o\n",$1,$2,$3,$4,$5,$6,$8);
1N/A }
1N/A }
1N/A }
1N/A
1N/AIf you would rather see this table listing hexadecimal values then
1N/Arun the table through:
1N/A
1N/A=over 4
1N/A
1N/A=item recipe 2
1N/A
1N/A=back
1N/A
1N/A perl -ne 'if(/(.{33})(\d+)\s+(\d+)\s+(\d+)\s+(\d+)/)' \
1N/A -e '{printf("%s%-9X%-9X%-9X%X\n",$1,$2,$3,$4,$5)}' perlebcdic.pod
1N/A
1N/AOr, in order to retain the UTF-x code points in hexadecimal:
1N/A
1N/A=over 4
1N/A
1N/A=item recipe 3
1N/A
1N/A=back
1N/A
1N/A open(FH,"<perlebcdic.pod") or die "Could not open perlebcdic.pod: $!";
1N/A while (<FH>) {
1N/A if (/(.{33})(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\.?(\d*)\s+(\d+)\.?(\d*)/) {
1N/A if ($7 ne '' && $9 ne '') {
1N/A printf("%s%-9X%-9X%-9X%-9X%-2X.%-6X%-2X.%X\n",$1,$2,$3,$4,$5,$6,$7,$8,$9);
1N/A }
1N/A elsif ($7 ne '') {
1N/A printf("%s%-9X%-9X%-9X%-9X%-2X.%-6X%X\n",$1,$2,$3,$4,$5,$6,$7,$8);
1N/A }
1N/A else {
1N/A printf("%s%-9X%-9X%-9X%-9X%-9X%X\n",$1,$2,$3,$4,$5,$6,$8);
1N/A }
1N/A }
1N/A }
1N/A
1N/A
1N/A incomp- incomp-
1N/A 8859-1 lete lete
1N/A chr 0819 0037 1047 POSIX-BC UTF-8 UTF-EBCDIC
1N/A ------------------------------------------------------------------------------------
1N/A <NULL> 0 0 0 0 0 0
1N/A <START OF HEADING> 1 1 1 1 1 1
1N/A <START OF TEXT> 2 2 2 2 2 2
1N/A <END OF TEXT> 3 3 3 3 3 3
1N/A <END OF TRANSMISSION> 4 55 55 55 4 55
1N/A <ENQUIRY> 5 45 45 45 5 45
1N/A <ACKNOWLEDGE> 6 46 46 46 6 46
1N/A <BELL> 7 47 47 47 7 47
1N/A <BACKSPACE> 8 22 22 22 8 22
1N/A <HORIZONTAL TABULATION> 9 5 5 5 9 5
1N/A <LINE FEED> 10 37 21 21 10 21 ***
1N/A <VERTICAL TABULATION> 11 11 11 11 11 11
1N/A <FORM FEED> 12 12 12 12 12 12
1N/A <CARRIAGE RETURN> 13 13 13 13 13 13
1N/A <SHIFT OUT> 14 14 14 14 14 14
1N/A <SHIFT IN> 15 15 15 15 15 15
1N/A <DATA LINK ESCAPE> 16 16 16 16 16 16
1N/A <DEVICE CONTROL ONE> 17 17 17 17 17 17
1N/A <DEVICE CONTROL TWO> 18 18 18 18 18 18
1N/A <DEVICE CONTROL THREE> 19 19 19 19 19 19
1N/A <DEVICE CONTROL FOUR> 20 60 60 60 20 60
1N/A <NEGATIVE ACKNOWLEDGE> 21 61 61 61 21 61
1N/A <SYNCHRONOUS IDLE> 22 50 50 50 22 50
1N/A <END OF TRANSMISSION BLOCK> 23 38 38 38 23 38
1N/A <CANCEL> 24 24 24 24 24 24
1N/A <END OF MEDIUM> 25 25 25 25 25 25
1N/A <SUBSTITUTE> 26 63 63 63 26 63
1N/A <ESCAPE> 27 39 39 39 27 39
1N/A <FILE SEPARATOR> 28 28 28 28 28 28
1N/A <GROUP SEPARATOR> 29 29 29 29 29 29
1N/A <RECORD SEPARATOR> 30 30 30 30 30 30
1N/A <UNIT SEPARATOR> 31 31 31 31 31 31
1N/A <SPACE> 32 64 64 64 32 64
1N/A ! 33 90 90 90 33 90
1N/A " 34 127 127 127 34 127
1N/A # 35 123 123 123 35 123
1N/A $ 36 91 91 91 36 91
1N/A % 37 108 108 108 37 108
1N/A & 38 80 80 80 38 80
1N/A ' 39 125 125 125 39 125
1N/A ( 40 77 77 77 40 77
1N/A ) 41 93 93 93 41 93
1N/A * 42 92 92 92 42 92
1N/A + 43 78 78 78 43 78
1N/A , 44 107 107 107 44 107
1N/A - 45 96 96 96 45 96
1N/A . 46 75 75 75 46 75
1N/A / 47 97 97 97 47 97
1N/A 0 48 240 240 240 48 240
1N/A 1 49 241 241 241 49 241
1N/A 2 50 242 242 242 50 242
1N/A 3 51 243 243 243 51 243
1N/A 4 52 244 244 244 52 244
1N/A 5 53 245 245 245 53 245
1N/A 6 54 246 246 246 54 246
1N/A 7 55 247 247 247 55 247
1N/A 8 56 248 248 248 56 248
1N/A 9 57 249 249 249 57 249
1N/A : 58 122 122 122 58 122
1N/A ; 59 94 94 94 59 94
1N/A < 60 76 76 76 60 76
1N/A = 61 126 126 126 61 126
1N/A > 62 110 110 110 62 110
1N/A ? 63 111 111 111 63 111
1N/A @ 64 124 124 124 64 124
1N/A A 65 193 193 193 65 193
1N/A B 66 194 194 194 66 194
1N/A C 67 195 195 195 67 195
1N/A D 68 196 196 196 68 196
1N/A E 69 197 197 197 69 197
1N/A F 70 198 198 198 70 198
1N/A G 71 199 199 199 71 199
1N/A H 72 200 200 200 72 200
1N/A I 73 201 201 201 73 201
1N/A J 74 209 209 209 74 209
1N/A K 75 210 210 210 75 210
1N/A L 76 211 211 211 76 211
1N/A M 77 212 212 212 77 212
1N/A N 78 213 213 213 78 213
1N/A O 79 214 214 214 79 214
1N/A P 80 215 215 215 80 215
1N/A Q 81 216 216 216 81 216
1N/A R 82 217 217 217 82 217
1N/A S 83 226 226 226 83 226
1N/A T 84 227 227 227 84 227
1N/A U 85 228 228 228 85 228
1N/A V 86 229 229 229 86 229
1N/A W 87 230 230 230 87 230
1N/A X 88 231 231 231 88 231
1N/A Y 89 232 232 232 89 232
1N/A Z 90 233 233 233 90 233
1N/A [ 91 186 173 187 91 173 *** ###
1N/A \ 92 224 224 188 92 224 ###
1N/A ] 93 187 189 189 93 189 ***
1N/A ^ 94 176 95 106 94 95 *** ###
1N/A _ 95 109 109 109 95 109
1N/A ` 96 121 121 74 96 121 ###
1N/A a 97 129 129 129 97 129
1N/A b 98 130 130 130 98 130
1N/A c 99 131 131 131 99 131
1N/A d 100 132 132 132 100 132
1N/A e 101 133 133 133 101 133
1N/A f 102 134 134 134 102 134
1N/A g 103 135 135 135 103 135
1N/A h 104 136 136 136 104 136
1N/A i 105 137 137 137 105 137
1N/A j 106 145 145 145 106 145
1N/A k 107 146 146 146 107 146
1N/A l 108 147 147 147 108 147
1N/A m 109 148 148 148 109 148
1N/A n 110 149 149 149 110 149
1N/A o 111 150 150 150 111 150
1N/A p 112 151 151 151 112 151
1N/A q 113 152 152 152 113 152
1N/A r 114 153 153 153 114 153
1N/A s 115 162 162 162 115 162
1N/A t 116 163 163 163 116 163
1N/A u 117 164 164 164 117 164
1N/A v 118 165 165 165 118 165
1N/A w 119 166 166 166 119 166
1N/A x 120 167 167 167 120 167
1N/A y 121 168 168 168 121 168
1N/A z 122 169 169 169 122 169
1N/A { 123 192 192 251 123 192 ###
1N/A | 124 79 79 79 124 79
1N/A } 125 208 208 253 125 208 ###
1N/A ~ 126 161 161 255 126 161 ###
1N/A <DELETE> 127 7 7 7 127 7
1N/A <C1 0> 128 32 32 32 194.128 32
1N/A <C1 1> 129 33 33 33 194.129 33
1N/A <C1 2> 130 34 34 34 194.130 34
1N/A <C1 3> 131 35 35 35 194.131 35
1N/A <C1 4> 132 36 36 36 194.132 36
1N/A <C1 5> 133 21 37 37 194.133 37 ***
1N/A <C1 6> 134 6 6 6 194.134 6
1N/A <C1 7> 135 23 23 23 194.135 23
1N/A <C1 8> 136 40 40 40 194.136 40
1N/A <C1 9> 137 41 41 41 194.137 41
1N/A <C1 10> 138 42 42 42 194.138 42
1N/A <C1 11> 139 43 43 43 194.139 43
1N/A <C1 12> 140 44 44 44 194.140 44
1N/A <C1 13> 141 9 9 9 194.141 9
1N/A <C1 14> 142 10 10 10 194.142 10
1N/A <C1 15> 143 27 27 27 194.143 27
1N/A <C1 16> 144 48 48 48 194.144 48
1N/A <C1 17> 145 49 49 49 194.145 49
1N/A <C1 18> 146 26 26 26 194.146 26
1N/A <C1 19> 147 51 51 51 194.147 51
1N/A <C1 20> 148 52 52 52 194.148 52
1N/A <C1 21> 149 53 53 53 194.149 53
1N/A <C1 22> 150 54 54 54 194.150 54
1N/A <C1 23> 151 8 8 8 194.151 8
1N/A <C1 24> 152 56 56 56 194.152 56
1N/A <C1 25> 153 57 57 57 194.153 57
1N/A <C1 26> 154 58 58 58 194.154 58
1N/A <C1 27> 155 59 59 59 194.155 59
1N/A <C1 28> 156 4 4 4 194.156 4
1N/A <C1 29> 157 20 20 20 194.157 20
1N/A <C1 30> 158 62 62 62 194.158 62
1N/A <C1 31> 159 255 255 95 194.159 255 ###
1N/A <NON-BREAKING SPACE> 160 65 65 65 194.160 128.65
1N/A <INVERTED EXCLAMATION MARK> 161 170 170 170 194.161 128.66
1N/A <CENT SIGN> 162 74 74 176 194.162 128.67 ###
1N/A <POUND SIGN> 163 177 177 177 194.163 128.68
1N/A <CURRENCY SIGN> 164 159 159 159 194.164 128.69
1N/A <YEN SIGN> 165 178 178 178 194.165 128.70
1N/A <BROKEN BAR> 166 106 106 208 194.166 128.71 ###
1N/A <SECTION SIGN> 167 181 181 181 194.167 128.72
1N/A <DIAERESIS> 168 189 187 121 194.168 128.73 *** ###
1N/A <COPYRIGHT SIGN> 169 180 180 180 194.169 128.74
1N/A <FEMININE ORDINAL INDICATOR> 170 154 154 154 194.170 128.81
1N/A <LEFT POINTING GUILLEMET> 171 138 138 138 194.171 128.82
1N/A <NOT SIGN> 172 95 176 186 194.172 128.83 *** ###
1N/A <SOFT HYPHEN> 173 202 202 202 194.173 128.84
1N/A <REGISTERED TRADE MARK SIGN> 174 175 175 175 194.174 128.85
1N/A <MACRON> 175 188 188 161 194.175 128.86 ###
1N/A <DEGREE SIGN> 176 144 144 144 194.176 128.87
1N/A <PLUS-OR-MINUS SIGN> 177 143 143 143 194.177 128.88
1N/A <SUPERSCRIPT TWO> 178 234 234 234 194.178 128.89
1N/A <SUPERSCRIPT THREE> 179 250 250 250 194.179 128.98
1N/A <ACUTE ACCENT> 180 190 190 190 194.180 128.99
1N/A <MICRO SIGN> 181 160 160 160 194.181 128.100
1N/A <PARAGRAPH SIGN> 182 182 182 182 194.182 128.101
1N/A <MIDDLE DOT> 183 179 179 179 194.183 128.102
1N/A <CEDILLA> 184 157 157 157 194.184 128.103
1N/A <SUPERSCRIPT ONE> 185 218 218 218 194.185 128.104
1N/A <MASC. ORDINAL INDICATOR> 186 155 155 155 194.186 128.105
1N/A <RIGHT POINTING GUILLEMET> 187 139 139 139 194.187 128.106
1N/A <FRACTION ONE QUARTER> 188 183 183 183 194.188 128.112
1N/A <FRACTION ONE HALF> 189 184 184 184 194.189 128.113
1N/A <FRACTION THREE QUARTERS> 190 185 185 185 194.190 128.114
1N/A <INVERTED QUESTION MARK> 191 171 171 171 194.191 128.115
1N/A <A WITH GRAVE> 192 100 100 100 195.128 138.65
1N/A <A WITH ACUTE> 193 101 101 101 195.129 138.66
1N/A <A WITH CIRCUMFLEX> 194 98 98 98 195.130 138.67
1N/A <A WITH TILDE> 195 102 102 102 195.131 138.68
1N/A <A WITH DIAERESIS> 196 99 99 99 195.132 138.69
1N/A <A WITH RING ABOVE> 197 103 103 103 195.133 138.70
1N/A <CAPITAL LIGATURE AE> 198 158 158 158 195.134 138.71
1N/A <C WITH CEDILLA> 199 104 104 104 195.135 138.72
1N/A <E WITH GRAVE> 200 116 116 116 195.136 138.73
1N/A <E WITH ACUTE> 201 113 113 113 195.137 138.74
1N/A <E WITH CIRCUMFLEX> 202 114 114 114 195.138 138.81
1N/A <E WITH DIAERESIS> 203 115 115 115 195.139 138.82
1N/A <I WITH GRAVE> 204 120 120 120 195.140 138.83
1N/A <I WITH ACUTE> 205 117 117 117 195.141 138.84
1N/A <I WITH CIRCUMFLEX> 206 118 118 118 195.142 138.85
1N/A <I WITH DIAERESIS> 207 119 119 119 195.143 138.86
1N/A <CAPITAL LETTER ETH> 208 172 172 172 195.144 138.87
1N/A <N WITH TILDE> 209 105 105 105 195.145 138.88
1N/A <O WITH GRAVE> 210 237 237 237 195.146 138.89
1N/A <O WITH ACUTE> 211 238 238 238 195.147 138.98
1N/A <O WITH CIRCUMFLEX> 212 235 235 235 195.148 138.99
1N/A <O WITH TILDE> 213 239 239 239 195.149 138.100
1N/A <O WITH DIAERESIS> 214 236 236 236 195.150 138.101
1N/A <MULTIPLICATION SIGN> 215 191 191 191 195.151 138.102
1N/A <O WITH STROKE> 216 128 128 128 195.152 138.103
1N/A <U WITH GRAVE> 217 253 253 224 195.153 138.104 ###
1N/A <U WITH ACUTE> 218 254 254 254 195.154 138.105
1N/A <U WITH CIRCUMFLEX> 219 251 251 221 195.155 138.106 ###
1N/A <U WITH DIAERESIS> 220 252 252 252 195.156 138.112
1N/A <Y WITH ACUTE> 221 173 186 173 195.157 138.113 *** ###
1N/A <CAPITAL LETTER THORN> 222 174 174 174 195.158 138.114
1N/A <SMALL LETTER SHARP S> 223 89 89 89 195.159 138.115
1N/A <a WITH GRAVE> 224 68 68 68 195.160 139.65
1N/A <a WITH ACUTE> 225 69 69 69 195.161 139.66
1N/A <a WITH CIRCUMFLEX> 226 66 66 66 195.162 139.67
1N/A <a WITH TILDE> 227 70 70 70 195.163 139.68
1N/A <a WITH DIAERESIS> 228 67 67 67 195.164 139.69
1N/A <a WITH RING ABOVE> 229 71 71 71 195.165 139.70
1N/A <SMALL LIGATURE ae> 230 156 156 156 195.166 139.71
1N/A <c WITH CEDILLA> 231 72 72 72 195.167 139.72
1N/A <e WITH GRAVE> 232 84 84 84 195.168 139.73
1N/A <e WITH ACUTE> 233 81 81 81 195.169 139.74
1N/A <e WITH CIRCUMFLEX> 234 82 82 82 195.170 139.81
1N/A <e WITH DIAERESIS> 235 83 83 83 195.171 139.82
1N/A <i WITH GRAVE> 236 88 88 88 195.172 139.83
1N/A <i WITH ACUTE> 237 85 85 85 195.173 139.84
1N/A <i WITH CIRCUMFLEX> 238 86 86 86 195.174 139.85
1N/A <i WITH DIAERESIS> 239 87 87 87 195.175 139.86
1N/A <SMALL LETTER eth> 240 140 140 140 195.176 139.87
1N/A <n WITH TILDE> 241 73 73 73 195.177 139.88
1N/A <o WITH GRAVE> 242 205 205 205 195.178 139.89
1N/A <o WITH ACUTE> 243 206 206 206 195.179 139.98
1N/A <o WITH CIRCUMFLEX> 244 203 203 203 195.180 139.99
1N/A <o WITH TILDE> 245 207 207 207 195.181 139.100
1N/A <o WITH DIAERESIS> 246 204 204 204 195.182 139.101
1N/A <DIVISION SIGN> 247 225 225 225 195.183 139.102
1N/A <o WITH STROKE> 248 112 112 112 195.184 139.103
1N/A <u WITH GRAVE> 249 221 221 192 195.185 139.104 ###
1N/A <u WITH ACUTE> 250 222 222 222 195.186 139.105
1N/A <u WITH CIRCUMFLEX> 251 219 219 219 195.187 139.106
1N/A <u WITH DIAERESIS> 252 220 220 220 195.188 139.112
1N/A <y WITH ACUTE> 253 141 141 141 195.189 139.113
1N/A <SMALL LETTER thorn> 254 142 142 142 195.190 139.114
1N/A <y WITH DIAERESIS> 255 223 223 223 195.191 139.115
1N/A
1N/AIf you would rather see the above table in CCSID 0037 order rather than
1N/AASCII + Latin-1 order then run the table through:
1N/A
1N/A=over 4
1N/A
1N/A=item recipe 4
1N/A
1N/A=back
1N/A
1N/A perl -ne 'if(/.{33}\d{1,3}\s{6,8}\d{1,3}\s{6,8}\d{1,3}\s{6,8}\d{1,3}/)'\
1N/A -e '{push(@l,$_)}' \
1N/A -e 'END{print map{$_->[0]}' \
1N/A -e ' sort{$a->[1] <=> $b->[1]}' \
1N/A -e ' map{[$_,substr($_,42,3)]}@l;}' perlebcdic.pod
1N/A
1N/AIf you would rather see it in CCSID 1047 order then change the digit
1N/A42 in the last line to 51, like this:
1N/A
1N/A=over 4
1N/A
1N/A=item recipe 5
1N/A
1N/A=back
1N/A
1N/A perl -ne 'if(/.{33}\d{1,3}\s{6,8}\d{1,3}\s{6,8}\d{1,3}\s{6,8}\d{1,3}/)'\
1N/A -e '{push(@l,$_)}' \
1N/A -e 'END{print map{$_->[0]}' \
1N/A -e ' sort{$a->[1] <=> $b->[1]}' \
1N/A -e ' map{[$_,substr($_,51,3)]}@l;}' perlebcdic.pod
1N/A
1N/AIf you would rather see it in POSIX-BC order then change the digit
1N/A51 in the last line to 60, like this:
1N/A
1N/A=over 4
1N/A
1N/A=item recipe 6
1N/A
1N/A=back
1N/A
1N/A perl -ne 'if(/.{33}\d{1,3}\s{6,8}\d{1,3}\s{6,8}\d{1,3}\s{6,8}\d{1,3}/)'\
1N/A -e '{push(@l,$_)}' \
1N/A -e 'END{print map{$_->[0]}' \
1N/A -e ' sort{$a->[1] <=> $b->[1]}' \
1N/A -e ' map{[$_,substr($_,60,3)]}@l;}' perlebcdic.pod
1N/A
1N/A
1N/A=head1 IDENTIFYING CHARACTER CODE SETS
1N/A
1N/ATo determine the character set you are running under from perl one
1N/Acould use the return value of ord() or chr() to test one or more
1N/Acharacter values. For example:
1N/A
1N/A $is_ascii = "A" eq chr(65);
1N/A $is_ebcdic = "A" eq chr(193);
1N/A
1N/AAlso, "\t" is a C<HORIZONTAL TABULATION> character so that:
1N/A
1N/A $is_ascii = ord("\t") == 9;
1N/A $is_ebcdic = ord("\t") == 5;
1N/A
1N/ATo distinguish EBCDIC code pages try looking at one or more of
1N/Athe characters that differ between them. For example:
1N/A
1N/A $is_ebcdic_37 = "\n" eq chr(37);
1N/A $is_ebcdic_1047 = "\n" eq chr(21);
1N/A
1N/AOr better still choose a character that is uniquely encoded in any
1N/Aof the code sets, e.g.:
1N/A
1N/A $is_ascii = ord('[') == 91;
1N/A $is_ebcdic_37 = ord('[') == 186;
1N/A $is_ebcdic_1047 = ord('[') == 173;
1N/A $is_ebcdic_POSIX_BC = ord('[') == 187;
1N/A
1N/AHowever, it would be unwise to write tests such as:
1N/A
1N/A $is_ascii = "\r" ne chr(13); # WRONG
1N/A $is_ascii = "\n" ne chr(10); # ILL ADVISED
1N/A
1N/AObviously the first of these will fail to distinguish most ASCII machines
1N/Afrom either a CCSID 0037, a 1047, or a POSIX-BC EBCDIC machine since "\r" eq
1N/Achr(13) under all of those coded character sets. But note too that
1N/Abecause "\n" is chr(13) and "\r" is chr(10) on the MacIntosh (which is an
1N/AASCII machine) the second C<$is_ascii> test will lead to trouble there.
1N/A
1N/ATo determine whether or not perl was built under an EBCDIC
1N/Acode page you can use the Config module like so:
1N/A
1N/A use Config;
1N/A $is_ebcdic = $Config{'ebcdic'} eq 'define';
1N/A
1N/A=head1 CONVERSIONS
1N/A
1N/A=head2 tr///
1N/A
1N/AIn order to convert a string of characters from one character set to
1N/Aanother a simple list of numbers, such as in the right columns in the
1N/Aabove table, along with perl's tr/// operator is all that is needed.
1N/AThe data in the table are in ASCII order hence the EBCDIC columns
1N/Aprovide easy to use ASCII to EBCDIC operations that are also easily
1N/Areversed.
1N/A
1N/AFor example, to convert ASCII to code page 037 take the output of the second
1N/Acolumn from the output of recipe 0 (modified to add \\ characters) and use
1N/Ait in tr/// like so:
1N/A
1N/A $cp_037 =
1N/A '\000\001\002\003\234\011\206\177\227\215\216\013\014\015\016\017' .
1N/A '\020\021\022\023\235\205\010\207\030\031\222\217\034\035\036\037' .
1N/A '\200\201\202\203\204\012\027\033\210\211\212\213\214\005\006\007' .
1N/A '\220\221\026\223\224\225\226\004\230\231\232\233\024\025\236\032' .
1N/A '\040\240\342\344\340\341\343\345\347\361\242\056\074\050\053\174' .
1N/A '\046\351\352\353\350\355\356\357\354\337\041\044\052\051\073\254' .
1N/A '\055\057\302\304\300\301\303\305\307\321\246\054\045\137\076\077' .
1N/A '\370\311\312\313\310\315\316\317\314\140\072\043\100\047\075\042' .
1N/A '\330\141\142\143\144\145\146\147\150\151\253\273\360\375\376\261' .
1N/A '\260\152\153\154\155\156\157\160\161\162\252\272\346\270\306\244' .
1N/A '\265\176\163\164\165\166\167\170\171\172\241\277\320\335\336\256' .
1N/A '\136\243\245\267\251\247\266\274\275\276\133\135\257\250\264\327' .
1N/A '\173\101\102\103\104\105\106\107\110\111\255\364\366\362\363\365' .
1N/A '\175\112\113\114\115\116\117\120\121\122\271\373\374\371\372\377' .
1N/A '\134\367\123\124\125\126\127\130\131\132\262\324\326\322\323\325' .
1N/A '\060\061\062\063\064\065\066\067\070\071\263\333\334\331\332\237' ;
1N/A
1N/A my $ebcdic_string = $ascii_string;
1N/A eval '$ebcdic_string =~ tr/' . $cp_037 . '/\000-\377/';
1N/A
1N/ATo convert from EBCDIC 037 to ASCII just reverse the order of the tr///
1N/Aarguments like so:
1N/A
1N/A my $ascii_string = $ebcdic_string;
1N/A eval '$ascii_string =~ tr/\000-\377/' . $cp_037 . '/';
1N/A
1N/ASimilarly one could take the output of the third column from recipe 0 to
1N/Aobtain a C<$cp_1047> table. The fourth column of the output from recipe
1N/A0 could provide a C<$cp_posix_bc> table suitable for transcoding as well.
1N/A
1N/A=head2 iconv
1N/A
1N/AXPG operability often implies the presence of an I<iconv> utility
1N/Aavailable from the shell or from the C library. Consult your system's
1N/Adocumentation for information on iconv.
1N/A
1N/AOn OS/390 or z/OS see the iconv(1) manpage. One way to invoke the iconv
1N/Ashell utility from within perl would be to:
1N/A
1N/A # OS/390 or z/OS example
1N/A $ascii_data = `echo '$ebcdic_data'| iconv -f IBM-1047 -t ISO8859-1`
1N/A
1N/Aor the inverse map:
1N/A
1N/A # OS/390 or z/OS example
1N/A $ebcdic_data = `echo '$ascii_data'| iconv -f ISO8859-1 -t IBM-1047`
1N/A
1N/AFor other perl based conversion options see the Convert::* modules on CPAN.
1N/A
1N/A=head2 C RTL
1N/A
1N/AThe OS/390 and z/OS C run time libraries provide _atoe() and _etoa() functions.
1N/A
1N/A=head1 OPERATOR DIFFERENCES
1N/A
1N/AThe C<..> range operator treats certain character ranges with
1N/Acare on EBCDIC machines. For example the following array
1N/Awill have twenty six elements on either an EBCDIC machine
1N/Aor an ASCII machine:
1N/A
1N/A @alphabet = ('A'..'Z'); # $#alphabet == 25
1N/A
1N/AThe bitwise operators such as & ^ | may return different results
1N/Awhen operating on string or character data in a perl program running
1N/Aon an EBCDIC machine than when run on an ASCII machine. Here is
1N/Aan example adapted from the one in L<perlop>:
1N/A
1N/A # EBCDIC-based examples
1N/A print "j p \n" ^ " a h"; # prints "JAPH\n"
1N/A print "JA" | " ph\n"; # prints "japh\n"
1N/A print "JAPH\nJunk" & "\277\277\277\277\277"; # prints "japh\n";
1N/A print 'p N$' ^ " E<H\n"; # prints "Perl\n";
1N/A
1N/AAn interesting property of the 32 C0 control characters
1N/Ain the ASCII table is that they can "literally" be constructed
1N/Aas control characters in perl, e.g. C<(chr(0) eq "\c@")>
1N/AC<(chr(1) eq "\cA")>, and so on. Perl on EBCDIC machines has been
1N/Aported to take "\c@" to chr(0) and "\cA" to chr(1) as well, but the
1N/Athirty three characters that result depend on which code page you are
1N/Ausing. The table below uses the character names from the previous table
1N/Abut with substitutions such as s/START OF/S.O./; s/END OF /E.O./;
1N/As/TRANSMISSION/TRANS./; s/TABULATION/TAB./; s/VERTICAL/VERT./;
1N/As/HORIZONTAL/HORIZ./; s/DEVICE CONTROL/D.C./; s/SEPARATOR/SEP./;
1N/As/NEGATIVE ACKNOWLEDGE/NEG. ACK./;. The POSIX-BC and 1047 sets are
1N/Aidentical throughout this range and differ from the 0037 set at only
1N/Aone spot (21 decimal). Note that the C<LINE FEED> character
1N/Amay be generated by "\cJ" on ASCII machines but by "\cU" on 1047 or POSIX-BC
1N/Amachines and cannot be generated as a C<"\c.letter."> control character on
1N/A0037 machines. Note also that "\c\\" maps to two characters
1N/Anot one.
1N/A
1N/A chr ord 8859-1 0037 1047 && POSIX-BC
1N/A ------------------------------------------------------------------------
1N/A "\c?" 127 <DELETE> " " ***><
1N/A "\c@" 0 <NULL> <NULL> <NULL> ***><
1N/A "\cA" 1 <S.O. HEADING> <S.O. HEADING> <S.O. HEADING>
1N/A "\cB" 2 <S.O. TEXT> <S.O. TEXT> <S.O. TEXT>
1N/A "\cC" 3 <E.O. TEXT> <E.O. TEXT> <E.O. TEXT>
1N/A "\cD" 4 <E.O. TRANS.> <C1 28> <C1 28>
1N/A "\cE" 5 <ENQUIRY> <HORIZ. TAB.> <HORIZ. TAB.>
1N/A "\cF" 6 <ACKNOWLEDGE> <C1 6> <C1 6>
1N/A "\cG" 7 <BELL> <DELETE> <DELETE>
1N/A "\cH" 8 <BACKSPACE> <C1 23> <C1 23>
1N/A "\cI" 9 <HORIZ. TAB.> <C1 13> <C1 13>
1N/A "\cJ" 10 <LINE FEED> <C1 14> <C1 14>
1N/A "\cK" 11 <VERT. TAB.> <VERT. TAB.> <VERT. TAB.>
1N/A "\cL" 12 <FORM FEED> <FORM FEED> <FORM FEED>
1N/A "\cM" 13 <CARRIAGE RETURN> <CARRIAGE RETURN> <CARRIAGE RETURN>
1N/A "\cN" 14 <SHIFT OUT> <SHIFT OUT> <SHIFT OUT>
1N/A "\cO" 15 <SHIFT IN> <SHIFT IN> <SHIFT IN>
1N/A "\cP" 16 <DATA LINK ESCAPE> <DATA LINK ESCAPE> <DATA LINK ESCAPE>
1N/A "\cQ" 17 <D.C. ONE> <D.C. ONE> <D.C. ONE>
1N/A "\cR" 18 <D.C. TWO> <D.C. TWO> <D.C. TWO>
1N/A "\cS" 19 <D.C. THREE> <D.C. THREE> <D.C. THREE>
1N/A "\cT" 20 <D.C. FOUR> <C1 29> <C1 29>
1N/A "\cU" 21 <NEG. ACK.> <C1 5> <LINE FEED> ***
1N/A "\cV" 22 <SYNCHRONOUS IDLE> <BACKSPACE> <BACKSPACE>
1N/A "\cW" 23 <E.O. TRANS. BLOCK> <C1 7> <C1 7>
1N/A "\cX" 24 <CANCEL> <CANCEL> <CANCEL>
1N/A "\cY" 25 <E.O. MEDIUM> <E.O. MEDIUM> <E.O. MEDIUM>
1N/A "\cZ" 26 <SUBSTITUTE> <C1 18> <C1 18>
1N/A "\c[" 27 <ESCAPE> <C1 15> <C1 15>
1N/A "\c\\" 28 <FILE SEP.>\ <FILE SEP.>\ <FILE SEP.>\
1N/A "\c]" 29 <GROUP SEP.> <GROUP SEP.> <GROUP SEP.>
1N/A "\c^" 30 <RECORD SEP.> <RECORD SEP.> <RECORD SEP.> ***><
1N/A "\c_" 31 <UNIT SEP.> <UNIT SEP.> <UNIT SEP.> ***><
1N/A
1N/A
1N/A=head1 FUNCTION DIFFERENCES
1N/A
1N/A=over 8
1N/A
1N/A=item chr()
1N/A
1N/Achr() must be given an EBCDIC code number argument to yield a desired
1N/Acharacter return value on an EBCDIC machine. For example:
1N/A
1N/A $CAPITAL_LETTER_A = chr(193);
1N/A
1N/A=item ord()
1N/A
1N/Aord() will return EBCDIC code number values on an EBCDIC machine.
1N/AFor example:
1N/A
1N/A $the_number_193 = ord("A");
1N/A
1N/A=item pack()
1N/A
1N/AThe c and C templates for pack() are dependent upon character set
1N/Aencoding. Examples of usage on EBCDIC include:
1N/A
1N/A $foo = pack("CCCC",193,194,195,196);
1N/A # $foo eq "ABCD"
1N/A $foo = pack("C4",193,194,195,196);
1N/A # same thing
1N/A
1N/A $foo = pack("ccxxcc",193,194,195,196);
1N/A # $foo eq "AB\0\0CD"
1N/A
1N/A=item print()
1N/A
1N/AOne must be careful with scalars and strings that are passed to
1N/Aprint that contain ASCII encodings. One common place
1N/Afor this to occur is in the output of the MIME type header for
1N/ACGI script writing. For example, many perl programming guides
1N/Arecommend something similar to:
1N/A
1N/A print "Content-type:\ttext/html\015\012\015\012";
1N/A # this may be wrong on EBCDIC
1N/A
1N/AUnder the IBM OS/390 USS Web Server or WebSphere on z/OS for example
1N/Ayou should instead write that as:
1N/A
1N/A print "Content-type:\ttext/html\r\n\r\n"; # OK for DGW et alia
1N/A
1N/AThat is because the translation from EBCDIC to ASCII is done
1N/Aby the web server in this case (such code will not be appropriate for
1N/Athe Macintosh however). Consult your web server's documentation for
1N/Afurther details.
1N/A
1N/A=item printf()
1N/A
1N/AThe formats that can convert characters to numbers and vice versa
1N/Awill be different from their ASCII counterparts when executed
1N/Aon an EBCDIC machine. Examples include:
1N/A
1N/A printf("%c%c%c",193,194,195); # prints ABC
1N/A
1N/A=item sort()
1N/A
1N/AEBCDIC sort results may differ from ASCII sort results especially for
1N/Amixed case strings. This is discussed in more detail below.
1N/A
1N/A=item sprintf()
1N/A
1N/ASee the discussion of printf() above. An example of the use
1N/Aof sprintf would be:
1N/A
1N/A $CAPITAL_LETTER_A = sprintf("%c",193);
1N/A
1N/A=item unpack()
1N/A
1N/ASee the discussion of pack() above.
1N/A
1N/A=back
1N/A
1N/A=head1 REGULAR EXPRESSION DIFFERENCES
1N/A
1N/AAs of perl 5.005_03 the letter range regular expression such as
1N/A[A-Z] and [a-z] have been especially coded to not pick up gap
1N/Acharacters. For example, characters such as E<ocirc> C<o WITH CIRCUMFLEX>
1N/Athat lie between I and J would not be matched by the
1N/Aregular expression range C</[H-K]/>. This works in
1N/Athe other direction, too, if either of the range end points is
1N/Aexplicitly numeric: C<[\x89-\x91]> will match C<\x8e>, even
1N/Athough C<\x89> is C<i> and C<\x91 > is C<j>, and C<\x8e>
1N/Ais a gap character from the alphabetic viewpoint.
1N/A
1N/AIf you do want to match the alphabet gap characters in a single octet
1N/Aregular expression try matching the hex or octal code such
1N/Aas C</\313/> on EBCDIC or C</\364/> on ASCII machines to
1N/Ahave your regular expression match C<o WITH CIRCUMFLEX>.
1N/A
1N/AAnother construct to be wary of is the inappropriate use of hex or
1N/Aoctal constants in regular expressions. Consider the following
1N/Aset of subs:
1N/A
1N/A sub is_c0 {
1N/A my $char = substr(shift,0,1);
1N/A $char =~ /[\000-\037]/;
1N/A }
1N/A
1N/A sub is_print_ascii {
1N/A my $char = substr(shift,0,1);
1N/A $char =~ /[\040-\176]/;
1N/A }
1N/A
1N/A sub is_delete {
1N/A my $char = substr(shift,0,1);
1N/A $char eq "\177";
1N/A }
1N/A
1N/A sub is_c1 {
1N/A my $char = substr(shift,0,1);
1N/A $char =~ /[\200-\237]/;
1N/A }
1N/A
1N/A sub is_latin_1 {
1N/A my $char = substr(shift,0,1);
1N/A $char =~ /[\240-\377]/;
1N/A }
1N/A
1N/AThe above would be adequate if the concern was only with numeric code points.
1N/AHowever, the concern may be with characters rather than code points
1N/Aand on an EBCDIC machine it may be desirable for constructs such as
1N/AC<if (is_print_ascii("A")) {print "A is a printable character\n";}> to print
1N/Aout the expected message. One way to represent the above collection
1N/Aof character classification subs that is capable of working across the
1N/Afour coded character sets discussed in this document is as follows:
1N/A
1N/A sub Is_c0 {
1N/A my $char = substr(shift,0,1);
1N/A if (ord('^')==94) { # ascii
1N/A return $char =~ /[\000-\037]/;
1N/A }
1N/A if (ord('^')==176) { # 37
1N/A return $char =~ /[\000-\003\067\055-\057\026\005\045\013-\023\074\075\062\046\030\031\077\047\034-\037]/;
1N/A }
1N/A if (ord('^')==95 || ord('^')==106) { # 1047 || posix-bc
1N/A return $char =~ /[\000-\003\067\055-\057\026\005\025\013-\023\074\075\062\046\030\031\077\047\034-\037]/;
1N/A }
1N/A }
1N/A
1N/A sub Is_print_ascii {
1N/A my $char = substr(shift,0,1);
1N/A $char =~ /[ !"\#\$%&'()*+,\-.\/0-9:;<=>?\@A-Z[\\\]^_`a-z{|}~]/;
1N/A }
1N/A
1N/A sub Is_delete {
1N/A my $char = substr(shift,0,1);
1N/A if (ord('^')==94) { # ascii
1N/A return $char eq "\177";
1N/A }
1N/A else { # ebcdic
1N/A return $char eq "\007";
1N/A }
1N/A }
1N/A
1N/A sub Is_c1 {
1N/A my $char = substr(shift,0,1);
1N/A if (ord('^')==94) { # ascii
1N/A return $char =~ /[\200-\237]/;
1N/A }
1N/A if (ord('^')==176) { # 37
1N/A return $char =~ /[\040-\044\025\006\027\050-\054\011\012\033\060\061\032\063-\066\010\070-\073\040\024\076\377]/;
1N/A }
1N/A if (ord('^')==95) { # 1047
1N/A return $char =~ /[\040-\045\006\027\050-\054\011\012\033\060\061\032\063-\066\010\070-\073\040\024\076\377]/;
1N/A }
1N/A if (ord('^')==106) { # posix-bc
1N/A return $char =~
1N/A /[\040-\045\006\027\050-\054\011\012\033\060\061\032\063-\066\010\070-\073\040\024\076\137]/;
1N/A }
1N/A }
1N/A
1N/A sub Is_latin_1 {
1N/A my $char = substr(shift,0,1);
1N/A if (ord('^')==94) { # ascii
1N/A return $char =~ /[\240-\377]/;
1N/A }
1N/A if (ord('^')==176) { # 37
1N/A return $char =~
1N/A /[\101\252\112\261\237\262\152\265\275\264\232\212\137\312\257\274\220\217\352\372\276\240\266\263\235\332\233\213\267\270\271\253\144\145\142\146\143\147\236\150\164\161-\163\170\165-\167\254\151\355\356\353\357\354\277\200\375\376\373\374\255\256\131\104\105\102\106\103\107\234\110\124\121-\123\130\125-\127\214\111\315\316\313\317\314\341\160\335\336\333\334\215\216\337]/;
1N/A }
1N/A if (ord('^')==95) { # 1047
1N/A return $char =~
1N/A /[\101\252\112\261\237\262\152\265\273\264\232\212\260\312\257\274\220\217\352\372\276\240\266\263\235\332\233\213\267\270\271\253\144\145\142\146\143\147\236\150\164\161-\163\170\165-\167\254\151\355\356\353\357\354\277\200\375\376\373\374\272\256\131\104\105\102\106\103\107\234\110\124\121-\123\130\125-\127\214\111\315\316\313\317\314\341\160\335\336\333\334\215\216\337]/;
1N/A }
1N/A if (ord('^')==106) { # posix-bc
1N/A return $char =~
1N/A /[\101\252\260\261\237\262\320\265\171\264\232\212\272\312\257\241\220\217\352\372\276\240\266\263\235\332\233\213\267\270\271\253\144\145\142\146\143\147\236\150\164\161-\163\170\165-\167\254\151\355\356\353\357\354\277\200\340\376\335\374\255\256\131\104\105\102\106\103\107\234\110\124\121-\123\130\125-\127\214\111\315\316\313\317\314\341\160\300\336\333\334\215\216\337]/;
1N/A }
1N/A }
1N/A
1N/ANote however that only the C<Is_ascii_print()> sub is really independent
1N/Aof coded character set. Another way to write C<Is_latin_1()> would be
1N/Ato use the characters in the range explicitly:
1N/A
1N/A sub Is_latin_1 {
1N/A my $char = substr(shift,0,1);
1N/A $char =~ /[������������������������������������������������������������������������������������������������]/;
1N/A }
1N/A
1N/AAlthough that form may run into trouble in network transit (due to the
1N/Apresence of 8 bit characters) or on non ISO-Latin character sets.
1N/A
1N/A=head1 SOCKETS
1N/A
1N/AMost socket programming assumes ASCII character encodings in network
1N/Abyte order. Exceptions can include CGI script writing under a
1N/Ahost web server where the server may take care of translation for you.
1N/AMost host web servers convert EBCDIC data to ISO-8859-1 or Unicode on
1N/Aoutput.
1N/A
1N/A=head1 SORTING
1N/A
1N/AOne big difference between ASCII based character sets and EBCDIC ones
1N/Aare the relative positions of upper and lower case letters and the
1N/Aletters compared to the digits. If sorted on an ASCII based machine the
1N/Atwo letter abbreviation for a physician comes before the two letter
1N/Afor drive, that is:
1N/A
1N/A @sorted = sort(qw(Dr. dr.)); # @sorted holds ('Dr.','dr.') on ASCII,
1N/A # but ('dr.','Dr.') on EBCDIC
1N/A
1N/AThe property of lower case before uppercase letters in EBCDIC is
1N/Aeven carried to the Latin 1 EBCDIC pages such as 0037 and 1047.
1N/AAn example would be that E<Euml> C<E WITH DIAERESIS> (203) comes
1N/Abefore E<euml> C<e WITH DIAERESIS> (235) on an ASCII machine, but
1N/Athe latter (83) comes before the former (115) on an EBCDIC machine.
1N/A(Astute readers will note that the upper case version of E<szlig>
1N/AC<SMALL LETTER SHARP S> is simply "SS" and that the upper case version of
1N/AE<yuml> C<y WITH DIAERESIS> is not in the 0..255 range but it is
1N/Aat U+x0178 in Unicode, or C<"\x{178}"> in a Unicode enabled Perl).
1N/A
1N/AThe sort order will cause differences between results obtained on
1N/AASCII machines versus EBCDIC machines. What follows are some suggestions
1N/Aon how to deal with these differences.
1N/A
1N/A=head2 Ignore ASCII vs. EBCDIC sort differences.
1N/A
1N/AThis is the least computationally expensive strategy. It may require
1N/Asome user education.
1N/A
1N/A=head2 MONO CASE then sort data.
1N/A
1N/AIn order to minimize the expense of mono casing mixed test try to
1N/AC<tr///> towards the character set case most employed within the data.
1N/AIf the data are primarily UPPERCASE non Latin 1 then apply tr/[a-z]/[A-Z]/
1N/Athen sort(). If the data are primarily lowercase non Latin 1 then
1N/Aapply tr/[A-Z]/[a-z]/ before sorting. If the data are primarily UPPERCASE
1N/Aand include Latin-1 characters then apply:
1N/A
1N/A tr/[a-z]/[A-Z]/;
1N/A tr/[������������������������������]/[������������������������������]/;
1N/A s/�/SS/g;
1N/A
1N/Athen sort(). Do note however that such Latin-1 manipulation does not
1N/Aaddress the E<yuml> C<y WITH DIAERESIS> character that will remain at
1N/Acode point 255 on ASCII machines, but 223 on most EBCDIC machines
1N/Awhere it will sort to a place less than the EBCDIC numerals. With a
1N/AUnicode enabled Perl you might try:
1N/A
1N/A tr/^?/\x{178}/;
1N/A
1N/AThe strategy of mono casing data before sorting does not preserve the case
1N/Aof the data and may not be acceptable for that reason.
1N/A
1N/A=head2 Convert, sort data, then re convert.
1N/A
1N/AThis is the most expensive proposition that does not employ a network
1N/Aconnection.
1N/A
1N/A=head2 Perform sorting on one type of machine only.
1N/A
1N/AThis strategy can employ a network connection. As such
1N/Ait would be computationally expensive.
1N/A
1N/A=head1 TRANSFORMATION FORMATS
1N/A
1N/AThere are a variety of ways of transforming data with an intra character set
1N/Amapping that serve a variety of purposes. Sorting was discussed in the
1N/Aprevious section and a few of the other more popular mapping techniques are
1N/Adiscussed next.
1N/A
1N/A=head2 URL decoding and encoding
1N/A
1N/ANote that some URLs have hexadecimal ASCII code points in them in an
1N/Aattempt to overcome character or protocol limitation issues. For example
1N/Athe tilde character is not on every keyboard hence a URL of the form:
1N/A
1N/A http://www.pvhp.com/~pvhp/
1N/A
1N/Amay also be expressed as either of:
1N/A
1N/A http://www.pvhp.com/%7Epvhp/
1N/A
1N/A http://www.pvhp.com/%7epvhp/
1N/A
1N/Awhere 7E is the hexadecimal ASCII code point for '~'. Here is an example
1N/Aof decoding such a URL under CCSID 1047:
1N/A
1N/A $url = 'http://www.pvhp.com/%7Epvhp/';
1N/A # this array assumes code page 1047
1N/A my @a2e_1047 = (
1N/A 0, 1, 2, 3, 55, 45, 46, 47, 22, 5, 21, 11, 12, 13, 14, 15,
1N/A 16, 17, 18, 19, 60, 61, 50, 38, 24, 25, 63, 39, 28, 29, 30, 31,
1N/A 64, 90,127,123, 91,108, 80,125, 77, 93, 92, 78,107, 96, 75, 97,
1N/A 240,241,242,243,244,245,246,247,248,249,122, 94, 76,126,110,111,
1N/A 124,193,194,195,196,197,198,199,200,201,209,210,211,212,213,214,
1N/A 215,216,217,226,227,228,229,230,231,232,233,173,224,189, 95,109,
1N/A 121,129,130,131,132,133,134,135,136,137,145,146,147,148,149,150,
1N/A 151,152,153,162,163,164,165,166,167,168,169,192, 79,208,161, 7,
1N/A 32, 33, 34, 35, 36, 37, 6, 23, 40, 41, 42, 43, 44, 9, 10, 27,
1N/A 48, 49, 26, 51, 52, 53, 54, 8, 56, 57, 58, 59, 4, 20, 62,255,
1N/A 65,170, 74,177,159,178,106,181,187,180,154,138,176,202,175,188,
1N/A 144,143,234,250,190,160,182,179,157,218,155,139,183,184,185,171,
1N/A 100,101, 98,102, 99,103,158,104,116,113,114,115,120,117,118,119,
1N/A 172,105,237,238,235,239,236,191,128,253,254,251,252,186,174, 89,
1N/A 68, 69, 66, 70, 67, 71,156, 72, 84, 81, 82, 83, 88, 85, 86, 87,
1N/A 140, 73,205,206,203,207,204,225,112,221,222,219,220,141,142,223
1N/A );
1N/A $url =~ s/%([0-9a-fA-F]{2})/pack("c",$a2e_1047[hex($1)])/ge;
1N/A
1N/AConversely, here is a partial solution for the task of encoding such
1N/Aa URL under the 1047 code page:
1N/A
1N/A $url = 'http://www.pvhp.com/~pvhp/';
1N/A # this array assumes code page 1047
1N/A my @e2a_1047 = (
1N/A 0, 1, 2, 3,156, 9,134,127,151,141,142, 11, 12, 13, 14, 15,
1N/A 16, 17, 18, 19,157, 10, 8,135, 24, 25,146,143, 28, 29, 30, 31,
1N/A 128,129,130,131,132,133, 23, 27,136,137,138,139,140, 5, 6, 7,
1N/A 144,145, 22,147,148,149,150, 4,152,153,154,155, 20, 21,158, 26,
1N/A 32,160,226,228,224,225,227,229,231,241,162, 46, 60, 40, 43,124,
1N/A 38,233,234,235,232,237,238,239,236,223, 33, 36, 42, 41, 59, 94,
1N/A 45, 47,194,196,192,193,195,197,199,209,166, 44, 37, 95, 62, 63,
1N/A 248,201,202,203,200,205,206,207,204, 96, 58, 35, 64, 39, 61, 34,
1N/A 216, 97, 98, 99,100,101,102,103,104,105,171,187,240,253,254,177,
1N/A 176,106,107,108,109,110,111,112,113,114,170,186,230,184,198,164,
1N/A 181,126,115,116,117,118,119,120,121,122,161,191,208, 91,222,174,
1N/A 172,163,165,183,169,167,182,188,189,190,221,168,175, 93,180,215,
1N/A 123, 65, 66, 67, 68, 69, 70, 71, 72, 73,173,244,246,242,243,245,
1N/A 125, 74, 75, 76, 77, 78, 79, 80, 81, 82,185,251,252,249,250,255,
1N/A 92,247, 83, 84, 85, 86, 87, 88, 89, 90,178,212,214,210,211,213,
1N/A 48, 49, 50, 51, 52, 53, 54, 55, 56, 57,179,219,220,217,218,159
1N/A );
1N/A # The following regular expression does not address the
1N/A # mappings for: ('.' => '%2E', '/' => '%2F', ':' => '%3A')
1N/A $url =~ s/([\t "#%&\(\),;<=>\?\@\[\\\]^`{|}~])/sprintf("%%%02X",$e2a_1047[ord($1)])/ge;
1N/A
1N/Awhere a more complete solution would split the URL into components
1N/Aand apply a full s/// substitution only to the appropriate parts.
1N/A
1N/AIn the remaining examples a @e2a or @a2e array may be employed
1N/Abut the assignment will not be shown explicitly. For code page 1047
1N/Ayou could use the @a2e_1047 or @e2a_1047 arrays just shown.
1N/A
1N/A=head2 uu encoding and decoding
1N/A
1N/AThe C<u> template to pack() or unpack() will render EBCDIC data in EBCDIC
1N/Acharacters equivalent to their ASCII counterparts. For example, the
1N/Afollowing will print "Yes indeed\n" on either an ASCII or EBCDIC computer:
1N/A
1N/A $all_byte_chrs = '';
1N/A for (0..255) { $all_byte_chrs .= chr($_); }
1N/A $uuencode_byte_chrs = pack('u', $all_byte_chrs);
1N/A ($uu = <<'ENDOFHEREDOC') =~ s/^\s*//gm;
1N/A M``$"`P0%!@<("0H+#`T.#Q`1$A,4%187&!D:&QP='A\@(2(C)"4F)R@I*BLL
1N/A M+2XO,#$R,S0U-C<X.3H[/#T^/T!!0D-$149'2$E*2TQ-3D]045)35%565UA9
1N/A M6EM<75Y?8&%B8V1E9F=H:6IK;&UN;W!Q<G-T=79W>'EZ>WQ]?G^`@8*#A(6&
1N/A MAXB)BHN,C8Z/D)&2DY25EI>8F9J;G)V>GZ"AHJ.DI::GJ*FJJZRMKJ^PL;*S
1N/A MM+6VM[BYNKN\O;Z_P,'"P\3%QL?(R<K+S,W.S]#1TM/4U=;7V-G:V]S=WM_@
1N/A ?X>+CY.7FY^CIZNOL[>[O\/'R\_3U]O?X^?K[_/W^_P``
1N/A ENDOFHEREDOC
1N/A if ($uuencode_byte_chrs eq $uu) {
1N/A print "Yes ";
1N/A }
1N/A $uudecode_byte_chrs = unpack('u', $uuencode_byte_chrs);
1N/A if ($uudecode_byte_chrs eq $all_byte_chrs) {
1N/A print "indeed\n";
1N/A }
1N/A
1N/AHere is a very spartan uudecoder that will work on EBCDIC provided
1N/Athat the @e2a array is filled in appropriately:
1N/A
1N/A #!/usr/local/bin/perl
1N/A @e2a = ( # this must be filled in
1N/A );
1N/A $_ = <> until ($mode,$file) = /^begin\s*(\d*)\s*(\S*)/;
1N/A open(OUT, "> $file") if $file ne "";
1N/A while(<>) {
1N/A last if /^end/;
1N/A next if /[a-z]/;
1N/A next unless int(((($e2a[ord()] - 32 ) & 077) + 2) / 3) ==
1N/A int(length() / 4);
1N/A print OUT unpack("u", $_);
1N/A }
1N/A close(OUT);
1N/A chmod oct($mode), $file;
1N/A
1N/A
1N/A=head2 Quoted-Printable encoding and decoding
1N/A
1N/AOn ASCII encoded machines it is possible to strip characters outside of
1N/Athe printable set using:
1N/A
1N/A # This QP encoder works on ASCII only
1N/A $qp_string =~ s/([=\x00-\x1F\x80-\xFF])/sprintf("=%02X",ord($1))/ge;
1N/A
1N/AWhereas a QP encoder that works on both ASCII and EBCDIC machines
1N/Awould look somewhat like the following (where the EBCDIC branch @e2a
1N/Aarray is omitted for brevity):
1N/A
1N/A if (ord('A') == 65) { # ASCII
1N/A $delete = "\x7F"; # ASCII
1N/A @e2a = (0 .. 255) # ASCII to ASCII identity map
1N/A }
1N/A else { # EBCDIC
1N/A $delete = "\x07"; # EBCDIC
1N/A @e2a = # EBCDIC to ASCII map (as shown above)
1N/A }
1N/A $qp_string =~
1N/A s/([^ !"\#\$%&'()*+,\-.\/0-9:;<>?\@A-Z[\\\]^_`a-z{|}~$delete])/sprintf("=%02X",$e2a[ord($1)])/ge;
1N/A
1N/A(although in production code the substitutions might be done
1N/Ain the EBCDIC branch with the @e2a array and separately in the
1N/AASCII branch without the expense of the identity map).
1N/A
1N/ASuch QP strings can be decoded with:
1N/A
1N/A # This QP decoder is limited to ASCII only
1N/A $string =~ s/=([0-9A-Fa-f][0-9A-Fa-f])/chr hex $1/ge;
1N/A $string =~ s/=[\n\r]+$//;
1N/A
1N/AWhereas a QP decoder that works on both ASCII and EBCDIC machines
1N/Awould look somewhat like the following (where the @a2e array is
1N/Aomitted for brevity):
1N/A
1N/A $string =~ s/=([0-9A-Fa-f][0-9A-Fa-f])/chr $a2e[hex $1]/ge;
1N/A $string =~ s/=[\n\r]+$//;
1N/A
1N/A=head2 Caesarian ciphers
1N/A
1N/AThe practice of shifting an alphabet one or more characters for encipherment
1N/Adates back thousands of years and was explicitly detailed by Gaius Julius
1N/ACaesar in his B<Gallic Wars> text. A single alphabet shift is sometimes
1N/Areferred to as a rotation and the shift amount is given as a number $n after
1N/Athe string 'rot' or "rot$n". Rot0 and rot26 would designate identity maps
1N/Aon the 26 letter English version of the Latin alphabet. Rot13 has the
1N/Ainteresting property that alternate subsequent invocations are identity maps
1N/A(thus rot13 is its own non-trivial inverse in the group of 26 alphabet
1N/Arotations). Hence the following is a rot13 encoder and decoder that will
1N/Awork on ASCII and EBCDIC machines:
1N/A
1N/A #!/usr/local/bin/perl
1N/A
1N/A while(<>){
1N/A tr/n-za-mN-ZA-M/a-zA-Z/;
1N/A print;
1N/A }
1N/A
1N/AIn one-liner form:
1N/A
1N/A perl -ne 'tr/n-za-mN-ZA-M/a-zA-Z/;print'
1N/A
1N/A
1N/A=head1 Hashing order and checksums
1N/A
1N/ATo the extent that it is possible to write code that depends on
1N/Ahashing order there may be differences between hashes as stored
1N/Aon an ASCII based machine and hashes stored on an EBCDIC based machine.
1N/AXXX
1N/A
1N/A=head1 I18N AND L10N
1N/A
1N/AInternationalization(I18N) and localization(L10N) are supported at least
1N/Ain principle even on EBCDIC machines. The details are system dependent
1N/Aand discussed under the L<perlebcdic/OS ISSUES> section below.
1N/A
1N/A=head1 MULTI OCTET CHARACTER SETS
1N/A
1N/APerl may work with an internal UTF-EBCDIC encoding form for wide characters
1N/Aon EBCDIC platforms in a manner analogous to the way that it works with
1N/Athe UTF-8 internal encoding form on ASCII based platforms.
1N/A
1N/ALegacy multi byte EBCDIC code pages XXX.
1N/A
1N/A=head1 OS ISSUES
1N/A
1N/AThere may be a few system dependent issues
1N/Aof concern to EBCDIC Perl programmers.
1N/A
1N/A=head2 OS/400
1N/A
1N/A=over 8
1N/A
1N/A=item PASE
1N/A
1N/AThe PASE environment is runtime environment for OS/400 that can run
1N/Aexecutables built for PowerPC AIX in OS/400, see L<perlos400>. PASE
1N/Ais ASCII-based, not EBCDIC-based as the ILE.
1N/A
1N/A=item IFS access
1N/A
1N/AXXX.
1N/A
1N/A=back
1N/A
1N/A=head2 OS/390, z/OS
1N/A
1N/APerl runs under Unix Systems Services or USS.
1N/A
1N/A=over 8
1N/A
1N/A=item chcp
1N/A
1N/AB<chcp> is supported as a shell utility for displaying and changing
1N/Aone's code page. See also L<chcp>.
1N/A
1N/A=item dataset access
1N/A
1N/AFor sequential data set access try:
1N/A
1N/A my @ds_records = `cat //DSNAME`;
1N/A
1N/Aor:
1N/A
1N/A my @ds_records = `cat //'HLQ.DSNAME'`;
1N/A
1N/ASee also the OS390::Stdio module on CPAN.
1N/A
1N/A=item OS/390, z/OS iconv
1N/A
1N/AB<iconv> is supported as both a shell utility and a C RTL routine.
1N/ASee also the iconv(1) and iconv(3) manual pages.
1N/A
1N/A=item locales
1N/A
1N/AOn OS/390 or z/OS see L<locale> for information on locales. The L10N files
1N/Aare in F</usr/nls/locale>. $Config{d_setlocale} is 'define' on OS/390
1N/Aor z/OS.
1N/A
1N/A=back
1N/A
1N/A=head2 VM/ESA?
1N/A
1N/AXXX.
1N/A
1N/A=head2 POSIX-BC?
1N/A
1N/AXXX.
1N/A
1N/A=head1 BUGS
1N/A
1N/AThis pod document contains literal Latin 1 characters and may encounter
1N/Atranslation difficulties. In particular one popular nroff implementation
1N/Awas known to strip accented characters to their unaccented counterparts
1N/Awhile attempting to view this document through the B<pod2man> program
1N/A(for example, you may see a plain C<y> rather than one with a diaeresis
1N/Aas in E<yuml>). Another nroff truncated the resultant manpage at
1N/Athe first occurrence of 8 bit characters.
1N/A
1N/ANot all shells will allow multiple C<-e> string arguments to perl to
1N/Abe concatenated together properly as recipes 0, 2, 4, 5, and 6 might
1N/Aseem to imply.
1N/A
1N/A=head1 SEE ALSO
1N/A
1N/AL<perllocale>, L<perlfunc>, L<perlunicode>, L<utf8>.
1N/A
1N/A=head1 REFERENCES
1N/A
1N/Ahttp://anubis.dkuug.dk/i18n/charmaps
1N/A
1N/Ahttp://www.unicode.org/
1N/A
1N/Ahttp://www.unicode.org/unicode/reports/tr16/
1N/A
1N/Ahttp://www.wps.com/texts/codes/
1N/AB<ASCII: American Standard Code for Information Infiltration> Tom Jennings,
1N/ASeptember 1999.
1N/A
1N/AB<The Unicode Standard, Version 3.0> The Unicode Consortium, Lisa Moore ed.,
1N/AISBN 0-201-61633-5, Addison Wesley Developers Press, February 2000.
1N/A
1N/AB<CDRA: IBM - Character Data Representation Architecture -
1N/AReference and Registry>, IBM SC09-2190-00, December 1996.
1N/A
1N/A"Demystifying Character Sets", Andrea Vine, Multilingual Computing
1N/A& Technology, B<#26 Vol. 10 Issue 4>, August/September 1999;
1N/AISSN 1523-0309; Multilingual Computing Inc. Sandpoint ID, USA.
1N/A
1N/AB<Codes, Ciphers, and Other Cryptic and Clandestine Communication>
1N/AFred B. Wrixon, ISBN 1-57912-040-7, Black Dog & Leventhal Publishers,
1N/A1998.
1N/A
1N/Ahttp://www.bobbemer.com/P-BIT.HTM
1N/AB<IBM - EBCDIC and the P-bit; The biggest Computer Goof Ever> Robert Bemer.
1N/A
1N/A=head1 HISTORY
1N/A
1N/A15 April 2001: added UTF-8 and UTF-EBCDIC to main table, pvhp.
1N/A
1N/A=head1 AUTHOR
1N/A
1N/APeter Prymmer pvhp@best.com wrote this in 1999 and 2000
1N/Awith CCSID 0819 and 0037 help from Chris Leach and
1N/AAndrE<eacute> Pirard A.Pirard@ulg.ac.be as well as POSIX-BC
1N/Ahelp from Thomas Dorner Thomas.Dorner@start.de.
1N/AThanks also to Vickie Cooper, Philip Newton, William Raffloer, and
1N/AJoe Smith. Trademarks, registered trademarks, service marks and
1N/Aregistered service marks used in this document are the property of
1N/Atheir respective owners.
1N/A
1N/A