|
|
 | Globalize your On Demand Business |  |  |
 |
|
 |
|
|
|
 | One of the key challenges in heterogeneous environments is to be able to deal
with different coded graphic character sets and code pages in a consistent manner.
Differences exist for reasons such as the origins of operating systems, the
provision of national language support in different countries, or an application's
requirements.
Migration to interoperable character sets and code pages for different countries
and groups of countries will minimize, but not eliminate, the differences to
be dealt with. Applications will continue to face this challenge, but now with
assistance from CDRA.
Four functions, CDRCVRT, CDRMSCI, CDRMSCP, and CDRMSCC, were defined earlier
(in "Chapter 5. CDRA Interface Definitions")
in support of difference management. The present chapter describes the concepts
behind difference management, the principles and criteria in designing the contents
of conversion tables, and some aspects of managing the selection of the required
conversion methods and tables. |
|
|
 | | Concepts
Difference management is a process by which differences in coded graphic character
representation of data are recognized and dealt with. Difference-detection mechanisms
must be placed in appropriate locations within a system in order to determine
if such differences do exist.
Differences in the data representation and the processing capabilities of an
application are used to trigger a difference management action. The choices
may be to convert the graphic character data, to leave it as it is, or to terminate
the current function altogether. CDRA provides assistance when a choice to convert
the data has been made. Conversion is viewed as a tool in difference management.
The results of difference management within CDRA will be dependent upon the
conversion tables and the conversion methods chosen.
A general view of the difference management process is shown in Figure
19.
The query function (see "Querying Tag
Values") assists in finding the relevant tag values to decide if a difference
exists.
Figure 19. Difference Management Process Flow
Do Not Convert
Potential reasons for not converting the data are as follows:
- The data can be processed as-is. Processing logic or a resource that can
process the data correctly is available and can be selected.
- The data is required to remain in its original form (retaining its associated
tag value). For example, when the context of use is simply store and retrieve,
no processing is to be done in the storage environment; thus, conversion is
not necessary. Note that this may not always be feasible, in that a receiving
component may have restrictions on the CCSID(s) for the data handled by that
component.
- The required conversion cannot be performed, as a necessary resource is
not available. Here, the "do not convert" decision is made after the conversion
function is called.
- The conversion results will be unsatisfactory. For example, the data loss
or integrity loss could be unacceptable.
Convert
The simplest form of conversion is the case where the input and output character
sets are equivalent, but the code point assignments for the characters are different.
Here, all the matched characters will only need to have their code points mapped.
The more general form of conversion must deal with input and output character
sets within which only a subset of the characters are equivalent.
During a conversion, only the common set of coded graphic characters can be
preserved. Management of the remaining unmatched characters depends on the nature
and context of the data. Conversion with mismatches can generate converted data
that may not have an assigned graphic character meaning in the output. Such
results of conversion may not be acceptable to an application.
CDRA has defined criteria for dealing with mismatches during the conversion
process. The specific criterion to be used is reflected in the content of the
conversion tables (and the logic that uses them) that are used in the conversion
process. A set of default conversion tables has been defined to map between
specific pairs of CCSIDs according to the most appropriate criterion, and are
defined with consistency as the goal. The use of such tables enhances the consistency
among implementing products when performing coded graphic character data conversion.
One of CDRA's goals is to minimize the loss of coded graphic character data
during conversion. The interoperable character sets and associated CDRA-defined
conversion tables help to address this goal by maximizing graphic character
integrity within a character-set group or subgroup. |
|
|
 | |
Once the decision to convert has been made, a generic data conversion process can be used.
A generic data conversion process contains many elements, one of which is the graphic character data conversion process. Figure 20 shows the elements of a typical data conversion procedure.
The different elements and there functions are:
- A parser, which is selected based on the architecture of the input data stream or on the description of the caller’s view (input view) of the data. The parser is responsible for separating the input data into different classifications or substrings, such as graphic character data, control characters/functions, floating-point numbers etc. These substrings, along with their characteristics, are passed on to the converter.
- The converter is responsible for sending each substring to the appropriate mapping module. Once mapped, each module returns the converted substring back to the converter. These substrings are then passed on the output generator.
- The mapping modules, accept substrings from the converter, perform the appropriate mappings and return the new substrings to the converter.
- The output generator receives the converted data substrings from the converter and puts them together as an output data stream made up of substrings of different classes.
Figure 20. Generic Data Conversion Process
CDRA defines only the graphic character data conversion part of the overall data conversion process. A limited number of control characters are addressed as part of handling different string types (see "Types of Strings") and as part of control character mappings (see "Pairings of Code Points"). Other control characters are treated as bytes, and are dealt with according to mismatch management criteria.
For correct results, the caller of the CDRA conversion function should ensure that the input string does not contain characters other than graphic character data.
Each one of these conversion modules may permit direct access by an application. Here, the application assumes the responsibility for the functions of parsing and output generation. For example, when an application creates a sequential file in the PC, only it knows where the string of bytes is broken into logical substrings and which of these substrings represent graphic character data. Conventions such as CR, LF to show an end of record for organizing a file, must be known and handled by the parsing logic and the output generator. Handling of the data organization for output is not performed by the graphic character conversion function.
Misinterpretation of Data
If the separation of graphic character data from other classes of data is not done, the graphic character conversion function can find byte strings that may or may not have graphic character meanings. The criterion selected for mismatch management specifies how to convert such byte strings if they appear in the input string. However, the problem of possible misinterpretation cannot be entirely dealt with using the CDRA conversion criteria alone.
For example, if the data byte was representing a counter value equal to 74, which is the same bit configuration as the code point X'4A' for a left square bracket in a System/370* CCSID 00500, it will get converted to another code point (X'5B') representing the left square bracket in a PC using CCSID 00850. If this value is interpreted as a count on the PC, the value is now 91. Neither the CDRA identifiers nor the graphic character conversion process can deal with this kind of misinterpretation.
A graphic character string may have a number of characteristics or properties associated with it. Some of these characteristics or properties are inherited from the encoding scheme such as the number of bytes per character. Others, such as how a string is terminated, the orientation of the string or whether or not the characters are shaped or unshaped can not be determined by the CCSID tag or encoding scheme alone. The following String Types are defined for use within the CDRA architecture.
String Type 0: CDRA Default
If there is no string type specified in a CCSID definition or as a parameter on an API call then the string type is zero. A string type of 0 means that the character data string is semantically defined by the CCSID. All of the characteristics of the string can be determined from the CCSID definition alone. No additional information is needed.
String Type 1: Null-terminated string
A variable-length graphic character string, which is terminated by a character whose code point has a binary value of zero. The number of bits in the code point used to represent the terminating character (the null terminator) is the smallest number of bits allowed for code points in the encoding scheme used.
The above definition is used in the following examples to determine the null-termination character:
- If the ES associated with CCSID1 indicates a "pure single-byte" or "mixed single-byte and double-byte" encoding, X'00' is used to terminate the string. In a mixed-byte string a null-termination can occur only in the single-byte segment of the string.
- In a string encoded using ES X'1301' (host mixed), a null-termination can occur only outside the SO and SI that surround the double-byte coded substring. Thus, any double-byte code point that begins with X'00' (such as in 327x data stream, where some EBCDIC control characters are represented in the data stream with X'00' preceding their corresponding single-byte code points) or ends with X'00' must not be interpreted to be a null-termination character.
- In a string using ES X'3300', X'2300' or X'2305', a double-byte code point can never begin with or terminate with X'00'. This also implies that no data code point in the string can be X'00'.
- If the encoding scheme indicates "pure double-byte" encoding, the null-termination character is X'0000'. This implies that none of the data code points in the string can be X'0000'.
The above definitions reflect the current usage and definitions of a null-terminated string in the C programming language. A length value may additionally be provided for the string; however, the null terminator takes precedence over the length value.
A null-terminated string is given a string type identifier of 1 in CDRA function calls and in the Graphic Character Conversion Selection Table (GCCST).
String Type 2: Padded string
A graphic character string that is padded with one or more space characters. Padding is done only when there is unused storage space available in an area containing the unpadded string, and when it can be done without violating the semantics of the encoding scheme of the CCSID of the string. The resultant space padded string will be a well-formed string following the semantics of its encoding scheme.
|
Caution:
|
When space padding is done as part of graphic character conversion, it is not possible to distinguish (in the resultant output buffer) the space pad characters that are generated as a result of conversion maps from those generated by the padding process. If a subsequent string operation removes the space characters, there can be a potential loss of the converted pad characters. |
- If the encoding scheme of the string is either "pure single-byte" or "mixed single-byte and double-byte", when the string occupies less than the area allocated for it, the string is padded to fill the remaining area with SPACE characters. The definition of SPACE is to be taken from the CCSID resource definition. In a mixed string the padding must be done only in the single-byte segment of the string.
- If the encoding scheme of the string is "pure double-byte", the SPACE character will have a double-byte code point specified in the string's CCSID resource definition.
String Type 3: "Special Newline Nextline Handling"
String type 3 has special meaning in certain IBM products. If a character data string is defined as a string type 3 than it is semantically defined by the CCSID with the additional property that any newline control characters in the string should be treated as a linefeed control characters and likewise, any linefeed control characters should be treated as newline control characters.
String Types 4 - 15: String Types for Bidirectional Languages
In the case of bidirectional languages, the string type is used to describe characteristics that are not implied by the CCSID or Encoding Scheme. The string characteristics which are defined for the bidirectional string types are:
- Text Type
- Numeric Shaping
- Orientation
- Text Shaping
- Symmetrical Swapping.
Following is a brief description of each of these characteristics and their possible values.
- Text Type
- The text type characteristic states what kind of algorithm is to be used when transforming the text layout. The text type can be visual (reading sequence), implicit (typing sequence), or explicit (includes directional control characters in the text segments explicitly). A visual algorithm copies entire lines of text as they appear without bothering about existing embedded directional segments. An implicit algorithm recognizes directional segments based on the natural directionality of the characters (i.e., right to left for Arabic characters and left to right for English characters) and performs segment inversions accordingly. An explicit algorithm recognizes directional segments and performs inversions based on special, explicit, directional controls embedded in the text.
- Example:
Visual, shaped text:
Implicit, unshaped text:
- Numeric Shaping
- The numeric shaping characteristic states whether the numbers embedded in a text string will have the shapes that are used in English (called Arabic digits), or the national numerical shapes. Possible values for this characteristic are Arabic, Hindi or passthrough. When passthrough is specified numeric digits are left as they appear in the data string (no numeric shaping occurs).
- Orientation
- The orientation of a data string together with the text type, indicates the storage or display sequence of the Arabic and English characters. The possible values for this characteristic are left to right (LTR), right to left (RTL), Contextual LTR and Contextual RTL. The term contextual is used to indicate that the orientation should be taken from the context of the data. The data may contain "strong" characters that are either orientation left or orientation right. The term following contextual (LTR or RTL) specifies what should be the default orientation when the data is orientation-neutral (i.e. there are no strong characters).
- Text Shaping
- The text shaping characteristic of a bidirectional string type indicates whether text shaping is performed. This is relevant for the scripts of Arabic languages (including Farsi and Urdu), where characters assume different shapes (initial, medial, final, or isolated) according to their position in a word and the connectivity traits of the character and its surroundings.
- Symmetrical Swapping
- The symmetric swapping characteristic states whether, in a right-to-left text phrase some directional pairs of characters (such as left and right parentheses, greater than and lesser than signs, left and right brackets, left and right braces) will be interchanged in order to preserve the logical meaning of the inverted text.
Each CCSID that is defined in support of a bidirectional language may have a default string type associated with it. In the event that a string is tagged with a CCSID for a bidirectional language and no string type is explicitly specified than the default string type is to be used. If no default string type has been specified then the string type is defined to be 0.
The following table shows the specific characteristics of each bidirectional string type that have been defined to date.
| String Type |
Text Type |
Numeric Shaping |
Orientation |
Text Shaping |
Symmetrical Swapping |
| 4 |
Visual |
Passthrough |
LTR |
Shaped |
Off |
| 5 |
Implicit |
Arabic |
LTR |
Unshaped |
On |
| 6 |
Implicit |
Arabic |
RTL |
Unshaped |
On |
| 7 |
Visual |
Passthrough |
Contextual |
Unshaped-Lig |
Off |
| 8 |
Visual |
Passthrough |
RTL |
Shaped |
Off |
| 9 |
Visual |
Passthrough |
RTL |
Shaped |
On |
| 10 |
Implicit |
Arabic |
Contextual LTR |
Unshaped |
On |
| 11 |
Implicit |
Arabic |
Contextual RTL |
Unshaped |
On |
| 12 |
Implicit |
Arabic |
RTL |
Shaped |
Off |
| 13 |
Visual |
Hindi |
LTR |
Shaped |
Off |
| 14 |
Visual |
Hindi |
RTL |
Shaped |
Off |
| 15 |
Visual |
Hindi |
RTL |
Shaped |
On |
Such strings are often interchanged in heterogeneous (or distributed) environments between applications that can support these string types. If the data conversion methods used for graphic character mapping are enhanced to deal with the parsing and assembly aspects of converting between specific types of strings, a degree of efficiency in performance can be attained. With this in view, provisions are made in the graphic conversion functions of CDRA to allow string-type specifications to select conversion methods that can deal with various string types besides converting the graphic characters.
A generic graphic character conversion function (see "Conversion Functions") converts an input graphic character string represented in a CCSID (the input CCSID) to an output string according to the CCSID specified for the output (the output CCSID). The interpretation of the input character string and the generation of the code points of the output character string adhere to the definitions of CCSIDs (see "Tagging in CDRA").
The results of the conversion process will be the following:
- The meaning of all the graphic characters that are common (same GCGIDs) between the input CS and output CS will be preserved
- All other input graphic and non-graphic characters will be converted to output code points following the mismatch management criterion used. Their meaning cannot be preserved in the output CCSID, but they may be retrievable by mapping back to the input CCSID using an appropriate conversion table.
Conversion of strings between some CCSIDs cannot maintain the same byte-length between the input and output strings. For example, the coded representation of a string containing a mixture of Katakana characters (single-byte code points) and Japanese ideographic characters (double-byte code points):
- Will have embedded shift-in and shift-out control characters between the Katakana characters and the ideographic characters in a Japanese EBCDIC-based system
- Will not have any embedded shift-in and shift-out control characters in a Japanese PC-based system.
A function that converts the data between the two coding methods in this example will find a byte-length difference of at least two bytes. Provisions must be made to accommodate differences in byte lengths when developing and using conversion functions.
The designer of the conversion program can reference the CCSID elements and their definitions from CDRA documents. The logical steps in performing the conversion are:
- Select an appropriate conversion method (see Appendix B. "Conversion Methods") based on the encoding schemes associated with the input and the output.
- Select one or more conversion tables based on the CS and CP elements of the input and output CCSIDs. The following section describes the criteria that can be used for defining the contents of the conversion tables.
The various steps involved in selecting the conversion methods and the associated tables for different conversion criteria are described in "Graphic Character Conversion Selection Table (GCCST) Resource". |
|
|
 | |
The input and output CCSIDs identify the CS, CP pairs. The content of a conversion
table is determined by the input and output CS, CP pairs to be mapped. When
there is more than one set of CS, CP pairs in the input to be matched with more
than one set in the output, the principles described in "Pairings
of Code Points" are used to determine the mapping.
If the input CS, CP pair has some common graphic characters that are split
between two output CSs, then the corresponding support in the conversion method
and tables of the appropriate type (see Appendix
B. Conversion Methods) are needed.
After the particular characters and their code point assignments are examined,
they are categorized, and decisions are made about pairing the input and output
code points.
A code point can be placed into one of the following categories:
- SPACE: the code point is assigned to the SPACE character GCGID
SP010000
- Valid Graphic: the code point is assignable to a graphic character
in the encoding structure, and is assigned a graphic character in the identified
character set
- Code Extension: the code point is assignable to a control character,
and its assigned value is a valid code extension control character or the
first character of a multiple-character code extension control as determined
by the encoding scheme identified
- Invalid Graphic: the code point is assignable to a graphic
character in the encoding structure, but either it is not assigned any graphic
character or it is assigned one that is not in the character set identified
- Single Control: the code point is assignable to a control character,
and is assigned a permitted control character for the application
- Start of Control: the code point is assignable to a control
character, and is assigned a permitted start of control sequence for the application
- Invalid Control: the code point is assignable to a control
character but is not assigned any control character, or it is assigned a character
that is valid neither for the application nor as a code extension control
defined in the encoding scheme.
Pairings of Code Points
The following general principles are used in pairing the input and output code
points:
- Matched Graphic Characters and SPACE
The graphic characters in the Valid Graphic category and the SPACE character
from the input are compared with the Valid Graphic and the SPACE character
in the output. For each Valid Graphic and SPACE character that is found in
both sets, the code point in the input code page is mapped to its corresponding
code point in the output code page. This set of characters is known as the
"common character set". Graphic characters are defined to be matching if they
have the same Graphic Character Global Identifier (GCGID).
- Code Extension Controls
Some input or output control character code points may be used for code extension
purposes. It is the responsibility of the conversion functions to handle these
code points correctly. An example of this can be found in "Method
3 for EBCDIC Mixed to PC Mixed".
- Matched Control Mnemonics
Some of the nongraphic characters can be commonly used control characters,
such as Horizontal Tab (HT) or Carriage Return (CR). If these are found in
graphic-character conversion, they will be handled on a "best-can-do" basis.
The mnemonic names associated with them will be used as a guide to pair input
and output code points in the control areas. The mnemonics may or may not
have identical functional meanings in the input and the output environments.
- Remainder
The remaining code points are treated as bytes. Some of these
bytes may be graphic character code points outside the common character set,
control mnemonics that have no matching control in the output, or non-allocated
input code points. The character set mismatch management criterion is used
to specify how these remaining characters are mapped.
|
|
|
 | |
Character set mismatch management (17) is
necessarily context- or application-sensitive: what is best for one application
may not be appropriate for another. Sometimes arbitrary decisions have to be made,
depending on the specific set of mismatched characters. Some general criteria
for mismatch management are:
- Round trip integrity, where each byte value is preserved when
data is returned from the target to the source
- Character replacement, or irreversible substitution of characters
with appearance or meaning retention
- Enforced subset match, or irreversible substitution with SUB
(substitute) control or an equivalent loss indicator character.
The application of these criteria results in different pairings of input and
output code points for mismatched characters in conversion tables.
The above criteria are discussed in the following sections.
Round Trip Integrity
The objective of this criterion is to send data from one system to another
one that has different representations of character data, and retrieve it without
loss. Often the "do not convert" choice is not available. For example, data
stored in a System/370 database is configured to have all its graphic character
data in one CCSID. If it acts as a remote repository for data from a PC application,
or from an application in another System/370 using a different CCSID, the data
must be converted to the configured CCSID. The data is intended to be retrieved
by the same application without loss when it is converted back for use in its
original CCSID.
Interpretation of Converted Data in the Output CCSID
The tag associated with the converted data will be the CCSID of the output.
The data will be interpreted -- possibly misinterpreted -- in the output environment.
In the absence of any validation or filtering services, data that has been converted
using the round trip criterion cannot be distinguished from data that has been
created locally in the system, or that has been converted from another CCSID
using the round trip criterion. Data conversion is only one of the possible
generators of code points that have no graphic meaning in a data object tagged
with a CCSID. An application that generates hexadecimal constants and stores
them along with other textual data is another possible generator.
Feasibility of Round Trip
Round trip mapping is always feasible for a common set of graphic characters
or for a set of control characters with the same mnemonics, assuming there are
no control sequences involved. The common sets of graphic and control characters
within the initial input and output CCSIDs can be preserved irrespective of
how many intermediate CCSIDs may be involved, provided that all the intermediate
CCSIDs contain the same common sets.
The round trip of all remaining code points from a particular input to an output
and back is feasible only under the following conditions:
- There are equal numbers of unmatched code points between the input and output.
- The bytes are mapped one-for-one from the input CCSID to the output CCSID.
- The same one-for-one relationship is used in the return path.
- If there are duplicate graphic characters in an input code (for example,
CP 850) and if the output has a matching graphic character in it, the conversion
preserves the byte value and not the graphic character meaning.
- If there are unequal numbers of input and output code points (such as between
the PC DBCS and host DBCS), round trip conversion is only possible from the
smaller of the two sets to the larger set and back. There is insufficient
coding space for all of the code points in the reverse direction. This situation
also exists between ISO-7 encodings (without code extensions) and any 8-bit
SBCS code.
When round trip mapping is not feasible or not desirable for a specific application,
other criteria must be used.
Pairing of Code Points Using Round Trip
In addition to the general principles described in "Pairings
of Code Points", the following principles are used when the round trip integrity
criterion is chosen:
- An input graphic code point outside the common set is mapped to an output
graphic code point outside the common set
- An input control code point is mapped to an output control code point outside
the mnemonic-based common set
- If the graphic encoding space of the source is larger than the graphic encoding
space of the target, some graphic code points will be mapped to control code
points, and vice versa.
Character Replacements
When round trip integrity is not feasible or desired, an alternative is to
permanently replace each mismatched character in the input character set with
its nearest equivalent in the output character set. The criterion for determining
the nearest equivalent depends on the context within which the converted data
will be used. For display and printing purposes, the nearest visual representation
may be chosen; for processing purposes, a character with the nearest meaning
may be selected. If neither criterion applies, an arbitrary character may be
chosen from the output character set.
Pairing of Code Points Using Character Replacements
In addition to the general principles described in "Pairings
of Code Points", the following additional principles are used when the character
replacement criterion is chosen:
- An input graphic code point is mapped to an output graphic code point outside
the common set with the nearest shape or meaning. Any remaining input graphic
code points -- those with no nearest equivalent based on the criterion being
used -- are mapped arbitrarily.
- An input control code point is mapped to an output control code point outside
the mnemonic-based common set. Any remaining input control code points are
arbitrarily mapped (folded) to other output control code points.
- Any round tripping achieved is incidental.
Enforced Subset Match
The enforced subset match criterion guarantees the preservation of the subset
of characters that are common to both the input and output character sets. Any
character not in this common subset will be replaced with a unique character
that indicates that a substitution has occurred.
Wherever possible, CDRA recommends that the standardized control character
SUB (substitute) be used for this purpose. Alternatives for "substitution
character" may be declared as part of the CCSID resource definitions. The default
SUB definition for each CCSID is included as part of the CCSID definition found
in Appendix C. CCSID Repository
In environments using the PC-Data or PC-Display encoding structures, X'7F'
is recommended as the default SUB. In single-byte EBCDIC environments, the defined
SUB is X'3F', and in ISO-7 and ISO-8 environments it is X'1A'.
Visualization of SUB Character
The SUB character should be visually represented by a uniquely distinguishable
character on presentation media. A warning flag should be returned to the caller
of the mapping service to show that a substitution has occurred.
Default SUB-Visualization Character
Some presentation devices and data streams specify a unique character to be
presented when a SUB code point is encountered in the presentation data. For
example: the 3270 Data Stream defines a "filled circle" as default; the PC displays
it as an "empty house symbol"; some printers print it as a "filled square".
When a presentation medium or a component interfacing to the presentation
medium is not capable of replacing the SUB character with a unique non-SPACE
visual character, the application sending data to be presented needs to convert
the SUB character to an appropriate graphic character. For consistency among
different implementations that do such a conversion, the Uppercase X
(LX020000) (or its equivalent) is defined as the CDRA-recommended default.
Products that perform such SUB character replacement should also provide a
means by which customers can select another graphic character of their choice
as an alternative.
Pairing of Code Points Using Enforced Subset Match
In addition to the general principles described in "Pairings
of Code Points", the following additional principle is used when the enforced
subset criterion is chosen:
All unmatched input graphic code points and mnemonically unmatched input control
code points are converted to the "substitution character" code point prescribed
for the output CCSID. |
|
|
 | |
Default conversion tables to be used for specific pairs of CCSIDs
in different groups are available. For information on how to obtain these tables see
Appendix J. CDRA Conversion Resources
The pairs of CCSIDs are those that are required within each
character set group, and include both interoperable and
coexistence and migration sets.
Each table has its own difference management
criterion. Where possible, the round trip integrity criterion has
been used; in other instances, enforcement and character replacement
have been used. |
|
|
 | |
The following exceptions to the basic mapping principles exist in some of the
tables:
- In the SBCS-PC to EBCDIC tables for Group 2 countries, SO and SI Code Extension
controls are substituted with substitution code point (X'3F') to avoid the
potential risk of generating invalid SO-SI pairs.
- Some PC code pages such as 00850 and 00863 assign two code points for the
symbols GCGID SM240000 and SM250000, in both graphic and control code range
(PC-Data Encoding Scheme). Both symbols are included in Group 1 Interoperable
Character Set 00697 and in the associated EBCDIC code pages. The other PC
code pages derived from 00437 (for example, CP 00437, 00860) contain the symbols
in control code range without duplication.
- The following rules are applied to the default tables in Group-1 and Group-1A
Coexistance and Migration sets for SM240000 and SM250000:
- For Code Pages such as 00850 and 00863:
- The code points assigned to the symbols in the graphic code range (PC-Data
Encoding Scheme) are treated as graphics. The code points in the control
code range are treated as control code points, and are mapped based
on the mnemonic names.
- For other PC code pages containing SM240000 and SM250000 only in control
code range:
- The code points are treated as valid graphics, and are mapped based
on GCGID when the symbols are included in the common graphic set between
the CCSID pair. Otherwise, they are mapped based on the control mnemonic
names.
|
|
|
 | |
The default tables defined in CDRA are based on specified criteria for mismatch
management. These tables may not suit all application requirements; IBM products
have used different tables for data conversion based on the criteria most suited
to their customer. It may be necessary for the products to continue to support
such tables.
Customers may have the need to continue using existing conversion tables or
methods. Such methods or tables may produce conversion results that are different
from those obtained using the default conversion tables.
Based on individual product and customer requirements, the ability to select
alternative conversion methods or tables for a pair of CCSIDs may be supported
by products as an option. If a product supports custom modifications, its documentation
should describe the procedure for selecting the alternative method or table.
Guidelines to prevent undesirable effects caused by such modifications should
also be documented by the individual products. |
|
|
 | |
All the concepts described above can be incorporated into a collection of conversion
methods and related conversion tables. The management aspects can also be embodied
along with this collection. A single-step convert function and a three-part
multiple-step conversion are defined in "Chapter
5. CDRA Interface Definitions". |
|
|  | |
|