Материал: part05

Внимание! Если размещение файла нарушает Ваши авторские права, то обязательно сообщите нам

DICOM PS3.5 2020a - Data Structures and Encoding​

Page 31​

6 Value Encoding​

A Data Set is constructed by encoding the values of Attributes specified in the Information Object Definition (IOD) of a Real-World​ Object. The specific content and semantics of these Attributes are specified in Information Object Definitions (see PS3.3). The range​ of possible data types of these values and their encoding are specified in this section. The structure of a Data Set, which is composed​ of Data Elements containing these values, is specified in Section 7.​

Throughout this Part, as well as other parts of the DICOM Standard, Tags are used to identify both specific Attributes and their cor-​ responding Data Elements.​

6.1 Support of Character Repertoires​

ValuesthataretextorcharacterstringscanbecomposedofGraphicandControlCharacters.TheGraphicCharacterset,independent​ of its encoding, is referred to as a Character Repertoire. Depending on the native language context in which Application Entities wish​ to exchange data using the DICOM Standard, different Character Repertoires will be used. The Character Repertoires supported by​ DICOM are:​

•​ISO 8859​

•​JIS X 0201-1976 Code for Information Interchange​

•​JIS X 0208-1990 Code for the Japanese Graphic Character set for information interchange​

•​JIS X 0212-1990 Code of the supplementary Japanese Graphic Character set for information interchange​

•​KS X 1001 (registered as ISO-IR 149) for Korean Language​

•​TIS 620-2533 (1990) Thai Characters Code for Information Interchange​

•​ISO 10646-1, 10646-2, and their associated supplements and extensions for Unicode character set​

•​GB 18030​

•​GB2312​

•​GBK​

Note​

1.​The ISO 10646-1, 10646-2, and their associated supplements and extensions correspond to the Unicode version 3.2​ character set. The ISO IR 192 corresponds to the use of the UTF-8 encoding for this character set.​

2.​The GB 18030 character set is harmonized with the Unicode character set on a regular basis, to reflect updates from​ both the Chinese language and from Unicode extensions to support other languages.​

3.​The issue of font selection is not addressed by the DICOM Standard. Issues such as proper display of words like "bone"​ inChineseorJapaneseusagearemanagedthroughfontselection.Similarly,otheruserinterfaceissueslikebidirectional​ character display and text orientation are not addressed by the DICOM Standard. The Unicode documents provide ex-​ tensive documentation on these issues.​

4.​The GBK character set is an extension of the GB 2312-1980 character set and supports the Chinese characters in GB​ 13000.1-93 that is the Chinese adaptation of Unicode 1.1. The GBK is code point backward compatible to GB2312-​ 1980. The GB 18030 character set is an extension of the GBK character set for support of Unicode 3.2, and provides​ backward code point compatibility.​

6.1.1 Representation of Encoded Character Values​

AsdefinedintheISOStandardsreferencedinthissection,bytevaluesusedforencodedrepresentationsofcharactersarerepresented​ in this section as two decimal numbers in the form column/row.​

This means that the value can be calculated as (column * 16) + row, e.g., 01/11 corresponds to the value 27 (1BH).​

- Standard -​

Page 32​

DICOM PS3.5 2020a - Data Structures and Encoding​

Note​

TwodigithexnotationwillbeusedthroughouttheremainderofthisStandardtorepresentcharacterencoding.Thecolumn/row​ notation is used only within Section 6.1 to simplify any cross referencing with applicable ISO standards.​

The byte encoding space is divided into four ranges of values:​

•​CL bytes from 00/00 to 01/15​

•​GL bytes from 02/00 to 07/15​

•​CR bytes from 08/00 to 09/15​

•​GR bytes from 10/00 to 15/15​

Note​

ISO8859doesnotdifferentiatebetweenacodeelement,e.g.,G0,andtheareainthecodetable,e.g.,GL,whereitisinvoked.​ The term "G0" specifies the code element as well as the area in the code table. In ISO/IEC 2022 there is a clear distinction​ between the code elements (G0, G1, G2, and G3) and the areas in which the code elements are invoked (GL or GR). In this​ Standard the nomenclature of ISO/IEC 2022 is used.​

The Control Character set C0 shall be invoked in CL and the Graphic Character sets G0 and G1 in GL and GR respectively. Only​ some Control Characters from the C0 set are used in DICOM (see Section 6.1.3), and characters from the C1 set shall not be used.​

6.1.2 Graphic Characters​

A Character Repertoire, or character set, is a collection of Graphic Characters specified independently of their encoding.​

6.1.2.1 Default Character Repertoire​

ThedefaultrepertoireforcharacterstringsinDICOMshallbetheBasicG0SetoftheInternationalReferenceVersionofISO646:1990​ (ISO-IR 6). See Annex E for a table of the DICOM default repertoire and its encoding.​

Note​

This Basic G0 Set is identical with the common character set of ISO 8859.​

6.1.2.2 Extension or Replacement of the Default Character Repertoire​

DICOM Application Entities (AEs) that extend or replace the default repertoire convey this information in the Specific Character Set​ (0008,0005) Attribute.​

Note​

The Attribute Specific Character Set (0008,0005) is encoded using a subset of characters from ISO-IR 6. See the definition​ for the Value Representation (VR) of Code String (CS) in Table 6.2-1.​

For Data Elements with Value Representations of SH (Short String), LO (Long String), UC (Unlimited Characters), ST (Short Text),​ LT (Long Text), UT (Unlimited Text) or PN (Person Name) the Default Character Repertoire may be extended or replaced (these​ ValueRepresentationsaredescribedinmoredetailinSection6.2).Ifsuchanextensionorreplacementisused,therelevant"Specific​ Character Set" shall be defined as an attribute of the SOP Common Module (0008,0005) (see PS3.3) and shall be stated in the​ Conformance Statement. PS3.2 gives conformance guidelines.​

Note​

1.​PreferredrepertoiresasdefinedinENV41503andENV41508fortheuseinWesternandEasternEurope,respectively,​ are: ISO-IR 100, ISO-IR 101, ISO-IR 144, ISO-IR 126. See Section 6.1.2.3.​

2.​Information Object Definitions using different character sets cannot rely per se on lexical ordering or string comparison​ of data elements represented as character strings. These operations can only be carried out within a given character​ repertoire and not across repertoire boundaries.​

- Standard -​

DICOM PS3.5 2020a - Data Structures and Encoding​

Page 33​

6.1.2.3 Encoding of Character Repertoires​

The 7-bit Default Character Repertoire can be replaced for use in Value Representations SH, LO, ST, LT, PN, UC and UT with one​ of the single-byte codes defined in PS3.3.​

Note​

This replacement character repertoire does not apply to other textual Value Representations (AE and CS).​

The replacement character repertoire shall be specified in value 1 of the Attribute Specific Character Set (0008,0005). Defined Terms​ for the Attribute Specific Character Set are specified in PS3.3.​

Note​

1.​The code table is split into the GL area, which supports a 94 character set only (bit combinations 02/01 to 07/14) plus​ SPACE in 02/00, and the GR area, which supports either a 94 or 96 character set (bit combinations 10/01 to 15/14 or​ 10/00 to 15/15). The default character set (ISO-IR 6) is always invoked in the GL area.​

2.​All character sets specified in ISO 8859 include ISO-IR 6. This set will always be invoked in the GL area of the code​ table and is the equivalent of ASCII (ANSI X3.4:1986), whereas the various extension repertoires are mapped onto the​ GR area of the code table.​

3.​The 8-bit code table of JIS X 0201 includes ISO-IR 14 (romaji alphanumeric characters) as the G0 code element and​ ISO-IR 13 (katakana phonetic characters) as the G1 code element. ISO-IR 14 is identical to ISO-IR 6, except that bit​ combination 05/12 represents a "¥" (YEN SIGN) and bit combination 07/14 represents an over-line.​

Two character codes of the single-byte character sets invoked in the GL area of the code table, 02/00 and 05/12, have special signi-​ ficance in the DICOM Standard. The character SPACE, represented by bit combination 02/00, shall be used for the padding of Data​ Element Values that are character strings. The Graphic Character represented by the bit combination 05/12, "\" (BACKSLASH) in the​ repertoire ISO-IR 6, shall only be used in character strings with Value Representations of UT, ST and LT (see Section 6.2). Otherwise​ the character code 05/12 is used as a separator for multi-valued Data Elements (see Section 6.4).​

Note​

When the value of the Attribute Specific Character Set (0008,0005) is either "ISO_IR 13" or "ISO 2022 IR 13", the graphic​ character represented by the bit combination 05/12 is a "¥" (YEN SIGN) in the character set of ISO-IR 14.​

The character DELETE (bit combination 07/15) shall not be used in DICOM character strings.​

The replacement Character Repertoire specified in value 1 of the Attribute Specific Character Set (0008,0005) (or the Default Char-​ acter Repertoire if value 1 is empty) may be further extended with additional Coded Character Sets, if needed and permitted by the​ replacement Character Repertoire. The additional Coded Character Sets and extension mechanism shall be specified in additional​ values of the Attribute Specific Character Set. If Attribute Specific Character Set (0008,0005) has a single value, the DICOM SOP​ InstancesupportsonlyonecodetableandnoCodeExtensiontechniques.IfAttributeSpecificCharacterSet(0008,0005)hasmultiple​ values, the DICOM SOP Instance supports Code Extension techniques as described in ISO/IEC 2022:1994.​

The Character Repertoires that prohibit extension are identified in Part 3.​

Note​

1.​Considerations on the Handling of Unsupported Character Sets:​

In DICOM, character sets are not negotiated between Application Entities but are indicated by a conditional attribute of​ the SOP Common Module. Therefore, implementations may be confronted with character sets that are unknown to​ them.​

TheUnicodeStandardincludesasubstantialdiscussionoftherecommendedmeansfordisplayandprintforcharacters​ that lack font support. These same recommendations may apply to the mechanisms for unsupported character sets.​

Themachineshouldprintordisplaysuchcharactersbyreplacingallunknowncharacterswiththefourcharacters"\nnn",​ where "nnn" is the three digit octal representation of each byte.​

- Standard -​

Page 34​

DICOM PS3.5 2020a - Data Structures and Encoding​

An example of this for an ASCII based machine would be as follows:​

Character String: Günther​

Encoded representation: 04/07 15/12 06/14 07/04 06/08 06/05 07/02​

ASCII based machine: G\374nther​

Implementations may also encounter Control Characters that they have no means to print or display. The machine may​ print or display such Control Characters by replacing the Control Character with the four characters "\nnn", where "nnn"​ is the three digit octal representation of each byte.​

2.​Considerations for missing fonts​

The Unicode standard and the GB18030 standard define mechanisms for print and display of characters that are​ missing from the available fonts. If GBK is specified in Specific Character Set (0008,0005), the GB 18030 rules of print​ and display of characters shall apply. The DICOM Standard does not specify user interface behavior since it does not​ affect network or media data exchange.​

3.​The Unicode and GB18030 standards have distinct Yen symbol, backslash, and several forms of reverse solidus. The​ separator for multi-valued data elements in DICOM is the character valued 05/12 regardless of what glyph is used to​ enter or display this character. The other reverse solidus characters that have a very similar appearance are not separ-​ ators.Thechoiceoffontcanaffecttheappearanceof05/12significantly.Multi-byteencodingsystems,suchasGB18030,​ GBK and ISO 2022, may generate encodings that contain a byte valued 05/12. Only the character that encodes as a​ single byte valued 05/12 is a delimiter.​

For multi-valued Data Elements, existing implementations that are expecting only single-byte replacement character​ sets may misinterpret the Value Multiplicity of the Data Element as a consequence of interpreting 05/12 bytes in multi-​ byte characters or ISO 2022 escape sequences as delimiters, and this may affect the integrity of store-and-forward​ operations.ApplicationsthatdonotexplicitlystatesupportforGB18030,GBKorISO2022intheirconformancestatement,​ might exhibit such behavior.​

6.1.2.4 Code Extension Techniques​

For Data Elements with Value Representations of SH (Short String), LO (Long String), UC (Unlimited Characters), ST (Short Text),​ LT (Long Text), UT (Unlimited Text) or PN (Person Name), the Default Character Repertoire or the character repertoire specified by​ value 1 of Attribute Specific Character Set (0008,0005), may be extended using the Code Extension techniques specified by ISO/IEC​ 2022:1994.​

If such Code Extension techniques are used, the related Specific Character Set or Sets shall be specified by value 2 to value n of the​ Attribute Specific Character Set (0008,0005) of the SOP Common Module (see PS3.3), and shall be stated in the Conformance​ Statement.​

Note​

1.​Defined Terms for Specific Character Set (0008,0005) are defined in PS3.3.​

2.​SupportforJapanesekanji(ideographic),hiragana(phonetic),katakana(phonetic),Korean(HangulphoneticandHanja​ ideographic) and Chinese characters is defined in PS3.3.​

3.​The Chinese Character Set (GB18030) and Unicode (ISO 10646-1, 10646-2) do not allow the use of Code Extension​ Techniques. If either of these character sets is used, no other character set may be specified in the Specific Character​ Set (0008,0005) attribute, that is, it may have only one value.​

6.1.2.5 Usage of Code Extension​

DICOMsupportsCodeExtensiontechniquesiftheAttributeSpecificCharacterSet(0008,0005)ismulti-valued.Themethodemployed​ for Code Extension in DICOM is as described in ISO/IEC 2022:1994. The following assumptions shall be made and the following re-​ strictions shall apply:​

- Standard -​

DICOM PS3.5 2020a - Data Structures and Encoding​

Page 35​

6.1.2.5.1 Assumed Initial States​

•​CodeelementG0andcodeelementG1(in8-bitmodeonly)arealwaysinvokedintheGLandGRareasofthecodetablerespectively.​ Designated character sets for these code elements are immediately in use. Code elements G2 and G3 are not used.​

•​The primary set of Control Characters shall always be designated as the C0 code element and this shall be invoked in the CL area​ of the code table. The C1 code element shall not be used.​

6.1.2.5.2 Restrictions for Code Extension​

•​As code elements G0 and G1 always have shift status, Locking Shifts (SI, SO) are not required and shall not be used.​

•​As code elements G2 and G3 are not used, Single Shifts (SS2 and SS3) cannot be used.​

•​Only the ESC sequences specified in PS3.3 shall be used to activate Code Elements.​

6.1.2.5.3 Requirements​

Thecharactersetspecifiedbyvalue1oftheAttributeSpecificCharacterSet (0008,0005),ortheDefaultCharacterRepertoireif value​ 1 is missing, shall be active at the beginning of each textual Data Element value, and at the beginning of each line (i.e., after a CR​ and/or LF) or page (i.e., after an FF).​

If within a textual value a character set other than the one specified in value 1 of the Attribute Specific Character Set (0008,0005), or​ the Default Character Repertoire if value 1 is missing, has been invoked, the character set specified in the value 1, or the Default​ Character Repertoire if value 1 is missing, shall be active in the following instances:​

•​before the end of line (i.e., before the CR and/or LF)​

•​before the end of a page (i.e., before the FF)​

•​before any other Control Character other than ESC (e.g., before any TAB)​

•​before the end of a Data Element value (e.g., before the 05/12 character code that separates multiple textual Data Element Values​ - 05/12 corresponds to "\" (BACKSLASH) in the case of default repertoire IR-6 or "¥" (YEN SIGN) in the case of IR-14).​

•​before the "^" and "=" delimiters separating name components and name component groups in Data Elements with a VR of PN.​

If within a textual value a character set other than the one specified in value 1 of the Attribute Specific Character Set (0008,0005), or​ the Default Character Repertoire if value 1 is missing, is used, the Escape Sequence of this character set must be inserted explicitly​ in the following instances:​

•​before the first use of the character set in the line​

•​before the first use of the character set in the page​

•​before the first use of the character set in the Data Element value​

•​before the first use of the character set in the name component and name component group in Data Element with a VR of PN​

Note​

These requirements allow an application to skip lines, values, or components in a textual data element and start the new​ line with a defined character set without the need to track the character set changes in the text skipped. A similar restriction​ appears in the RFCs describing the use of multi-byte character sets over the Internet. An Escape Sequence switching to​ the value 1 or default Specific Character Set is not needed within a line, value, or component if no Code Extensions are​ present. Nor is a switch needed to the value 1 or default Specific Character Set if this character set has only the G0 Code​ Element defined, and the G0 Code Element is still active.​

6.1.2.5.4 Levels of Implementation and Initial Designation​

a.​Attribute Specific Character Set (0008,0005) not present:​

•​7-bit code​

- Standard -​