SMPTE ST 429-2:2023-09
Revision of SMPTE ST 429-2:2020
SMPTE Standard

D-Cinema Packaging β€” DCP Operational Constraints

Approved - 2023-09-12

Table of contentsπŸ”—

  1. Foreword
  2. 1 Scope
  3. 2 Conformance
  4. 3 Normative references
  5. 4 Terms and definitions
  6. 5 Overview (Informative)
    1. 5.1 General
    2. 5.2 D-Cinema Package
    3. 5.3 D-Cinema Composition
  7. 6 DCP Constraints
    1. 6.1 Minimum Contents
    2. 6.2 UUID Generation
    3. 6.3 XML Constraints
  8. 7 Packing List Constraints
    1. 7.1 General
    2. 7.2 Asset Identity
    3. 7.3 Unique Set of Assets
    4. 7.4 Digital Signature
    5. 7.5 Group ID
      1. 7.5.1 Composition Packages
      2. 7.5.2 Asset Packages
  9. 8 Composition Constraints
    1. 8.1 General
    2. 8.2 Edit Rate
    3. 8.3 Picture Essence Encoding
    4. 8.4 Sound Essence Encoding
    5. 8.5 Timed Text Essence Encoding
      1. 8.5.1 General
      2. 8.5.2 Fonts for Timed Text
      3. 8.5.3 Text Color Interpretation
      4. 8.5.4 Images for On-Screen Timed Text
      5. 8.5.5 Maximum Rate of Occurrence for On-Screen Timed Text
      6. 8.5.6 Constraints on Stereoscopic Control
      7. 8.5.7 IntrinsicPictureResolution Attribute
    6. 8.6 Sound and Picture Sample Rates
    7. 8.7 Track File Edit Rates
    8. 8.8 Homogenous Essence
  10. 9 Composition Playlist Constraints
    1. 9.1 Minimum Essence Requirement
    2. 9.2 Composition Playlist Uniqueness
    3. 9.3 ContentVersion Id
    4. 9.4 Reel Duration
    5. 9.5 Track Files
    6. 9.6 Picture Tracks
      1. 9.6.1 General
      2. 9.6.2 Essence Characteristics
    7. 9.7 Sound Tracks
      1. 9.7.1 General
      2. 9.7.2 Essence Characteristics
    8. 9.8 Timed Text Tracks
    9. 9.9 Marker Tracks
    10. 9.10 Cryptographic Keys
    11. 9.11 Hash Element
    12. 9.12 Digital Signature
    13. 9.13 Composition Metadata
  11. 10 Track File Constraints
    1. 10.1 General
    2. 10.2 Encryption
    3. 10.3 Picture Track Files
      1. 10.3.1 General
      2. 10.3.2 Operational Pattern
      3. 10.3.3 Compression
      4. 10.3.4 Wrapping
    4. 10.4 Sound Track Files
      1. 10.4.1 General
      2. 10.4.2 Operational Pattern
      3. 10.4.3 Wrapping
      4. 10.4.4 Channel Assignment
    5. 10.5 Timed Text Track Files
      1. 10.5.1 General
      2. 10.5.2 Timed Text Essence Format
      3. 10.5.3 Track File Format
      4. 10.5.4 Timed Text Essence Descriptor
  12. Annex A Audio Channel Assignment Label (Normative)
    1. A.1 General
    2. A.2 Static Container Channel Configurations
      1. A.2.1 General
      2. A.2.2 Channel Label Set ULs
      3. A.2.3 Channel Configuration Tables
    3. A.3 Configurations using MXF Multichannel Audio Framework
      1. A.3.1 General
      2. A.3.2 Configuration Channel Assignment Label
      3. A.3.3 AudioChannelLabelSubDescriptor
      4. A.3.4 Common D-Cinema Channels
      5. A.3.5 Extension Channels
      6. A.3.6 SoundfieldGroupLabelSubDescriptor
      7. A.3.7 Common D-Cinema Soundfield Groups
      8. A.3.8 Extension Soundfield Groups
  13. Annex B Additional Timed Text Essence Descriptor Items (Normative)
  14. Bibliography

ForewordπŸ”—

SMPTE (the Society of Motion Picture and Television Engineers) is an internationally-recognized standards developing organization. Headquartered and incorporated in the United States of America, SMPTE has members in over 80 countries on six continents. SMPTE’s Engineering Documents, including Standards, Recommended Practices, and Engineering Guidelines, are prepared by SMPTE’s Technology Committees. Participation in these Committees is open to all with a bona fide interest in their work. SMPTE cooperates closely with other standards-developing organizations, including ISO, IEC and ITU. SMPTE Engineering Documents are drafted in accordance with the rules given in its Standards Operations Manual.

At the time of publication no notice had been received by SMPTE claiming patent rights essential to the implementation of this Engineering Document. However, attention is drawn to the possibility that some of the elements of this document may be the subject of patent rights. SMPTE shall not be held responsible for identifying any or all such patent rights.

This document was prepared by Technology Committee 27C.

The following summarizes the changes from the previous edition of this document:

Copyright Β© 2024, Society of Motion Picture and Television Engineers. All rights reserved. No part of this material may be reproduced, by any means whatsoever, without the prior written permission of the Society of Motion Picture and Television Engineers.

1 ScopeπŸ”—

This document specifies a D-Cinema Package (DCP), a collection of files containing D-Cinema essence and related metadata to be ingested and reproduced by a D-Cinema playback system.

2 ConformanceπŸ”—

Normative text is text that describes elements of the design that are indispensable or contains the conformance language keywords: "shall", "should", or "may". Informative text is text that is potentially helpful to the user, but not indispensable, and can be removed, changed, or added editorially without affecting interoperability. Informative text does not contain any conformance keywords.

All text in this document is, by default, normative, except: the Introduction, any section explicitly labeled as "Informative" or individual paragraphs that start with "Note:"

The keywords "shall" and "shall not" indicate requirements strictly to be followed in order to conform to the document and from which no deviation is permitted.

The keywords, "should" and "should not" indicate that, among several possibilities, one is recommended as particularly suitable, without mentioning or excluding others; or that a certain course of action is preferred but not necessarily required; or that (in the negative form) a certain possibility or course of action is deprecated but not prohibited.

The keywords "may" and "need not" indicate courses of action permissible within the limits of the document.

The keyword "reserved" indicates a provision that is not defined at this time, shall not be used, and may be defined in the future. The keyword "forbidden" indicates "reserved" and in addition indicates that the provision will never be defined in the future.

A conformant implementation according to this document is one that includes all mandatory provisions ("shall") and, if implemented, all recommended provisions ("should") as described. A conformant implementation need not implement optional provisions ("may") and need not implement them as described.

Unless otherwise specified, the order of precedence of the types of normative information in this document shall be as follows: Normative prose shall be the authoritative definition; Tables shall be next; then formal languages; then figures; and then any other language forms.

3 Normative referencesπŸ”—

The following documents are referred to in the text in such a way that some or all of their content constitutes requirements of this document. For dated references, only the edition cited applies. For undated references, the latest edition of the referenced document (including any amendments) applies.

4 Terms and definitionsπŸ”—

For the purposes of this document, the terms and definitions given in the following documents and the additional terms and definitions apply:

Digital Cinema
D-Cinema
D-Cinema Package
DCP
International Standard Audiovisual Number
ISAN
[SOURCE: IETF RFC 4246]
Unique Material Identifier
UMID
[SOURCE: SMPTE ST 330]
Universally Unique Identifier
UUID
[SOURCE: IETF RFC 4122]
eXtensible Markup Language
XML
[SOURCE: W3C XML 1.0]

5 Overview (Informative)πŸ”—

5.1 GeneralπŸ”—

D-Cinema content is composed of a number of distinct elements such as Composition Playlists and Track Files (D-Cinema assets). For delivery to D-Cinema systems, assets are combined into a logical D-Cinema Package (DCP). The syntax and semantics of these assets and the DCP are described by the family of D-Cinema specifications depicted in Figure 1. To promote modularity and layering, each document has a limited scope and often defines a single structure or format.

This specification describes operational constraints applicable to the complete DCP. While structure-specific constraints are addressed in the document that defines a particular structure, this document defines constraints that apply to the combined set of structures that comprise a DCP. For instance, constraints specific to the Composition Playlist, such as those related to content markers, must be defined in the Composition Playlist (CPL) specification, whereas constraints that apply to the DCP as a whole, such as composition edit rate, will be defined in this document.

Figure 1 –⁠ DCP Family of Specifications.

5.2 D-Cinema PackageπŸ”—

A D-Cinema Package (DCP) is a set of files consisting of one (1) Packing List (SMPTE ST 429-8) and each of the files referenced by that Packing List. Figure 2 illustrates this structure. The figure shows a Packing List with ten asset references. Each asset reference points to one of the nine track files or the Composition Playlist. A Packing List may reference any combination of Track Files and Composition Playlists, however the set of referenced files must contain no duplicates.

A DCP may contain one or more complete Compositions, or it may contain components of compositions destined to complete, augment or replace previously distributed material.

Figure 2 –⁠ A D-Cinema Package consists of a Packing List and the files to which it refers.

5.3 D-Cinema CompositionπŸ”—

A Composition is a set of files consisting of one (1) Composition Playlist document (SMPTE ST 429-7) and each of the Track Files (see Clause 10 below) referred to from within that Composition Playlist. Figure 3 illustrates this structure for a composition having three reels of image, sound and subtitles.

Figure 3 –⁠ A Composition consists of a Composition Playlist and the Track Files to which it refers.

6 DCP ConstraintsπŸ”—

6.1 Minimum ContentsπŸ”—

A DCP shall consist of one Packing List and one or more assets (i.e., Composition Playlists and/or Track Files), referenced by the Packing List.

6.2 UUID GenerationπŸ”—

UUID values are used throughout the DCP to uniquely identify assets and data structures. All UUID values in a DCP shall be generated as specified in IETF RFC 4122. UUID values which identify assets or encryption keys shall be generated using a truly-random or pseudo-random number source, and shall have a Version field value of 4 (0100b) (as specified in IETF RFC 4122).

NOTE —⁠ The b suffix on this value indicates a binary encoding, most significant bit (MSB) first.

6.3 XML ConstraintsπŸ”—

XML documents (SMPTE ST 428-7, SMPTE ST 429-7, SMPTE ST 429-8, SMPTE ST 429-10, SMPTE ST 429-12) in a DCP shall be encoded using the UTF-8 character encoding (ISO/IEC 10646) and shall comply with SMPTE ST 429-17.

7 Packing List ConstraintsπŸ”—

7.1 GeneralπŸ”—

The Packing List document which defines the DCP contents shall be created as specified in SMPTE ST 429-8. Note that the specification requires that each Packing List document must have a unique UUID value in the top-level Id element. A Packing List may reference assets which are referenced by other Packing Lists.

7.2 Asset IdentityπŸ”—

The value of the Id element within each Asset element shall be extracted from the referenced asset per the specification for the asset (see SMPTE ST 429-3 and SMPTE ST 429-7.)

7.3 Unique Set of AssetsπŸ”—

Each Asset element shall contain an Id element value that is unique within the Packing List.

7.4 Digital SignatureπŸ”—

When a Packing List document is digitally signed as specified in SMPTE ST 429-8, digital certificates in the signer's certificate chain shall conform to the provisions of SMPTE ST 430-2.

7.5 Group IDπŸ”—

7.5.1 Composition PackagesπŸ”—

A Composition Package is a DCP containing only the complete set of assets comprising one or more compositions. The GroupId element shall not be present in the Packing List of a Composition Package.

7.5.2 Asset PackagesπŸ”—

An Asset Package is a DCP containing Track Files and/or Composition Playlists comprising one or more incomplete compositions (i.e., some assets needed to complete the composition are not present in the package.) Asset Packages shall be identified by the presence of the GroupId element in the Packing List. An Asset Package should contain only related assets (i.e., partial sets of assets from two unrelated compositions should be listed in separate Packing Lists using different GroupId values.) When two or more Asset Packages contain related assets, the Packing Lists should have the same GroupId value.

8 Composition ConstraintsπŸ”—

8.1 GeneralπŸ”—

A Composition (i.e., a Composition Playlist and referenced Track Files) may be delivered in a single DCP or it may be spread across several DCPs. Regardless of the number of DCPs used to convey a Composition, a Composition shall conform to the following constraints.

8.2 Edit RateπŸ”—

The composition shall have an Edit Rate of 24/1, 25/1, 30/1, 48/1, 50/1 or 60/1.

8.3 Picture Essence EncodingπŸ”—

Picture essence tracks shall be encoded as specified in SMPTE ST 428-1. The pixel array size and frame rate shall be one of the formats listed in Table 1. Monoscopic picture essence tracks shall have matching frame rate and edit rate. Stereoscopic picture essence tracks shall be limited to the 2K formats, and shall have a frame rate of 48/1 and an edit rate equal to half the frame rate (re= rf/2). (See SMPTE ST 429-10 for an explanation).

Source images having an aspect ratio not listed in Table 1 should be encoded so that the image fills either the horizontal or vertical dimension of the desired Full pixel array (2K or 4K). To fill the pixel array in the opposite dimension, the image should be padded with an equal number of black pixels on each side, i.e., "letter-box" (top side, bottom side) or "pillar-box" (left side, right side).

Table 1 –⁠ Pixel Array Dimensions
Format Horizontal Pixels Vertical Pixels Frame Rate
2K Scope (2.39:1) 2048 858 24/1, 25/1, 30/1, 48/1, 50/1 or 60/1
2K Flat (1.85:1) 1998 1080 24/1, 25/1, 30/1, 48/1, 50/1 or 60/1
2K Full (1.90:1) 2048 1080 24/1, 25/1, 30/1, 48/1, 50/1 or 60/1
4K Scope (2.39:1) 4096 1716 24/1, 25/1 or 30/1
4K Flat (1.85:1) 3996 2160 24/1, 25/1 or 30/1
4K Full (1.90:1) 4096 2160 24/1, 25/1 or 30/1

8.4 Sound Essence EncodingπŸ”—

Sound essence tracks shall be encoded as specified in SMPTE ST 428-2. 10.3.4 and Annex A specify means of identifying the content of these essence tracks.

8.5 Timed Text Essence EncodingπŸ”—

8.5.1 GeneralπŸ”—

Timed Text essence shall be encoded as XML data as specified in SMPTE ST 428-7, and may be constrained per SMPTE ST 428-10. Sub-pictures shall be encoded as Portable Network Graphics (PNG) images as specified in ISO/IEC 15948.

8.5.2 Fonts for Timed TextπŸ”—

When Text elements are present in the Timed Text essence, one (1) LoadFont element shall be present. Timed Text essence shall not contain more than one (1) LoadFont element.

Within the scope of any given Subtitle element, all Font elements shall have the same EffectSize attribute value.

The font resource should not be larger than 10MB.

NOTE 1 —⁠ Legacy implementations might not be able to support font resources larger than 640 KB.

NOTE 2 —⁠ Operational testing has determined that a Font size smaller than 8 pt might be difficult to read, and that, depending on the length of the subtitle, a very large Font size might take too long to appear and might go beyond the dimension of the Primary Picture.

8.5.3 Text Color InterpretationπŸ”—

Color values encoded in the Timed Text essence (in the Color and EffectColor attributes of the Font element) shall be encoded as sRGB values (IEC 61966-2-1).

8.5.4 Images for On-Screen Timed TextπŸ”—

PNG image resources used per SMPTE ST 428-7 shall have three (3) 8-bit color components (R, G, and B). An alpha channel may be present. If an alpha channel is present, the decoder shall use it when creating the composite image. PNG image resources shall contain the sRGB chunk per ISO/IEC 15948.

The width and height of a subpicture shall be equal to or less than the width and height, respectively, of the associated main picture.

8.5.5 Maximum Rate of Occurrence for On-Screen Timed TextπŸ”—

Up to two (2) subtitle instances may be visible on screen at any time. The visibility period of an instance shall include fade-in and fade-out times. A subtitle instance shall contain no more than six (6) Text elements or three (3) Image elements.

8.5.6 Constraints on Stereoscopic ControlπŸ”—

All Text and Image elements to be displayed at the same time shall have the same depth information specified through Zvalue within VariableZ and/or Zposition attributes.

8.5.7 IntrinsicPictureResolution AttributeπŸ”—

When present, the value of the IntrinsicPictureResolution attribute of the SubtitleReel element (see SMPTE ST 428-7) shall be one of the values listed in Table 2 below.

Table 2 –⁠ IntrinsicPictureResolution Attribute Values
Attribute Value
2K Scope
2K Flat
2K Full
4K Scope
4K Flat
4K Full

NOTE —⁠ The IntrinsicPictureResolution attribute is intended to guide the mastering operator to select the appropriate subtitle resources for the Primary Picture content.

8.6 Sound and Picture Sample RatesπŸ”—

The sample rate of sound essence in a Composition shall be one of the combinations listed in Table 3.

Table 3 –⁠ Sample Rate Constraints
Sound Sample Rate Composition Edit Rate Samples per Edit Unit
48 kHz 24/1 2000
48 kHz 25/1 1920
48 kHz 30/1 1600
48 kHz 48/1 1000
48 kHz 50/1 960
48 kHz 60/1 800
96 kHz 24/1 4000
96 kHz 25/1 3840
96 kHz 30/1 3200
96 kHz 48/1 2000
96 kHz 50/1 1920
96 kHz 60/1 1600

8.7 Track File Edit RatesπŸ”—

All essence tracks in a Composition shall have an identical Edit Rate.

8.8 Homogenous EssenceπŸ”—

Essence tracks in a Composition shall have homogenous encoding parameter values throughout the Composition. Picture essence shall have constant frame rate and pixel array size. Sound essence shall have constant sample rate, language, channel count, and channel assignment parameters.

9 Composition Playlist ConstraintsπŸ”—

9.1 Minimum Essence RequirementπŸ”—

A Composition Playlist shall have one picture essence track and one sound essence track in each Reel element.

9.2 Composition Playlist UniquenessπŸ”—

Two Composition Playlist documents having different contents shall have different values in the top-level Id element.

9.3 ContentVersion IdπŸ”—

The Id element within the ContentVersion element shall contain a URI value conforming to one of the following types:

NOTE —⁠ The Id element of the ContentVersion element is intended to remain constant across multiple Composition Playlist instances referencing the same underlying content. For instance, both a pre-release and a final version of a Composition Playlist associated with the same feature can have the same ContentVersion/Id, while their Id elements are different. In a typical application, ContentVersion/Id can be used as a reference to an internal booking system.

9.4 Reel DurationπŸ”—

The Duration element shall be present within every Asset element that refers to an external track file. The value of all Duration elements in a reel, with the exception of timed text elements, shall be equal. The Duration of the Reel shall be determined by the MainPicture element, per the provisions of SMPTE ST 429-7, or the MainStereoscopicPicture element, whichever is present.

9.5 Track FilesπŸ”—

Track files referenced by a Composition Playlist shall conform to the provisions of Clause 10 of this document.

9.6 Picture TracksπŸ”—

9.6.1 GeneralπŸ”—

Each Reel element in a Composition Playlist document shall contain one (1) MainPicture element (SMPTE ST 429-7) or one (1) MainStereoscopicPicture element (SMPTE ST 429-10). This element shall refer to a Picture Track File as defined by SMPTE ST 429-3. If the element name is MainStereoscopicPicture, the referenced Track File shall also conform to SMPTE ST 429-10.

9.6.2 Essence CharacteristicsπŸ”—

All picture assets in a Composition Playlist shall have identical values for the following metadata items:

  • element name (i.e., MainPicture or MainStereoscopicPicture)
  • EditRate element
  • FrameRate element
  • ScreenAspectRatio element

9.7 Sound TracksπŸ”—

9.7.1 GeneralπŸ”—

This element shall refer to a Sound Track File as defined by SMPTE ST 429-3.

9.7.2 Essence CharacteristicsπŸ”—

All sound assets in a Composition Playlist shall have identical values for the following metadata items:

  • EditRate element
  • Language element

9.8 Timed Text TracksπŸ”—

A timed text track is established by the presence of a timed text asset (e.g. MainSubtitle, MainCaption, ClosedSubtitle, or ClosedCaption) in at least one Reel of a Composition. Once a timed text asset appears in one Reel, the established track shall be assumed to exist for the entire Composition, even if related timed text Asset elements are not present in all Reels.

Each Reel element in a Composition Playlist document may contain one on-screen text track, either MainSubtitle as defined by SMPTE ST 429-7 or MainCaption as defined by SMPTE ST 429-12. When present, the MainSubtitle element shall refer to a Timed Text Track File as defined by SMPTE ST 429-5, containing an XML resource conforming to SMPTE ST 428-7. When present, the MainCaption element shall refer to a Timed Text Track File as defined by SMPTE ST 429-5, containing an XML resource conforming to SMPTE ST 428-10. A Composition Playlist shall contain no more than one on-screen text track type (MainSubtitle or MainCaption).

Each Reel element in a Composition Playlist document may contain up to six (6) off-screen (closed) text tracks, using any combination of ClosedSubtitle and ClosedCaption elements as defined by SMPTE ST 429-12. When present, an off-screen text element shall refer to a Timed Text Track File as defined by SMPTE ST 429-5, containing an XML resource conforming to SMPTE ST 428-10. When more than one off-screen text track asset of the same type (ClosedSubtitle or ClosedCaption) is present, the Language attribute shall be used. The Language attribute value of each off-screen text track shall be unique among the set of similarly-typed off-screen text tracks. The value of the Language attribute shall be used to identify material of the same off-screen text track from Reel to Reel for each Asset type instance.

The maximum number of timed text tracks in a Composition Playlist document is seven (7); one (1) on-screen text track plus six (6) off-screen text tracks. Each off-screen text track with a unique combination of element name and Language shall be considered a distinct off-screen text track.

In order to illustrate the concepts in this section, the example diagram in Figure 4 shows a collection of Composition assets on the left, and a Composition with tracks on the right. Each reel shown on the left contains a number of off-screen timed text assets that appears to be within the specified limit of this standard. However, in the example, the number of off-screen text tracks possible is seven, which is more than that allowed by this standard. The Composition on the right is correctly constrained. Note that each timed text track exists for the duration of the Composition, even though it might not be represented by an asset in every reel.

Figure 4 –⁠ Example of allocating timed text assets to timed text tracks.

9.9 Marker TracksπŸ”—

When present, a MainMarkers element shall not contain either:

NOTE —⁠ As specified in SMPTE ST 429-7, a MainMarkers element contains neither an EntryPoint element nor a Duration element since it does not reference a Track File.

9.10 Cryptographic KeysπŸ”—

No more than 256 distinct cryptographic keys, as uniquely identified by their Key ID, shall be used to encrypt the assets referenced by a Composition Playlist.

9.11 Hash ElementπŸ”—

The Hash element shall be present in an asset when the KeyId element is present (i.e., when the referenced Track File is encrypted).

9.12 Digital SignatureπŸ”—

When a Composition Playlist document is digitally signed as specified in SMPTE ST 429-7, digital certificates in the signer's certificate chain shall conform to the provisions of SMPTE ST 430-2.

9.13 Composition MetadataπŸ”—

The CompositionMetadataAsset element defined in SMPTE ST 429-16 should be present.

10 Track File ConstraintsπŸ”—

10.1 GeneralπŸ”—

Essence data shall be contained in MXF files (SMPTE ST 377-1) constrained according to SMPTE ST 429-20.

10.2 EncryptionπŸ”—

When cryptographic protection is required, Track Files shall use KLV encryption per SMPTE ST 429-6. Each encrypted Track File shall be encrypted with exactly one (1) 128-bit symmetric key, which is the Cipher Key of the Track File.

The Essence Container Label urn:smpte:ul:060e2b34.04010107.0d010301.020b0100 shall be used for both frame- and clip-wrapped essence.

NOTE 1 —⁠ SMPTE ST 429-6 deprecates the Essence Container Label urn:smpte:ul:060e2b34.04010107.0d010301.020b0100 for clip-wrapped essence outside of D-Cinema applications.

If the Encrypted Track File contains MIC items, the MIC Key used to generate the MIC items shall be derived from the Cipher Key of Track File using the Legacy MIC Key derivation algorithm specified at SMPTE ST 429-6.

NOTE 2 —⁠ SMPTE ST 429-6 no longer specifies a MIC Key derivation method as part of its Reference Decryption Processing Model. This method however remains in use when generating MIC items during Encrypted Track File authoring. The generated MIC Key is carried in the KDM as specified at SMPTE ST 430-1.

10.3 Picture Track FilesπŸ”—

10.3.1 GeneralπŸ”—

In addition to the essence encoding constraints specified in Clause 8, Picture Track Files shall have the following properties.

10.3.2 Operational PatternπŸ”—

Picture Track Files shall conform to the provisions of SMPTE ST 429-3.

10.3.3 CompressionπŸ”—

Picture essence shall consist of a sequence of codestreams that conform either to the 2K digital cinema profile or the 4K digital cinema profile specified at Rec. ITU-T T.800 | ISO/IEC 15444-1.

There shall be 5 wavelet transform levels for 2K picture essence.

There shall be 6 wavelet transform levels for 4K picture essence.

10.3.4 WrappingπŸ”—

Picture essence shall be frame wrapped according to SMPTE ST 422 and SMPTE ST 429-4. Stereoscopic picture essence shall also conform to SMPTE ST 429-10.

10.4 Sound Track FilesπŸ”—

10.4.1 GeneralπŸ”—

In addition to the essence encoding constraints specified in Clause 8 above, Sound Track Files shall have the following properties.

10.4.2 Operational PatternπŸ”—

Sound Track Files shall conform to the provisions of SMPTE ST 429-3.

10.4.3 WrappingπŸ”—

Sound essence shall be frame wrapped per SMPTE ST 382. Sound essence shall be contained in KLV packets labeled with the Wave Frame Wrapped Element UL. A Wave Audio Essence Descriptor shall be present in the Top-Level File Package.

10.4.4 Channel AssignmentπŸ”—

Channel assignment defines what reproduction channel is carried in each channel of the distributed track. Sound Track File channel assignment shall be indicated by a UL value in the Channel Assignment property of the Wave Audio Essence Descriptor. The UL may indicate a fixed channel assignment. Annex A defines a set of channel assignments and respective UL values based on this method. The UL may also indicate a channel assignment scheme defined in another specification. In this case, additional details regarding channel assignment shall be provided by the specification that defines the UL.

If the Channel Assignment property is not present, Channel Configuration 1 (Table A.3) shall be assumed by the decoder. Routing of the container channel to the system audio output is not in the scope of this document.

10.5 Timed Text Track FilesπŸ”—

10.5.1 GeneralπŸ”—

In addition to the essence encoding constraints specified in Clause 8 above, Timed Text Track Files shall have the following properties.

10.5.2 Timed Text Essence FormatπŸ”—

Timed Text essence shall be encoded as XML data as specified in SMPTE ST 428-7, and may be constrained per SMPTE ST 428-10. See 8.4 and 9.8 above.

10.5.3 Track File FormatπŸ”—

Timed Text Track Files shall be created according to SMPTE ST 429-5.

10.5.4 Timed Text Essence DescriptorπŸ”—

If the DCDM Subtitle file contains the IntrinsicPictureResolution attribute (see SMPTE ST 428-7), then the Intrinsic Picture Resolution property of the Timed Text Essence Descriptor, defined in Annex B, should be present in the Timed Text Track File and, when present, shall represent the same value.

If the DCDM Subtitle file contains the DisplayType element (see SMPTE ST 428-7), then the Display Type property of the Timed Text Essence Descriptor, defined in Annex B, should be present in the Timed Text Track File and, when present, shall represent the same value.

If the Timed Text Essence Descriptor property RFC 5646 Language Tag List is present, it shall contain at least the language code specified in the DCDM Subtitle file.

If at least one subtitle instance of the DCDM Subtitle file contains a Zposition attribute (as defined in SMPTE ST 428-7), the Z-Position In Use property of the Timed Text Essence Descriptor shall be non-zero.

Annex A
Audio Channel Assignment Label (Normative)πŸ”—

A.1 GeneralπŸ”—

NOTE —⁠ Implementation behavior is undefined when a Sound Track File fails to adhere to the normative provisions specified herein.

SMPTE ST 382 carries multi-channel PCM sound samples by using sample interleave on a channel basis. Each sample position can be thought of as a channel within the container specified at SMPTE ST 382.

The number of channels within the Sound Track File shall be an even number. The inclusion of a channel of silence may be required to achieve this.

Clause A.1 and Clause A.2 each specifies a method for unambiguously identifying the channels present in Sound Track Files and indicating their intended reproduction location in the theater. Each method uses the ChannelAssignment property of the WaveAudioEssence Descriptor in a Sound Track File, as specified in 10.4.4 above.

Compliant playback devices shall use the ChannelAssignment property to identify the sound channels being used.

A.2 Static Container Channel ConfigurationsπŸ”—

A.2.1 GeneralπŸ”—

Each table in this Annex defines a container channel configuration that has a corresponding Universal Label (UL) for use as a value of the ChannelAssignment property. Container channels are numbered in sample packing order. The first sample is carried in container channel 1, the second in container channel 2 and so on.

The number of channels contained in a Sound Track file shall be less than or equal to the number of channels defined by the table associated with the ChannelAssignment property. However, if a given container channel is present, it shall be used according to the table. The WaveAudioEssence Descriptor ChannelCount property may be used in combination with the ChannelAssignment property to determine actual channel usage. For instance, a ChannelAssignment label indicating Channel Configuration 1 may accompany a container with a ChannelCount value of 6, indicating that channels 7 and 8 (Hearing Impaired and Visually Impaired-Narrative) are not present.

The special case of no specified channel configuration is also provided for (see Table A.6). The label associated with this table shall mean no configuration specified. This may be used for test or experimental purposes.

NOTE —⁠ For the purpose of setting appropriate transport flags, implementations should not assume that all audio channels in Channel Configuration 4 contain linear PCM audio samples suitable for direct conversion to an analog audio signal.

A.2.2 Channel Label Set ULsπŸ”—

Table A.1 –⁠ Specification of the Channel Assignment Label when Static Container Channel Configurations are used
Byte No. Description Value (hex) Meaning
1-7 Registry Designator See register
8 Registry Version Number 0bh Version of the register in which this label first appears
9 Parametric 04h Node used to define parametric data
10 Sound Essence 02h Identifies sound essence coding
11 Sound Coding Characteristics 02h Identifies sound coding characteristics
12 Sound Channel Labeling 10h Identifies sound channel labeling
13 Sound Channel Labeling SMPTE ST 429-2 03h Identifies sound channel labeling as defined in this document (SMPTE ST 429-2)
14 Channel Label Sets 01h Identifies Static Sound Channel Label Sets
15 Channel Configuration See Table A.2 Identifies sound Channel Configuration
16 Reserved 00h Reserved
Table A.2 –⁠ Values for Table A.1, Byte 15
Channel Configuration Byte 15 Value
Channel Configuration 1 (Table A.3) 01h
Channel Configuration 2 (Table A.4) 02h
Channel Configuration 3 (Table A.5) 03h
Channel Configuration 4 (Table A.6) 04h
Channel Configuration 5 (Table A.7) 05h

A.2.3 Channel Configuration TablesπŸ”—

Table A.3 –⁠ Channel Configuration 1
Container Channel SMPTE ST 428-12 Name
1 Left
2 Right
3 Center
4 LFE
5 Left Surround
6 Right Surround
7 Hearing Impaired
8 Visually Impaired-Narrative
Table A.4 –⁠ Channel Configuration 2
Container Channel SMPTE ST 428-12 Name
1 Left
2 Right
3 Center
4 LFE
5 Left Surround
6 Right Surround
7 Center Surround
8 Not Used
9 Hearing Impaired
10 Visually Impaired-Narrative
Table A.5 –⁠ Channel Configuration 3
Container Channel SMPTE ST 428-12 Name
1 Left
2 Right
3 Center
4 LFE
5 Left Surround
6 Right Surround
7 Left Center
8 Right Center
9 Hearing Impaired
10 Visually Impaired-Narrative
Table A.6 –⁠ Channel Configuration 4
Container Channel Name
1 CH01
2 CH02
3 CH03
4 CH04
5 CH05
6 CH06
7 CH07
8 CH08
9 CH09
10 CH10
11 CH11
12 CH12
13 CH13
14 CH14
15 CH15
16 CH16
Table A.7 –⁠ Channel Configuration 5
Container Channel SMPTE ST 428-12 Name
1 Left
2 Right
3 Center
4 LFE
5 Left Side Surround
6 Right Side Surround
7 Left Rear Surround
8 Right Rear Surround
9 Hearing Impaired
10 Visually Impaired-Narrative

NOTE —⁠ Earlier revisions of this specification used terminology from SMPTE ST 428-3, instead of SMPTE ST 428-12, to define the mappings from container channels to audio channels. Although the mappings remain unchanged, the terms used to refer to a few of the audio channels have changed. For instance, SMPTE ST 428-12 differentiates Side Surrounds (Lss/Rss) from Left and Right surrounds (Ls/Rs) and uses Lrs to refer to the Left Rear Surround channel, whereas SMPTE ST 428-3 uses Rls.

A.3 Configurations using MXF Multichannel Audio FrameworkπŸ”—

A.3.1 GeneralπŸ”—

When the ChannelAssignment of the WaveAudioEssence Descriptor in a Sound Track File contains the UL defined in Table A.8, the framework specified in SMPTE ST 377-4 shall be used in conjunction with the constraints defined in A.3.2 and A.3.3 to unambiguously identify the audio channels and soundfield group carried in the Sound Track File.

NOTE —⁠ Items defined in SMPTE ST 377-4 that are not specified in this section can nevertheless be present in the Sound Track File and describe particular aspects of an audio channel or soundfield group. Implementations can safely ignore these items.

The MXF Multichannel Audio Framework (MCA Framework) associates audio channels and soundfield groups contained within a D-Cinema Sound Track File with an MXF SubDescriptor that contains metadata, including a unique identifier. This enables D-Cinema implementations to properly route and process audio channels, e.g. the Hearing Impaired and Left channels may be handled by different devices. It also enables straightforward extensibility for the purpose of both experimentation and widespread use: new standalone audio channels can be defined without impacting existing soundfield groups and new soundfield groups can be introduced with minimal effort.

Figure A.1 illustrates the use of the audio channel and soundfield group information contained in a Sound Track File, as specified here.

Figure A.1 –⁠ Illustrative use of AudioChannelLabelSubDescriptor and SoundfieldGroupLabelSubDescriptor for a Sound Track File containing 10 audio channels consisting of a 5.1 soundfield group and associated Hearing Impaired and Visually Impaired-Narrative channels (Informative). The audio channel labeling method defined in this section is not limited to this specific channel count or soundfield configuration.

A.3.2 Configuration Channel Assignment LabelπŸ”—

Table A.8 –⁠ Specification of the Channel Assignment Label when the MCA Framework is used
Byte No. Description Value (hex) Meaning
1-7 Registry Designator See register
8 Registry Version Number 0D Version of the register in which this label first appears
9 Parametric 04h Node used to define parametric data
10 Sound Essence 02h Identifies sound essence coding
11 Sound Coding Characteristics 02h Identifies sound coding characteristics
12 Sound Channel Labeling 10h Identifies sound channel labeling
13 Sound Channel Labeling SMPTE ST 429-2 03h Identifies sound channel labeling as defined in this document (SMPTE ST 429-2)
14 D-Cinema Application of the MXF Multichannel Audio Framework 02h Indicates that the D-Cinema Application of the MXF Multichannel Audio Framework is used
15 Reserved 00h Reserved
16 Reserved 00h Reserved

A.3.3 AudioChannelLabelSubDescriptorπŸ”—

Each audio channel contained in the Sound Track File shall be associated with zero or one AudioChannelLabelSubDescriptor instance, and each AudioChannelLabelSubDescriptor instance shall be associated with an audio channel.

Implementations shall ignore audio channels not associated with an AudioChannelLabelSubDescriptor instance. These channels should contain silence.

NOTE —⁠ The ChannelCount property of the Wave Audio Essence Descriptor reflects the number of channels in the Sound Track File and not the number of AudioChannelLabelSubDescriptor instances.

In addition to the items required by SMPTE ST 377-4, the following items shall be present in every AudioChannelLabelSubDescriptor instance:

  • MCA Channel ID
  • MCA Tag Name
  • RFC 5646 Spoken Language
  • SoundfieldGroupLinkID, if and only if the audio channel referenced by the AudioChannelLabelSubDescriptor instance belongs to a soundfield group associated with a SoundfieldGroupLabelSubDescriptor instance. If present, SoundfieldGroupLinkID shall contain the MCA Link ID value of the associated SoundfieldGroupLabelSubDescriptor instance.

Not all audio channels present in a Sound Track File need to be associated with a soundfield group. For example, Hearing Impaired and Visually Impaired-Narrative channels, if present, do not belong to a soundfield group and, hence, their respective AudioChannelLabelSubDescriptor instances do not reference a SoundfieldGroupLabelSubDescriptor instance.

If an audio channel is associated with a soundfield group, then the value of their respective RFC 5646 Spoken Language items shall be equal.

A.3.4 Common D-Cinema ChannelsπŸ”—

Implementations shall recognize the common D-Cinema audio channels defined in SMPTE ST 428-12.

The presence of such an audio channel shall be indicated by an AudioChannelLabelSubDescriptor instance whose MCA Label Dictionary ID value is equal to UL of the audio channel as specified at SMPTE ST 428-12.

The MCA Tag Name of such an AudioChannelLabelSubDescriptor instance shall be equal to the Name (as specified in SMPTE ST 428-12) of the audio channel associated with the UL value.

The MCA Tag Symbol item of such an AudioChannelLabelSubDescriptor instance shall be constructed by prepending the string ch to the Symbol (as specified in SMPTE ST 428-12) of the audio channel associated with the UL value.

No audio channel listed at SMPTE ST 428-12 shall appear more than once in a given Sound Track File with the exception of Hearing Impaired and Visually Impaired-Narrative channels. If there are multiple Hearing Impaired or Visually Impaired-Narrative channels in a Sound Track File, they shall be distinguished by the value of their RFC 5646 Spoken Language item.

Furthermore, the RFC 5646 Spoken Language item shall not have the same value in two or more audio channels labeled Hearing Impaired, and the RFC 5646 Spoken Language item shall not have the same value in two or more audio channels labeled Visually Impaired-Narrative.

A.3.5 Extension ChannelsπŸ”—

For extensibility, channels not defined at SMPTE ST 428-12 may be present.

Implementations shall not automatically pre-assign an audio channel with an AudioChannelLabelSubDescriptor instance having a MCA Label Dictionary ID that the implementation does not recognize and, for the purpose of setting appropriate transport flags, should not assume that such an audio channel contains linear PCM audio samples suitable for direct conversion to an analog audio signal.

Implementations may display to the user channels associated with an MCA Label Dictionary ID they do not recognize and offer the user the option to take action on such a channel based on the MCA Tag Name, MCA Tag Symbol and RFC 5646 Spoken Language of the AudioChannelLabelSubDescriptor instance that references it.

A.3.6 SoundfieldGroupLabelSubDescriptorπŸ”—

There shall be one and only one SoundfieldGroupLabelSubDescriptor instance in the Sound Track file.

In addition to the items required by SMPTE ST 377-4, the following items shall be present in the SoundfieldGroupLabelSubDescriptor instance:

  • MCA Tag Name
  • RFC 5646 Spoken Language

A.3.7 Common D-Cinema Soundfield GroupsπŸ”—

Implementations shall recognize the common D-Cinema soundfield groups specified at SMPTE ST 428-12.

The presence of such a soundfield group shall be indicated by SoundfieldGroupLabelSubDescriptor instance whose MCA Label Dictionary ID value is equal to one of the UL specified at SMPTE ST 428-12.

The MCA Tag Name of such a SoundfieldGroupLabelSubDescriptor instance shall match the value of the Name of the soundfield group (as specified in SMPTE ST 428-12) associated with the UL value.

The MCA Tag Symbol item of such an SoundfieldGroupLabelSubDescriptor instance shall be constructed by prepending the string sg to the Symbol of the soundfield group (as specified in SMPTE ST 428-12) associated with the UL value.

Not all channels listed in the Audio Channels column of a given soundfield group in SMPTE ST 428-12 need to be present in the sound track file, but only those channels listed in the Audio Channels column for a given soundfield group may reference that SoundfieldGroupLabelSubDescriptor instance. Furthermore, if a channel is listed in the Audio Channels column of a given soundfield group but absent in the Sound Track File, then implementations shall assume the channel was not intended for reproduction by the content provider.

NOTE —⁠ Implementations may indicate to the user if a channel listed in the Audio Channels column for a given soundfield group is not present.

A.3.8 Extension Soundfield GroupsπŸ”—

For extensibility, soundfield groups not defined at SMPTE ST 428-12 may be present. However, implementations shall take no action with a SoundfieldGroupLabelSubDescriptor instance having a MCA Label Dictionary ID that the implementation does not recognize or if a channel that is not listed in the Audio Channels column for a given soundfield group references that SoundfieldGroupLabelSubDescriptor instance.

NOTE —⁠ Implementations can use the SoundfieldGroupLabelSubDescriptor instance for display to the user and to appropriately configure the B-Chain for the intended soundfield reproduction.

Annex B
Additional Timed Text Essence Descriptor Items (Normative)πŸ”—

The items listed below are additional properties that may be use in the Timed Text Essence Descriptor when creating MXF Timed Text Track Files as defined by SMPTE ST 429-5. The usage of these items is detailed in the main body of this document.

Table B.1 –⁠ Timed Text Essence Descriptor - Additional Properties
Item Name Type Len UL Designator Req ? Meaning Default
Display Type UTF16 String var 06.0E.2B.34 01.01.01.0E 06.01.01.02 04.00.00.00 Opt A text string giving an application specific means to indicate the intended use of the content of the XML document none
Intrinsic Picture Resolution UTF16 String var 06.0E.2B.34 01.01.01.0E 06.01.01.02 05.00.00.00 Opt Indicates the resolution of the primary picture on which Sub-Picture Ancillary Resources are to be rendered none
RFC 5646 Language Tag List UTF16 String var 06.0E.2B.34 01.01.01.0E 03.01.01.02 02.16.00.00 Opt A comma-separated list of language tags, each as specified at IETF RFC 5646. empty
Z-Position In Use UInt8 1 06.0E.2B.34 01.01.01.0E 06.01.01.02 06.00.00.00 Opt When non-zero, indicates that one or more subtitle instances in the enclosed XML resource make use of stereoscopic positioning features. 00h

BibliographyπŸ”—