SMPTE (the Society of Motion Picture and Television Engineers) is an internationally-recognized standards developing organization. Headquartered and incorporated in the United States of America, SMPTE has members in over 80 countries on six continents. SMPTEβs Engineering Documents, including Standards, Recommended Practices, and Engineering Guidelines, are prepared by SMPTEβs Technology Committees. Participation in these Committees is open to all with a bona fide interest in their work. SMPTE cooperates closely with other standards-developing organizations, including ISO, IEC and ITU. SMPTE Engineering Documents are drafted in accordance with the rules given in its Standards Operations Manual.
At the time of publication no notice had been received by SMPTE claiming patent rights essential to the implementation of this Engineering Document. However, attention is drawn to the possibility that some of the elements of this document may be the subject of patent rights. SMPTE shall not be held responsible for identifying any or all such patent rights.
This document was prepared by Technology Committee 27C.
The following summarizes the changes from the previous edition of this document:
Copyright Β© 2024, Society of Motion Picture and Television Engineers. All rights reserved. No part of this material may be reproduced, by any means whatsoever, without the prior written permission of the Society of Motion Picture and Television Engineers.
This document specifies a D-Cinema Package (DCP), a collection of files containing D-Cinema essence and related metadata to be ingested and reproduced by a D-Cinema playback system.
Normative text is text that describes elements of the design that are indispensable or contains the conformance language keywords: "shall", "should", or "may". Informative text is text that is potentially helpful to the user, but not indispensable, and can be removed, changed, or added editorially without affecting interoperability. Informative text does not contain any conformance keywords.
All text in this document is, by default, normative, except: the Introduction, any section explicitly labeled as "Informative" or individual paragraphs that start with "Note:"
The keywords "shall" and "shall not" indicate requirements strictly to be followed in order to conform to the document and from which no deviation is permitted.
The keywords, "should" and "should not" indicate that, among several possibilities, one is recommended as particularly suitable, without mentioning or excluding others; or that a certain course of action is preferred but not necessarily required; or that (in the negative form) a certain possibility or course of action is deprecated but not prohibited.
The keywords "may" and "need not" indicate courses of action permissible within the limits of the document.
The keyword "reserved" indicates a provision that is not defined at this time, shall not be used, and may be defined in the future. The keyword "forbidden" indicates "reserved" and in addition indicates that the provision will never be defined in the future.
A conformant implementation according to this document is one that includes all mandatory provisions ("shall") and, if implemented, all recommended provisions ("should") as described. A conformant implementation need not implement optional provisions ("may") and need not implement them as described.
Unless otherwise specified, the order of precedence of the types of normative information in this document shall be as follows: Normative prose shall be the authoritative definition; Tables shall be next; then formal languages; then figures; and then any other language forms.
The following documents are referred to in the text in such a way that some or all of their content constitutes requirements of this document. For dated references, only the edition cited applies. For undated references, the latest edition of the referenced document (including any amendments) applies.
For the purposes of this document, the terms and definitions given in the following documents and the additional terms and definitions apply:
D-Cinema content is composed of a number of distinct elements such as Composition Playlists and Track Files (D-Cinema assets). For delivery to D-Cinema systems, assets are combined into a logical D-Cinema Package (DCP). The syntax and semantics of these assets and the DCP are described by the family of D-Cinema specifications depicted in Figure 1. To promote modularity and layering, each document has a limited scope and often defines a single structure or format.
This specification describes operational constraints applicable to the complete DCP. While structure-specific constraints are addressed in the document that defines a particular structure, this document defines constraints that apply to the combined set of structures that comprise a DCP. For instance, constraints specific to the Composition Playlist, such as those related to content markers, must be defined in the Composition Playlist (CPL) specification, whereas constraints that apply to the DCP as a whole, such as composition edit rate, will be defined in this document.
A D-Cinema Package (DCP) is a set of files consisting of one (1) Packing List (SMPTE ST 429-8) and each of the files referenced by that Packing List. Figure 2 illustrates this structure. The figure shows a Packing List with ten asset references. Each asset reference points to one of the nine track files or the Composition Playlist. A Packing List may reference any combination of Track Files and Composition Playlists, however the set of referenced files must contain no duplicates.
A DCP may contain one or more complete Compositions, or it may contain components of compositions destined to complete, augment or replace previously distributed material.
A Composition is a set of files consisting of one (1) Composition Playlist document (SMPTE ST 429-7) and each of the Track Files (see Clause 10 below) referred to from within that Composition Playlist. Figure 3 illustrates this structure for a composition having three reels of image, sound and subtitles.
A DCP shall consist of one Packing List and one or more assets (i.e., Composition Playlists and/or Track Files), referenced by the Packing List.
UUID values are used throughout the DCP to uniquely identify assets and
data structures. All UUID values in a DCP shall be generated as specified
in IETF RFC 4122. UUID values which identify assets or
encryption keys shall be generated using a truly-random or pseudo-random
number source, and shall have a Version field value of 4
(0100b) (as specified in IETF RFC 4122).
NOTE ββ The b suffix on this value indicates a binary encoding, most significant bit (MSB) first.
XML documents (SMPTE ST 428-7, SMPTE ST 429-7, SMPTE ST 429-8, SMPTE ST 429-10, SMPTE ST 429-12) in a DCP shall be encoded using the UTF-8 character encoding (ISO/IEC 10646) and shall comply with SMPTE ST 429-17.
The Packing List document which defines the DCP contents shall be created as specified in SMPTE ST 429-8. Note that the specification requires that each Packing List document must have a unique UUID value in the top-level Id element. A Packing List may reference assets which are referenced by other Packing Lists.
The value of the Id element within each Asset
element shall be extracted from the referenced asset per the
specification for the asset (see SMPTE ST 429-3 and SMPTE ST 429-7.)
Each Asset element shall contain an Id element value that
is unique within the Packing List.
When a Packing List document is digitally signed as specified in SMPTE ST 429-8, digital certificates in the signer's certificate chain shall conform to the provisions of SMPTE ST 430-2.
A Composition Package is a DCP containing only the
complete set of assets comprising one or more compositions. The
GroupId element shall not be present in the Packing List
of a Composition Package.
An Asset Package is a DCP containing Track Files and/or
Composition Playlists comprising one or more incomplete compositions
(i.e., some assets needed to complete the composition are not present in
the package.) Asset Packages shall be identified by the presence of the
GroupId element in the Packing List. An Asset Package
should contain only related assets (i.e., partial sets of assets from
two unrelated compositions should be listed in separate Packing Lists
using different GroupId values.) When two or more Asset
Packages contain related assets, the Packing Lists should have the
same
GroupId value.
A Composition (i.e., a Composition Playlist and referenced Track Files) may be delivered in a single DCP or it may be spread across several DCPs. Regardless of the number of DCPs used to convey a Composition, a Composition shall conform to the following constraints.
The composition shall have an Edit Rate of 24/1, 25/1, 30/1, 48/1, 50/1 or 60/1.
Picture essence tracks shall be encoded as specified in SMPTE ST 428-1. The pixel array size and frame rate shall be one of the formats listed in Table 1. Monoscopic picture essence tracks shall have matching frame rate and edit rate. Stereoscopic picture essence tracks shall be limited to the 2K formats, and shall have a frame rate of 48/1 and an edit rate equal to half the frame rate (). (See SMPTE ST 429-10 for an explanation).
Source images having an aspect ratio not listed in Table 1 should be encoded so that the image fills either the horizontal or vertical dimension of the desired Full pixel array (2K or 4K). To fill the pixel array in the opposite dimension, the image should be padded with an equal number of black pixels on each side, i.e., "letter-box" (top side, bottom side) or "pillar-box" (left side, right side).
| Format | Horizontal Pixels | Vertical Pixels | Frame Rate |
|---|---|---|---|
| 2K Scope (2.39:1) | 2048 | 858 | 24/1, 25/1, 30/1, 48/1, 50/1 or 60/1 |
| 2K Flat (1.85:1) | 1998 | 1080 | 24/1, 25/1, 30/1, 48/1, 50/1 or 60/1 |
| 2K Full (1.90:1) | 2048 | 1080 | 24/1, 25/1, 30/1, 48/1, 50/1 or 60/1 |
| 4K Scope (2.39:1) | 4096 | 1716 | 24/1, 25/1 or 30/1 |
| 4K Flat (1.85:1) | 3996 | 2160 | 24/1, 25/1 or 30/1 |
| 4K Full (1.90:1) | 4096 | 2160 | 24/1, 25/1 or 30/1 |
Sound essence tracks shall be encoded as specified in SMPTE ST 428-2. 10.3.4 and Annex A specify means of identifying the content of these essence tracks.
Timed Text essence shall be encoded as XML data as specified in SMPTE ST 428-7, and may be constrained per SMPTE ST 428-10. Sub-pictures shall be encoded as Portable Network Graphics (PNG) images as specified in ISO/IEC 15948.
When Text elements are present in the Timed Text essence, one (1)
LoadFont element shall be present. Timed Text essence
shall not contain more than one (1) LoadFont element.
Within the scope of any given Subtitle element, all Font
elements shall have the same EffectSize attribute
value.
The font resource should not be larger than 10MB.
NOTE 1 ββ Legacy implementations might not be able to support font resources larger than 640 KB.
NOTE 2 ββ Operational testing has determined that a Font size smaller than 8 pt might be difficult to read, and that, depending on the length of the subtitle, a very large Font size might take too long to appear and might go beyond the dimension of the Primary Picture.
Color values encoded in the Timed Text essence (in the
Color and
EffectColor attributes of the Font element)
shall be encoded as sRGB values (IEC 61966-2-1).
PNG image resources used per SMPTE ST 428-7 shall have three (3) 8-bit color components (R, G, and B). An alpha channel may be present. If an alpha channel is present, the decoder shall use it when creating the composite image. PNG image resources shall contain the sRGB chunk per ISO/IEC 15948.
The width and height of a subpicture shall be equal to or less than the width and height, respectively, of the associated main picture.
Up to two (2) subtitle instances may be visible on screen at any
time. The visibility period of an instance shall include fade-in and
fade-out times. A subtitle instance shall contain no more than six (6)
Text elements or three (3) Image elements.
All Text and Image elements to be displayed
at the same time shall have the same depth information specified through
Zvalue within VariableZ and/or
Zposition attributes.
When present, the value of the
IntrinsicPictureResolution attribute of the
SubtitleReel element (see SMPTE ST 428-7)
shall be one of the values listed in Table 2 below.
| Attribute Value |
|---|
| 2K Scope |
| 2K Flat |
| 2K Full |
| 4K Scope |
| 4K Flat |
| 4K Full |
NOTE ββ The IntrinsicPictureResolution attribute is
intended to guide the mastering operator to select the appropriate
subtitle resources for the Primary Picture content.
The sample rate of sound essence in a Composition shall be one of the combinations listed in Table 3.
| Sound Sample Rate | Composition Edit Rate | Samples per Edit Unit |
|---|---|---|
| 48 kHz | 24/1 | 2000 |
| 48 kHz | 25/1 | 1920 |
| 48 kHz | 30/1 | 1600 |
| 48 kHz | 48/1 | 1000 |
| 48 kHz | 50/1 | 960 |
| 48 kHz | 60/1 | 800 |
| 96 kHz | 24/1 | 4000 |
| 96 kHz | 25/1 | 3840 |
| 96 kHz | 30/1 | 3200 |
| 96 kHz | 48/1 | 2000 |
| 96 kHz | 50/1 | 1920 |
| 96 kHz | 60/1 | 1600 |
All essence tracks in a Composition shall have an identical Edit Rate.
Essence tracks in a Composition shall have homogenous encoding parameter values throughout the Composition. Picture essence shall have constant frame rate and pixel array size. Sound essence shall have constant sample rate, language, channel count, and channel assignment parameters.
A Composition Playlist shall have one picture essence track and one
sound essence track in each Reel element.
Two Composition Playlist documents having different contents shall have
different values in the top-level Id element.
The Id element within the ContentVersion
element shall contain a URI value conforming to one of the following
types:
NOTE ββ The Id element of the ContentVersion element
is intended to remain constant across multiple Composition Playlist
instances referencing the same underlying content. For instance, both a
pre-release and a final version of a Composition Playlist associated with
the same feature can have the same ContentVersion/Id, while
their Id elements are different. In a typical application,
ContentVersion/Id can be used as a reference to an internal
booking system.
The Duration element shall be present within every
Asset element that refers to an external track file. The
value of all Duration elements in a reel, with the
exception of timed text elements, shall be equal. The Duration of the
Reel shall be determined by the MainPicture element, per
the provisions of
SMPTE ST 429-7, or the
MainStereoscopicPicture
element, whichever is present.
Track files referenced by a Composition Playlist shall conform to the provisions of Clause 10 of this document.
Each Reel element in a Composition Playlist document
shall contain one (1) MainPicture element (SMPTE ST 429-7) or one (1)
MainStereoscopicPicture element (SMPTE ST 429-10). This element shall refer to a Picture
Track File as defined by SMPTE ST 429-3. If the element
name is
MainStereoscopicPicture, the referenced Track File shall
also conform to
SMPTE ST 429-10.
All picture assets in a Composition Playlist shall have identical values for the following metadata items:
MainPicture or
MainStereoscopicPicture)
EditRate elementFrameRate elementScreenAspectRatio elementThis element shall refer to a Sound Track File as defined by SMPTE ST 429-3.
All sound assets in a Composition Playlist shall have identical values for the following metadata items:
EditRate elementLanguage elementA timed text track is established by the presence of a timed text asset
(e.g. MainSubtitle, MainCaption,
ClosedSubtitle, or ClosedCaption) in at least
one Reel of a Composition. Once a timed text asset appears
in one Reel, the established track shall be assumed to exist for the
entire Composition, even if related timed text Asset
elements are not present in all Reels.
Each Reel element in a Composition Playlist document may
contain one on-screen text track, either MainSubtitle as
defined by SMPTE ST 429-7 or MainCaption as
defined by SMPTE ST 429-12. When present, the
MainSubtitle element shall refer to a Timed Text Track File
as defined by SMPTE ST 429-5, containing an XML resource
conforming to SMPTE ST 428-7. When present, the
MainCaption element shall refer to a Timed Text Track File
as defined by SMPTE ST 429-5, containing an XML resource
conforming to
SMPTE ST 428-10. A Composition Playlist shall contain no
more than one on-screen text track type (MainSubtitle or
MainCaption).
Each Reel element in a Composition Playlist document may
contain up to six (6) off-screen (closed) text tracks, using any
combination of ClosedSubtitle and ClosedCaption
elements as defined by SMPTE ST 429-12. When present, an
off-screen text element shall refer to a Timed Text Track File as
defined by SMPTE ST 429-5, containing an XML resource
conforming to
SMPTE ST 428-10. When more than one off-screen text track
asset of the same type (ClosedSubtitle or
ClosedCaption) is present, the Language
attribute shall be used. The Language attribute value of
each off-screen text track shall be unique among the set of
similarly-typed off-screen text tracks. The value of the
Language attribute shall be used to identify material of
the same off-screen text track from Reel to Reel for each
Asset type instance.
The maximum number of timed text tracks in a Composition Playlist
document is seven (7); one (1) on-screen text track plus six (6)
off-screen text tracks. Each off-screen text track with a unique
combination of element name and Language shall be considered
a distinct off-screen text track.
In order to illustrate the concepts in this section, the example diagram in Figure 4 shows a collection of Composition assets on the left, and a Composition with tracks on the right. Each reel shown on the left contains a number of off-screen timed text assets that appears to be within the specified limit of this standard. However, in the example, the number of off-screen text tracks possible is seven, which is more than that allowed by this standard. The Composition on the right is correctly constrained. Note that each timed text track exists for the duration of the Composition, even though it might not be represented by an asset in every reel.
When present, a MainMarkers element shall not contain
either:
Marker element with an Offset value
that exceeds the duration of the parent Reel; orIntrinsicDuration value that exceeds the duration of
the parent Reel.NOTE ββ As specified in SMPTE ST 429-7, a
MainMarkers element contains neither an
EntryPoint element nor a Duration element since
it does not reference a Track File.
No more than 256 distinct cryptographic keys, as uniquely identified by their Key ID, shall be used to encrypt the assets referenced by a Composition Playlist.
The Hash element shall be present in an asset when the
KeyId element is present (i.e., when the referenced Track
File is encrypted).
When a Composition Playlist document is digitally signed as specified in SMPTE ST 429-7, digital certificates in the signer's certificate chain shall conform to the provisions of SMPTE ST 430-2.
The CompositionMetadataAsset element defined in SMPTE ST 429-16 should be present.
Essence data shall be contained in MXF files (SMPTE ST 377-1) constrained according to SMPTE ST 429-20.
When cryptographic protection is required, Track Files shall use KLV encryption per SMPTE ST 429-6. Each encrypted Track File shall be encrypted with exactly one (1) 128-bit symmetric key, which is the Cipher Key of the Track File.
The Essence Container Label
urn:smpte:ul:060e2b34.04010107.0d010301.020b0100 shall be
used for both frame- and clip-wrapped essence.
NOTE 1 ββ SMPTE ST 429-6 deprecates the
Essence Container Label
urn:smpte:ul:060e2b34.04010107.0d010301.020b0100 for
clip-wrapped essence outside of D-Cinema applications.
If the Encrypted Track File contains MIC items, the MIC Key used to generate the MIC items shall be derived from the Cipher Key of Track File using the Legacy MIC Key derivation algorithm specified at SMPTE ST 429-6.
NOTE 2 ββ SMPTE ST 429-6 no longer specifies a MIC Key derivation method as part of its Reference Decryption Processing Model. This method however remains in use when generating MIC items during Encrypted Track File authoring. The generated MIC Key is carried in the KDM as specified at SMPTE ST 430-1.
In addition to the essence encoding constraints specified in Clause 8, Picture Track Files shall have the following properties.
Picture Track Files shall conform to the provisions of SMPTE ST 429-3.
Picture essence shall consist of a sequence of codestreams that conform either to the 2K digital cinema profile or the 4K digital cinema profile specified at Rec. ITU-T T.800 | ISO/IEC 15444-1.
There shall be 5 wavelet transform levels for 2K picture essence.
There shall be 6 wavelet transform levels for 4K picture essence.
Picture essence shall be frame wrapped according to SMPTE ST 422 and SMPTE ST 429-4. Stereoscopic picture essence shall also conform to SMPTE ST 429-10.
In addition to the essence encoding constraints specified in Clause 8 above, Sound Track Files shall have the following properties.
Sound Track Files shall conform to the provisions of SMPTE ST 429-3.
Sound essence shall be frame wrapped per SMPTE ST 382. Sound essence shall be contained in KLV packets labeled with the Wave Frame Wrapped Element UL. A Wave Audio Essence Descriptor shall be present in the Top-Level File Package.
Channel assignment defines what reproduction channel is carried in each channel of the distributed track. Sound Track File channel assignment shall be indicated by a UL value in the Channel Assignment property of the Wave Audio Essence Descriptor. The UL may indicate a fixed channel assignment. Annex A defines a set of channel assignments and respective UL values based on this method. The UL may also indicate a channel assignment scheme defined in another specification. In this case, additional details regarding channel assignment shall be provided by the specification that defines the UL.
If the Channel Assignment property is not present, Channel Configuration 1 (Table A.3) shall be assumed by the decoder. Routing of the container channel to the system audio output is not in the scope of this document.
In addition to the essence encoding constraints specified in Clause 8 above, Timed Text Track Files shall have the following properties.
Timed Text essence shall be encoded as XML data as specified in SMPTE ST 428-7, and may be constrained per SMPTE ST 428-10. See 8.4 and 9.8 above.
Timed Text Track Files shall be created according to SMPTE ST 429-5.
If the DCDM Subtitle file contains the IntrinsicPictureResolution attribute (see SMPTE ST 428-7), then the Intrinsic Picture Resolution property of the Timed Text Essence Descriptor, defined in Annex B, should be present in the Timed Text Track File and, when present, shall represent the same value.
If the DCDM Subtitle file contains the DisplayType
element (see SMPTE ST 428-7), then the Display
Type
property of the Timed Text Essence Descriptor, defined in Annex B, should be present in the Timed Text Track
File and, when present, shall represent the same value.
If the Timed Text Essence Descriptor property RFC 5646 Language Tag List is present, it shall contain at least the language code specified in the DCDM Subtitle file.
If at least one subtitle instance of the DCDM Subtitle file contains
a Zposition attribute (as defined in SMPTE ST 428-7), the
Z-Position In Use property of the Timed Text Essence Descriptor
shall be non-zero.
NOTE ββ Implementation behavior is undefined when a Sound Track File fails to adhere to the normative provisions specified herein.
SMPTE ST 382 carries multi-channel PCM sound samples by using sample interleave on a channel basis. Each sample position can be thought of as a channel within the container specified at SMPTE ST 382.
The number of channels within the Sound Track File shall be an even number. The inclusion of a channel of silence may be required to achieve this.
Clause A.1 and Clause A.2 each specifies a method for unambiguously identifying the channels present in Sound Track Files and indicating their intended reproduction location in the theater. Each method uses the ChannelAssignment property of the WaveAudioEssence Descriptor in a Sound Track File, as specified in 10.4.4 above.
Compliant playback devices shall use the ChannelAssignment property to identify the sound channels being used.
Each table in this Annex defines a container channel configuration that has a corresponding Universal Label (UL) for use as a value of the ChannelAssignment property. Container channels are numbered in sample packing order. The first sample is carried in container channel 1, the second in container channel 2 and so on.
The number of channels contained in a Sound Track file shall be less than or equal to the number of channels defined by the table associated with the ChannelAssignment property. However, if a given container channel is present, it shall be used according to the table. The WaveAudioEssence Descriptor ChannelCount property may be used in combination with the ChannelAssignment property to determine actual channel usage. For instance, a ChannelAssignment label indicating Channel Configuration 1 may accompany a container with a ChannelCount value of 6, indicating that channels 7 and 8 (Hearing Impaired and Visually Impaired-Narrative) are not present.
The special case of no specified channel configuration is also provided for (see Table A.6). The label associated with this table shall mean no configuration specified. This may be used for test or experimental purposes.
NOTE ββ For the purpose of setting appropriate transport flags, implementations should not assume that all audio channels in Channel Configuration 4 contain linear PCM audio samples suitable for direct conversion to an analog audio signal.
| Byte No. | Description | Value (hex) | Meaning |
|---|---|---|---|
| 1-7 | Registry Designator | See register | |
| 8 | Registry Version Number | 0bh |
Version of the register in which this label first appears |
| 9 | Parametric | 04h |
Node used to define parametric data |
| 10 | Sound Essence | 02h |
Identifies sound essence coding |
| 11 | Sound Coding Characteristics | 02h |
Identifies sound coding characteristics |
| 12 | Sound Channel Labeling | 10h |
Identifies sound channel labeling |
| 13 | Sound Channel Labeling SMPTE ST 429-2 | 03h |
Identifies sound channel labeling as defined in this document (SMPTE ST 429-2) |
| 14 | Channel Label Sets | 01h |
Identifies Static Sound Channel Label Sets |
| 15 | Channel Configuration | See Table A.2 | Identifies sound Channel Configuration |
| 16 | Reserved | 00h |
Reserved |
| Channel Configuration | Byte 15 Value |
|---|---|
| Channel Configuration 1 (Table A.3) | 01h |
| Channel Configuration 2 (Table A.4) | 02h |
| Channel Configuration 3 (Table A.5) | 03h |
| Channel Configuration 4 (Table A.6) | 04h |
| Channel Configuration 5 (Table A.7) | 05h |
| Container Channel | SMPTE ST 428-12 Name |
|---|---|
| 1 | Left |
| 2 | Right |
| 3 | Center |
| 4 | LFE |
| 5 | Left Surround |
| 6 | Right Surround |
| 7 | Hearing Impaired |
| 8 | Visually Impaired-Narrative |
| Container Channel | SMPTE ST 428-12 Name |
|---|---|
| 1 | Left |
| 2 | Right |
| 3 | Center |
| 4 | LFE |
| 5 | Left Surround |
| 6 | Right Surround |
| 7 | Center Surround |
| 8 | Not Used |
| 9 | Hearing Impaired |
| 10 | Visually Impaired-Narrative |
| Container Channel | SMPTE ST 428-12 Name |
|---|---|
| 1 | Left |
| 2 | Right |
| 3 | Center |
| 4 | LFE |
| 5 | Left Surround |
| 6 | Right Surround |
| 7 | Left Center |
| 8 | Right Center |
| 9 | Hearing Impaired |
| 10 | Visually Impaired-Narrative |
| Container Channel | Name |
|---|---|
| 1 | CH01 |
| 2 | CH02 |
| 3 | CH03 |
| 4 | CH04 |
| 5 | CH05 |
| 6 | CH06 |
| 7 | CH07 |
| 8 | CH08 |
| 9 | CH09 |
| 10 | CH10 |
| 11 | CH11 |
| 12 | CH12 |
| 13 | CH13 |
| 14 | CH14 |
| 15 | CH15 |
| 16 | CH16 |
| Container Channel | SMPTE ST 428-12 Name |
|---|---|
| 1 | Left |
| 2 | Right |
| 3 | Center |
| 4 | LFE |
| 5 | Left Side Surround |
| 6 | Right Side Surround |
| 7 | Left Rear Surround |
| 8 | Right Rear Surround |
| 9 | Hearing Impaired |
| 10 | Visually Impaired-Narrative |
NOTE ββ Earlier revisions of this specification used terminology from SMPTE ST 428-3, instead of SMPTE ST 428-12, to define the mappings from container channels to audio channels. Although the mappings remain unchanged, the terms used to refer to a few of the audio channels have changed. For instance, SMPTE ST 428-12 differentiates Side Surrounds (Lss/Rss) from Left and Right surrounds (Ls/Rs) and uses Lrs to refer to the Left Rear Surround channel, whereas SMPTE ST 428-3 uses Rls.
When the ChannelAssignment of the WaveAudioEssence Descriptor in a Sound Track File contains the UL defined in Table A.8, the framework specified in SMPTE ST 377-4 shall be used in conjunction with the constraints defined in A.3.2 and A.3.3 to unambiguously identify the audio channels and soundfield group carried in the Sound Track File.
NOTE ββ Items defined in SMPTE ST 377-4 that are not specified in this section can nevertheless be present in the Sound Track File and describe particular aspects of an audio channel or soundfield group. Implementations can safely ignore these items.
The MXF Multichannel Audio Framework (MCA Framework) associates audio channels and soundfield groups contained within a D-Cinema Sound Track File with an MXF SubDescriptor that contains metadata, including a unique identifier. This enables D-Cinema implementations to properly route and process audio channels, e.g. the Hearing Impaired and Left channels may be handled by different devices. It also enables straightforward extensibility for the purpose of both experimentation and widespread use: new standalone audio channels can be defined without impacting existing soundfield groups and new soundfield groups can be introduced with minimal effort.
Figure A.1 illustrates the use of the audio channel and soundfield group information contained in a Sound Track File, as specified here.
| Byte No. | Description | Value (hex) | Meaning |
|---|---|---|---|
| 1-7 | Registry Designator | See register | |
| 8 | Registry Version Number | 0D | Version of the register in which this label first appears |
| 9 | Parametric | 04h |
Node used to define parametric data |
| 10 | Sound Essence | 02h |
Identifies sound essence coding |
| 11 | Sound Coding Characteristics | 02h |
Identifies sound coding characteristics |
| 12 | Sound Channel Labeling | 10h |
Identifies sound channel labeling |
| 13 | Sound Channel Labeling SMPTE ST 429-2 | 03h |
Identifies sound channel labeling as defined in this document (SMPTE ST 429-2) |
| 14 | D-Cinema Application of the MXF Multichannel Audio Framework | 02h |
Indicates that the D-Cinema Application of the MXF Multichannel Audio Framework is used |
| 15 | Reserved | 00h |
Reserved |
| 16 | Reserved | 00h |
Reserved |
Each audio channel contained in the Sound Track File shall be associated with zero or one AudioChannelLabelSubDescriptor instance, and each AudioChannelLabelSubDescriptor instance shall be associated with an audio channel.
Implementations shall ignore audio channels not associated with an AudioChannelLabelSubDescriptor instance. These channels should contain silence.
NOTE ββ The ChannelCount property of the Wave Audio Essence Descriptor reflects the number of channels in the Sound Track File and not the number of AudioChannelLabelSubDescriptor instances.
In addition to the items required by SMPTE ST 377-4, the following items shall be present in every AudioChannelLabelSubDescriptor instance:
Not all audio channels present in a Sound Track File need to be associated with a soundfield group. For example, Hearing Impaired and Visually Impaired-Narrative channels, if present, do not belong to a soundfield group and, hence, their respective AudioChannelLabelSubDescriptor instances do not reference a SoundfieldGroupLabelSubDescriptor instance.
If an audio channel is associated with a soundfield group, then the value of their respective RFC 5646 Spoken Language items shall be equal.
Implementations shall recognize the common D-Cinema audio channels defined in SMPTE ST 428-12.
The presence of such an audio channel shall be indicated by an AudioChannelLabelSubDescriptor instance whose MCA Label Dictionary ID value is equal to UL of the audio channel as specified at SMPTE ST 428-12.
The MCA Tag Name of such an AudioChannelLabelSubDescriptor instance shall be equal to the Name (as specified in SMPTE ST 428-12) of the audio channel associated with the UL value.
The MCA Tag Symbol item of such an AudioChannelLabelSubDescriptor
instance shall be constructed by prepending the string ch
to the Symbol (as specified in SMPTE ST 428-12) of the
audio channel associated with the UL value.
No audio channel listed at SMPTE ST 428-12 shall appear more than once in a given Sound Track File with the exception of Hearing Impaired and Visually Impaired-Narrative channels. If there are multiple Hearing Impaired or Visually Impaired-Narrative channels in a Sound Track File, they shall be distinguished by the value of their RFC 5646 Spoken Language item.
Furthermore, the RFC 5646 Spoken Language item shall not have the same value in two or more audio channels labeled Hearing Impaired, and the RFC 5646 Spoken Language item shall not have the same value in two or more audio channels labeled Visually Impaired-Narrative.
For extensibility, channels not defined at SMPTE ST 428-12 may be present.
Implementations shall not automatically pre-assign an audio channel with an AudioChannelLabelSubDescriptor instance having a MCA Label Dictionary ID that the implementation does not recognize and, for the purpose of setting appropriate transport flags, should not assume that such an audio channel contains linear PCM audio samples suitable for direct conversion to an analog audio signal.
Implementations may display to the user channels associated with an MCA Label Dictionary ID they do not recognize and offer the user the option to take action on such a channel based on the MCA Tag Name, MCA Tag Symbol and RFC 5646 Spoken Language of the AudioChannelLabelSubDescriptor instance that references it.
There shall be one and only one SoundfieldGroupLabelSubDescriptor instance in the Sound Track file.
In addition to the items required by SMPTE ST 377-4, the following items shall be present in the SoundfieldGroupLabelSubDescriptor instance:
Implementations shall recognize the common D-Cinema soundfield groups specified at SMPTE ST 428-12.
The presence of such a soundfield group shall be indicated by SoundfieldGroupLabelSubDescriptor instance whose MCA Label Dictionary ID value is equal to one of the UL specified at SMPTE ST 428-12.
The MCA Tag Name of such a SoundfieldGroupLabelSubDescriptor instance shall match the value of the Name of the soundfield group (as specified in SMPTE ST 428-12) associated with the UL value.
The MCA Tag Symbol item of such an SoundfieldGroupLabelSubDescriptor
instance shall be constructed by prepending the string sg
to the Symbol of the soundfield group (as specified in SMPTE ST 428-12) associated with the UL value.
Not all channels listed in the Audio Channels column of a given soundfield group in SMPTE ST 428-12 need to be present in the sound track file, but only those channels listed in the Audio Channels column for a given soundfield group may reference that SoundfieldGroupLabelSubDescriptor instance. Furthermore, if a channel is listed in the Audio Channels column of a given soundfield group but absent in the Sound Track File, then implementations shall assume the channel was not intended for reproduction by the content provider.
NOTE ββ Implementations may indicate to the user if a channel listed in the Audio Channels column for a given soundfield group is not present.
For extensibility, soundfield groups not defined at SMPTE ST 428-12 may be present. However, implementations shall take no action with a SoundfieldGroupLabelSubDescriptor instance having a MCA Label Dictionary ID that the implementation does not recognize or if a channel that is not listed in the Audio Channels column for a given soundfield group references that SoundfieldGroupLabelSubDescriptor instance.
NOTE ββ Implementations can use the SoundfieldGroupLabelSubDescriptor instance for display to the user and to appropriately configure the B-Chain for the intended soundfield reproduction.
The items listed below are additional properties that may be use in the Timed Text Essence Descriptor when creating MXF Timed Text Track Files as defined by SMPTE ST 429-5. The usage of these items is detailed in the main body of this document.
| Item Name | Type | Len | UL Designator | Req ? | Meaning | Default |
|---|---|---|---|---|---|---|
| Display Type | UTF16 String | var | 06.0E.2B.34 01.01.01.0E 06.01.01.02 04.00.00.00 |
Opt | A text string giving an application specific means to indicate the intended use of the content of the XML document | none |
| Intrinsic Picture Resolution | UTF16 String | var | 06.0E.2B.34 01.01.01.0E 06.01.01.02 05.00.00.00 |
Opt | Indicates the resolution of the primary picture on which Sub-Picture Ancillary Resources are to be rendered | none |
| RFC 5646 Language Tag List | UTF16 String | var | 06.0E.2B.34 01.01.01.0E 03.01.01.02 02.16.00.00 |
Opt | A comma-separated list of language tags, each as specified at IETF RFC 5646. | empty |
| Z-Position In Use | UInt8 | 1 | 06.0E.2B.34 01.01.01.0E 06.01.01.02 06.00.00.00 |
Opt | When non-zero, indicates that one or more subtitle instances in the enclosed XML resource make use of stereoscopic positioning features. | 00h |