SMPTE (the Society of Motion Picture and Television Engineers) is an internationally-recognized standards developing organization. Headquartered and incorporated in the United States of America, SMPTE has members in over 80 countries on six continents. SMPTEβs Engineering Documents, including Standards, Recommended Practices, and Engineering Guidelines, are prepared by SMPTEβs Technology Committees. Participation in these Committees is open to all with a bona fide interest in their work. SMPTE cooperates closely with other standards-developing organizations, including ISO, IEC and ITU. SMPTE Engineering Documents are drafted in accordance with the rules given in its Standards Operations Manual.
At the time of publication no notice had been received by SMPTE claiming patent rights essential to the implementation of this Engineering Document. However, attention is drawn to the possibility that some of the elements of this document may be the subject of patent rights. SMPTE shall not be held responsible for identifying any or all such patent rights.
This document was prepared by Technology Committee 27C.
This edition updates external references to their latest versions.
Copyright Β© 2024, Society of Motion Picture and Television Engineers. All rights reserved. No part of this material may be reproduced, by any means whatsoever, without the prior written permission of the Society of Motion Picture and Television Engineers.
This section is entirely informative and does not form an integral part of this Engineering Document.
This document specifies a method for carrying Immersive Audio data essence, as specified in SMPTE ST 2098-2, in an MXF file based on the MXF Generic Container for use in a D-Cinema Package. While this specification is written to allow synchronization of the Immersive Audio data track with picture or sound, the synchronization mechanism is outside the scope of this document.
This standard specifies the mapping of an SMPTE ST 2098-2 Immersive Audio Bitstream into an MXF file for use within a Digital Cinema Package. The MXF Generic Container (GC), as described in SMPTE ST 379-1, is used as a normative reference for this mapping, but this specification uses Class 14 ULs which prevents full compliance. The resultant file will be referred to as an Immersive Audio Track File.
This standard specifies the Key, the Length and the Value fields of the Immersive Audio Data Element. This standard also defines the Essence Container and the Essence Descriptors.
This standard defines the SMPTE ST 429-7 Composition Playlist asset type extension required for inclusion of an Immersive Audio track.
Normative text is text that describes elements of the design that are indispensable or contains the conformance language keywords: "shall", "should", or "may". Informative text is text that is potentially helpful to the user, but not indispensable, and can be removed, changed, or added editorially without affecting interoperability. Informative text does not contain any conformance keywords.
All text in this document is, by default, normative, except: the Introduction, any clause explicitly labeled as "Informative" or individual paragraphs that start with "Note:"
The keywords "shall" and "shall not" indicate requirements strictly to be followed in order to conform to the document and from which no deviation is permitted.
The keywords, "should" and "should not" indicate that, among several possibilities, one is recommended as particularly suitable, without mentioning or excluding others; or that a certain course of action is preferred but not necessarily required; or that (in the negative form) a certain possibility or course of action is deprecated but not prohibited.
The keywords "may" and "need not" indicate courses of action permissible within the limits of the document.
The keyword "reserved" indicates a provision that is not defined at this time, shall not be used, and may be defined in the future. The keyword "forbidden" indicates "reserved" and in addition indicates that the provision will never be defined in the future.
A conformant implementation according to this document is one that includes all mandatory provisions ("shall") and, if implemented, all recommended provisions ("should") as described. A conformant implementation need not implement optional provisions ("may") and need not implement them as described.
Unless otherwise specified, the order of precedence of the types of normative information in this document shall be as follows: Normative prose shall be the authoritative definition; Tables shall be next; then formal languages; then figures; and then any other language forms.
The following documents are referred to in the text in such a way that some or all of their content constitutes requirements of this document. For dated references, only the edition cited applies. For undated references, the latest edition of the referenced document (including any amendments) applies.
For the purposes of this document, the terms and definitions given in the following documents apply:
The "Frame Wrapping" method for data essence is illustrated in Figure 1. Frame wrapping shall be used for the Immersive Audio track file.
Figure 1 shows a series of data elements, each wrapped in a single Content Package Data Element with no other Generic Container Elements in the Container. Each Content Package has the duration of one edit unit.
The Frame Wrapping method enables frame by frame access by MXF applications which process at the KLV level. Sufficient information is provided to allow individual frames to be identified at the KLV level without an MXF decoder having to parse or decode the Essence Data. Each data frame shall be KLV wrapped using a GC Data Element Key.
Clip wrapping shall not be used in an Immersive Audio track file
Custom wrapping shall not be used in an Immersive Audio track file
Only Data Elements shall be carried in an Immersive Audio track file. Each Data Item shall contain only a single Data Element.
The Immersive Audio Data Element key shall be set as specified below:
06.0E.2B.34.01.02.01.05.0E.09.06.01.00.00.00.01
NOTE ββ The Immersive Audio Data Element key is not a valid SMPTE UL, is not listed in the SMPTE Metadata Registers, and does not conform to SMPTE ST 379-1.
The length field and its application shall comply with SMPTE ST 377-1.
The value field shall comprise one IABitstream frame as defined in SMPTE ST 2098-2.
The Immersive Audio Track file shall conform to SMPTE ST 429-3 .
All IAFrames in the bitstream shall have the same value for FrameRate, BitDepth and SampleRate.
Bitstreams shall not contain the AudioDataPCM element. Instead, AudioDataDLC shall be used for all audio essence.
The bit depth of audio samples shall be 24 bits.
Each individual frame of the Immersive Audio Bitstream shall be constrained to have a maximum size as listed in Table 1 below.
| Edit Rate | Maximum Size (bytes) |
|---|---|
| 24 | 781,250 |
| 25 | 750,000 |
| 30 | 625,000 |
| 48 | 390,625 |
| 50 | 375,000 |
| 60 | 312,500 |
| 96 | 195,313 |
| 100 | 187,500 |
| 120 | 156,250 |
The UserData element as defined in SMPTE ST 2098-2 shall not be used.
UseCase codes between the values of 0x30 and 0xFE inclusive shall not
be used.
ChannelID codes greater than 0x7F shall not be used.
The Immersive Audio Data Essence Container UL is defined in Table 2:
| Kind | Leaf |
|---|---|
| Name | MXF-GC IAData Frame Wrapped |
| Symbol | MXF_GC_IAData_Frame_Wrapped |
| Description | Identifies Container for Frame Wrapped Immersive Audio Data |
| UL | urn:smpte:ul: 060E2B34.04010105.0E090605.00000000
|
The Essence Container UL is used within a batch of ULs in Partition Packs and the Preface set and on its own in the Essence Descriptor.
NOTE ββ This UL is from a Class 14 node so does not comply with the UL construction defined in SMPTE ST 379-1.
The File Descriptor sets are those structural metadata sets in the Header Metadata that describe the essence and metadata elements defined in this document. The Immersive Audio Data Essence Descriptor shall be a sub-class of the MXF Generic Data Essence Descriptor (SMPTE ST 377-1). File Descriptor sets shall be present in the Header Metadata for each Essence Element. Implementations that carry specific data types may extend the Immersive Audio Data Essence Descriptor using a SubDescriptor. Implementations complying with this specification shall ignore unrecognized SubDescriptors.
| Item Name | Type | Len | Local Tag | UL Designator | Req ? | Meaning | Default |
|---|---|---|---|---|---|---|---|
| Immersive Audio Data Essence Descriptor | Set UL | 16 | dyn | 06.0E.2B.34.02.53.01.05.0E.09.06.03.00.00.00.00 |
Req | Identifies the Immersive Audio Data Essence Descriptor Set (a collection of Parametric metadata) | |
| Length | BER Length | 4 | Req | Set length | |||
| All items from the Data Essence Descriptor in SMPTE ST 377-1 to be included | |||||||
The Data Essence Coding property shall be present in the Immersive Audio Data Essence Descriptor
In the Primer Pack, the UID associated with Local Tag 0x3E01
shall be set to:
06.0E.2B.34 01.01.01.05.04.03.03.02 00.00.00.00
NOTE ββ The UL above corresponds to the Data Essence Coding Item UL
with its Version Number (Byte 8) set to 0x05. The UL is not
listed in the SMPTE Metadata Registers, and does not conform to SMPTE ST 377-1.
The Data Essence Coding UL is used in the Immersive Audio Data Essence Descriptor. The UL and associated values are defined in Table 4.
| Kind | Leaf |
|---|---|
| Name | Immersive Audio Coding |
| Symbol | ImmersiveAudioCoding |
| Description | Identifies Immersive Audio Coding per SMPTE ST 2098-2 |
| UL | urn:smpte:ul:060E2B34.04010105.0E090604.00000000 |
The File Descriptor sets are those structural metadata sets in the Header Metadata that describe the essence and metadata elements defined in this document. File Descriptor sets shall be present in the Header Metadata for each Essence Element. The Immersive Audio Data Essence SubDescriptor is a supplementary Essence Descriptor that can be strongly referenced by any Data Essence Descriptor and shall be an instance of MXF SubDescriptor. In order that the strong reference can be made, the MXF Generic Descriptor (as defined in SMPTE ST 377-1) has an additional optional property.
The Local Tag value associated with this additional optional property (called βSubDescriptorsβ) shall be dynamically allocated (dynamic) as defined in SMPTE ST 377-1. The translation from each dynamically allocated Local Tag value to its full UL value can be found using the Primer Pack mechanism defined in SMPTE ST 377-1.
Some of the values of the items defined in the Immersive Audio Data Essence SubDescriptor are derived from values used in the Immersive Audio frame. The specific items whose values are derived are as follows:
| Item Name | Type | Len | Local Tag | UL Designator | Req ? | Meaning |
|---|---|---|---|---|---|---|
| Immersive Audio Data Essence Sub Descriptor | Set UL | 16 | dyn | 06.0E.2B.34.02.53.01.05.0E.09.06.06.00.00.00.00
|
Req | Defines the Immersive Audio Data Essence Descriptor Set (a collection of Parametric metadata) |
| Length | BER Length | 4 | Req | Set length | ||
| Immersive Audio Version | Uint8 | 1 | dyn | 06.0E.2B.34.01.01.01.05.0E.09.05.06.00.00.00.00
|
Opt | Immersive Audio Coder version used to create the source Bitstream |
| Max Channel Count | Uint16 | 2 | dyn | 06.0E.2B.34.01.01.01.05.0E.09.05.07.00.00.00.00
|
Opt | Maximum number of channels in the bitstream |
| Max Object Count | Uint16 | 2 | dyn | 06.0E.2B.34.01.01.01.05.0E.09.05.08.00.00.00.00
|
Opt | Maximum Number of objects in the bitstream |
| Immersive Audio ID | UUID | 16 | dyn | 06.0E.2B.34.01.01.01.05.0E.09.05.09.00.00.00.00
|
Opt | UUID of the Immersive Audio project |
| First Frame | UInt32 | 4 | dyn | 06.0E.2B.34.01.01.01.05.0E.09.05.0A.00.00.00.00
|
Opt | Specifies an edit unit for alignment with the FFOA of the picture track |
| IAB Sample Rate | Rational | 8 | dyn | 060E2B34.0101010E.04020301.0F000000 |
Opt | Sample Rate of the audio essence contained in the bitstream. |
Some of the values stored in the Subdescriptor are derived from the Immersive Audio Bitstream as indicated below. However, the values for Immersive Audio ID and First Frame are set during initial bitstream coding and should not be changed. Implementations should copy the values for Immersive Audio ID and First Frame from the Immersive Audio Data Essence SubDescriptor of the unmodified file in the event that the file must be re-wrapped.
This integer is derived from the Version field of a single IAFrame element. All IAFrame elements within a file shall have the same value.
Decoders shall ignore this item.
Decoders shall ignore this item.
This item is deprecated and should not be present.
This parameter shall be an integer indicating the edit unit that is to be synchronized to the beginning of the associated image sequence, also known as the FFOA.
The Sample Rate parameter shall indicate the rational representation of the IAFrame SampleRate field.
The Edit Rate of the essence track shall be derived from the IAFrame:FrameRate code.
The Sample Rate value for the MXF Generic File Descriptor shall be derived from the IAFrame:FrameRate code.
Files created with this specification shall be encrypted using SMPTE ST 429-6, if encryption is necessary.
To reference the Immersive Audio Track in a Composition, the extension elements defined in this section shall be used to extend the Reel element of a Composition Playlist, as specified in SMPTE ST 429-7.
The AuxData extension element defined in this specification shall be
associated with a unique XML namespace name that shall be the string value
http://www.dolby.com/schemas/2012/AD. This namespace name
conveys both structural and semantic version information, and serves the
purpose of a traditional version number field.
XML namespace names used in this standard are identified in Table 6. Namespace names are represented as Uniform Resource Identifier (URI) values per IETF RFC 3986.
NOTE ββ Readers unfamiliar with URI values as XML namespace names should be aware that although a URI value begins with a "method" element ("http" in this case), the value is designed primarily to be a unique string and does not necessarily correspond to an actual on-line resource. Applications implementing this standard should not attempt to resolve URI values on-line.
| Qualifier | URI |
|---|---|
| cpl | http://www.smpte-ra.org/schemas/429-7/2006/CPL |
| ia | http://www.dolby.com/schemas/2012/AD |
URIs listed in Table 6 are normative, whereas the
namespace qualifier values themselves (used in Table 6 and
elsewhere in this standard) are not normative. Thus, namespace qualifier
values may be replaced in instance documents by any arbitrary
XML-compliant namespace qualifier, meaning that conformant implementations
shall expect any XML-compliant namespace qualifier value that is
associated with a URI from Datatypes from other schemas that are used in
this document will be prefixed with the appropriate namespace qualifier
(e.g. xs:dateTime). See
W3C XML Schema Part 2: Datatypes for further information about these types.
The AuxData extension element defines the Immersive Audio asset intended for use with the composition. The actual data essence is contained in an external Immersive Audio Track File.
The AuxData element shall be an instance of the
DataTrackFileAssetType element, which is derived from the
TrackFileAssetType whose structure is defined in SMPTE ST 429-7.
The element defined below replicates values contained in the underlying track file and shall remain consistent with the content of the underlying track file at all times. It is included in the Composition Playlist to alleviate the need for theater management software to access and parse individual track files when scheduling content. In the event an inconsistency exists, the values contained in the underlying track file shall take precedence.
The DataType element is a UL that matches the value of the Data Essence Coding parameter of the Data Essence Descriptor in the Immersive Audio Data Track File. This allows identification of the type of data essence that is referenced by the Immersive Audio Data track. It shall be coded as type urn:smpte:ul as specified in SMPTE ST 2029. Only one value shall be valid, which is the UL specified for Immersive Audio essence in Clause 11.
The XML Schema document at Element a, that conforms to W3C XML Schema Part 1: Structures, normatively defines the structure of the Composition Playlist extensions previously described using a machine-readable language. While this schema is intended to faithfully represent the structure presented in the normative prose portions of this specification, conflicts in definition may occur. In the event of such a conflict, the normative prose shall be the authoritative expression of the standard.
This annex lists non-prose elements of this document.