diff options
author | Ralph Giles <giles@thaumas.net> | 2016-10-04 08:41:07 -0700 |
---|---|---|
committer | Erik de Castro Lopo <erikd@mega-nerd.com> | 2016-10-05 03:30:21 +1100 |
commit | 4bbd73a854f47eaf0f776c01d7cf6d0c21639e74 (patch) | |
tree | ee17e097f5470745eb97f03c197002c408fe137c /doc | |
parent | a2420c140544e2ca132434c0be53e04076a85c4d (diff) | |
download | flac-4bbd73a854f47eaf0f776c01d7cf6d0c21639e74.tar.gz |
Flac-in-mp4 draft v0.0.0.
We've been working on a draft spec for encapsulation of FLAC
in the ISO Base Media File Format (mp4). This is the initial
draft created by Monty Montgomery based on Yusuke Nakamura's
Opus-in-mp4 draft.
More details at https://bugzilla.mozilla.org/show_bug.cgi?id=1286097
Signed-off-by: Erik de Castro Lopo <erikd@mega-nerd.com>
Diffstat (limited to 'doc')
-rw-r--r-- | doc/isoflac.txt | 662 |
1 files changed, 662 insertions, 0 deletions
diff --git a/doc/isoflac.txt b/doc/isoflac.txt new file mode 100644 index 00000000..a74d0f9b --- /dev/null +++ b/doc/isoflac.txt @@ -0,0 +1,662 @@ +Encapsulation of FLAC in ISO Base Media File Format +Version 0.0.0 (early draft) + +Table of Contents +1 Scope +2 Supproting Normative References +3 Terms and Definitions +4 Design Rules of Encapsulation + 4.1 File Type Indentification + 4.2 Overview of Track Structure + 4.3 Definition of FLAC sample + 4.3.1 Sample entry format + 4.3.2 FLAC Specific Box + 4.3.3 Sample format + 4.3.4 Duration of FLAC sample + 4.3.5 Sub-sample + 4.3.6 Random Access + 4.3.6.1 Random Access Point + 4.4 Basic Structure (informative) + 4.4.1 Initial Movie + 4.5 Example of Encapsulation (informative) +5 Author's Address + +1 Scope + + This document specifies the normative mapping for encapsulation of + FLAC coded audio bitstreams in ISO Base Media file format and its + derivatives. The encapsulation of FLAC coded bitstreams in + QuickTime file format is outside the scope of this specification. + +2 Supporting Normative References + + [1] ISO/IEC 14496-12:2012 Corrected version + + Information technology — Coding of audio-visual objects — Part + 12: ISO base media file format + + [2] ISO/IEC 14496-12:2012/Amd.1:2013 + + Information technology — Coding of audio-visual objects — Part + 12: ISO base media file format AMENDMENT 1: Various + enhancements including support for large metadata + + [3] FLAC format specification + + https://xiph.org/flac/format.html + + Definition of the FLAC Audio Codec stream format + + [4] FLAC-in-Ogg mapping specification + + https://xiph.org/flac/ogg_mapping.html + + Ogg Encapsulation for the FLAC Audio Codec + + [5] Matroska specification + +3 Terms and Definitions + + 3.1 active track + + enabled track from the non-alternate group or selected track + from alternate group + + 3.2 edit + + entry in the Edit List Box + + 3.3 sample-accurate + + for any PCM sample, a timestamp exactly matching its sampling + timestamp is present in the media timeline. + + 3.4 native metadata + + +4 Design Rules of Encapsulation + + 4.1 File Type Indentification + + This specification does not define any brand to declare files + are conformant to this specification. Files conformant to + this specification shall contain at least one brand which + supports the requirements and the requirements described in + this clause without contradiction in the compatible brands + list of the File Type Box. The minimal support of the + encapsulation of FLAC bitstreams in ISO Base Media file format + requires the 'isom' brand. + + 4.2 Overview of Track Structure + + FLAC coded audio shall be encapsulated into the ISO Base + Media File Format as media data within an audio track. + + + The handler_type field in the Handler Reference Box + shall be set to 'soun'. + + + The Media Information Box shall contain the Sound Media + Header Box. + + + The codingname of the sample entry is 'fLaC'. + + This specification does not define any encapsulation + using MP4AudioSampleEntry with objectTypeIndication + specified by the MPEG-4 Registration Authority + (http://www.mp4ra.org/). See section 'Sample entry + format' for the definition of the the sample entry. + + + The 'dfLa' box is added to the sample entry to convey + initializing information for the decoder. + + See section 'FLAC Specific Box' for the definition of + the box contents. + + + A FLAC sample is exactly one FLAC packet. See section + 'Sample format' for details of the packet contents. + + + Every FLAC sample is a sync sample. No pre-roll or + lapping is required. See section 'Random Access' for + further details. + + FLAC native metadata + + 4.3 Definition of a FLAC sample + + 4.3.1 Sample entry format + + For any track containing one or more FLAC bitstreams, a + sample entry describing the corresponding FLAC bitstream + shall be present inside the Sample Table Box. This version + of the specification defines only one sample entry format + named FLACSampleEntry whose codingname is 'fLaC'. This + sample entry includes exactly one FLAC Specific Box + defined in section 'FLAC specific box' as a mandatory box + and indicates that FLAC samples described by this sample + entry are stored by the sample format described in section + 'Sample format'. + + The syntax and semantics of the FLACSampleEntry is shown + as follows. The data fields of this box and native + FLAC[3] structures encoded within FLAC blocks are both + stored in big-endian format, though for purposes of the + ISO BMFF container, FLAC native metadata and data blocks + are treated as unstructured octet streams. + + class FLACSampleEntry() extends AudioSampleEntry ('fLaC'){ + FLACSpecificBox(); + } + + + channelcount: + + The channelcount field shall be set equal to the + channel count specified by the FLAC bitstream's native + METADATA_BLOCK_STREAMINFO header as described in [3]. + Note that the FLAC FRAME_HEADER structure that begins + each FLAC sample redundantly encodes channel number; + the number of channels declared in each FRAME_HEADER + MUST match the number of channels declared here and in + the METADATA_BLOCK_STREAMINFO header. + + + samplesize: + + The samplesize field shall be set equal to the bits + per sample specified by the FLAC bitstream's native + METADATA_BLOCK_STREAMINFO header as described in [3]. + Note that the FLAC FRAME_HEADER structure that begins + each FLAC sample redundantly encodes the number of + bits per sample; the bits per sample declared in each + FRAME_HEADER MUST match the samplesize declared here + and the bits per sample field declared in the + METADATA_BLOCK_STREAMINFO header. + + + samplerate: + + The samplerate field shall be set equal to the sample + rate specified by the FLAC bitstream's native + METADATA_BLOCK_STREAMINFO header as described in [3], + left-shifted by 16 bits. Note that the FLAC + FRAME_HEADER structure that begins each FLAC sample + redundantly encodes the sample rate; the sample rate + declared in each FRAME_HEADER MUST match the sample + rate declared here and in the + METADATA_BLOCK_STREAMINFO header. + + + FLACSpecificBox + + This box contains initializing information for the + decoder as defined in section 'FLAC specific box' + + 4.3.2 FLAC Specific Box + + Exactly one FLAC Specific Box shall be present in each + FLACSampleEntry. The FLAC Specific Box contains the + Version field and this specification defines version 0 of + this box. If incompatible changes occur in the fields + after the Version field within the FLACSpecificBox in the + future versions of this specification, another version + will be defined. The data fields of this box and native + FLAC[3] structures encoded within FLAC blocks are both + stored in big-endian format, though for purposes of the + ISO BMFF container, FLAC native metadata and data blocks + are treated as unstructured octet streams. + + The syntax and semantics of the FLAC Specific Box is shown + as follows. + + aligned(8) class FLACMetadataBlock { + unsigned int(1) LastMetadataBlockFlag; + unsigned int(7) BlockType; + unsigned int(24) Length; + unsigned int(8) MetadataBlockData[BlockLength]; + } + + aligned(8) class FLACSpecificBox extends Box('dfLa'){ + unsigned int(8) Version; + unsigned int(8) MetadataBlocks; + for(i=0; i <= MetadataBlocks; i++){ + FLACMetadataBlock(); + } + } + + + Version: + + The Version field shall be set to 0. + + In the future versions of this specification, this + field may be set to other values. And without support + of those values, the reader shall not read the fields + after this within the FLACSpecificBox. + + + MetadataBlocks: + + The number of FLAC[3] native metadata blocks to + follow. This value must be at least 1 as a native + METADATA_BLOCK_STREAMINFO structure is required to + decode FLAC audio data. + + These fields are followed by a sequence of FLAC[3] + native-metadata block structures that fill the remainder + of the box length. + + + LastMetadataBlockFlag: + + The LastMetadataBlockFlag field maps semantically to + the FLAC[3] native MEATADATA_BLOCK_HEADER + Last-metadata-block flag as defined in the FLAC[3] + file specification. + + The LastMetadataBlockFlag is set to 1 if this + MetadataBlock is the last metadata block in the + FLACSpecificBox. It is set to 0 otherwise. + + + BlockType: + + The BlockType field maps semantically to the FLAC[3] + native MEATADATA_BLOCK_HEADER BLOCK_TYPE field as + defined in the FLAC[3] file specification. + + The BlockType is set to a valid FLAC[3] BLOCK_TYPE + value that identifies the type of this native metadata + block. The BlockType of the first FLACMetadataBlock + must be set to 0, signifying this is a FLAC[3] native + METADATA_BLOCK_STREAMINFO block. + + + Length: + + The Length field maps semantically to the FLAC[3] + native MEATADATA_BLOCK_HEADER Length field as + defined in the FLAC[3] file specification. + + The length field specifies the number of bytes of + MetadataBlockData to follow. + + + MetadataBlockData + + The MetadataBlockData field maps semantically to the + FLAC[3] native MEATADATA_BLOCK_HEADER + METADATA_BLOCKDATA as defined in the FLAC[3] file + specification. + + The FLACMetadataBlock structure consists of three fields + filling a total of four bytes that form a FLAC[3] native + METADATA_BLOCK_HEADER, followed by raw octet bytes that + comprise the FLAC[3] native METADATA_BLOCK_DATA. Taken + together, the bytes of the FLACMetadataBlock form a + complete FLAC[3] native METADATA_BLOCK structure. + + Note that a minimum of a single FLACMetadataBlock, + consisting of a FLAC[3] native METADATA_BLOCK_STREAMINFO + structure, is required. Should the FLACSpecificBox + contain more than a single FLACMetadataBlock structure, + the FLACMetadataBlock contianing the FLAC[3] native + METADATA_BLOCK_STREAMINFO must occur first in the list. + + Other containers that package FLAC audio streams, such as + Ogg[4] and Matroska[5], wrap FLAC[3] native metadata without + modification similar to this specification. When + repackaging or remuxing FLAC[3] streams from another + format that contains FLAC[3] native metadata into an ISO + BMFF file, the complete FLAC[3] native metadata should be + preserved in the ISO BMFF stream as described above. It + is also allowed to parse this native metadata and include + contextually redundant ISO BMFF-native repackagings and/or + reparsings of FLAC[3] native metadata, so long as the + native metadata is also preserved. + + 4.3.3 Sample format + + A FLAC sample is exactly one FLAC audio FRAME packet (as + defined in the FLAC[3] file specification) belonging to a + FLAC bitstreams. The FLAC sample data begins with a + complete FLAC FRAME_HEADER, followed by one FLAC SUBFRAME + per channel, any necessary bit padding, and ends with the + usual FLAC FRAME_FOOTER. + + Note that the FLAC native FRAME_HEADER structure that + begins each FLAC sample redundantly encodes channel count, + sample rate, and sample size. The values of these fields + must agree both with the values declared in the FLAC + METADATA_BLOCK_STREAMINFO structure as well as the + FLACDSampleEntry box. + + 4.3.4 Duration of a FLAC sample + + The duration of any given FLAC sample is determined by + dividing the decoded block size of a FLAC frame, as + encoded in the FLAC FRAME's FRAME_HEADER structure, by the + value of the timescale field in the Media Header Box. + FLAC samples are permitted to have variable durations + within a given audio stream. FLAC does not use padding + values. + + 4.3.5 Sub-sample + + Sub-samples are not defined for FLAC samples in this + specification. + + 4.3.6 Random Access + + This subclause describes the nature of the random access of FLAC sample. + + 4.3.6.1 Random Access Point + + All FLAC samples can be independently decoded + i.e. every FLAC sample is a sync sample. The Sync + Sample Box shall not be present as long as there are + no samples other than FLAC samples in the same + track. The sample_is_non_sync_sample field for FLAC + samples shall be set to 0. + + 4.3.6.2 Pre-roll + + FLAC bitstreams do not require pre-roll or + multi-sample synchronization. All samples + independently decode directly to a complete set of + valid samples. NeitherAudioRollRecoveryEntry nor AudioPreRollEntry + shall be used. + + 4.4 Basic Structure (informative) + + 4.4.1 Initial Movie + + This subclause shows a basic structure of the Movie Box as follows: + + +----+----+----+----+----+----+----+----+------------------------------+ + |moov| | | | | | | | Movie Box | + +----+----+----+----+----+----+----+----+------------------------------+ + | |mvhd| | | | | | | Movie Header Box | + +----+----+----+----+----+----+----+----+------------------------------+ + | |trak| | | | | | | Track Box | + +----+----+----+----+----+----+----+----+------------------------------+ + | | |tkhd| | | | | | Track Header Box | + +----+----+----+----+----+----+----+----+------------------------------+ + | | |edts|* | | | | | Edit Box | + +----+----+----+----+----+----+----+----+------------------------------+ + | | | |elst|* | | | | Edit List Box | + +----+----+----+----+----+----+----+----+------------------------------+ + | | |mdia| | | | | | Media Box | + +----+----+----+----+----+----+----+----+------------------------------+ + | | | |mdhd| | | | | Media Header Box | + +----+----+----+----+----+----+----+----+------------------------------+ + | | | |hdlr| | | | | Handler Reference Box | + +----+----+----+----+----+----+----+----+------------------------------+ + | | | |minf| | | | | Media Information Box | + +----+----+----+----+----+----+----+----+------------------------------+ + | | | | |smhd| | | | Sound Media Header Box | + +----+----+----+----+----+----+----+----+------------------------------+ + | | | | |dinf| | | | Data Information Box | + +----+----+----+----+----+----+----+----+------------------------------+ + | | | | | |dref| | | Data Reference Box | + +----+----+----+----+----+----+----+----+------------------------------+ + | | | | | | |url | | DataEntryUrlBox | + +----+----+----+----+----+----+ or +----+------------------------------+ + | | | | | | |urn | | DataEntryUrnBox | + +----+----+----+----+----+----+----+----+------------------------------+ + | | | | |stbl| | | | Sample Table | + +----+----+----+----+----+----+----+----+------------------------------+ + | | | | | |stsd| | | Sample Description Box | + +----+----+----+----+----+----+----+----+------------------------------+ + | | | | | | |fLaC| | FLACSampleEntry | + +----+----+----+----+----+----+----+----+------------------------------+ + | | | | | | | |dfLa| FLAC Specific Box | + +----+----+----+----+----+----+----+----+------------------------------+ + | | | | | |stts| | | Decoding Time to Sample Box | + +----+----+----+----+----+----+----+----+------------------------------+ + | | | | | |stsc| | | Sample To Chunk Box | + +----+----+----+----+----+----+----+----+------------------------------+ + | | | | | |stsz| | | Sample Size Box | + +----+----+----+----+----+ or +----+----+------------------------------+ + | | | | | |stz2| | | Compact Sample Size Box | + +----+----+----+----+----+----+----+----+------------------------------+ + | | | | | |stco| | | Chunk Offset Box | + +----+----+----+----+----+ or +----+----+------------------------------+ + | | | | | |co64| | | Chunk Large Offset Box | + +----+----+----+----+----+----+----+----+------------------------------+ + | |mvex|* | | | | | | Movie Extends Box | + +----+----+----+----+----+----+----+----+------------------------------+ + | | |trex|* | | | | | Track Extends Box | + +----+----+----+----+----+----+----+----+------------------------------+ + + Figure 1 - Basic structure of Movie Box + + It is strongly recommended that the order of boxes should + follow the above structure. Boxes marked with an asterisk + (*) may be present. For most boxes listed above, the + definition is as is defined in ISO/IEC 14496-12 [1]. The + additional boxes and the additional requirements, + restrictions and recommendations to the other boxes are + described in this specification. + + 4.5 Example of Encapsulation (informative) + [File] + size = 17790 + [ftyp: File Type Box] + position = 0 + size = 24 + major_brand = mp42 : MP4 version 2 + minor_version = 0 + compatible_brands + brand[0] = mp42 : MP4 version 2 + brand[1] = isom : ISO Base Media file format + [moov: Movie Box] + position = 24 + size = 757 + [mvhd: Movie Header Box] + position = 32 + size = 108 + version = 0 + flags = 0x000000 + creation_time = UTC 2014/12/12, 18:41:19 + modification_time = UTC 2014/12/12, 18:41:19 + timescale = 48000 + duration = 33600 (00:00:00.700) + rate = 1.000000 + volume = 1.000000 + reserved = 0x0000 + reserved = 0x00000000 + reserved = 0x00000000 + transformation matrix + | a, b, u | | 1.000000, 0.000000, 0.000000 | + | c, d, v | = | 0.000000, 1.000000, 0.000000 | + | x, y, w | | 0.000000, 0.000000, 1.000000 | + pre_defined = 0x00000000 + pre_defined = 0x00000000 + pre_defined = 0x00000000 + pre_defined = 0x00000000 + pre_defined = 0x00000000 + pre_defined = 0x00000000 + next_track_ID = 2 + [iods: Object Descriptor Box] + position = 140 + size = 33 + version = 0 + flags = 0x000000 + [tag = 0x10: MP4_IOD] + expandableClassSize = 16 + ObjectDescriptorID = 1 + URL_Flag = 0 + includeInlineProfileLevelFlag = 0 + reserved = 0xf + ODProfileLevelIndication = 0xff + sceneProfileLevelIndication = 0xff + audioProfileLevelIndication = 0xfe + visualProfileLevelIndication = 0xff + graphicsProfileLevelIndication = 0xff + [tag = 0x0e: ES_ID_Inc] + expandableClassSize = 4 + Track_ID = 1 + [trak: Track Box] + position = 173 + size = 608 + [tkhd: Track Header Box] + position = 181 + size = 92 + version = 0 + flags = 0x000007 + Track enabled + Track in movie + Track in preview + creation_time = UTC 2014/12/12, 18:41:19 + modification_time = UTC 2014/12/12, 18:41:19 + track_ID = 1 + reserved = 0x00000000 + duration = 33600 (00:00:00.700) + reserved = 0x00000000 + reserved = 0x00000000 + layer = 0 + alternate_group = 0 + volume = 1.000000 + reserved = 0x0000 + transformation matrix + | a, b, u | | 1.000000, 0.000000, 0.000000 | + | c, d, v | = | 0.000000, 1.000000, 0.000000 | + | x, y, w | | 0.000000, 0.000000, 1.000000 | + width = 0.000000 + height = 0.000000 + [mdia: Media Box] + position = 273 + size = 472 + [mdhd: Media Header Box] + position = 281 + size = 32 + version = 0 + flags = 0x000000 + creation_time = UTC 2014/12/12, 18:41:19 + modification_time = UTC 2014/12/12, 18:41:19 + timescale = 48000 + duration = 34560 (00:00:00.720) + language = und + pre_defined = 0x0000 + [hdlr: Handler Reference Box] + position = 313 + size = 51 + version = 0 + flags = 0x000000 + pre_defined = 0x00000000 + handler_type = soun + reserved = 0x00000000 + reserved = 0x00000000 + reserved = 0x00000000 + name = Xiph Audio Handler + [minf: Media Information Box] + position = 364 + size = 381 + [smhd: Sound Media Header Box] + position = 372 + size = 16 + version = 0 + flags = 0x000000 + balance = 0.000000 + reserved = 0x0000 + [dinf: Data Information Box] + position = 388 + size = 36 + [dref: Data Reference Box] + position = 396 + size = 28 + version = 0 + flags = 0x000000 + entry_count = 1 + [url : Data Entry Url Box] + position = 412 + size = 12 + version = 0 + flags = 0x000001 + location = in the same file + [stbl: Sample Table Box] + position = 424 + size = 321 + [stsd: Sample Description Box] + position = 432 + size = 79 + version = 0 + flags = 0x000000 + entry_count = 1 + [fLaC: Audio Description] + position = 448 + size = 63 + reserved = 0x000000000000 + data_reference_index = 1 + reserved = 0x0000 + reserved = 0x0000 + reserved = 0x00000000 + channelcount = 2 + samplesize = 16 + pre_defined = 0 + reserved = 0 + samplerate = 48000.000000 + [dfLa: FLAC Specific Box] + position = 484 + size = 48 + Version = 0 + MetadataBlocks = 1 + LastMetadataBlockFlag = 1 + BlockType = 0 + Length = 34 + MetadataBlockData[34]; + [stts: Decoding Time to Sample Box] + position = 490 + size = 24 + version = 0 + flags = 0x000000 + entry_count = 1 + entry[0] + sample_count = 18 + sample_delta = 1920 + [stsc: Sample To Chunk Box] + position = 514 + size = 40 + version = 0 + flags = 0x000000 + entry_count = 2 + entry[0] + first_chunk = 1 + samples_per_chunk = 13 + sample_description_index = 1 + entry[1] + first_chunk = 2 + samples_per_chunk = 5 + sample_description_index = 1 + [stsz: Sample Size Box] + position = 554 + size = 92 + version = 0 + flags = 0x000000 + sample_size = 0 (variable) + sample_count = 18 + entry_size[0] = 977 + entry_size[1] = 938 + entry_size[2] = 939 + entry_size[3] = 938 + entry_size[4] = 934 + entry_size[5] = 945 + entry_size[6] = 948 + entry_size[7] = 956 + entry_size[8] = 955 + entry_size[9] = 930 + entry_size[10] = 933 + entry_size[11] = 934 + entry_size[12] = 972 + entry_size[13] = 977 + entry_size[14] = 958 + entry_size[15] = 949 + entry_size[16] = 962 + entry_size[17] = 848 + [stco: Chunk Offset Box] + position = 646 + size = 24 + version = 0 + flags = 0x000000 + entry_count = 2 + chunk_offset[0] = 686 + chunk_offset[1] = 12985 + [free: Free Space Box] + position = 670 + size = 8 + [mdat: Media Data Box] + position = 678 + size = 17001 + +5 Authors' Address + Monty Montgomery <monty@xiph.org> |