xet documentation
MDB Shard File Format Specification
MDB Shard File Format Specification
A Shard is a serialized object containing file reconstruction information and xorb metadata for deduplication purposes.
The Shard format is the vehicle for uploading the file reconstruction upload and communicating information about xorbs and chunks that clients can deduplicate their data against.
Overview
The MDB (Merkle Database) shard file format is a binary format used to store file metadata and content-addressable storage (CAS) information for efficient deduplication and retrieval. This document describes the binary layout and deserialization process for the shard format. Implementors of the xet protocol MUST use the shard format when implementing the upload protocol. The shard format is used on the shard upload (record files) and global deduplication APIs.
Use As API Request and Response Bodies
The shard format is used in the shard upload API as the request payload and in the global deduplication/chunk query API as the response payload.
Shard Upload
The shard in this case is a serialization format that allows clients to denote the files that they are uploading. Each file reconstruction maps to a File Info block in the File Info section. Additionally, the listing of all new xorbs that the client created are mapped to items (CAS Info blocks) in the CAS Info section so that they may be deduplicated against in the future.
When uploading a shard the footer section MUST be omitted.
An example of a shard that can be used for file upload can be found in Xet reference files. A version of this shard that also contains the footer in Xet reference files too, see the README for the reference files dataset for more context.
Global Deduplication
Shards returned by the Global Deduplication API have an empty File Info Section, and only contain relevant information in the CAS Info section. The CAS Info section returned by this API contains xorbs, where a xorb described in the CAS Info section contains the chunk that was queried. Clients can deduplicate their content against any of the other xorbs described in any CAS Info block in the CAS Info section of the returned shard. Other xorb descriptions returned in a shard are possibly more likely to reference content that the client has.
An example of a shard that can be returned for a global deduplication query can be found in Xet reference files.
File Structure
A shard file consists of the following sections in order:
βββββββββββββββββββββββ β Header β βββββββββββββββββββββββ€ β File Info Section β βββββββββββββββββββββββ€ β CAS Info Section β βββββββββββββββββββββββ€ β Footer β βββββββββββββββββββββββ
Overall File Layout with Byte Offsets
Offset 0: βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β Header (48 bytes) β β Fixed size βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ Offset footer.file_info_offset: βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β File Info Section β β Variable size β (Multiple file blocks + β β bookend entry) β β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ Offset footer.cas_info_offset: βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β CAS Info Section β β Variable size β (Multiple CAS blocks + β β bookend entry) β β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ Offset footer.footer_offset: βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β Footer (200 bytes, sometimes omitted) β β Fixed size βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Constants
MDB_SHARD_HEADER_VERSION
: 2MDB_SHARD_FOOTER_VERSION
: 1MDB_FILE_INFO_ENTRY_SIZE
: 48 bytes (size of each file info structure)MDB_CAS_INFO_ENTRY_SIZE
: 48 bytes (size of each CAS info structure)MDB_SHARD_HEADER_TAG
: 32-byte magic identifier
Data Types
All multi-byte integers are stored in little-endian format.
u8
: 8-bit unsigned integeru32
: 32-bit unsigned integeru64
: 64-bit unsigned integer- Byte Array types are denoted like in rust as
[u8; N]
whereN
is the number of bytes in the array. - Hash: 32-byte hash value, a special
[u8; 32]
1. Header (MDBShardFileHeader)
Location: Start of file (offset 0) Size: 48 bytes
struct MDBShardFileHeader {
tag: [u8; 32], // Magic number identifier
version: u64, // Header version (must be 2)
footer_size: u64, // Size of footer in bytes, set to 0 if footer is omitted
}
Memory Layout:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββ β tag (32 bytes) β version β footer_sz β β Magic Number Identifier β (8 bytes) β (8 bytes) β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ΄ββββββββββββ΄ββββββββββββ 0 32 40 48
Deserialization steps:
- Read 32 bytes for the magic tag
- Verify tag matches
MDB_SHARD_HEADER_TAG
- Read 8 bytes for version (u64)
- Verify version equals 2
- Read 8 bytes for footer_size (u64)
When serializing, footer_size MUST be the number of bytes that make up the footer, or 0 if the footer is omitted.
2. File Info Section
Location: footer.file_info_offset
to footer.cas_info_offset
or directly after the header
This section contains a sequence of 0 or more file information (File Info) blocks, each consisting at least a header and at least 1 data sequence entry, and OPTIONAL verification entries and metadata extension section. The file info section ends when reaching the bookend entry.
Each File Info block within the overall section is a serialization of a file reconstruction into a binary format.
For each file, there is a FileDataSequenceHeader
and for each term a FileDataSequenceEntry
with OPTIONAL a matching FileVerificationEntry
and also OPTIONAL at the end a FileMetadataExt
.
A shard File Info section can contain more than 1 File Info block in series, after completing reading all the content for 1 file description, the next one immediately begins. If when reading the header of the next section a reader encounters the bookend entry that means the file info section is over; you have read the last file description in this shard.
File Info Section Layout
Without Optional Components:
βββββββββββββββββββββββ β FileDataSeqHeader β β File 1 βββββββββββββββββββββββ€ β FileDataSeqEntry β βββββββββββββββββββββββ€ β FileDataSeqEntry β βββββββββββββββββββββββ€ β ... β βββββββββββββββββββββββ€ β FileDataSeqHeader β β File 2 βββββββββββββββββββββββ€ β FileDataSeqEntry β βββββββββββββββββββββββ€ β ... β βββββββββββββββββββββββ€ β Bookend Entry β β All 0xFF hash + zeros βββββββββββββββββββββββ
With All Optional Components:
βββββββββββββββββββββββ β FileDataSeqHeader β β File 1 (flags indicate verification + metadata) βββββββββββββββββββββββ€ β FileDataSeqEntry β βββββββββββββββββββββββ€ β FileDataSeqEntry β βββββββββββββββββββββββ€ β ... β βββββββββββββββββββββββ€ β FileVerifyEntry β β One per FileDataSeqEntry βββββββββββββββββββββββ€ β FileVerifyEntry β βββββββββββββββββββββββ€ β ... β βββββββββββββββββββββββ€ β FileMetadataExt β β One per file (if flag set) βββββββββββββββββββββββ€ β FileDataSeqHeader β β File 2 βββββββββββββββββββββββ€ β ... β βββββββββββββββββββββββ€ β Bookend Entry β β All 0xFF hash + zeros βββββββββββββββββββββββ
FileDataSequenceHeader
struct FileDataSequenceHeader {
file_hash: Hash, // 32-byte file hash
file_flags: u32, // Flags indicating conditional sections that follow
num_entries: u32, // Number of FileDataSequenceEntry structures
_unused: [u8; 8], // Reserved space 8 bytes
}
File Flags:
MDB_FILE_FLAG_WITH_VERIFICATION
(0x80000000 or 1 << 31): Has verification entriesMDB_FILE_FLAG_WITH_METADATA_EXT
(0x40000000 or 1 << 30): Has metadata extension
Given the file_data_sequence_header.file_flags & MASK
(bitwise AND) operations, if the result != 0 then the effect is true.
Memory Layout:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ¬βββββββββββ¬ββββββββββββ¬βββββββββββββ β file_hash (32 bytes) βfile_flagsβnum_entriesβ _unused β β File Hash Value β(4 bytes) β(4 bytes) β (8 bytes) β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ΄βββββββββββ΄ββββββββββββ΄βββββββββββββ 0 32 36 40 48
FileDataSequenceEntry
Each FileDataSequenceEntry
is 1 term is essentially the binary serialization of a file reconstruction term.
struct FileDataSequenceEntry {
cas_hash: Hash, // 32-byte Xorb hash in the term
cas_flags: u32, // CAS flags (reserved for future, set to 0)
unpacked_segment_bytes: u32, // Term size when unpacked
chunk_index_start: u32, // Start chunk index within the Xorb for the term
chunk_index_end: u32, // End chunk index (exclusive) within the Xorb for the term
}
Note that when describing a chunk range in a
FileDataSequenceEntry
use ranges that are start-inclusive but end-exclusive i.e.[chunk_index_start, chunk_index_end)
Memory Layout:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ¬ββββββββββ¬ββββββββββ¬ββββββββββ¬ββββββββββ β cas_hash (32 bytes) βcas_flagsβunpacked βchunk_idxβchunk_idxβ β CAS Block Hash β(4 bytes)βseg_bytesβstart βend β β β β(4 bytes)β(4 bytes)β(4 bytes)β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ΄ββββββββββ΄ββββββββββ΄ββββββββββ΄ββββββββββ 0 32 36 40 44 48
FileVerificationEntry (OPTIONAL)
Verification Entries MUST be set for shard uploads.
To generate verification hashes for shard upload read the section about Verification Hashes.
struct FileVerificationEntry {
range_hash: Hash, // 32-byte verification hash
_unused: [u8; 16], // Reserved (16 bytes)
}
Memory Layout:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββ β range_hash (32 bytes) β _unused (16 bytes) β β Verification Hash β Reserved Space β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ΄βββββββββββββββββββββββββββββββββ 0 32 48
When a shard has verification entries, all file info sections MUST have verification entries.
If only some subset of files in the shard have verification entries, the shard is considered invalid.
Every FileDataSequenceEntry
will have a matching FileVerificationEntry
in this case where the range_hash is computed with the chunk hashes for that range of chunks.
For any file the nth FileVerificationEntry
correlates to the nth FileDataSequenceEntry
, and like FileDataSequenceEntries
if there are verification entries there will be file_data_sequence_header.num_entries
verification entries (following the num_entries data sequence entries).
FileMetadataExt (OPTIONAL)
This section is REQUIRED per file for shards uploaded through the shard upload API.
There is only 1 FileMetadataExt
instance per file info block and it is the last component of that file info block when present.
The sha256 field is the 32 byte SHA256 of the file contents of the file described.
struct FileMetadataExt {
sha256: Hash, // 32-byte SHA256 hash
_unused: [u8; 16], // Reserved (16 bytes)
}
Memory Layout:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββ β sha256 (32 bytes) β _unused (16 bytes) β β SHA256 Hash β Reserved Space β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ΄βββββββββββββββββββββββββββββββββ 0 32 48
File Info Bookend
The end of the file info sections is marked by a bookend entry.
The bookend entry is 48 bytes long where the first 32 bytes are all 0xFF
, followed by 16 bytes of all 0x00
.
Suppose you were attempting to deserialize a FileDataSequenceHeader
and itβs file hash was all 1 bits then this entry is a bookend entry and the next bytes start the next section.
Since the file info section immediately follows the header, a client MAY skip deserializing the footer to know where it starts deserializing this section. The file info section begins right after the header and ends when the bookend is reached.
Deserialization steps:
- Seek to
footer.file_info_offset
- Read
FileDataSequenceHeader
- Check if
file_hash
is all0xFF
(bookend marker) - if so, stop - Read
file_data_sequence_header.num_entries
ΓFileDataSequenceEntry
structures - If
file_flags & MDB_FILE_FLAG_WITH_VERIFICATION != 0
: readfile_data_sequence_header.num_entries
ΓFileVerificationEntry
- If
file_flags & MDB_FILE_FLAG_WITH_METADATA_EXT != 0
: read 1 ΓFileMetadataExt
- Repeat from step 2 until bookend found
3. CAS Info Section
Location: footer.cas_info_offset
to footer.footer_offset
or directly after the file info section bookend
This section contains CAS (Content Addressable Storage) block information. Each CAS Info block represents a xorb by first having a CASChunkSequenceHeader
which contains the number of CASChunkSequenceEntries
to follow that make up this block. The CAS Info section ends when reaching the bookend entry.
CAS Info Section Layout
βββββββββββββββββββββββ β CASChunkSeqHeader β β CAS Block 1 βββββββββββββββββββββββ€ β CASChunkSeqEntry β βββββββββββββββββββββββ€ β CASChunkSeqEntry β βββββββββββββββββββββββ€ β ... β βββββββββββββββββββββββ€ β CASChunkSeqHeader β β CAS Block 2 βββββββββββββββββββββββ€ β CASChunkSeqEntry β βββββββββββββββββββββββ€ β ... β βββββββββββββββββββββββ€ β Bookend Entry β β All 0xFF hash + zeros βββββββββββββββββββββββ
Deserialization steps:
- Seek to
footer.cas_info_offset
- Read
CASChunkSequenceHeader
- Check if
cas_hash
is all 0xFF (bookend marker) - if so, stop - Read
cas_chunk_sequence_header.num_entries
ΓCASChunkSequenceEntry
structures - Repeat from step 2 until bookend found
CASChunkSequenceHeader
struct CASChunkSequenceHeader {
cas_hash: Hash, // 32-byte Xorb hash
cas_flags: u32, // CAS flags (reserved for later, set to 0)
num_entries: u32, // Number of chunks in this Xorb
num_bytes_in_cas: u32, // Total size of all raw chunk bytes in this Xorb
num_bytes_on_disk: u32, // Length of the xorb after serialized when uploaded
}
Memory Layout:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ¬ββββββββββ¬ββββββββββ¬ββββββββββ¬ββββββββββ β cas_hash (32 bytes) βcas_flagsβnum_ βnum_bytesβnum_bytesβ β CAS Block Hash β(4 bytes)βentries βin_cas βon_disk β β β β(4 bytes)β(4 bytes)β(4 bytes)β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ΄ββββββββββ΄ββββββββββ΄ββββββββββ΄ββββββββββ 0 32 36 40 44 48
CASChunkSequenceEntry
Every CASChunkSequenceHeader
will have a num_entries
number field.
This number is the number of CASChunkSequenceEntry
items that should be deserialized that are associated with the xorb described by this CAS Info block.
struct CASChunkSequenceEntry {
chunk_hash: Hash, // 32-byte chunk hash
chunk_byte_range_start: u32, // Start position in CAS block
unpacked_segment_bytes: u32, // Size when unpacked
_unused: [u8; 8], // Reserved space 8 bytes
}
Memory Layout:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ¬ββββββββββ¬ββββββββββ¬ββββββββββββββββββ β chunk_hash (32 bytes) βchunk_ βunpacked β _unused β β Chunk Hash βbyte_ βsegment_ β (8 bytes) β β βrange_ βbytes β β β βstart β(4 bytes)β β β β(4 bytes)β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ΄ββββββββββ΄ββββββββββ΄ββββββββββββββββββ 0 32 36 40 48
CAS Info Bookend
The end of the cas info sections is marked by a bookend entry.
The bookend entry is 48 bytes long where the first 32 bytes are all 0xFF
, followed by 16 bytes of all 0x00
.
Suppose you were attempting to deserialize a CASChunkSequenceHeader
and itβs hash was all 1 bits then this entry is a bookend entry and the next bytes start the next section.
Since the cas info section immediately follows the file info section bookend, a client MAY skip deserializing the footer to know where the cas info section starts starts deserialize this section, it begins right after the file info section bookend and ends when the next bookend is reached.
4. Footer (MDBShardFileFooter)
MUST NOT include the footer when serializing the shard as the body for the shard upload API.
Location: End of file minus footer_size Size: 200 bytes
struct MDBShardFileFooter {
version: u64, // Footer version (must be 1)
file_info_offset: u64, // Offset to file info section
cas_info_offset: u64, // Offset to CAS info section
_buffer: [u8; 48], // Reserved space (48 bytes)
chunk_hash_hmac_key: Hash, // HMAC key for chunk hashes (32 bytes)
shard_creation_timestamp: u64, // Creation time (seconds since epoch)
shard_key_expiry: u64, // Expiry time (seconds since epoch)
_buffer2: [u8; 72], // Reserved space (72 bytes)
footer_offset: u64, // Offset where footer starts
}
Memory Layout:
Fields are not exactly to scale
βββββββββββ¬ββββββββββ¬ββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββ β version βfile_infoβcas_info β _buffer (reserved) β chunk_hash_hmac_key β β(8 bytes)βoffset βoffset β (48 bytes) β (32 bytes) β β β(8 bytes)β(8 bytes)β β β βββββββββββ΄ββββββββββ΄ββββββββββ΄ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββββββββββββββ 0 8 16 24 72 104 βββββββββββ¬βββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ¬ββββββββββ βcreation βshard_ β _buffer (reserved) βfooter_ β βtimestampβkey_expiryβ (72 bytes) βoffset β β(8 bytes)β (8 bytes)β β(8 bytes)β βββββββββββ΄βββββββββββ΄ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ΄ββββββββββ 104 112 120 192 200
Deserialization steps:
- Seek to
file_size - footer_size
- Read all fields sequentially as u64 values
- Verify version equals 1
Use of Footer Fields
file_info_offset and cas_info_offset
These offsets allow you to seek into the shard data buffer to reach these sections without deserializing linearly.
HMAC Key Protection
If footer.chunk_hash_hmac_key
is non-zero (as a response shard from the global dedupe API), chunk hashes in the CAS Info section are protected with HMAC:
- The stored chunk hashes are
HMAC(original_hash, footer.chunk_hash_hmac_key)
- To check if a chunk of data that you have matches a chunk listed in the shard, compute
HMAC(chunk_hash, footer.chunk_hash_hmac_key)
for your chunk hash and search through the shard results. If you find a match (matched_chunk) then you know the original chunk hash of your chunk and the matched_chunk is the same and you can deduplicate your chunk by referencing the xorb that matched_chunk belongs to.
Shard Key Expiry
The shard key expiry is a 64 bit unix timestamp of when the shard received is to be considered expired (usually in the order of days or weeks after the shard was sent back).
After this expiry time has passed clients SHOULD consider this shard expired and SHOULD NOT use it to deduplicate data. Uploads that reference xorbs that were referenced by this shard can be rejected at the serverβs discretion.
Complete Deserialization Algorithm
// ** option 1, read linearly, streaming ** // assume shard is a read-able file-like object and the reader position is at start of shard // 1. Read and validate header header = read_header(shard) // 2. Read file info section file_info = read_file_info_section(shard) // read through file info bookend // 3. Read CAS info section cas_info = read_cas_info_section(shard) // read through cas info bookend // 4. Read footer footer = read_footer(shard) // shard reader should now be at EOF // ** option 2, read footer and seek ** // assume shard is a read-able seek-able file-like object // 1. Read and validate header seek(start of shard) header = read_header(shard) // 2. Read and validate footer (needed for offsets) seek(end of shard minus header.footer_size) footer = read_footer(shard) // 3. Read file info section seek(footer.file_info_offset) file_info = read_file_info_section(shard) // until footer.cas_info_offset // 4. Read CAS info section seek(footer.cas_info_offset) cas_info = read_cas_info_section(shard) // until footer.footer_offset
Version Compatibility
- Header version 2: Current format
- Footer version 1: Current format
- Shards with different versions will be rejected
Error Handling
- Always verify magic numbers and versions
- Check that offsets are within file bounds
- Verify that bookend markers are present where expected