We propose a binary format for Compound Merkle Paths (CMP hereon) optimized for minimal data bandwidth during transmission.
Explainer Video
Copyright
This BRC is licensed under the Open BSV license.
Motivation
Current format standards do not cover merkle paths for multiple txids within the same block. This would help reduce the overall size of data needed to express any set of paths from the same block. The larger the set the bigger the space saving.
Specification
For each level of the merkle path the opposite hash from the one which can be calculated is provided.
Data Types
Field
Description
Size
height
The height of the tree up to a max 64
1 byte
nLeaves
VarInt number of leaves at this height
1-9 bytes
offset
VarInt offset from left hand side within tree
1-9 bytes
leaf
Each leaf is a 32 byte hash
32 bytes
Formatting Syntax
offset and leaf are repeated for nLeaves at each height
height does not need to be repeated, the inference is that height starts as the max height of the tree and is decremented by one each time we reach the end of the current set of leaves. Once height === -1 we stop parsing.
nLeaves is repeated for each height, followed by the corresponding offset and leaf for each.
Example
Important Note
We must include the txid and offset within a block at height 0. This is the most efficient way to store the index data required to pull out individual paths when given only a txid. In the example below we encode txids at indices 0 and 3:
02// height = 201// nLeafs at this height VarInt// ----------------------01// offset VarIntcd73c0c6bb645581816fa960fd2f1636062fcbf23cb57981074ab8d708a76e3b // 32 byte hash// ----------------------// implied end of leaves at this height// height of next leaves is therefore 102// nLeafs at this height VarInt// ----------------------00// offset VarInt3470d882cf556a4b943639eba15dc795dffdbebdc98b9a98e3637fda96e3811e // 32 byte hash// ----------------------01// offset VarIntc58e40f22b9e9fcd05a09689a9b19e6e62dbfd3335c5253d09a7a7cd755d9a3c // 32 byte hash// ----------------------// implied end of leaves at this height// height of next leaves is therefore 004// nLeafs at this height VarInt// ----------------------00// offset VarInt8c00bb9360e93fb822c84b2e579fa4ce75c8378ae87f67730a49552f73c56ee8 // 32 byte hash (this is the txid at index 0)// ----------------------01// offset VarIntda256f78ae0ad74bbf539662cdb9122aa02ba9a9d883f1d52468d96290515adb // 32 byte hash// ----------------------02// offset VarIntb4c8d919190a090e77b73ffcd52b85babaaeeb62da000473102aca7f070facef // 32 byte hash// ----------------------03// offset VarInte5b331f4961d764373f3a4e2751954e75489fb17902aad583eedbb41dc165a3b // 32 byte hash (this is the txid at index 3)// ----------------------// implied end of data because new height would be -1
Implementation
Let's start by dumping this format as hex into a Buffer and parsing it into an object with a Buffer Reader. Then we construct an object
If we JSON encode the leaves we get the following. Height is encoded as the position of the leaf object within the outermost array.
[ // index within the outer array corresponds to the height { // within each height there must be one or more hashes with their corresponding offsets { [hash]: offset }"e86ec5732f55490a73677fe88a37c875cea49f572e4bc822b83fe96093bb008c":0,// txid at index 0"db5a519062d96824d5f183d8a9a92ba02a12b9cd629653bf4bd70aae786f25da":1,"efac0f077fca2a10730400da62ebaebaba852bd5fc3fb7770e090a1919d9c8b4":2,"3b5a16dc41bbed3e58ad2a9017fb8954e7541975e2a4f37343761d96f431b3e5":3// txid at index 3 }, {"1e81e396da7f63e3989a8bc9bdbefddf95c75da1eb3936944b6a55cf82d87034":0,"3c9a5d75cda7a7093d25c53533fddb626e9eb1a98996a005cd9f9e2bf2408ec5":1 }, {"3b6ea708d7b84a078179b53cf2cb2f0636162ffd60a96f81815564bbc6c073cd":1 }]
You can derive individual paths for particular transaction indices as necessary using the following algorithm:
Reading index 3 from the Compound Merkle Path.
// let's say we want to derive the path for txid 3b5a16dc41bbed3e58ad2a9017fb8954e7541975e2a4f37343761d96f431b3e5 // first we determine from the 0th array in the compound merkle path that the block index associated is 3.constexample= { index:3, path: []} compoundPath.map((leaves, height) => {constindexOffset=example.index >> height ^1for (consthashin leaves) {if (leaves[hash] === indexOffset) {example.path.push(hash)returntrue } }returnError(`We do not have a hash for this index at height: ${height}`)})
Important to understand that we only kept the paths for indices 0 and 3 for this particular example. If you attempt to run the algo above for any other indices, an error would be thrown. This allowed us to keep 7 hashes out of 14 total. Keeping each path separately we'd have to keep minimum 6 hashes, and if we added another index then the compound method would only require one more hash, whereas saving individual paths would require another 3. The total bandwidth saving would be significant if we had thousands of transactions all in the same block.
Merkle Proof
We use this to prove inclusion in a block by running a Merkle Proof algorithm on the txid, index, and path. We arrive at a Merkle root hash. This can then be used as the key in a Block Header lookup to determine whether the txid is included within a block which is part of the longest chain of work.