Compound Merkle Path Format

Deggen (deggen@kschw.com) Damian Orzepowski (damian.orzepowski@4chain.studio)

Abstract

We propose a binary format for Compound Merkle Paths (CMP hereon) optimized for minimal data bandwidth during transmission.

Explainer Video

This BRC is licensed under the Open BSV license.

Motivation

Current format standards do not cover merkle paths for multiple txids within the same block. This would help reduce the overall size of data needed to express any set of paths from the same block. The larger the set the bigger the space saving.

Specification

For each level of the merkle path the opposite hash from the one which can be calculated is provided.

Data Types

FieldDescriptionSize

height

The height of the tree up to a max 64

1 byte

nLeaves

VarInt number of leaves at this height

1-9 bytes

offset

VarInt offset from left hand side within tree

1-9 bytes

leaf

Each leaf is a 32 byte hash

32 bytes

Formatting Syntax

  1. offset and leaf are repeated for nLeaves at each height

  2. height does not need to be repeated, the inference is that height starts as the max height of the tree and is decremented by one each time we reach the end of the current set of leaves. Once height === -1 we stop parsing.

  3. nLeaves is repeated for each height, followed by the corresponding offset and leaf for each.

Example

Important Note

We must include the txid and offset within a block at height 0. This is the most efficient way to store the index data required to pull out individual paths when given only a txid. In the example below we encode txids at indices 0 and 3:

indextxid

0

e86ec5732f55490a73677fe88a37c875cea49f572e4bc822b83fe96093bb008c

3

3b5a16dc41bbed3e58ad2a9017fb8954e7541975e2a4f37343761d96f431b3e5

By convention we reverse the bytes of a txid hex string so these sequences will be seen in their inverse endian form below.

Hex

020101cd73c0c6bb645581816fa960fd2f1636062fcbf23cb57981074ab8d708a76e3b02003470d882cf556a4b943639eba15dc795dffdbebdc98b9a98e3637fda96e3811e01c58e40f22b9e9fcd05a09689a9b19e6e62dbfd3335c5253d09a7a7cd755d9a3c04008c00bb9360e93fb822c84b2e579fa4ce75c8378ae87f67730a49552f73c56ee801da256f78ae0ad74bbf539662cdb9122aa02ba9a9d883f1d52468d96290515adb02b4c8d919190a090e77b73ffcd52b85babaaeeb62da000473102aca7f070facef03e5b331f4961d764373f3a4e2751954e75489fb17902aad583eedbb41dc165a3b

Bytewise Breakdown

02 // height = 2
01 // nLeafs at this height VarInt
// ----------------------
01 // offset VarInt
cd73c0c6bb645581816fa960fd2f1636062fcbf23cb57981074ab8d708a76e3b // 32 byte hash
// ----------------------
// implied end of leaves at this height
// height of next leaves is therefore 1
02 // nLeafs at this height VarInt
// ----------------------
00 // offset VarInt
3470d882cf556a4b943639eba15dc795dffdbebdc98b9a98e3637fda96e3811e // 32 byte hash
// ----------------------
01 // offset VarInt
c58e40f22b9e9fcd05a09689a9b19e6e62dbfd3335c5253d09a7a7cd755d9a3c // 32 byte hash
// ----------------------
// implied end of leaves at this height
// height of next leaves is therefore 0
04 // nLeafs at this height VarInt
// ----------------------
00 // offset VarInt
8c00bb9360e93fb822c84b2e579fa4ce75c8378ae87f67730a49552f73c56ee8 // 32 byte hash (this is the txid at index 0)
// ----------------------
01 // offset VarInt
da256f78ae0ad74bbf539662cdb9122aa02ba9a9d883f1d52468d96290515adb // 32 byte hash
// ----------------------
02 // offset VarInt
b4c8d919190a090e77b73ffcd52b85babaaeeb62da000473102aca7f070facef // 32 byte hash
// ----------------------
03 // offset VarInt
e5b331f4961d764373f3a4e2751954e75489fb17902aad583eedbb41dc165a3b // 32 byte hash (this is the txid at index 3)
// ----------------------
// implied end of data because new height would be -1

Implementation

Let's start by dumping this format as hex into a Buffer and parsing it into an object with a Buffer Reader. Then we construct an object

const { Br } = require('openspv')
const reader = new Br()
reader.buf = Buffer.from('020101cd73c0c6bb645581816fa960fd2f1636062fcbf23cb57981074ab8d708a76e3b02003470d882cf556a4b943639eba15dc795dffdbebdc98b9a98e3637fda96e3811e01c58e40f22b9e9fcd05a09689a9b19e6e62dbfd3335c5253d09a7a7cd755d9a3c04008c00bb9360e93fb822c84b2e579fa4ce75c8378ae87f67730a49552f73c56ee801da256f78ae0ad74bbf539662cdb9122aa02ba9a9d883f1d52468d96290515adb02b4c8d919190a090e77b73ffcd52b85babaaeeb62da000473102aca7f070facef03e5b331f4961d764373f3a4e2751954e75489fb17902aad583eedbb41dc165a3b', 'hex')

let maxHeight = parseInt(reader.read(1).toString('hex'), 16)
let compoundPath = Array(maxHeight + 1).fill(0).map(() => ({}))
let height = maxHeight
let x = 0
let nLeavesAtThisHeight = reader.readVarIntNum()
let startOfNextHeight = nLeavesAtThisHeight
while (height >= 0) {
  offset = reader.readVarIntNum()
  hash = reader.read(32).reverse().toString('hex')
  compoundPath[height][hash] = offset
  x++
  if (x == startOfNextHeight) {
    height--
    if (height < 0) break
    nLeavesAtThisHeight = reader.readVarIntNum()
    startOfNextHeight = nLeavesAtThisHeight + x
  }
}

console.log({ compoundPath })

JSON Encoding of a Compound Merkle Path

If we JSON encode the leaves we get the following. Height is encoded as the position of the leaf object within the outermost array.

[ // index within the outer array corresponds to the height
	{ // within each height there must be one or more hashes with their corresponding offsets { [hash]: offset }
		"e86ec5732f55490a73677fe88a37c875cea49f572e4bc822b83fe96093bb008c": 0, // txid at index 0
		"db5a519062d96824d5f183d8a9a92ba02a12b9cd629653bf4bd70aae786f25da": 1,
		"efac0f077fca2a10730400da62ebaebaba852bd5fc3fb7770e090a1919d9c8b4": 2,
		"3b5a16dc41bbed3e58ad2a9017fb8954e7541975e2a4f37343761d96f431b3e5": 3 // txid at index 3
	},
    {
		"1e81e396da7f63e3989a8bc9bdbefddf95c75da1eb3936944b6a55cf82d87034": 0,
		"3c9a5d75cda7a7093d25c53533fddb626e9eb1a98996a005cd9f9e2bf2408ec5": 1
	},
	{
		"3b6ea708d7b84a078179b53cf2cb2f0636162ffd60a96f81815564bbc6c073cd": 1
	}
]

You can derive individual paths for particular transaction indices as necessary using the following algorithm:

Reading index 3 from the Compound Merkle Path.


// let's say we want to derive the path for txid 3b5a16dc41bbed3e58ad2a9017fb8954e7541975e2a4f37343761d96f431b3e5 
// first we determine from the 0th array in the compound merkle path that the block index associated is 3.

const example = {
  index: 3,
  path: []
} 

compoundPath.map((leaves, height) => {
  const indexOffset = example.index >> height ^ 1
  for (const hash in leaves) {
    if (leaves[hash] === indexOffset) {
      example.path.push(hash)
      return true
    }
  }
  return Error(`We do not have a hash for this index at height: ${height}`)
})

Which yields:

{
  index: 3,
  path: [
    '3b6ea708d7b84a078179b53cf2cb2f0636162ffd60a96f81815564bbc6c073cd',
    '1e81e396da7f63e3989a8bc9bdbefddf95c75da1eb3936944b6a55cf82d87034',
    'efac0f077fca2a10730400da62ebaebaba852bd5fc3fb7770e090a1919d9c8b4'
  ]
}

Important to understand that we only kept the paths for indices 0 and 3 for this particular example. If you attempt to run the algo above for any other indices, an error would be thrown. This allowed us to keep 7 hashes out of 14 total. Keeping each path separately we'd have to keep minimum 6 hashes, and if we added another index then the compound method would only require one more hash, whereas saving individual paths would require another 3. The total bandwidth saving would be significant if we had thousands of transactions all in the same block.

Merkle Proof

We use this to prove inclusion in a block by running a Merkle Proof algorithm on the txid, index, and path. We arrive at a Merkle root hash. This can then be used as the key in a Block Header lookup to determine whether the txid is included within a block which is part of the longest chain of work.

// pseudocode
function merkleProof(txid, index, path) {
  try {
    const root = deriveRootFromPath(txid, index, path)
    const blockHeader = await lookupHeaderByRoot(root)
    if (blockHeader.state === 'LONGEST_CHAIN') return true
    else return false
  } catch (error) {
    console.log({ error })
    return false
  }
}

Last updated