BSV Unified Merkle Path (BUMP) Format

Darren Kellenschwiler ([email protected]), Deggen Tone Engel ([email protected]), TonesNotes Ty Everett ([email protected]) Damian Orzepowski ([email protected]) Arkadiusz Osowski ([email protected])

Abstract

We propose the BSV Unified Merkle Path format in both binary and JSON encoding optimized for generation by transaction processors, and also happens to be convenient for proof validating clients.

At a high level the format encodes a number of txids which all exist within one particular block, along with each of their merkle paths and the blockHeight.

The blockHeight is encoded first, followed by level 0 of the Merkle tree, which includes the txids of interest, and their corresponding siblings. Thereafter we encode each level of the tree thereafter, but only include branches of the tree which are required to calculate the Merkle root the txids which are of interest to us. For example if we only have one txid of interest, we will include it and its sibling, followed by one leaf per level of the tree.

This BRC is licensed under the Open BSV license.

Visualization

BUMP Showcase can help form an understanding of how BUMP works to encode all necessary data.

Motivation

Several formats have made their own improvements to the original format which was returned by a Bitcoin node via json-rpc method getmerkleproof.

Improvements include:

  • BRC-10 a TSC creation which was subsequently returned by the node's json-rpc method getmerkleproof2

  • BRC-11 removing the need for specifying targets, replacing with height to improve validation speed.

  • BRC-58 removal of all extraneous data to minimize data size.

  • BRC-61 introduction of a compound path encoding which allows representation of multiple paths within the same block.

The purpose of defining this new specification is to capture the incremental improvements under one spec which encapsulates the pros of each, and removes the cons. This new spec should allow:

  • Inclusion of height makes lookup extremely fast while only adding maximum 9 bytes to the data size.

  • Multiple paths can be expressed in the same data model.

  • One format for everything, so that there is no need to convert from single to compound path.

  • Size optimization allowing us to skip encoding of far right leaves when duplication of working hash would suffice.

Binary Encoding

Global

The top level encoding specifies a block height and a tree height.

Field
Description
Size

block height

VarInt block height in which the transactions are encapsulated

1-9 bytes

tree height

The height of the Merkle Tree in this block, max 64

1 byte

Level

Thereafter the number of leaves at the top height is specified, and the leaves for this height follow.

Field
Description
Size

nLeaves

VarInt number of leaves at this height

1-9 bytes

leaves

Each leaf encoded in the format below.

sum of leaf sizes

Leaf

Once all leaves at this height have been specified, an implied increment of the height in the tree occurs and we specify the number of leaves in the next level up, and so on until we have specified the leaves at level (treeHeight - 1) at which point we stop. We do not need to encode the root hash as it is always calculable.

Field
Description
Size

offset

VarInt offset from left hand side within tree

1-9 bytes

flags

Flags can be 00, or 01, or 02 - detailed meaning in table below

1 byte

hash

A hash representing a txid, sibling hash, or a branch

0 or 32 bytes

Flags

The first flag is to indicate whether or not to duplicate the working hash or use the following data. The second flag indicates whether the hash is a relevant txid or just a sibling hash.

bits
byte
meaning

0000 0000

00

data follows, not a client txid

0000 0001

01

nothing follows, duplicate working hash

0000 0010

02

data follows, and is a client txid

Hex String

Bytewise Breakdown

JSON Encoding

In the JSON encoding - we start with a height of a block in which transactions from BUMP are mined. A path Array index corresponds to the height within the Merkle tree, so we start with level 0 which includes all of the txid's of interest to a client in this block and the txid's of additional transactions required to begin the merkle root computation. Within each array element we contain an array of one or more leaves which are specified as a leaf.

Within the leaf itself we have am offset - the only required parameter, along with optional hash, txid and duplicate. The hash is a hex string encoding reversed bytes of the hash at this position in the Merkle tree, the duplicate true is a boolean and represents a "no data" for this position, this is to encode for the right hand side of the merkle tree. The expected behavior is for a parser to duplicate the working hash in this case, therefore no further data is required. A txid boolean is included if true - to indicate whether the hash in question is considered a relevant txid to the receiving party, rather than just a sibling hash needed to calculate the root.

JSON Example

Calculating the Merkle Root from a BUMP

Let's start by dumping this format as hex into a Buffer in JavaScript and parsing it into an object with a Buffer Reader. Then we can calculate the merkle root from any of the included txids.

Merging

A note on compounding multiple BUMPs together. The first check should always be the blockHeight - ensure it matches. The second check is the root. Each BUMP calculates its root, and if they don't match - you cannot combine them. If they match then the process is a simple inclusion of all leaves, dropping duplicates.

Implementations

TypeScript - ts-sdk Golang - go-sdk Python - py-sdk

Last updated

Was this helpful?