Background Evaluation Extended Format (BEEF) Transactions
Deggen (deggen@kschw.com) Damian Orzepowski (damian.orzepowski@4chain.studio) Wojciech Regulski (wojciech.regulski@4chain.studio) Arkadiusz Osowski (arkadiusz.osowski@4chain.studio)
Abstract
We propose a binary format for sending Transactions between peers to allow Simple Payment Verification (SPV). The format is optimized for minimal bandwidth while maintaining data required to independently validate the transaction in full.
Assumption: Every user has an independent source of Block Headers indexed by Merkle Root.
The simplest form is two transactions, one confirmed in a block which has a corresponding Merkle path and includes an output which is being spent in the second. The second is a newly signed transaction which constitutes the actual payment.
In cases where one or more inputs are not yet mined, each ancestral transaction is included. We step back through the Transaction DAG until every input has a corresponding parent transaction with a Merkle path.
This format is aligned with Dr. Craig Wright's explanation of SPV from this article. He makes reference to a new paradigm, this format is an attempt to bring forth that paradigm.
Users can adopt this format to transmit the data required to independently verify that a transaction is valid and is spending a real utxo from an existing block. The control mechanism to ensure no previous spend of the utxo is the economics and law. The format is intended for micropayments, so the risk is low. It is also the case that there is a cost to faking the data, since the attacker would really have to have previously owned an actual utxo. Any malfeasant would be signing incriminating evidence and sending it directly to the plaintiff if they were to defraud someone using this format. Without Merkle paths the data would be easier to fake, and there is no skin in the game required since the data could be randomly generated. Considering all these factors, the validation process is detailed in a later section.
Copyright
This BRC is licensed under the Open BSV license.
Motivation
Simplified Payment Verification formats for transmitting transactions between peers has yet to see wide adoption. This proposal advocates for complete ecosystem adoption of the principles of SPV; acknowledges that the system is secured by economic incentives, and law; and lays out a binary format for transmitting the data between parties optimizing for minimal bandwidth.
Three prior formats should be mentioned:
Extended Format BRC-30 incorporates the utxo script for script evaluation and satoshis for checking amounts. This format would still work when sending to nodes as they have a utxo lookup which is indexed by a hash of the extended data, meaning that invalid EF data would be detected immediately.
Tx Ancestors which was created for use within the Lite Client Toolbox, this uses an array of rawtxs, Merkle proofs, and Mapi responses to transport the data required for SPV.
Everett-style Transaction Envelopes BRC-8 uses a recursive JSON format which is otherwise similar to the above format originally designed for use within DPP. One of the ideas which this proposal highlights is that the target of merkle proofs could be height rather than blockhash or root, to save on bytes. Thinking along these lines, we propose that no target be provided at all, and the root simply be calculated at the receieving end as needed. Using the root to look up headers also avoids any implementation complexity associated with competing blocks. For example, when a competing block encapsulates the transactions it may not have the same height, if all transactions are being stored with paths at a height rather than blockhash or merkle root (which are unique to specific versions of a block) then updating a set of the transactions with merkle paths from the new block will be difficult to sort out. If they're indexed by blockhash then it would be trivial to set them all as "in need of updated paths" without the need for disambiguation.
Mapi is to be deprecated in the near future, so any new recommended formats should not include Mapi Responses as part of the solution.
The array of rawtxs within the Tx Ancestors spec makes some sense in that there are strange cases where the same rawtx has two outputs which are spent in different transactions, both within the ancestry of the tx we are validating. You don't want to have embedded a copy of the same proof twice, hence a hash map would make more sense than just including the merkle path bytes in the tx itself. Everett-style Transaction Envelope deals with this well even given its recursive object design. Txids are used as pointers to existing transactions when complex dependency graphs occur. This proposal uses the same thinking, but for a binary format lists are used rather than recursive objects.
The ordering of data has streaming validation in mind - an idea raised in EF BRC-30 - such that a receiver can start processing the validation immediately on receipt of the first few bytes, and any later bytes rely on previous bytes for their validation to even begin.
Merkle Paths
Oldest Tx Anchored by Path
Newer Txs depending on Oldest parent
Newest Tx
As soon as the Merkle Paths are receieved we can calculate the roots and lookup their blockheaders. If they're not valid then validation can stop there - rejected. Then we look at the oldest ancestor - if it's valid then its children can be validated, and so on until we reach the most recent tx.
Specification
BEEF combines thinking from several formats into one binary stream, prefixed with a very specific version number for disambiguation.
The encoding version number 4022206465 is chosen such that when seen in hex encoded as 4 bytes little endian it reads: 0100BEEF
. This is to allow multiple versions to be defined in future while keeping the data minimal and leaving an obvious sequence which developers can eyeball. Often Bitcoin developers will see a sequence 0100000000... for rawtx. When they instead see 0100BEEF... they will know this data is BEEF format when debugging and so on. If there are future improvements then the next version would be 0200BEEF
for example, this marker will remain "BEEF" for tens of thousands of versions until we reach 4022271999 which is obviously much more than would ever be required.
Version no
Version number starts at 4022206465, encoded Uint32LE => 0100BEEF
4 bytes
nBUMPs
VarInt number of BSV Unified Merkle Paths which follow
1-9 bytes
BUMP data
All of the BUMPs required to prove inclusion of inputs in longest chain of blocks BRC-74
many bytes x nBUMPs
nTransactions
VarInt number of transactions which follow
1-9 bytes
Has BUMP
01
if so, followed by the BUMP index; 00
if not, followed by nothing.
1 byte
BUMP index
VarInt index number - indicating the BUMP to which the prior tx belongs if there is one.
1-9 bytes
Ordering
Order is important - we must ensure that we end with the tx being evaluated, and its inputs are above, and their inputs are above that. Khan's algorithm is a well known solution to the problem in Graph Theory. Running this on the transaction DAG subset is recommended for complex transaction chains where order is not clear. Example below to demonstrate with extraneous data removed for clarity.
BEEF Example
Hex
Bytewise Breakdown
Validation Process
We parse the BUMPs storing each in an array so that we can address them by index later.
We parse each transaction.
RawTx bytes double sha256 to get the txid.
Store in hashmap from txid => parsedTx
If there is a Merkle path after the tx then we lookup the BUMP using the BUMP index number.
Lookup the txid within level 0 leaves of the BUMP to get the index of the txid within a block.
Calculate the Merkle root with the index, txid, and BUMP data.
Add the merkle root to an array which will be used in a request to our local header service once all transactions have been parsed.
Mark the tx as valid or not as soon as the header service responds.
Otherwise we run local validation on the transaction, stopping at the soonest failure.
Check the txid exists in memory, and is marked as valid.
Check that all scripts evaluate to TRUE.
Check that the sum of satoshis in > satoshis out + fee
Mark the tx as valid.
We make a request to the local header service as soon as we have all merkle roots calculated.
This involves sending a list of Merkle roots to the Pulse service which will validate that the merkle roots provided are all part of headers within the longest chain.
If any of the roots are not part of the longest chain, the response is negative and the whole validation has failed.
We await the final transaction being marked as valid since it depends on all other processes.
Implementation
The format will be built into BUX to begin real world testing. Thereafter common libraries such as bsv and go-bt for easy incorporation into existing stacks.
Last updated