Advanced Transactions - Adrian Huber

Like we learned earlier, the P2PKH script is the most common type of transactions. Now we look at advanced scripts, like multisignature scripts or Pay-to-Script-Hash.

Multisignature Scripts

These scripts set a condition, where N public keys are recorded in the script and at least M of those must provide signatures to unlock the funds (M-of-N scheme).

The standard multisignature scripts are limited to 3 listed public keys (N = 3). However, multisignature scripts wrapped in a Pay-to-Script-Hash (P2SH) script are limited to 15 keys. These information may be outdated, you can check the isStandard() function to get the current numbers.

Normally, a locking script of a multisignature transaction would look like this:

M <Public Key 1> <Public Key 2> ... <Public Key N> N CHECKMULTISIG

and the locking script would look like this:

<Signature 1>...<Signature N>

But there is a bug in CHECKMULTISIG’s execution that requires a slight workaround. When CHECKMULTISIG executes, it should consume M+N+2 items on the stack as parameters. However, due to the bug, CHECKMULTISIG will pop an extra value or one value more than expected. This extra value doesn’t affect the CHECKMULTISIG operator, but it needs to be there. Otherwise it will cause a stack error and script failure.

Because this bug became part of the consensus rules, it must now be replicated forever. Therefore, the correct unlocking script looks like this:

0 <Signature 1>...<Signature N>

This extra value can be anything, normally set to 0.

Pay-to-Script-Hash (P2SH)

Although multisignature scripts are a powerful feature, they are cumbersome to use. Let’s say there are multiple shop owners which want to use multisignature scripts for payments that customer make. Now every customer would have to understand how to create a transaction using custom scripts. Since the transaction is much larger, the fees would also be higher.

P2SH was developed to resolve these practical difficulties. With P2SH payments, the complex locking script is replaced with its digital fingerprint, a cryptographic hash:

Locking Script	2 PubKey1 PubKey2 PubKey3 PubKey4 PubKey5 5 CHECKMULTISIG
Unlocking Script	0 Sig1 Sig2

Complex script without (above) and as P2SH (below)

Redeem Script	2 PubKey1 PubKey2 PubKey3 PubKey4 PubKey5 5 CHECKMULTISIG
Locking Script	HASH160 <20-byte hash of redeem script> EQUAL
Unlocking Script	0 Sig1 Sig2 <redeem script>

The two scripts are combined in two stages. First, the redeem script is checked against the locking script to make sure the hash matches. If the redeem script hash matches, the unlocking script is executed on its own, to unlock the redeem script.

This workaround shifts the fees and complexity from the sender (who creates the transaction) to the receiver (who unlocks and spends the transaction).

Furthermore, the heavy data storage switches from the output (blockchain + UTXO) to the input (blockchain).

Another important part of the P2SH feature is the ability to encode a script hash as an address, as defined in BIP-13. P2SH addresses use the version prefix “5,” which results in Base58Check-encoded addresses that start with a “3”.

Redeem Script and Validation

As of version 0.9.2 of the Bitcoin Core client, P2SH transactions can contain any valid script. But you are not able to put a P2SH inside a P2SH redeem script, because the P2SH specification is not recursive.

Mark that the redeem script is not presented to the network until you attempt to spend a P2SH output. Therefore, an invalid redeem script will be processed, the UTXO will be successfully locked. This opens the potential risk to lock bitcoin which can’t be spend.

Data Recording Output (RETURN)

Bitcoin’s distributed and timestamped ledger, the blockchain, has potential uses far beyond payments. Many developers have tried to use the scripting language for other use cases. Some are not happy with creating unspendable outputs, because it will bloat the UTXO size.

This is where the RETURN operator comes in. RETURN allows developers to add 80 bytes of nonpayment data to a transaction output. It creates an explicitly provably unspendable output, which doesn’t need to be stored in the UTXO set. Instead, the data is stored in the blockchain so it doesn’t bloat the UTXO memory pool and burden full nodes with the cost of more expensive RAM.

RETURN <data>

Notes:

limited to 80 bytes, it’s often represented by a hash
many applications put a prefix in front of the data to help identify the application, like Proof of Existence
a standard transaction can have only one RETURN output, but a single RETURN output can be combined in a transaction with outputs of any other type

With bitcoin core 0.10, there are also two new command-line options:

datacarrier controls relay and mining of RETURN transactions, with the default set to “1” to allow them
datacarriersize takes a numeric argument specifying the maximum size in bytes of the RETURN script, 83 bytes by default, which, allows for a maximum of 80 bytes of RETURN data plus one byte of RETURN opcode and two bytes of PUSHDATA opcode.

Timelocks

Timelocks are restrictions on transactions or outputs that only allow spending after a point in time. It’s implemented by the nLocktime field in a transaction. Timelocks are useful for postdating transactions and locking funds to a date in the future. More importantly, timelocks extend bitcoin scripting into the dimension of time, opening the door for complex multistep smart contracts.

nLocktime is almost always set to zero. But if it’s nonzero and below 500 million, it’s interpreted as a block height, meaning the transaction is not valid and is not relayed or included in the blockchain prior to the specified block height. If nLocktime is greater than or equal to 500 million, it’s interpreted as a Unix Epoch timestamp (seconds since Jan-1-1970) .

However, nLocktime can’t stop the sender from spending the same inputs before the time is reached.

To achieve such a guarantee, the timelock restriction must be placed on the UTXO itself and be part of the locking script, rather than on the transaction. Two new timelock features were introduced in late 2015 and mid-2016 that offer UTXO-level timelocks:

CHECKLOCKTIMEVERIFY
CHECKSEQUENCEVERIFY

Check Lock Time Verify (CLTV)

Based on a specification in BIP-65, a new script operator called CHECKLOCKTIMEVERIFY (CLTV) was added to the scripting language. While nLocktime is a transaction-level timelock, CLTV is an output-based timelock. CLTV doesn’t replace nLocktime, but rather restricts specific UTXO such that they can only be spent in a future transaction with nLocktime set to a greater or equal value.

A P2SH transaction with a redeem script would look like this:

<now + 3 months> CHECKLOCKTIMEVERIFY DROP DUP HASH160 <Public Key Hash> EQUALVERIFY CHECKSIG

When the receiver tries to spend this UTXO, he constructs a transaction that references the UTXO as an input. He uses his signature and public key in the unlocking script of that input and sets the transaction nLocktime to be equal to or greater than the timelock in the CHECKLOCKTIMEVERIFY the sender set. The receiver then broadcasts the transaction on the bitcoin network.

CHECKLOCKTIMEVERIFY fails and halts execution, marking the transaction invalid if:

the stack is empty
the top item on the stack is less than 0
the timelock type (height versus timestamp) of the top stack item and the nLocktime field are not the same
the top stack item is greater than the transaction’s nLocktime field
the nSequence field of the input is 0xffffffff

If the CLTV is satisfied, the time parameter may need to be dropped. This is done by the DROP operator.

Relative Timelocks

Until now, we only discussed absolute timelocks. This functionality is especially useful in bidirectional state channels and Lightning Networks.
They are also implemented with both, a transaction-level feature and a script-level opcode. The transaction-level relative timelock is implemented as a consensus rule on the value of nSequence, a transaction field that is set in every transaction input (BIP-68). Script-level relative timelocks are implemented with the CHECKSEQUENCEVERIFY (CSV) opcode (BIP-112).

Relative Timelocks with nSequence

Relative timelocks can be set on each input of a transaction, by setting the nSequence field in each input.

Originally, nSequence was intended to allow transactions to be modified in the mempool (Temporary storage for transactions that have been received by a node). In that use, a transaction containing inputs with nSequence value below 2³² – 1 (0xFFFFFFFF) indicated a transaction that was not yet “finalized.” Such a transaction would be held in the mempool until it was replaced by another transaction spending the same inputs with a higher nSequence value. Once a transaction was received whose inputs had an nSequence value of 0xFFFFFFFF it would be considered “finalized” and mined.

However, this purpose was not never properly implemented and was customarily set to 0xFFFFFFFF.

Since BIP-68, new consensus rules apply for any transaction containing an input whose nSequence value is less than 2³¹ (bit 1<<31 is not set). It signalizes a relative timelock, transaction is only valid once the input has aged by the timelock amount.

The nSequence value is specified in either blocks or seconds, but in a slightly different format than is used in nLocktime. It uses a type-flag which is set in the 23rd least-significant bit (i.e., value 1<<22). If this type-flag is set, the nSequence value is interpreted as a multiple of 512 seconds.

Relative Timelocks with CSV

With the opcode CHECKSEQUENCEVERIFY (CSV), nSequence values are added in scripts.

As with CLTV, the value in CSV must match the format in the corresponding nSequence value. A transaction is only valid if the input nSequence value is greater than or equal to the CSV parameter.

Relative timelocks with CSV are especially useful when several (chained) transactions are created and signed, but not propagated, when they’re kept “off-chain”.

Median-Time-Past

Bitcoin is a decentralized network, which means that each participant has his or her own perspective of time. There is a difference between wall time and consensus time. Bitcoin reaches consensus every 10 minutes about the state of the ledger as it existed in the past.

Miners set the timestamps in the block headers. There is a certain degree of latitude allowed by the consensus rules to account for differences in clock accuracy between decentralized nodes. This allows miners to lie about the time in a block in order to collect the extra fees.

BIP-113 solves this issue by introducing a new way to calculate time, called Median-Time-Past. It is calculated by taking the timestamps of the last 11 blocks and finding the median. That median time then becomes consensus time and is used for all timelock calculations.

Median-Time-Past changes the implementation of time calculations for nLocktime, CLTV, nSequence, and CSV. The consensus time calculated by Median-Time-Past is always approximately one hour behind wall clock time.

Timelock Defence Against Fee-Sniping

Fee-sniping is a theoretical attack scenario, where miners attempting to rewrite past blocks “snipe” higher-fee transactions from future blocks to maximize their profitability.

The don’t mine the block again, they just include transactions from the current mempool with higher fees. Because block rewards are currently higher than transaction fees, it’s not very lucrative. But this might be a problem in the future.

To prevent “fee sniping,” when Bitcoin Core creates transactions, it uses nLocktime to limit them to the “next block,” by default. Under normal circumstances, this nLocktime has no effect. The nSequence on all the inputs is set to 0xFFFFFFFE to enable nLocktime.

Conditional Clauses

Conditional clauses in Bitcoin, also known as flow control, looks a bit different than in other programming languages (IF, THEN, …).

At a basic level, they allow us to construct a redeem script that has two ways of being unlocked, depending on a TRUE/FALSE outcome of evaluating a logical condition. For example, if x is TRUE, the redeem script is A and the ELSE redeem script is B.

Bitcoin implements flow control using the IF, ELSE, ENDIF, and NOTIF opcodes. Additionally, conditional expressions can contain boolean operators such as BOOLAND, BOOLOR, and NOT. There isn’t a limit for “nested” conditional expressions.

Notice that Script is a stack language, the flow control script is structured backwards:

Normally:

if (condition):
  code to run when condition is true
else:
  code to run when condition is false
code to run in either case

in Script:

condition
IF
  code to run when condition is true
ELSE
  code to run when condition is false
ENDIF
code to run in either cases

Another form of conditional in Bitcoin Script is any opcode that ends in VERIFY. The VERIFY suffix means that if the condition evaluated is not TRUE, the execution terminates immediately and the transaction is deemed invalid. It acts as a guard clause which only continues if a precondition is met.

Sidenote: An opcode such as EQUAL will push the result (TRUE/FALSE) onto the stack, leaving it there for evaluation by subsequent opcodes. In contrast EQUALVERIFY doesn’t leave anything on the stack.

Example 1

A very common use for flow control in Bitcoin Script is to construct a redeem script that offers multiple execution paths, each a different way of redeeming the UTXO.

In our scenario, we have two signers where both are able to redeem the UTXO. Normally, this would be done as a 1-of-2 multisig script, but we will use an IF clause.

The redeem script would look like this:

IF
 <Pubkey Person 1> CHECKSIG
ELSE
 <Pubkey Person 2> CHECKSIG
ENDIF

As you notice, the condition is missing. It’s included in the unlocking script:

<Sig Person 1> 1

The 1 at the end serves as the condition (TRUE) that will make the IF clause execute the first redemption path, which requires the signature of Person 1. Person 2 needs to put a 0 (FASLE) before the signature to unlock the script.

Example 2

Situation: Person A, Person B, Person C own a business. In order to spent business capital, 2 out of 3 must signatures are required. If one of the people has a problem with his keys, after 30 days, their lawyer should also be able to sign transactions. In case all business owners can’t sign transactions, the lawyer should be able to spent UTXOs by himself after 90 days.

The redeem script would look like this:

01  IF
02    IF
03      2
04    ELSE
05      <30 days> CHECKSEQUENCEVERIFY DROP
06      <Pubkey Lawyer> CHECKSIGVERIFY
07      1
08    ENDIF
09    <Pubkey Person A> <Pubkey Person B> <Pubkey Person C> 3 CHECKMULTISIG
10  ELSE
11    <90 days> CHECKSEQUENCEVERIFY DROP
12    <Pubkey Lawyer> CHECKSIG
13  ENDIF

If two business owners want to spend something, the first path is executed:

0 <Sig Person A> <Sig Person B> TRUE TRUE

After 30 days, the second path can be executed:

0 <Sig Lawyer> <Sig Person A> FALSE TRUE

Finally, there is the third path, which can only be executed after 90 days:

<Sig Lawyer> FALSE

Segregated Witness

Segregated Witness (segwit) is an upgrade to the bitcoin consensus rules and network protocol, as a BIP-9 soft-fork.

In cryptography, the term “witness” is used to describe a solution to a cryptographic puzzle. Bitcoin refers to it as an unlocking script for an UTXO.

Before segwit’s introduction, the witness data was embedded in the transaction as part of the input. Segregated Witness moves the witness data into a separate witness data structure that accompanies a transaction.

Segregated Witness is defined by the following BIPs:

BIP-141: The main definition of Segregated Witness.

BIP-143: Transaction Signature Verification for Version 0 Witness Program

BIP-144 :Peer Services—New network messages and serialization formats

BIP-145: getblocktemplate Updates for Segregated Witness (for mining)

BIP-173: Base32 address format for native v0-16 witness outputs

Segregated Witness has a lot of benefits:

Transaction Malleability: transaction hashes become immutable by anyone other than the creator of the transaction, because the hash doesn’t include the witness data anymore
Script Versioning: With the introduction of Segregated Witness scripts, every locking script is preceded by a script version number. This allows the scripting language to be upgraded in a backward-compatible way (i.e., using soft fork upgrades) to introduce new script operands, syntax, or semantics.
Network and Storage Scaling: By moving the witness data outside the transaction data, Segregated Witness improves bitcoin’s scalability. Nodes can prune the witness data after validating the signatures, or ignore it altogether when doing simplified payment verification. The witness data doesn’t need to be stored or transmitted to all nodes.
Offline Signing Improvement: Previously, an offline signing device (e.g. hardware wallet) would have to verify the amount of each input before signing a transaction. This was usually accomplished by streaming a large amount of data about the previous transactions referenced as inputs. Since the amount is now part of the commitment hash that is signed, an offline device does not need the previous transactions.

How Segregated Witness Works

Segregated Witness is rather a change to how individual UTXO are spent and therefore is a per-output feature. A Transaction can spend both, segregated or inline-witness outputs at once.

A Pay-to-Witness-Public-Key-Hash (P2WPKH) output script would look like this:

0 ab68025513c3dbd2f7b92a94e0581f5d50f654e7

It contains two parts, the “0” is interpreted as a version number (the witness version). The second 20-bytes is simply the hash of the public key, known as the witness programm.

Here is a decoded transaction which shows the difference in the output being spent:

with the signature:

[...]
“Vin” : [
"txid": "0627052b6f28912f2703066a912ea577f2ce4da4caa5a5fbd8a57286c345c2f2",
"vout": 0,
     	 "scriptSig": “<Person 1 scriptSig>”,
]
[...]

with the witness data:

[...]
“Vin” : [
"txid": "0627052b6f28912f2703066a912ea577f2ce4da4caa5a5fbd8a57286c345c2f2",
"vout": 0,
     	 "scriptSig": “”,
]
[...]
“witness”: “< Person 1 witness data>”
[...]

Notice how the right side has an empty scriptSig, but includes a Signature in the witness data (which is separated from the transaction data).

Sidenote: P2WSH should only be created as a payment request, because the sender can’t know if the receiving wallet has the ability to construct segwit transactions and spend P2WPKH outputs. Additionally, P2WPKH outputs must be constructed from the hash of a compressed public key.

Pay-Witness-Script-Hash (P2WSH)

The second type of witness program corresponds to a P2SH script.

Here is an example of a redeem script that defines a 2-of-5 multisignature

P2SH:

HASH160 54c557e07dde5bb6cb791c7a540e0a4796f5e97e EQUAL

P2WSH:

0 a9b7b38d972cabc7961dbfbcb841ad4508d133c47ba87457b4a0e8aae86dbb89

You see that the hash of the redeem script with the operators is replaced by a witness version and a 32-bit SHA hash of the redeem script.

The wallet will put an empty scriptSig in the transaction:

P2SH

[...]
“Vin” : [
"txid": "abcdef12345...",
"vout": 0,
     	 "scriptSig": “<SigA> <SigB> <2 PubA PubB PubC PubD PubE 5 CHECKMULTISIG>”,
]

P2WSH

[...]
“Vin” : [
"txid": "abcdef12345...",
"vout": 0,
     	 "scriptSig": “”,
]
[...]
“witness”: “<SigA> <SigB> <2 PubA PubB PubC PubD PubE 5 CHECKMULTISIG>”
[...]

P2SHW is much simpler, because the redeem script and the signatures are outside of the transaction data.

Sidenote: While P2SH uses the 20-byte RIPEMD160(SHA256(script)) hash, the P2WSH witness program uses a 32-byte SHA256(script) hash. It is more secure and helps to differentiate between the two types of witness programs (P2WPKH and P2WSH).

Upgrading to Segregated Witness

Segregated Witness is implemented as a backward-compatible upgrade, where old and new clients can coexist. The P2WPKH and P2WSH payment types are used when both sender and recipient are segwit-aware.

That leaves two scenarios:

Ability of a sender’s wallet that is not segwit-aware to make a payment to a recipient’s wallet that can process segwit transactions
Ability of a sender’s wallet that is segwit-aware to recognize and distinguish between recipients that are segwit-aware and ones that are not, by their addresses.

Embedding Segregated Witness inside P2SH

The receiver can construct a P2SH address that contains a segwit script inside it. This payment can be spend with a segwit transaction, but the sender sees it as a “normal” P2SH address. Both forms of witness scripts can be included in a P2SH address, noted as P2SH(P2WPKH) and P2SH(P2WSH).

P2SH(P2WPKH): The receiver’s wallet constructs a P2WPKH witness program with his public key. This witness program is then hashed and the resulting hash is encoded as a P2SH script. The P2SH script is converted to a bitcoin address, one that starts with a “3,” as we saw in the Pay-to-Script-Hash (P2SH) section.

witness version + 20-byte public key hash —> witness program

witness program —> SHA256 —> RIPEMD160 —> another 20-byte hash

20-byte hash —> btc address

P2SH(P2WSH): The receiver’s wallet hashes the redeem script with SHA256. Next, the hashed redeem script is turned into a P2WSH witness program. Then, the witness program itself is hashed with SHA256 and RIPEMD160, producing a new 20-byte hash, as used in traditional P2SH. Next, the wallet constructs a P2SH bitcoin address from this hash.

redeem script —> SHA256 —> + witness version —> witness program

witness program —> SHA256 —> RIPEMD160 —> 20-byte hash

20-byte hash —> btc address

The sender’s wallet would lock the output to the following script:

HASH160 ’20-byte hash’ EQUAL

However, once wallets are broadly supporting segwit, it makes sense to encode witness scripts directly in a native address format designed for segwit, rather than embed it in P2SH.

The native segwit address format is designed in BIP-173.

BIP-173

(Base32 address format for native v0-16 witness outputs)

BIP-173 is a checksummed Base32 encoding, as compared to the Base58 encoding of a “traditional” bitcoin address. They are also called bench32 addresses.

Because of the lower alphanumeric character set, bech32 is easier to read, speak, and 45% more efficient to encode in QR codes.

The BCH error detection algorithm is a vast improvement over the previous checksum algorithm (from Base58Check), allowing not only detection but also correction of errors.

Here are some examples:

Mainnet P2WPKH	bc1qw508d6qejxtdg4y5r3zarvary0c5xw7kv8f3t4
Testnet P2WPKH	tb1qw508d6qejxtdg4y5r3zarvary0c5xw7kxpjzsx
Mainnet P2WSH	bc1qrp33g0q5c5txsp9arysrx4k6zdkfs4nce4xj0gdcccefvpysxf3qccfmv3
Testnet P2WSH	tb1qrp33g0q5c5txsp9arysrx4k6zdkfs4nce4xj0gdcccefvpysxf3q0sl5k7

These addresses are up to 90 characters long and consists of three parts:

human readable part, identifying “bc” (mainnet) or “tb” (testnet)
the separator, the digit “1” is a separator and not part of the 32-character encoding set
the data part, a minimum of 6 alphanumeric characters, the checksum encoded witness script

Transaction Identifiers

One of the greatest benefits of Segregated Witness is that it eliminates third-party transaction malleability.

Before segwit, transactions could have their signatures subtly modified by third parties, changing their transaction ID (hash) without changing any fundamental properties (inputs, outputs, amounts). This created opportunities for denial-of-service attacks as well as attacks against poorly written wallet software that assumed unconfirmed transaction hashes were immutable.

With the introduction of Segregated Witness, transactions have two identifiers, txid and wtxid. The traditional transaction ID txid is the double-SHA256 hash of the serialized transaction, without the witness data. A transaction wtxid is the double-SHA256 hash of the new serialization format of the transaction with witness data.

The traditional txid is calculated in exactly the same way as with a nonsegwit transaction. However, since a pure segwit transaction (a transaction that only contains segwit inputs) has empty scriptSigs in every input, there is no part of the transaction that can be modified by a third party.

The wtxid is like an “extended” ID, in that the hash also incorporates the witness data. If a transaction is transmitted without witness data, then the wtxid and txid are identical. Note that since the wtxid includes witness data (signatures) and since witness data may be malleable, the wtxid should be considered malleable until the transaction is confirmed. Only the txid of a pure segwit transaction can be considered immutable by third parties.

Segregated Witness’ New Signing Algorithm

Like we learned earlier, signatures are applied on a commitment hash which includes either certain parts or the hole transaction.

Unfortunately, the way the commitment hash was calculated introduced the possibility that a node verifying the signature can be forced to perform a significant number of hash computations. An attacker could therefore create a transaction with a very large number of signature operations, causing the entire bitcoin network to have to perform hundreds or thousands of hash operations to verify the transaction.

Segwit represented an opportunity to address this problem by changing the way the commitment hash is calculated. For segwit version 0 witness programs, signature verification occurs using an improved commitment hash algorithm as specified in BIP-143.

This new algorithm achieves two goals:

slows down the increase of hash operations to the number of signature operations
the commitment hash now also includes the value (amounts) of each input as part of the commitment, the wallet doesn’t need to fetch the data anymore

Economic Incentives for Segregated Witness

As the volume of bitcoin transactions increases, so does the cost of resources (CPU, network bandwidth, disk space, memory). Miners are compensated through transaction fees, nonmining full nodes not.

Fees are calculated based on the transaction size. But from the perspective of full nodes and miners, some parts of a transaction carry much higher costs:

Disk Space: Every transaction is stored in the blockchain, adding to the total size of the blockchain. The blockchain is stored on disk, but the storage can be optimized by “pruning” (deleting) older transactions.
CPU: Every transaction must be validated, which requires CPU time.
Bandwidth: Every transaction is transmitted across the network at least once. Without any optimization in the block propagation protocol, transactions are transmitted again as part of a block, doubling the impact on network capacity.
Memory: Nodes that validate transactions keep the UTXO index or the entire UTXO set in memory to speed up validation. Because memory is at least one order of magnitude more expensive than disk, growth of the UTXO set contributes disproportionately to the cost of running a node.

The most expensive part of a transaction are the newly created outputs, as they are added to the in-memory UTXO set. By comparison, signatures (aka witness data) add the least burden to the network and the cost of running a node, because witness data are only validated once and then never used again.