The Bitcoin Network - Adrian Huber

Bitcoin is structured as a peer-to-peer (P2P) network architecture on top of the internet. P2P means that all computers within the network are all equal, there is no server or a centralized service.

The term “bitcoin network” refers to the collection of nodes running the bitcoin P2P protocol. There are other protocols such as Stratum that are used additionally for mining and lightweight or mobile wallets. These protocols are provided by gateway routing servers that access the bitcoin network using the bitcoin P2P protocol. They extend the bitcoin network to nodes running other protocols.

“Extended bitcoin network” refers to the overall network that includes any related protocols connecting the components of the bitcoin system.

Node Types and Roles

Although nodes in the bitcoin P2P network are equal, they may take on different roles depending on the functionality they are supporting. A bitcoin node is a collection of functions:

Network (routing node)
the blockchain database
mining
wallet services

Full nodes contain the complete blockchain and can autonomously verify any transaction without external reference.

Lightweight nodes (SPV nodes) maintain only a subset of the blockchain and verify transactions using a method called simplified payment verification (SPV).

But there are many different types of nodes on the extended bitcoin network:

The extended bitcoin network showing various node types, gateways, and protocols:

Bitcoin Relay Networks

While the bitcoin P2P network serves the general needs of a broad variety of node types, it exhibits too high network latency for bitcoin mining nodes.

A Bitcoin Relay Network is a network that attempts to minimize the latency in the transmission of blocks between miners. The network consisted of several specialized nodes hosted on the Amazon Web Services infrastructure around the world. It served to connect the majority of miners and mining pools.

The original Bitcoin Relay Network was replaced in 2016 with the introduction of the Fast Internet Bitcoin Relay Engine(FIBRE). FIBRE is a UDP-based relay network that relays blocks within a network of nodes. FIBRE implements compact block optimization to further reduce the amount of data transmitted and the network latency.

Relay networks are not replacements for bitcoin’s P2P network. Instead they are overlay networks that provide additional connectivity between nodes with specialized needs (Like small roads ↔ freeways).

Network Discovery

When a new node boots up, it must discover and connect to at least one other bitcoin node on the network in order to participate.

To connect to a known peer, nodes establish a TCP connection, usually to port 8333 (the port generally known as the one used by bitcoin), or an alternative port if one is provided. Upon establishing a connection, the node will start a “handshake” by transmitting a version message, which contains basic identifying information, including:

nVersion: The bitcoin P2P protocol version the client “speaks” (e.g., 70002)
nLocalServices: A list of local services supported by the node, currently just NODE_NETWORK
nTime: The current time
addrYou: The IP address of the remote node as seen from this node
addrMe: The IP address of the local node, as discovered by the local node
subver: A sub-version showing the type of software running on this node (e.g., /Satoshi:0.9.2.1/)
BestHeight: The block height of this node’s blockchain

The version message is always the first message sent by any peer to another peer. The local peer receiving a version message will examine the remote peer’s reported nVersion and decide if the remote peer is compatible. If the remote peer is compatible, the local peer will acknowledge the version message and establish a connection by sending a verack message.

Once one or more connections are established, the new node will send an addr message containing its own IP address to its neighbors. The neighbors will, in turn, forward the addr message to their neighbors, ensuring that the newly connected node becomes well known and better connected. Additionally, the newly connected node can send getaddr to the neighbors, asking them to return a list of IP addresses of other peers.

It’s also unnecessary and wasteful of network resources to connect to more than a handful of nodes. After bootstrapping, a node will remember its most recent successful peer connections.

But how does a new node find peers?

The first method is to query DNS using a number of DNS seeds, which are DNS servers that provide a list of IP addresses of bitcoin nodes. Some of those seeds provide a static list of IP addresses from stable bitcoin listening nodes. Some of the DNS seeds are custom implementations of BIND (Berkeley Internet Name Daemon). They return a random subset from a list of bitcoin node addresses collected by a crawler or a long-running bitcoin node. The Bitcoin Core client contains the names of nine different DNS seeds. The diversity of ownership and diversity of implementation of the different DNS seeds offers a high level of reliability for the initial bootstrapping process.

Alternatively, a bootstrapping node that knows nothing of the network must be given the IP address of at least one bitcoin node, after which it can establish connections through further introductions. The command-line argument -seednode can be used to connect to one node just for introductions using it as a seed. After the initial seed node is used to form introductions, the client will disconnect from it and use the newly discovered peers.

Full Nodes

The full blockchain node relies on the network to receive updates about new blocks of transactions, which it verifies and incorporates into its local copy of the blockchain. It provides independent verification of all transactions without the need to rely on, or trust, any other systems.

There are a few alternative implementations of full blockchain bitcoin clients, built using different programming languages and software architectures. However, the most common implementation is the reference client Bitcoin Core, also known as the Satoshi client.

If a brand-new bitcoin full node connects to peers, it tries to construct a complete blockchain. The peer’s version message contains the value “BestHeight” which gives the current blockchain height. Peered nodes will exchange a “getblocks” message that contains the hash (fingerprint) of the top block on their local blockchain. If a node receives a hash of a lower block, it will transmit their hashes using an inv (inventory) message. The node missing these blocks will then retrieve them, by issuing a series of getdata messages requesting the full block data and identifying the requested blocks using the hashes from the inv message.

InventorySynchronization — Node synchronizing the blockchain by retrieving blocks from a peer

Simplified Payment Verification (SPV) Nodes

SPV nodes download only the block headers and do not download the transactions included in each block. They cannot construct a full picture of all the UTXOs that are available for spending, because these nodes don’t know about all the transactions on the network.

Transactions are verified by reference to their depth in the blockchain, instead of their height. A full blockchain node will construct a fully verified chain of thousands of blocks and transactions reaching down the blockchain (back in time). Whereas an SPV node will verify the chain of all blocks and link that chain to the transaction of interest.

For comparison, a full node is like a tourist in a city, equipped with a map of every street and address. By comparison, an SPV node is like a tourist in a city asking random strangers for turn-by-turn directions while knowing only one main avenue.

The SPV node establishes the existence of a transaction in a block by requesting a merkle path proof and by validating the Proof-of-Work in the chain of blocks. However, the existence of a transaction can be “hidden” from an SPV node. An SPV node can definitely prove that a transaction exists but cannot verify that a transaction

To defend against a double-spending attack, an SPV node needs to connect randomly to several nodes. This increase the probability that it’s in contact with at least one honest node.

For most practical purposes, well-connected SPV nodes are secure enough. For infallible security, however, nothing beats running a full blockchain node.

SPV nodes use a getheaders message instead of getblocks to get the block headers.

Because SPV nodes need to retrieve specific transactions in order to selectively verify them, they also create a privacy risk. Unlike full blockchain nodes, which collect all transactions within each block, the SPV node’s requests for specific data can inadvertently reveal the addresses in their wallet.

Bitcoin developers added a feature called bloom filters to solve this issue.

Bloom Filters

Bloom filters allow SPV nodes to receive a subset of the transactions without revealing precisely which addresses they are interested in. This is done through a filtering mechanism that uses probabilities rather than fixed patterns.

Taking our previous analogy, if a tourist asks strangers for directions to a certain street, he inadvertently reveals his destination. A bloom filter is like asking: “Are there any streets in this neighbourhood whose name ends in R-C-H?”. The precision search can be varied. A more specific bloom filter will produce accurate results, but lacks in privacy.

How Bloom Filters Work

Bloom filters are implemented as a variable-size array of N binary digits (a bit field) and a variable number of M hash functions. The hash functions are designed to produce an output that is between 1 and N, corresponding to the array of binary digits.The hash functions are generated deterministically, so that any node will always use the same hash functions and get the same results for a specific input.

The number of N and M defines the level of accuracy and therefore privacy.

The input pattern can be either be a PubKey Hash, P2SH or a transaction hash. Applying the first hash function to the input results in a number between 1 and N. The corresponding bit in the array (indexed from 0 to N) is found and set to 1, thereby recording the output of the hash function.

Then, the next hash function is used to set another bit and so on. Once all M hash functions have been applied, the search pattern will be “recorded” in the bloom filter as M bits that have been changed from 0 to 1.

Multiple patterns are included in the same bloom filter. In this way, the peers doesn’t know which pattern the SPV node is searching. Furthermore, some patterns have the same outputs, but the inputs are in a different order:

Pattern X : 12, 3, 8

Pattern Z: 3, 12, 8

The peer can’t tell the difference. It checks every address, transaction or script that matches the bloom filter. After that, it will sent back information about these patterns. The SPV node will delete all the unnecessary patterns.

The peer checks every possible pattern. In our example, the pattern Y doesn’t match the bloom filter, the peer won’t send information about this. Note that the bloom filter contains more than one pattern, because there a 4 digits changed to “1”.

BIP-37 (Peer Services) defines the network protocol and bloom filter mechanism for SPV nodes.

Encrypted and Authenticated Connections

The original implementation of bitcoin communicates entirely in the clear. While this is not a major privacy concern for full nodes, it is a big problem for SPV nodes.

To increase the privacy and security of the bitcoin P2P network, there are two solutions that provide encryption of the communications:

Tor Transport
P2P Authentication and Encryption with BIP-150/151.

Tor Transport

Tor (The Onion Routing network), is a software project and network. It offers anonymity, untraceability and privacy through randomized network paths.

Bitcoin Core offers several configuration options that allow you to run a bitcoin node with its traffic transported over the Tor network. In addition, Bitcoin Core can also offer a Tor hidden service allowing other Tor nodes to connect directly over Tor.

Peer-to-Peer Authentication and Encryption

BIP-151 and BIP-150 defines optional services for P2P authentication and encryption.

BIP-151: enables negotiated encryption for all communications between two nodes that support BIP-151.

BIP-150: offers optional peer authentication that allows nodes to authenticate each other’s identity using ECDSA and private keys. (requires BIP-150)

They allow users to run SPV clients that connect to a trusted full node, using encryption and authentication to protect the privacy of the SPV client. Additionally, authentication can be used to create networks of trusted bitcoin nodes and prevent Man-in-the-Middle attacks.

Transaction Pools

Almost every node on maintains a temporary list of unconfirmed transactions called the memory pool, mempool, or transaction pool.

Some node implementations also maintain a separate pool of orphaned transactions. A transaction’s input can refer to a transaction that is not yet known (like a missing parent). Until the parent transaction arrives, it will be stored temporarily in the orphan pool.

When a transaction is added to the transaction pool, the orphan pool is checked for any orphans that reference this transaction’s outputs (its children).

Like we learned earlier, most nodes also maintain an UTXO database or pool, which is the set of all unspent outputs on the blockchain.