The Byzantine Generals Problem and Why it Matters for Blockchains

January 26, 2023 - 9 min read

How does a network consistently reach consensus without being disrupted or confused by malfunctions, or betrayed by colluding nodes in their midst?

Distributed Systems and Network Faults

Reliable computer networks, particularly in the case of distributed systems such as public blockchains, must be able to maintain their performance under adverse conditions. The point of it all is to offer open access to smart contract transactions and a shared, verifiable ledger without any meaningful interruptions or failures.

After all, blockchains would not be very useful if they were not extremely antifragile in their designs, lest they would continuously go offline, require intervention or maintenance (likely centralized), or split the chain with forks. They need to keep on going, even when the conditions become unfavorable, if they come under attack, or worse, if seemingly honest nodes collude to betray the whole network.

That is, the network must be robust enough to maintain optimal performance, even if one or more of its nodes fails, goes offline indefinitely, or worse, behaves in a malicious fashion. In blockchain settings, this could take the form of a rogue node participant trying to double-spend some crypto and sneak it into the next block. It could also emerge as a node validator lying about the state of the network in attempts to fake transactions, or a front-running attack in which nodes reorder transactions in ways which allow them to extract profits unfairly.

This gets us into the realm of game theory if we are going to understand and then try to predict what types of behavior could arise between participants of distributed networks like blockchains. Game theory is a framework for conceptualizing complex interactions between anonymous, and often competing actors.

By more closely understanding the nuances of game theory, one can optimize the incentive structures and security parameters in such a way as to scale a network and safeguard the fidelity of each new block. The goal is to figure out what voting majority is needed to achieve consensus before forming a new block and processing the next batch of transactions in the mempool.

The Byzantine Generals Problem

The Byzantine Generals Problem originates from a 1982 academic paper from author Leslie Lamport. It is a sort of narrative abstraction of the technical problem dealing with partial system failures of distributed networks.

The problem Lamport introduced is a thought-experiment based around computer science and game theory, and which was narrativized by the author due to his belief that presenting his problem in this fashion would attract more attention and thus accelerate the exploration of solutions to it. Ironically, he was right; here we are over 40 years later introducing the origin story of his works.

“The fundamental question of the Byzantine Generals Problem is this: what majority threshold is needed for the network to confirm the addition of new blocks to a blockchain?”

Though blockchains have come along several decades later, the analogy of this parable nevertheless pertains to the design of consensus algorithms to this day. That is to say, it illustrates the primary fashion in which security parameters are designed to tolerate irregular network activity before things break.

The entire problem can be boiled down to this: multiple generals need to lay siege to a city, and they have it encircled. However, they can only succeed if they make their assault in perfect unison, throwing all of their might into the siege simultaneously. Conversely, they will lose if their attack isn’t well-coordinated.

The same phenomena can be found mirrored in the production and addition of new blocks to a blockchain. In order to form a block of new transactions which can be added to a blockchain with the degree of precision needed to manifest the world’s financial infrastructure requires cooperation. However, a network cannot be 100% functional all the time, and it cannot completely collapse if some of its key participants and their access to validating the next block.

There are several types of Byzantine faults to consider:
Fail to respond/return request
Return erroneous data
Return erroneous data intentionally
Collude with others to manipulate outcomes

Moreover, malicious actors are tempted by the amount of wealth being stored and transferred on DeFi protocols, and will inevitably try their best to extract wealth by pretending to be honest yet behaving in the opposite manner. This is why the consideration of game theory is helpful in setting the parameters for what percentage of the network must be functional in order to maintain its performance in perpetuity. That is, what security measures, parameter settings, or failsafe protocols allow us to safely assume that the block data being validated can be trusted and subsequently, the settlement of transactions finalized?

This assumption includes the unlikely scenario in that all remaining actors were either dysfunctional or dishonest. The fundamental question of the Byzantine Generals Problem is this: what majority threshold is needed for the network to confirm the addition of new blocks to a blockchain?

The answer to this question for many projects is around 50%, while for others it hovers around 66%. That is to say that in some cases, only a simple majority of nodes must reach a consensus that the next block is valid, while in others a supermajority of ⅔ is the parameter used. We will distinguish these more clearly below. However, to analyze and compare the consensus mechanisms of each blockchain will be left for another article for the sake of brevity.

It should nevertheless be understood that the tradeoffs between adjusting parameters include things like centralization, network liveness and availability, time to settlement finality, block size, and a number of other important properties. Tweaking the parameters in just the right way is every developer’s challenge, but we may also be in need of technological breakthroughs which have yet to occur. For now, let’s stick with what we already have to work with, and proceed accordingly.

Blockchains and Fault Tolerances

With Bitcoin, a synchronous network, miners work to achieve consensus with a 49% fault tolerance. That is, messages broadcasted by honest nodes are designed to be received by all other honest nodes within some set window of time, forming new blocks in sequence.

Recall the Byzantine generals and their need to attack in unison. Bitcoin miners are similar to the generals within Satoshi’s blockchain. Miners are responsible for validating the next block of transactions, which are analogous to the messages sent between the generals surrounding the city they wanted to attack. In theory, malicious nodes could propose an erroneous block of transactions in order to double-spend or otherwise attack the network would be like enemy spies which try to disrupt communications between the attacking Byzantine generals.

The Byzantine Generals Problem is therefore addressed on the Bitcoin network by the miners, the work they put in to prove their honesty, and their validation of the block proposed by the round’s leader. Of course, becoming the leader is achieved by guessing the next block’s header via complex mathematical computations in successive rounds, and repeating every 10 minutes.

However, this makes the size of each block incredibly important, since the entire network performs a round of consensus in synchrony. Each block can only contain so much data. What if there were more transactions submitted in the mempool of a given block than could actually fit within its size limitations? It would slow the network’s ability to perform quick transactions and provide final settlement for participants.

“Roughly speaking, it has become standard to use 33% fault tolerant consensus mechanisms for asynchronous networks. In such cases, a ⅔ majority of the validators are typically used as a consensus threshold for confirming new blocks. “

Blockchains are thus obligated to make tradeoffs between the prioritization of consistency regarding nodes’ state knowledge and the availability of the network to settle transactions quickly. If a given project wants to prioritize network uptime and availability, transactions will be validated with little downtime. On the other hand, if prioritizing state-consistency across all of its nodes, some transactions could be halted or fail to execute during periods of extreme network congestion.

Asynchronous blockchains allow for their transactions to be validated in various batches concurrently as opposed to sequentially in the case of synchronous networks like Bitcoin. That is, nodes in asynchronous networks can process transactions independently of one another and without the need for universal synchrony in order to validate transactions in lockstep. Additionally, nodes prove their honesty by simply staking digital assets within smart contracts on the network as a sort of security escrow in the event of misbehavior.

Consequently, asynchronous blockchains enjoy much more liveness and scalability potential, with faster final settlement times than traditional, synchronous blockchains. That being said, this also invites some added risk of blockchain forks, pauses, reboots, or double-spend attacks.

This is one reason why the fault tolerance parameter is adjusted to properly protect large amounts of digital assets over time. Roughly speaking, it has become standard to use 33% fault tolerant consensus mechanisms for asynchronous networks. In such cases, a ⅔ majority of the validators are typically used as a consensus threshold for confirming new blocks.

Over time, various Byzantine Fault Tolerance (BFT) algorithms have become the gold standard in proof-of-stake protocols. The asynchronous consensus algorithm essentially introduces time-outs and history checks in order to reach network consensus and ensure liveness remains operational. In other words, with asynchronous networks, scalability is one of the top priorities, given that synchronous networks become bloated and unusable after growing past a certain threshold.

Conclusion

For blockchains, solving the Byzantine Generals Problem means reaching network consensus in spite of many bad actors trying to steal from you, as well as the potential for various parts of the network to be down or malfunction. Managing this task even for a short period of time is challenging, but achieving such a feat perpetually is a much more difficult undertaking.

Success is likely to depend on the fundamental vulnerabilities present in the consensus algorithms being tested, their ability to scale as transaction volumes increase, as well as their ability to integrate with other technologies like DeFi platforms or even legacy networks.

Overall, the future of Web3 will likely be shaped by the needs of early-to-adopt industries and organizations, as well as breakthroughs in our ability to solve the Byzantine Generals Problem at a massively large scale, but without sacrificing on network availability via consensus algorithm limitations.

That is, can we solve the problem at scale and maintain the sort of speeds needed to power the future of finance; and can we keep it self-sustainable in perpetuity? To such questions, only the test of time can truly reveal the answers.

Get news, insights, and more

Privacy • Terms of Use • Website Data Usage & Cookies • Bug Disclosure • Biometric Information Privacy Policy

The Byzantine Generals Problem and Why it Matters for Blockchains

How does a network consistently reach consensus without being disrupted or confused by malfunctions, or betrayed by colluding nodes in their midst?

Distributed Systems and Network Faults

The Byzantine Generals Problem

“The fundamental question of the Byzantine Generals Problem is this: what majority threshold is needed for the network to confirm the addition of new blocks to a blockchain?”

Blockchains and Fault Tolerances

“Roughly speaking, it has become standard to use 33% fault tolerant consensus mechanisms for asynchronous networks. In such cases, a ⅔ majority of the validators are typically used as a consensus threshold for confirming new blocks. “

Conclusion

Read Next

Get news, insights, and more