-
On Orchestrating Parallel Broadcasts for Distributed Ledgers
Authors:
Peiyao Sheng,
Chenyuan Wu,
Dahlia Malkhi,
Michael K. Reiter,
Chrysoula Stathakopoulou,
Michael Wei,
Maofan Yin
Abstract:
This paper introduces and develops the concept of ``ticketing'', through which atomic broadcasts are orchestrated by nodes in a distributed system. The paper studies different ticketing regimes that allow parallelism, yet prevent slow nodes from hampering overall progress. It introduces a hybrid scheme which combines managed and unmanaged ticketing regimes, striking a balance between adaptivity an…
▽ More
This paper introduces and develops the concept of ``ticketing'', through which atomic broadcasts are orchestrated by nodes in a distributed system. The paper studies different ticketing regimes that allow parallelism, yet prevent slow nodes from hampering overall progress. It introduces a hybrid scheme which combines managed and unmanaged ticketing regimes, striking a balance between adaptivity and resilience. The performance evaluation demonstrates how managed and unmanaged ticketing regimes benefit throughput in systems with heterogeneous resources both in static and dynamic scenarios, with the managed ticketing regime performing better among the two as it adapts better. Finally, it demonstrates how using the hybrid ticketing regime performance can enjoy both the adaptivity of the managed regime and the liveness guarantees of the unmanaged regime.
△ Less
Submitted 17 May, 2024;
originally announced July 2024.
-
Lumiere: Making Optimal BFT for Partial Synchrony Practical
Authors:
Andrew Lewis-Pye,
Dahlia Malkhi,
Oded Naor,
Kartik Nayak
Abstract:
The view synchronization problem lies at the heart of many Byzantine Fault Tolerant (BFT) State Machine Replication (SMR) protocols in the partial synchrony model, since these protocols are usually based on views. Liveness is guaranteed if honest processors spend a sufficiently long time in the same view during periods of synchrony, and if the leader of the view is honest. Ensuring that these cond…
▽ More
The view synchronization problem lies at the heart of many Byzantine Fault Tolerant (BFT) State Machine Replication (SMR) protocols in the partial synchrony model, since these protocols are usually based on views. Liveness is guaranteed if honest processors spend a sufficiently long time in the same view during periods of synchrony, and if the leader of the view is honest. Ensuring that these conditions occur, known as Byzantine View Synchronization (BVS), has turned out to be the performance bottleneck of many BFT SMR protocols.
A recent line of work has shown that, by using an appropriate view synchronization protocol, BFT SMR protocols can achieve $O(n^2)$ communication complexity in the worst case after GST, thereby finally matching the lower bound established by Dolev and Reischuk in 1985. However, these protocols suffer from two major issues:
(1) When implemented so as to be optimistically responsive, even a single Byzantine processor may infinitely often cause $Ω(nΔ)$ latency between consecutive consensus decisions.
(2) Even in the absence of Byzantine action, infinitely many views require honest processors to send $Ω(n^2)$ messages.
Here, we present Lumiere, an optimistically responsive BVS protocol which maintains optimal worst-case communication complexity while simultaneously addressing the two issues above: for the first time, Lumiere enables BFT consensus solutions in the partial synchrony setting that have $O(n^2)$ worst-case communication complexity, and that eventually always (i.e., except for a small constant number of "warmup" decisions) have communication complexity and latency which is linear in the number of actual faults in the execution.
△ Less
Submitted 16 March, 2024; v1 submitted 14 November, 2023;
originally announced November 2023.
-
BBCA-CHAIN: Low Latency, High Throughput BFT Consensus on a DAG
Authors:
Dahlia Malkhi,
Chrysoula Stathakopoulou,
Maofan Yin
Abstract:
This paper presents a partially synchronous BFT consensus protocol powered by BBCA, a lightly modified Byzantine Consistent Broadcast (BCB) primitive. BBCA provides a Complete-Adopt semantic through an added probing interface to allow either aborting the broadcast by correct nodes or exclusively, adopting the message consistently in case of a potential delivery. It does not introduce any extra typ…
▽ More
This paper presents a partially synchronous BFT consensus protocol powered by BBCA, a lightly modified Byzantine Consistent Broadcast (BCB) primitive. BBCA provides a Complete-Adopt semantic through an added probing interface to allow either aborting the broadcast by correct nodes or exclusively, adopting the message consistently in case of a potential delivery. It does not introduce any extra types of messages or additional communication costs to BCB.
BBCA is harnessed into BBCA-CHAIN to make direct commits on a chained backbone of a causally ordered graph of blocks, without any additional voting blocks or artificial layering. With the help of Complete-Adopt, the additional knowledge gained from the underlying BCB completely removes the voting latency in popular DAG-based protocols. At the same time, causal ordering allows nodes to propose blocks in parallel and achieve high throughput. BBCA-CHAIN thus closes up the gap between protocols built by consistent broadcasts (e.g., Bullshark) to those without such an abstraction (e.g., PBFT/HotStuff), emphasizing their shared fundamental principles.
Using a Bracha-style BCB as an example, we fully specify BBCA-CHAIN with simplicity, serving as a solid basis for high-performance replication systems (and blockchains).
△ Less
Submitted 24 May, 2024; v1 submitted 10 October, 2023;
originally announced October 2023.
-
BBCA-LEDGER: High Throughput Consensus meets Low Latency
Authors:
Chrysoula Stathakopoulou,
Michael Wei,
Maofan Yin,
Hongbo Zhang,
Dahlia Malkhi
Abstract:
This paper presents BBCA-LEDGER, a Byzantine log replication technology for partially synchronous networks enabling blocks to be broadcast in parallel, such that each broadcast is finalized independently and instantaneously into an individual slot in the log. Every finalized broadcast is eventually committed to the total ordering, so that all network bandwidth has utility in disseminating blocks.…
▽ More
This paper presents BBCA-LEDGER, a Byzantine log replication technology for partially synchronous networks enabling blocks to be broadcast in parallel, such that each broadcast is finalized independently and instantaneously into an individual slot in the log. Every finalized broadcast is eventually committed to the total ordering, so that all network bandwidth has utility in disseminating blocks. Finalizing log slots in parallel achieves both high throughput and low latency. BBCA-LEDGER is composed of two principal protocols that interweave together, a low-latency/high-throughput happy path, and a high-throughput DAG-based fallback path. The happy path employs a novel primitive called BBCA, a consistent broadcast enforcing unique slot numbering. In steady state, BBCA ensures that a transaction can be committed with low latency, in just 3 network steps. Under network partitions or faults, we harness recent advances in BFT and build a fallback mechanism on a direct acyclic graph (DAG) created by BBCA broadcasts. In this manner, BBCA-LEDGER exhibits the throughput benefits of DAG-based BFT in face of gaps.
△ Less
Submitted 26 June, 2023;
originally announced June 2023.
-
Lessons from HotStuff
Authors:
Dahlia Malkhi,
Maofan Yin
Abstract:
This article will take you on a journey to the core of blockchains, their Byzantine consensus engine, where HotStuff emerged as a new algorithmic foundation for the classical Byzantine generals consensus problem.
The first part of the article underscores the theoretical advances HotStuff enabled, including several models in which HotStuff-based solutions closed problems which were opened for dec…
▽ More
This article will take you on a journey to the core of blockchains, their Byzantine consensus engine, where HotStuff emerged as a new algorithmic foundation for the classical Byzantine generals consensus problem.
The first part of the article underscores the theoretical advances HotStuff enabled, including several models in which HotStuff-based solutions closed problems which were opened for decades.
The second part focuses on HotStuff performance in real life setting, where its simplicity drove adoption of HotStuff as the golden standard for blockchain design, and many variants and improvements built on top of it.
Both parts of this document are meant to describe lessons drawn from HotStuff as well as dispel certain myths.
△ Less
Submitted 22 May, 2023;
originally announced May 2023.
-
Maximal Extractable Value (MEV) Protection on a DAG
Authors:
Dahlia Malkhi,
Pawel Szalachowski
Abstract:
Many cryptocurrency platforms are vulnerable to Maximal Extractable Value (MEV) attacks, where a malicious consensus leader can inject transactions or change the order of user transactions to maximize its profit. A promising line of research in MEV mitigation is to enhance the Byzantine fault tolerance (BFT) consensus core of blockchains by new functionalities, like hiding transaction contents, su…
▽ More
Many cryptocurrency platforms are vulnerable to Maximal Extractable Value (MEV) attacks, where a malicious consensus leader can inject transactions or change the order of user transactions to maximize its profit. A promising line of research in MEV mitigation is to enhance the Byzantine fault tolerance (BFT) consensus core of blockchains by new functionalities, like hiding transaction contents, such that malicious parties cannot analyze and exploit them until they are ordered. An orthogonal line of research demonstrates excellent performance for BFT protocols designed around Directed Acyclic Graphs (DAG). They provide high throughput by keeping high network utilization, decoupling transactions' dissemination from their metadata ordering, and encoding consensus logic efficiently over a DAG representing a causal ordering of disseminated messages. This paper explains how to combine these two advances. It introduces a DAG-based protocol called Fino, that integrates MEV-resistance features into DAG-based BFT without delaying the steady spreading of transactions by the DAG transport and with zero message overhead. The scheme operates without complex secret share verifiability or recoverability, and avoids costly threshold encryption.
△ Less
Submitted 23 December, 2022; v1 submitted 1 August, 2022;
originally announced August 2022.
-
Block-STM: Scaling Blockchain Execution by Turning Ordering Curse to a Performance Blessing
Authors:
Rati Gelashvili,
Alexander Spiegelman,
Zhuolun Xiang,
George Danezis,
Zekun Li,
Dahlia Malkhi,
Yu Xia,
Runtian Zhou
Abstract:
Block-STM is a parallel execution engine for smart contracts, built around the principles of Software Transactional Memory. Transactions are grouped in blocks, and every execution of the block must yield the same deterministic outcome. Block-STM further enforces that the outcome is consistent with executing transactions according to a preset order, leveraging this order to dynamically detect depen…
▽ More
Block-STM is a parallel execution engine for smart contracts, built around the principles of Software Transactional Memory. Transactions are grouped in blocks, and every execution of the block must yield the same deterministic outcome. Block-STM further enforces that the outcome is consistent with executing transactions according to a preset order, leveraging this order to dynamically detect dependencies and avoid conflicts during speculative transaction execution. At the core of Block-STM is a novel, low-overhead collaborative scheduler of execution and validation tasks.
Block-STM is implemented on the main branch of the Diem Blockchain code-base and runs in production at Aptos. Our evaluation demonstrates that Block-STM is adaptive to workloads with different conflict rates and utilizes the inherent parallelism therein. Block-STM achieves up to $110k$ tps in the Diem benchmarks and up to $170k$ tps in the Aptos Benchmarks, which is a $20$x and $17$x improvement over the sequential baseline with $32$ threads, respectively. The throughput on a contended workload is up to $50k$ tps and $80k$ tps in Diem and Aptos benchmarks, respectively.
△ Less
Submitted 25 August, 2022; v1 submitted 14 March, 2022;
originally announced March 2022.
-
Be Aware of Your Leaders
Authors:
Shir Cohen,
Rati Gelashvili,
Lefteris Kokoris Kogias,
Zekun Li,
Dahlia Malkhi,
Alberto Sonnino,
Alexander Spiegelman
Abstract:
Advances in blockchains have influenced the State-Machine-Replication (SMR) world and many state-of-the-art blockchain-SMR solutions are based on two pillars: Chaining and Leader-rotation. A predetermined round-robin mechanism used for Leader-rotation, however, has an undesirable behavior: crashed parties become designated leaders infinitely often, slowing down overall system performance. In this…
▽ More
Advances in blockchains have influenced the State-Machine-Replication (SMR) world and many state-of-the-art blockchain-SMR solutions are based on two pillars: Chaining and Leader-rotation. A predetermined round-robin mechanism used for Leader-rotation, however, has an undesirable behavior: crashed parties become designated leaders infinitely often, slowing down overall system performance. In this paper, we provide a new Leader-Aware SMR framework that, among other desirable properties, formalizes a Leader-utilization requirement that bounds the number of rounds whose leaders are faulty in crash-only executions. We introduce Carousel, a novel, reputation-based Leader-rotation solution to achieve Leader-Aware SMR. The challenge in adaptive Leader-rotation is that it cannot rely on consensus to determine a leader, since consensus itself needs a leader. Carousel uses the available on-chain information to determine a leader locally and achieves Liveness despite this difficulty. A HotStuff implementation fitted with Carousel demonstrates drastic performance improvements: it increases throughput over 2x in faultless settings and provided a 20x throughput increase and 5x latency reduction in the presence of faults.
△ Less
Submitted 3 October, 2021;
originally announced October 2021.
-
Strengthened Fault Tolerance in Byzantine Fault Tolerant Replication
Authors:
Zhuolun Xiang,
Dahlia Malkhi,
Kartik Nayak,
Ling Ren
Abstract:
Byzantine fault tolerant (BFT) state machine replication (SMR) is an important building block for constructing permissioned blockchain systems. In contrast to Nakamoto Consensus where any block obtains higher assurance as buried deeper in the blockchain, in BFT SMR, any committed block is secure has a fixed resilience threshold. In this paper, we investigate strengthened fault tolerance (SFT) in B…
▽ More
Byzantine fault tolerant (BFT) state machine replication (SMR) is an important building block for constructing permissioned blockchain systems. In contrast to Nakamoto Consensus where any block obtains higher assurance as buried deeper in the blockchain, in BFT SMR, any committed block is secure has a fixed resilience threshold. In this paper, we investigate strengthened fault tolerance (SFT) in BFT SMR under partial synchrony, which provides gradually increased resilience guarantees (like Nakamoto Consensus) during an optimistic period when the network is synchronous and the number of Byzantine faults is small. Moreover, the committed blocks can tolerate more than one-third (up to two-thirds) corruptions even after the optimistic period. Compared to the prior best solution Flexible BFT which requires quadratic message complexity, our solution maintains the linear message complexity of state-of-the-art BFT SMR protocols and requires only marginal bookkeeping overhead. We implement our solution over the open-source Diem project, and give experimental results that demonstrate its efficiency under real-world scenarios.
△ Less
Submitted 11 January, 2021;
originally announced January 2021.
-
Twins: BFT Systems Made Robust
Authors:
Shehar Bano,
Alberto Sonnino,
Andrey Chursin,
Dmitri Perelman,
Zekun Li,
Avery Ching,
Dahlia Malkhi
Abstract:
This paper presents Twins, an automated unit test generator of Byzantine attacks. Twins implements three types of Byzantine behaviors: (i) leader equivocation, (ii) double voting, and (iii) losing internal state such as forgetting 'locks' guarding voted values. To emulate interesting attacks by a Byzantine node, it instantiates twin copies of the node instead of one, giving both twins the same ide…
▽ More
This paper presents Twins, an automated unit test generator of Byzantine attacks. Twins implements three types of Byzantine behaviors: (i) leader equivocation, (ii) double voting, and (iii) losing internal state such as forgetting 'locks' guarding voted values. To emulate interesting attacks by a Byzantine node, it instantiates twin copies of the node instead of one, giving both twins the same identities and network credentials. To the rest of the system, the twins appear indistinguishable from a single node behaving in a 'questionable' manner. Twins can systematically generate Byzantine attack scenarios at scale, execute them in a controlled manner, and examine their behavior. Twins scenarios iterate over protocol rounds and vary the communication patterns among nodes. Twins runs in a production setting within DiemBFT where it can execute 44M Twins-generated scenarios daily. Whereas the system at hand did not manifest errors, subtle safety bugs that were deliberately injected for the purpose of validating the implementation of Twins itself were exposed within minutes. Twins can prevent developers from regressing correctness when updating the codebase, introducing new features, or performing routine maintenance tasks. Twins only requires a thin wrapper over DiemBFT, we thus envision other systems using it. Building on this idea, one new attack and several known attacks against other BFT protocols were materialized as Twins scenarios. In all cases, the target protocols break within fewer than a dozen protocol rounds, hence it is realistic for the Twins approach to expose the problems.
△ Less
Submitted 14 January, 2022; v1 submitted 22 April, 2020;
originally announced April 2020.
-
Rainblock: Faster Transaction Processing in Public Blockchains
Authors:
Soujanya Ponnapalli,
Aashaka Shah,
Amy Tai,
Souvik Banerjee,
Vijay Chidambaram,
Dahlia Malkhi,
Michael Wei
Abstract:
Public blockchains like Ethereum use Merkle trees to verify transactions received from untrusted servers before applying them to the blockchain. We empirically show that the low throughput of such blockchains is due to the I/O bottleneck associated with using Merkle trees for processing transactions. We present RAINBLOCK, a new architecture for public blockchains that increases throughput without…
▽ More
Public blockchains like Ethereum use Merkle trees to verify transactions received from untrusted servers before applying them to the blockchain. We empirically show that the low throughput of such blockchains is due to the I/O bottleneck associated with using Merkle trees for processing transactions. We present RAINBLOCK, a new architecture for public blockchains that increases throughput without affecting security. RAINBLOCK achieves this by tackling the I/O bottleneck on two fronts: first, decoupling transaction processing from I/O, and removing I/O from the critical path; second, reducing I/O amplification by customizing storage for blockchains. RAINBLOCK uses a novel variant of the Merkle tree, the Distributed Sharded Merkle tree (DSM-TREE) to store system state. We evaluate RAINBLOCK using workloads based on public Ethereum traces (including smart contracts) and show that RAINBLOCK processes 20K transactions per second in a geo-distributed setting with four regions spread across three continents.
△ Less
Submitted 15 October, 2020; v1 submitted 25 September, 2019;
originally announced September 2019.
-
Cogsworth: Byzantine View Synchronization
Authors:
Oded Naor,
Mathieu Baudet,
Dahlia Malkhi,
Alexander Spiegelman
Abstract:
Most methods for Byzantine fault tolerance (BFT) in the partial synchrony setting divide the local state of the nodes into views, and the transition from one view to the next dictates a leader change. In order to provide liveness, all honest nodes need to stay in the same view for a sufficiently long time. This requires \emph{view synchronization}, a requisite of BFT that we extract and formally d…
▽ More
Most methods for Byzantine fault tolerance (BFT) in the partial synchrony setting divide the local state of the nodes into views, and the transition from one view to the next dictates a leader change. In order to provide liveness, all honest nodes need to stay in the same view for a sufficiently long time. This requires \emph{view synchronization}, a requisite of BFT that we extract and formally define here.
Existing approaches for Byzantine view synchronization incur quadratic communication (in $n$, the number of parties). A cascade of $O(n)$ view changes may thus result in $O(n^3)$ communication complexity. This paper presents a new Byzantine view synchronization algorithm named Cogsworth, that has optimistically linear communication complexity and constant latency. Faced with benign failures, Cogsworth has expected linear communication and constant latency.
The result here serves as an important step towards reaching solutions that have overall quadratic communication, the known lower bound on Byzantine fault tolerant consensus. Cogsworth is particularly useful for a family of BFT protocols that already exhibit linear communication under various circumstances, but suffer quadratic overhead due to view synchronization.
△ Less
Submitted 6 February, 2020; v1 submitted 11 September, 2019;
originally announced September 2019.
-
FairLedger: A Fair Blockchain Protocol for Financial Institutions
Authors:
Kfir Lev-Ari,
Alexander Spiegelman,
Idit Keidar,
Dahlia Malkhi
Abstract:
Financial institutions are currently looking into technologies for permissioned blockchains. A major effort in this direction is Hyperledger, an open source project hosted by the Linux Foundation and backed by a consortium of over a hundred companies. A key component in permissioned blockchain protocols is a byzantine fault tolerant (BFT) consensus engine that orders transactions. However, current…
▽ More
Financial institutions are currently looking into technologies for permissioned blockchains. A major effort in this direction is Hyperledger, an open source project hosted by the Linux Foundation and backed by a consortium of over a hundred companies. A key component in permissioned blockchain protocols is a byzantine fault tolerant (BFT) consensus engine that orders transactions. However, currently available BFT solutions in Hyperledger (as well as in the literature at large) are inadequate for financial settings; they are not designed to ensure fairness or to tolerate selfish behavior that arises when financial institutions strive to maximize their own profit.
We present FairLedger, a permissioned blockchain BFT protocol, which is fair, designed to deal with rational behavior, and, no less important, easy to understand and implement. The secret sauce of our protocol is a new communication abstraction, called detectable all-to-all (DA2A), which allows us to detect participants (byzantine or rational) that deviate from the protocol, and punish them. We implement FairLedger in the Hyperledger open source project, using Iroha framework, one of the biggest projects therein. To evaluate FairLegder's performance, we also implement it in the PBFT framework and compare the two protocols. Our results show that in failure-free scenarios FairLedger achieves better throughput than both Iroha's implementation and PBFT in wide-area settings.
△ Less
Submitted 10 June, 2019;
originally announced June 2019.
-
Flexible Byzantine Fault Tolerance
Authors:
Dahlia Malkhi,
Kartik Nayak,
Ling Ren
Abstract:
This paper introduces Flexible BFT, a new approach for BFT consensus solution design revolving around two pillars, stronger resilience and diversity. The first pillar, stronger resilience, involves a new fault model called alive-but-corrupt faults. Alive-but-corrupt replicas may arbitrarily deviate from the protocol in an attempt to break safety of the protocol. However, if they cannot break safet…
▽ More
This paper introduces Flexible BFT, a new approach for BFT consensus solution design revolving around two pillars, stronger resilience and diversity. The first pillar, stronger resilience, involves a new fault model called alive-but-corrupt faults. Alive-but-corrupt replicas may arbitrarily deviate from the protocol in an attempt to break safety of the protocol. However, if they cannot break safety, they will not try to prevent liveness of the protocol. Combining alive-but-corrupt faults into the model, Flexible BFT is resilient to higher corruption levels than possible in a pure Byzantine fault model. The second pillar, diversity, designs consensus solutions whose protocol transcript is used to draw different commit decisions under diverse beliefs. With this separation, the same Flexible BFT solution supports synchronous and asynchronous beliefs, as well as varying resilience threshold combinations of Byzantine and alive-but-corrupt faults.
At a technical level, Flexible BFT achieves the above results using two new ideas. First, it introduces a synchronous BFT protocol in which only the commit step requires to know the network delay bound and thus replicas execute the protocol without any synchrony assumption. Second, it introduces a notion called Flexible Byzantine Quorums by dissecting the roles of different quorums in existing consensus protocols.
△ Less
Submitted 30 May, 2019; v1 submitted 22 April, 2019;
originally announced April 2019.
-
Validated Asynchronous Byzantine Agreement with Optimal Resilience and Asymptotically Optimal Time and Word Communication
Authors:
Ittai Abraham,
Dahlia Malkhi,
Alexander Spiegelman
Abstract:
We provide a new protocol for Validated Asynchronous Byzantine Agreement. Validated (multi-valued) Asynchronous Byzantine Agreement is a key building block in constructing Atomic Broadcast and fault-tolerant state machine replication in the asynchronous setting. Our protocol can withstand the optimal number $f<n/3$ of Byzantine failures and reaches agreement in the asymptotically optimal expected…
▽ More
We provide a new protocol for Validated Asynchronous Byzantine Agreement. Validated (multi-valued) Asynchronous Byzantine Agreement is a key building block in constructing Atomic Broadcast and fault-tolerant state machine replication in the asynchronous setting. Our protocol can withstand the optimal number $f<n/3$ of Byzantine failures and reaches agreement in the asymptotically optimal expected $O(1)$ running time. Honest parties in our protocol send only an expected $O(n^2)$ messages where each message contains a value and a constant number of signatures. Hence our total expected communication is $O(n^2)$ words. The best previous result of Cachin et al. from 2001 solves Validated Byzantine Agreement with optimal resilience and $O(1)$ expected time but with $O(n^3)$ expected word communication. Our work addresses an open question of Cachin et al. from 2001 and improves the expected word communication from $O(n^3)$ to the asymptotically optimal $O(n^2)$.
△ Less
Submitted 4 November, 2018;
originally announced November 2018.
-
sAVSS: Scalable Asynchronous Verifiable Secret Sharing in BFT Protocols
Authors:
Soumya Basu,
Alin Tomescu,
Ittai Abraham,
Dahlia Malkhi,
Michael K. Reiter,
Emin Gün Sirer
Abstract:
This paper introduces a new way to incorporate verifiable secret sharing (VSS) schemes into Byzantine Fault Tolerance (BFT) protocols. This technique extends the threshold guarantee of classical Byzantine Fault Tolerant algorithms to include privacy as well. This provides applications with a powerful primitive: a threshold trusted third party, which simplifies many difficult problems such as a fai…
▽ More
This paper introduces a new way to incorporate verifiable secret sharing (VSS) schemes into Byzantine Fault Tolerance (BFT) protocols. This technique extends the threshold guarantee of classical Byzantine Fault Tolerant algorithms to include privacy as well. This provides applications with a powerful primitive: a threshold trusted third party, which simplifies many difficult problems such as a fair exchange. In order to incorporate VSS into BFT, we introduced sAVSS, a framework that transforms any VSS scheme into an asynchronous VSS scheme with constant overhead. By incorporating Kate et al.'s scheme into our framework, we obtain an asynchronous VSS that has constant overhead on each replica -- the first of its kind. We show that a key-value store built using BFT replication and sAVSS supports writing secret-shared values with about a 30% - 50% throughput overhead with less than 35 millisecond request latencies.
△ Less
Submitted 21 December, 2018; v1 submitted 10 July, 2018;
originally announced July 2018.
-
SBFT: a Scalable and Decentralized Trust Infrastructure
Authors:
Guy Golan Gueta,
Ittai Abraham,
Shelly Grossman,
Dahlia Malkhi,
Benny Pinkas,
Michael K. Reiter,
Dragos-Adrian Seredinschi,
Orr Tamir,
Alin Tomescu
Abstract:
SBFT is a state of the art Byzantine fault tolerant permissioned blockchain system that addresses the challenges of scalability, decentralization and world-scale geo-replication. SBFTis optimized for decentralization and can easily handle more than 200 active replicas in a real world-scale deployment. We evaluate \sysname in a world-scale geo-replicated deployment with 209 replicas withstanding f=…
▽ More
SBFT is a state of the art Byzantine fault tolerant permissioned blockchain system that addresses the challenges of scalability, decentralization and world-scale geo-replication. SBFTis optimized for decentralization and can easily handle more than 200 active replicas in a real world-scale deployment. We evaluate \sysname in a world-scale geo-replicated deployment with 209 replicas withstanding f=64 Byzantine failures. We provide experiments that show how the different algorithmic ingredients of \sysname increase its performance and scalability. The results show that SBFT simultaneously provides almost 2x better throughput and about 1.5x better latency relative to a highly optimized system that implements the PBFT protocol. To achieve this performance improvement, SBFT uses a combination of four ingredients: using collectors and threshold signatures to reduce communication to linear, using an optimistic fast path, reducing client communication and utilizing redundant servers for the fast path.
△ Less
Submitted 2 January, 2019; v1 submitted 4 April, 2018;
originally announced April 2018.
-
HotStuff: BFT Consensus in the Lens of Blockchain
Authors:
Maofan Yin,
Dahlia Malkhi,
Michael K. Reiter,
Guy Golan Gueta,
Ittai Abraham
Abstract:
We present HotStuff, a leader-based Byzantine fault-tolerant replication protocol for the partially synchronous model. Once network communication becomes synchronous, HotStuff enables a correct leader to drive the protocol to consensus at the pace of actual (vs. maximum) network delay--a property called responsiveness--and with communication complexity that is linear in the number of replicas. To…
▽ More
We present HotStuff, a leader-based Byzantine fault-tolerant replication protocol for the partially synchronous model. Once network communication becomes synchronous, HotStuff enables a correct leader to drive the protocol to consensus at the pace of actual (vs. maximum) network delay--a property called responsiveness--and with communication complexity that is linear in the number of replicas. To our knowledge, HotStuff is the first partially synchronous BFT replication protocol exhibiting these combined properties. HotStuff is built around a novel framework that forms a bridge between classical BFT foundations and blockchains. It allows the expression of other known protocols (DLS, PBFT, Tendermint, Casper), and ours, in a common framework.
Our deployment of HotStuff over a network with over 100 replicas achieves throughput and latency comparable to that of BFT-SMaRt, while enjoying linear communication footprint during leader failover (vs. quadratic with BFT-SMaRt).
△ Less
Submitted 23 July, 2019; v1 submitted 13 March, 2018;
originally announced March 2018.
-
Stable and Consistent Membership at Scale with Rapid
Authors:
Lalith Suresh,
Dahlia Malkhi,
Parikshit Gopalan,
Ivan Porto Carreiro,
Zeeshan Lokhandwala
Abstract:
We present the design and evaluation of Rapid, a distributed membership service. At Rapid's core is a scheme for multi-process cut detection (CD) that revolves around two key insights: (i) it suspects a failure of a process only after alerts arrive from multiple sources, and (ii) when a group of processes experience problems, it detects failures of the entire group, rather than conclude about each…
▽ More
We present the design and evaluation of Rapid, a distributed membership service. At Rapid's core is a scheme for multi-process cut detection (CD) that revolves around two key insights: (i) it suspects a failure of a process only after alerts arrive from multiple sources, and (ii) when a group of processes experience problems, it detects failures of the entire group, rather than conclude about each process individually. Implementing these insights translates into a simple membership algorithm with low communication overhead.
We present evidence that our strategy suffices to drive unanimous detection almost-everywhere, even when complex network conditions arise, such as one-way reachability problems, firewall misconfigurations, and high packet loss. Furthermore, we present both empirical evidence and analyses that proves that the almost-everywhere detection happens with high probability. To complete the design, Rapid contains a leaderless consensus protocol that converts multi-process cut detections into a view-change decision. The resulting membership service works both in fully decentralized as well as logically centralized modes.
We present an evaluation of Rapid in moderately scalable cloud settings. Rapid bootstraps 2000 node clusters 2-5.8x faster than prevailing tools such as Memberlist and ZooKeeper, remains stable in face of complex failure scenarios, and provides strong consistency guarantees. It is easy to integrate Rapid into existing distributed applications, of which we demonstrate two.
△ Less
Submitted 9 March, 2018;
originally announced March 2018.
-
Revisiting Fast Practical Byzantine Fault Tolerance: Thelma, Velma, and Zelma
Authors:
Ittai Abraham,
Guy Gueta,
Dahlia Malkhi,
Jean-Philippe Martin
Abstract:
In a previous note (arXiv:1712.01367 [cs.DC]) , we observed a safety violation in Zyzzyva and a liveness violation in FaB. In this manuscript, we sketch fixes to both. The same view-change core is applied in the two schemes, and additionally, applied to combine them and create a single, enhanced scheme that has the benefits of both approaches.
In a previous note (arXiv:1712.01367 [cs.DC]) , we observed a safety violation in Zyzzyva and a liveness violation in FaB. In this manuscript, we sketch fixes to both. The same view-change core is applied in the two schemes, and additionally, applied to combine them and create a single, enhanced scheme that has the benefits of both approaches.
△ Less
Submitted 26 January, 2018;
originally announced January 2018.
-
Revisiting Fast Practical Byzantine Fault Tolerance
Authors:
Ittai Abraham,
Guy Gueta,
Dahlia Malkhi,
Lorenzo Alvisi,
Rama Kotla,
Jean-Philippe Martin
Abstract:
In this note, we observe a safety violation in Zyzzyva and a liveness violation in FaB. To demonstrate these issues, we require relatively simple scenarios, involving only four replicas, and one or two view changes. In all of them, the problem is manifested already in the first log slot.
In this note, we observe a safety violation in Zyzzyva and a liveness violation in FaB. To demonstrate these issues, we require relatively simple scenarios, involving only four replicas, and one or two view changes. In all of them, the problem is manifested already in the first log slot.
△ Less
Submitted 4 December, 2017;
originally announced December 2017.
-
Solida: A Blockchain Protocol Based on Reconfigurable Byzantine Consensus
Authors:
Ittai Abraham,
Dahlia Malkhi,
Kartik Nayak,
Ling Ren,
Alexander Spiegelman
Abstract:
The decentralized cryptocurrency Bitcoin has experienced great success but also encountered many challenges. One of the challenges has been the long confirmation time. Another challenge is the lack of incentives at certain steps of the protocol, raising concerns for transaction withholding, selfish mining, etc. To address these challenges, we propose Solida, a decentralized blockchain protocol bas…
▽ More
The decentralized cryptocurrency Bitcoin has experienced great success but also encountered many challenges. One of the challenges has been the long confirmation time. Another challenge is the lack of incentives at certain steps of the protocol, raising concerns for transaction withholding, selfish mining, etc. To address these challenges, we propose Solida, a decentralized blockchain protocol based on reconfigurable Byzantine consensus augmented by proof-of-work. Solida improves on Bitcoin in confirmation time, and provides safety and liveness assuming the adversary control less than (roughly) one-third of the total mining power.
△ Less
Submitted 18 November, 2017; v1 submitted 8 December, 2016;
originally announced December 2016.
-
Flexible Paxos: Quorum intersection revisited
Authors:
Heidi Howard,
Dahlia Malkhi,
Alexander Spiegelman
Abstract:
Distributed consensus is integral to modern distributed systems. The widely adopted Paxos algorithm uses two phases, each requiring majority agreement, to reliably reach consensus. In this paper, we demonstrate that Paxos, which lies at the foundation of many production systems, is conservative. Specifically, we observe that each of the phases of Paxos may use non-intersecting quorums. Majority qu…
▽ More
Distributed consensus is integral to modern distributed systems. The widely adopted Paxos algorithm uses two phases, each requiring majority agreement, to reliably reach consensus. In this paper, we demonstrate that Paxos, which lies at the foundation of many production systems, is conservative. Specifically, we observe that each of the phases of Paxos may use non-intersecting quorums. Majority quorums are not necessary as intersection is required only across phases.
Using this weakening of the requirements made in the original formulation, we propose Flexible Paxos, which generalizes over the Paxos algorithm to provide flexible quorums. We show that Flexible Paxos is safe, efficient and easy to utilize in existing distributed systems. We conclude by discussing the wide reaching implications of this result. Examples include improved availability from reducing the size of second phase quorums by one when the number of acceptors is even and utilizing small disjoint phase-2 quorums to speed up the steady-state.
△ Less
Submitted 23 August, 2016;
originally announced August 2016.
-
Optimal Gossip with Direct Addressing
Authors:
Bernhard Haeupler,
Dahlia Malkhi
Abstract:
Gossip algorithms spread information by having nodes repeatedly forward information to a few random contacts. By their very nature, gossip algorithms tend to be distributed and fault tolerant. If done right, they can also be fast and message-efficient. A common model for gossip communication is the random phone call model, in which in each synchronous round each node can PUSH or PULL information t…
▽ More
Gossip algorithms spread information by having nodes repeatedly forward information to a few random contacts. By their very nature, gossip algorithms tend to be distributed and fault tolerant. If done right, they can also be fast and message-efficient. A common model for gossip communication is the random phone call model, in which in each synchronous round each node can PUSH or PULL information to or from a random other node. For example, Karp et al. [FOCS 2000] gave algorithms in this model that spread a message to all nodes in $Θ(\log n)$ rounds while sending only $O(\log \log n)$ messages per node on average.
Recently, Avin and Elsässer [DISC 2013], studied the random phone call model with the natural and commonly used assumption of direct addressing. Direct addressing allows nodes to directly contact nodes whose ID (e.g., IP address) was learned before. They show that in this setting, one can "break the $\log n$ barrier" and achieve a gossip algorithm running in $O(\sqrt{\log n})$ rounds, albeit while using $O(\sqrt{\log n})$ messages per node.
We study the same model and give a simple gossip algorithm which spreads a message in only $O(\log \log n)$ rounds. We also prove a matching $Ω(\log \log n)$ lower bound which shows that this running time is best possible. In particular we show that any gossip algorithm takes with high probability at least $0.99 \log \log n$ rounds to terminate. Lastly, our algorithm can be tweaked to send only $O(1)$ messages per node on average with only $O(\log n)$ bits per message. Our algorithm therefore simultaneously achieves the optimal round-, message-, and bit-complexity for this setting. As all prior gossip algorithms, our algorithm is also robust against failures. In particular, if in the beginning an oblivious adversary fails any $F$ nodes our algorithm still, with high probability, informs all but $o(F)$ surviving nodes.
△ Less
Submitted 11 February, 2014;
originally announced February 2014.
-
Papillon: Greedy Routing in Rings
Authors:
Ittai Abraham,
Dahlia Malkhi,
Gurmeet Singh Manku
Abstract:
We study {\sc greedy} routing over $n$ nodes placed in a ring, with the \emph{distance} between two nodes defined to be the clockwise or the absolute distance between them along the ring. Such graphs arise in the context of modeling social networks and in routing networks for peer-to-peer systems. We construct the first network over $n$ nodes in which {\sc greedy} routing takes…
▽ More
We study {\sc greedy} routing over $n$ nodes placed in a ring, with the \emph{distance} between two nodes defined to be the clockwise or the absolute distance between them along the ring. Such graphs arise in the context of modeling social networks and in routing networks for peer-to-peer systems. We construct the first network over $n$ nodes in which {\sc greedy} routing takes $O(\log n / \log d)$ hops in the worst-case, with $d$ out-going links per node. Our result has the first asymptotically optimal greedy routing complexity. Previous constructions required $O(\frac{\log^2 n}{d})$ hops.
△ Less
Submitted 13 July, 2005;
originally announced July 2005.
-
The Load and Availability of Byzantine Quorum Systems
Authors:
Dahlia Malkhi,
Michael Reiter,
Avishai Wool
Abstract:
Replicated services accessed via {\em quorums} enable each access to be performed at only a subset (quorum) of the servers, and achieve consistency across accesses by requiring any two quorums to intersect. Recently, $b$-masking quorum systems, whose intersections contain at least $2b+1$ servers, have been proposed to construct replicated services tolerant of $b$ arbitrary (Byzantine) server fai…
▽ More
Replicated services accessed via {\em quorums} enable each access to be performed at only a subset (quorum) of the servers, and achieve consistency across accesses by requiring any two quorums to intersect. Recently, $b$-masking quorum systems, whose intersections contain at least $2b+1$ servers, have been proposed to construct replicated services tolerant of $b$ arbitrary (Byzantine) server failures. In this paper we consider a hybrid fault model allowing benign failures in addition to the Byzantine ones. We present four novel constructions for $b$-masking quorum systems in this model, each of which has optimal {\em load} (the probability of access of the busiest server) or optimal availability (probability of some quorum surviving failures). To show optimality we also prove lower bounds on the load and availability of any $b$-masking quorum system in this model.
△ Less
Submitted 12 August, 1999;
originally announced August 1999.
-
On Propagating Updates in a Byzantine Environment
Authors:
Dahlia Malkhi,
Yishay Mansour,
Michael Reiter
Abstract:
We study how to efficiently diffuse updates to a large distributed system of data replicas, some of which may exhibit arbitrary (Byzantine) failures. We assume that strictly fewer than $t$ replicas fail, and that each update is initially received by at least $t$ correct replicas. The goal is to diffuse each update to all correct replicas while ensuring that correct replicas accept no updates gen…
▽ More
We study how to efficiently diffuse updates to a large distributed system of data replicas, some of which may exhibit arbitrary (Byzantine) failures. We assume that strictly fewer than $t$ replicas fail, and that each update is initially received by at least $t$ correct replicas. The goal is to diffuse each update to all correct replicas while ensuring that correct replicas accept no updates generated spuriously by faulty replicas. To achieve reliable diffusion, each correct replica accepts an update only after receiving it from at least $t$ others. We provide the first analysis of epidemic-style protocols for such environments. This analysis is fundamentally different from known analyses for the benign case due to our treatment of fully Byzantine failures---which, among other things, precludes the use of digital signatures for authenticating forwarded updates. We propose two epidemic-style diffusion algorithms and two measures that characterize the efficiency of diffusion algorithms in general. We characterize both of our algorithms according to these measures, and also prove lower bounds with regards to these measures that show that our algorithms are close to optimal.
△ Less
Submitted 12 August, 1999;
originally announced August 1999.
-
Secure Execution of Java Applets using a Remote Playground
Authors:
Dahlia Malkhi,
Michael Reiter
Abstract:
Mobile code presents a number of threats to machines that execute it. We introduce an approach for protecting machines and the resources they hold from mobile code, and describe a system based on our approach for protecting host machines from Java 1.1 applets. In our approach, each Java applet downloaded to the protected domain is rerouted to a dedicated machine (or set of machines), the {\em pl…
▽ More
Mobile code presents a number of threats to machines that execute it. We introduce an approach for protecting machines and the resources they hold from mobile code, and describe a system based on our approach for protecting host machines from Java 1.1 applets. In our approach, each Java applet downloaded to the protected domain is rerouted to a dedicated machine (or set of machines), the {\em playground}, at which it is executed. Prior to execution the applet is transformed to use the downloading user's web browser as a graphics terminal for its input and output, and so the user has the illusion that the applet is running on her own machine. In reality, however, mobile code runs only in the sanitized environment of the playground, where user files cannot be mounted and from which only limited network connections are accepted by machines in the protected domain. Our playground thus provides a second level of defense against mobile code that circumvents language-based defenses. The paper presents the design and implementation of a playground for Java 1.1 applets, and discusses extensions of it for other forms of mobile code including Java 1.2.
△ Less
Submitted 12 August, 1999;
originally announced August 1999.
-
Secure Multicast in a WAN
Authors:
Dahlia Malkhi,
Michael Merritt,
Ohad Rodeh
Abstract:
A secure reliable multicast protocol enables a process to send a message to a group of recipients such that all correct destinations receive the same message, despite the malicious efforts of fewer than a third of the total number of processes, including the sender. This has been sh own to be a useful tool in building secure distributed services, albeit with a cost that typically grows linearly…
▽ More
A secure reliable multicast protocol enables a process to send a message to a group of recipients such that all correct destinations receive the same message, despite the malicious efforts of fewer than a third of the total number of processes, including the sender. This has been sh own to be a useful tool in building secure distributed services, albeit with a cost that typically grows linearly with the size of the system. For very large networks, for which this is prohibitive, we present two approaches for reducing the cost: First, we show a protocol whose cost is on the order of the number of tolerated failures. Secondly, we show how relaxing the consistency requirement to a probabilistic guarantee can reduce the associated cost, effectively to a constant.
△ Less
Submitted 12 August, 1999;
originally announced August 1999.