subscribe to arXiv mailings

On Orchestrating Parallel Broadcasts for Distributed Ledgers

Authors: Peiyao Sheng, Chenyuan Wu, Dahlia Malkhi, Michael K. Reiter, Chrysoula Stathakopoulou, Michael Wei, Maofan Yin

Abstract: This paper introduces and develops the concept of ``ticketing'', through which atomic broadcasts are orchestrated by nodes in a distributed system. The paper studies different ticketing regimes that allow parallelism, yet prevent slow nodes from hampering overall progress. It introduces a hybrid scheme which combines managed and unmanaged ticketing regimes, striking a balance between adaptivity an… ▽ More This paper introduces and develops the concept of ``ticketing'', through which atomic broadcasts are orchestrated by nodes in a distributed system. The paper studies different ticketing regimes that allow parallelism, yet prevent slow nodes from hampering overall progress. It introduces a hybrid scheme which combines managed and unmanaged ticketing regimes, striking a balance between adaptivity and resilience. The performance evaluation demonstrates how managed and unmanaged ticketing regimes benefit throughput in systems with heterogeneous resources both in static and dynamic scenarios, with the managed ticketing regime performing better among the two as it adapts better. Finally, it demonstrates how using the hybrid ticketing regime performance can enjoy both the adaptivity of the managed regime and the liveness guarantees of the unmanaged regime. △ Less

Submitted 17 May, 2024; originally announced July 2024.

arXiv:2311.08091 [pdf, ps, other]

Lumiere: Making Optimal BFT for Partial Synchrony Practical

Authors: Andrew Lewis-Pye, Dahlia Malkhi, Oded Naor, Kartik Nayak

Abstract: The view synchronization problem lies at the heart of many Byzantine Fault Tolerant (BFT) State Machine Replication (SMR) protocols in the partial synchrony model, since these protocols are usually based on views. Liveness is guaranteed if honest processors spend a sufficiently long time in the same view during periods of synchrony, and if the leader of the view is honest. Ensuring that these cond… ▽ More The view synchronization problem lies at the heart of many Byzantine Fault Tolerant (BFT) State Machine Replication (SMR) protocols in the partial synchrony model, since these protocols are usually based on views. Liveness is guaranteed if honest processors spend a sufficiently long time in the same view during periods of synchrony, and if the leader of the view is honest. Ensuring that these conditions occur, known as Byzantine View Synchronization (BVS), has turned out to be the performance bottleneck of many BFT SMR protocols. A recent line of work has shown that, by using an appropriate view synchronization protocol, BFT SMR protocols can achieve $O(n^2)$ communication complexity in the worst case after GST, thereby finally matching the lower bound established by Dolev and Reischuk in 1985. However, these protocols suffer from two major issues: (1) When implemented so as to be optimistically responsive, even a single Byzantine processor may infinitely often cause $Ω(nΔ)$ latency between consecutive consensus decisions. (2) Even in the absence of Byzantine action, infinitely many views require honest processors to send $Ω(n^2)$ messages. Here, we present Lumiere, an optimistically responsive BVS protocol which maintains optimal worst-case communication complexity while simultaneously addressing the two issues above: for the first time, Lumiere enables BFT consensus solutions in the partial synchrony setting that have $O(n^2)$ worst-case communication complexity, and that eventually always (i.e., except for a small constant number of "warmup" decisions) have communication complexity and latency which is linear in the number of actual faults in the execution. △ Less

Submitted 16 March, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

arXiv:2310.06335 [pdf, other]

BBCA-CHAIN: Low Latency, High Throughput BFT Consensus on a DAG

Authors: Dahlia Malkhi, Chrysoula Stathakopoulou, Maofan Yin

Abstract: This paper presents a partially synchronous BFT consensus protocol powered by BBCA, a lightly modified Byzantine Consistent Broadcast (BCB) primitive. BBCA provides a Complete-Adopt semantic through an added probing interface to allow either aborting the broadcast by correct nodes or exclusively, adopting the message consistently in case of a potential delivery. It does not introduce any extra typ… ▽ More This paper presents a partially synchronous BFT consensus protocol powered by BBCA, a lightly modified Byzantine Consistent Broadcast (BCB) primitive. BBCA provides a Complete-Adopt semantic through an added probing interface to allow either aborting the broadcast by correct nodes or exclusively, adopting the message consistently in case of a potential delivery. It does not introduce any extra types of messages or additional communication costs to BCB. BBCA is harnessed into BBCA-CHAIN to make direct commits on a chained backbone of a causally ordered graph of blocks, without any additional voting blocks or artificial layering. With the help of Complete-Adopt, the additional knowledge gained from the underlying BCB completely removes the voting latency in popular DAG-based protocols. At the same time, causal ordering allows nodes to propose blocks in parallel and achieve high throughput. BBCA-CHAIN thus closes up the gap between protocols built by consistent broadcasts (e.g., Bullshark) to those without such an abstraction (e.g., PBFT/HotStuff), emphasizing their shared fundamental principles. Using a Bracha-style BCB as an example, we fully specify BBCA-CHAIN with simplicity, serving as a solid basis for high-performance replication systems (and blockchains). △ Less

Submitted 24 May, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

arXiv:2306.14757 [pdf, other]

BBCA-LEDGER: High Throughput Consensus meets Low Latency

Authors: Chrysoula Stathakopoulou, Michael Wei, Maofan Yin, Hongbo Zhang, Dahlia Malkhi

Abstract: This paper presents BBCA-LEDGER, a Byzantine log replication technology for partially synchronous networks enabling blocks to be broadcast in parallel, such that each broadcast is finalized independently and instantaneously into an individual slot in the log. Every finalized broadcast is eventually committed to the total ordering, so that all network bandwidth has utility in disseminating blocks.… ▽ More This paper presents BBCA-LEDGER, a Byzantine log replication technology for partially synchronous networks enabling blocks to be broadcast in parallel, such that each broadcast is finalized independently and instantaneously into an individual slot in the log. Every finalized broadcast is eventually committed to the total ordering, so that all network bandwidth has utility in disseminating blocks. Finalizing log slots in parallel achieves both high throughput and low latency. BBCA-LEDGER is composed of two principal protocols that interweave together, a low-latency/high-throughput happy path, and a high-throughput DAG-based fallback path. The happy path employs a novel primitive called BBCA, a consistent broadcast enforcing unique slot numbering. In steady state, BBCA ensures that a transaction can be committed with low latency, in just 3 network steps. Under network partitions or faults, we harness recent advances in BFT and build a fallback mechanism on a direct acyclic graph (DAG) created by BBCA broadcasts. In this manner, BBCA-LEDGER exhibits the throughput benefits of DAG-based BFT in face of gaps. △ Less

Submitted 26 June, 2023; originally announced June 2023.

arXiv:2305.13556 [pdf, other]

Lessons from HotStuff

Authors: Dahlia Malkhi, Maofan Yin

Abstract: This article will take you on a journey to the core of blockchains, their Byzantine consensus engine, where HotStuff emerged as a new algorithmic foundation for the classical Byzantine generals consensus problem. The first part of the article underscores the theoretical advances HotStuff enabled, including several models in which HotStuff-based solutions closed problems which were opened for dec… ▽ More This article will take you on a journey to the core of blockchains, their Byzantine consensus engine, where HotStuff emerged as a new algorithmic foundation for the classical Byzantine generals consensus problem. The first part of the article underscores the theoretical advances HotStuff enabled, including several models in which HotStuff-based solutions closed problems which were opened for decades. The second part focuses on HotStuff performance in real life setting, where its simplicity drove adoption of HotStuff as the golden standard for blockchain design, and many variants and improvements built on top of it. Both parts of this document are meant to describe lessons drawn from HotStuff as well as dispel certain myths. △ Less

Submitted 22 May, 2023; originally announced May 2023.

arXiv:2208.00940 [pdf, other]

Maximal Extractable Value (MEV) Protection on a DAG

Authors: Dahlia Malkhi, Pawel Szalachowski

Abstract: Many cryptocurrency platforms are vulnerable to Maximal Extractable Value (MEV) attacks, where a malicious consensus leader can inject transactions or change the order of user transactions to maximize its profit. A promising line of research in MEV mitigation is to enhance the Byzantine fault tolerance (BFT) consensus core of blockchains by new functionalities, like hiding transaction contents, su… ▽ More Many cryptocurrency platforms are vulnerable to Maximal Extractable Value (MEV) attacks, where a malicious consensus leader can inject transactions or change the order of user transactions to maximize its profit. A promising line of research in MEV mitigation is to enhance the Byzantine fault tolerance (BFT) consensus core of blockchains by new functionalities, like hiding transaction contents, such that malicious parties cannot analyze and exploit them until they are ordered. An orthogonal line of research demonstrates excellent performance for BFT protocols designed around Directed Acyclic Graphs (DAG). They provide high throughput by keeping high network utilization, decoupling transactions' dissemination from their metadata ordering, and encoding consensus logic efficiently over a DAG representing a causal ordering of disseminated messages. This paper explains how to combine these two advances. It introduces a DAG-based protocol called Fino, that integrates MEV-resistance features into DAG-based BFT without delaying the steady spreading of transactions by the DAG transport and with zero message overhead. The scheme operates without complex secret share verifiability or recoverability, and avoids costly threshold encryption. △ Less

Submitted 23 December, 2022; v1 submitted 1 August, 2022; originally announced August 2022.

arXiv:2203.06871 [pdf, other]

Block-STM: Scaling Blockchain Execution by Turning Ordering Curse to a Performance Blessing

Authors: Rati Gelashvili, Alexander Spiegelman, Zhuolun Xiang, George Danezis, Zekun Li, Dahlia Malkhi, Yu Xia, Runtian Zhou

Abstract: Block-STM is a parallel execution engine for smart contracts, built around the principles of Software Transactional Memory. Transactions are grouped in blocks, and every execution of the block must yield the same deterministic outcome. Block-STM further enforces that the outcome is consistent with executing transactions according to a preset order, leveraging this order to dynamically detect depen… ▽ More Block-STM is a parallel execution engine for smart contracts, built around the principles of Software Transactional Memory. Transactions are grouped in blocks, and every execution of the block must yield the same deterministic outcome. Block-STM further enforces that the outcome is consistent with executing transactions according to a preset order, leveraging this order to dynamically detect dependencies and avoid conflicts during speculative transaction execution. At the core of Block-STM is a novel, low-overhead collaborative scheduler of execution and validation tasks. Block-STM is implemented on the main branch of the Diem Blockchain code-base and runs in production at Aptos. Our evaluation demonstrates that Block-STM is adaptive to workloads with different conflict rates and utilizes the inherent parallelism therein. Block-STM achieves up to $110k$ tps in the Diem benchmarks and up to $170k$ tps in the Aptos Benchmarks, which is a $20$x and $17$x improvement over the sequential baseline with $32$ threads, respectively. The throughput on a contended workload is up to $50k$ tps and $80k$ tps in Diem and Aptos benchmarks, respectively. △ Less

Submitted 25 August, 2022; v1 submitted 14 March, 2022; originally announced March 2022.

arXiv:2110.00960 [pdf, other]

Be Aware of Your Leaders

Authors: Shir Cohen, Rati Gelashvili, Lefteris Kokoris Kogias, Zekun Li, Dahlia Malkhi, Alberto Sonnino, Alexander Spiegelman

Abstract: Advances in blockchains have influenced the State-Machine-Replication (SMR) world and many state-of-the-art blockchain-SMR solutions are based on two pillars: Chaining and Leader-rotation. A predetermined round-robin mechanism used for Leader-rotation, however, has an undesirable behavior: crashed parties become designated leaders infinitely often, slowing down overall system performance. In this… ▽ More Advances in blockchains have influenced the State-Machine-Replication (SMR) world and many state-of-the-art blockchain-SMR solutions are based on two pillars: Chaining and Leader-rotation. A predetermined round-robin mechanism used for Leader-rotation, however, has an undesirable behavior: crashed parties become designated leaders infinitely often, slowing down overall system performance. In this paper, we provide a new Leader-Aware SMR framework that, among other desirable properties, formalizes a Leader-utilization requirement that bounds the number of rounds whose leaders are faulty in crash-only executions. We introduce Carousel, a novel, reputation-based Leader-rotation solution to achieve Leader-Aware SMR. The challenge in adaptive Leader-rotation is that it cannot rely on consensus to determine a leader, since consensus itself needs a leader. Carousel uses the available on-chain information to determine a leader locally and achieves Liveness despite this difficulty. A HotStuff implementation fitted with Carousel demonstrates drastic performance improvements: it increases throughput over 2x in faultless settings and provided a 20x throughput increase and 5x latency reduction in the presence of faults. △ Less

Submitted 3 October, 2021; originally announced October 2021.

arXiv:2101.03715 [pdf, ps, other]

Strengthened Fault Tolerance in Byzantine Fault Tolerant Replication

Authors: Zhuolun Xiang, Dahlia Malkhi, Kartik Nayak, Ling Ren

Abstract: Byzantine fault tolerant (BFT) state machine replication (SMR) is an important building block for constructing permissioned blockchain systems. In contrast to Nakamoto Consensus where any block obtains higher assurance as buried deeper in the blockchain, in BFT SMR, any committed block is secure has a fixed resilience threshold. In this paper, we investigate strengthened fault tolerance (SFT) in B… ▽ More Byzantine fault tolerant (BFT) state machine replication (SMR) is an important building block for constructing permissioned blockchain systems. In contrast to Nakamoto Consensus where any block obtains higher assurance as buried deeper in the blockchain, in BFT SMR, any committed block is secure has a fixed resilience threshold. In this paper, we investigate strengthened fault tolerance (SFT) in BFT SMR under partial synchrony, which provides gradually increased resilience guarantees (like Nakamoto Consensus) during an optimistic period when the network is synchronous and the number of Byzantine faults is small. Moreover, the committed blocks can tolerate more than one-third (up to two-thirds) corruptions even after the optimistic period. Compared to the prior best solution Flexible BFT which requires quadratic message complexity, our solution maintains the linear message complexity of state-of-the-art BFT SMR protocols and requires only marginal bookkeeping overhead. We implement our solution over the open-source Diem project, and give experimental results that demonstrate its efficiency under real-world scenarios. △ Less

Submitted 11 January, 2021; originally announced January 2021.

arXiv:2004.10617 [pdf, other]

Twins: BFT Systems Made Robust

Authors: Shehar Bano, Alberto Sonnino, Andrey Chursin, Dmitri Perelman, Zekun Li, Avery Ching, Dahlia Malkhi

Abstract: This paper presents Twins, an automated unit test generator of Byzantine attacks. Twins implements three types of Byzantine behaviors: (i) leader equivocation, (ii) double voting, and (iii) losing internal state such as forgetting 'locks' guarding voted values. To emulate interesting attacks by a Byzantine node, it instantiates twin copies of the node instead of one, giving both twins the same ide… ▽ More This paper presents Twins, an automated unit test generator of Byzantine attacks. Twins implements three types of Byzantine behaviors: (i) leader equivocation, (ii) double voting, and (iii) losing internal state such as forgetting 'locks' guarding voted values. To emulate interesting attacks by a Byzantine node, it instantiates twin copies of the node instead of one, giving both twins the same identities and network credentials. To the rest of the system, the twins appear indistinguishable from a single node behaving in a 'questionable' manner. Twins can systematically generate Byzantine attack scenarios at scale, execute them in a controlled manner, and examine their behavior. Twins scenarios iterate over protocol rounds and vary the communication patterns among nodes. Twins runs in a production setting within DiemBFT where it can execute 44M Twins-generated scenarios daily. Whereas the system at hand did not manifest errors, subtle safety bugs that were deliberately injected for the purpose of validating the implementation of Twins itself were exposed within minutes. Twins can prevent developers from regressing correctness when updating the codebase, introducing new features, or performing routine maintenance tasks. Twins only requires a thin wrapper over DiemBFT, we thus envision other systems using it. Building on this idea, one new attack and several known attacks against other BFT protocols were materialized as Twins scenarios. In all cases, the target protocols break within fewer than a dozen protocol rounds, hence it is realistic for the Twins approach to expose the problems. △ Less

Submitted 14 January, 2022; v1 submitted 22 April, 2020; originally announced April 2020.

arXiv:1909.11590 [pdf, other]

Rainblock: Faster Transaction Processing in Public Blockchains

Authors: Soujanya Ponnapalli, Aashaka Shah, Amy Tai, Souvik Banerjee, Vijay Chidambaram, Dahlia Malkhi, Michael Wei

Abstract: Public blockchains like Ethereum use Merkle trees to verify transactions received from untrusted servers before applying them to the blockchain. We empirically show that the low throughput of such blockchains is due to the I/O bottleneck associated with using Merkle trees for processing transactions. We present RAINBLOCK, a new architecture for public blockchains that increases throughput without… ▽ More Public blockchains like Ethereum use Merkle trees to verify transactions received from untrusted servers before applying them to the blockchain. We empirically show that the low throughput of such blockchains is due to the I/O bottleneck associated with using Merkle trees for processing transactions. We present RAINBLOCK, a new architecture for public blockchains that increases throughput without affecting security. RAINBLOCK achieves this by tackling the I/O bottleneck on two fronts: first, decoupling transaction processing from I/O, and removing I/O from the critical path; second, reducing I/O amplification by customizing storage for blockchains. RAINBLOCK uses a novel variant of the Merkle tree, the Distributed Sharded Merkle tree (DSM-TREE) to store system state. We evaluate RAINBLOCK using workloads based on public Ethereum traces (including smart contracts) and show that RAINBLOCK processes 20K transactions per second in a geo-distributed setting with four regions spread across three continents. △ Less

Submitted 15 October, 2020; v1 submitted 25 September, 2019; originally announced September 2019.

arXiv:1909.05204 [pdf, ps, other]

Cogsworth: Byzantine View Synchronization

Authors: Oded Naor, Mathieu Baudet, Dahlia Malkhi, Alexander Spiegelman

Abstract: Most methods for Byzantine fault tolerance (BFT) in the partial synchrony setting divide the local state of the nodes into views, and the transition from one view to the next dictates a leader change. In order to provide liveness, all honest nodes need to stay in the same view for a sufficiently long time. This requires \emph{view synchronization}, a requisite of BFT that we extract and formally d… ▽ More Most methods for Byzantine fault tolerance (BFT) in the partial synchrony setting divide the local state of the nodes into views, and the transition from one view to the next dictates a leader change. In order to provide liveness, all honest nodes need to stay in the same view for a sufficiently long time. This requires \emph{view synchronization}, a requisite of BFT that we extract and formally define here. Existing approaches for Byzantine view synchronization incur quadratic communication (in $n$, the number of parties). A cascade of $O(n)$ view changes may thus result in $O(n^3)$ communication complexity. This paper presents a new Byzantine view synchronization algorithm named Cogsworth, that has optimistically linear communication complexity and constant latency. Faced with benign failures, Cogsworth has expected linear communication and constant latency. The result here serves as an important step towards reaching solutions that have overall quadratic communication, the known lower bound on Byzantine fault tolerant consensus. Cogsworth is particularly useful for a family of BFT protocols that already exhibit linear communication under various circumstances, but suffer quadratic overhead due to view synchronization. △ Less

Submitted 6 February, 2020; v1 submitted 11 September, 2019; originally announced September 2019.

arXiv:1906.03819 [pdf, other]

FairLedger: A Fair Blockchain Protocol for Financial Institutions

Authors: Kfir Lev-Ari, Alexander Spiegelman, Idit Keidar, Dahlia Malkhi

Abstract: Financial institutions are currently looking into technologies for permissioned blockchains. A major effort in this direction is Hyperledger, an open source project hosted by the Linux Foundation and backed by a consortium of over a hundred companies. A key component in permissioned blockchain protocols is a byzantine fault tolerant (BFT) consensus engine that orders transactions. However, current… ▽ More Financial institutions are currently looking into technologies for permissioned blockchains. A major effort in this direction is Hyperledger, an open source project hosted by the Linux Foundation and backed by a consortium of over a hundred companies. A key component in permissioned blockchain protocols is a byzantine fault tolerant (BFT) consensus engine that orders transactions. However, currently available BFT solutions in Hyperledger (as well as in the literature at large) are inadequate for financial settings; they are not designed to ensure fairness or to tolerate selfish behavior that arises when financial institutions strive to maximize their own profit. We present FairLedger, a permissioned blockchain BFT protocol, which is fair, designed to deal with rational behavior, and, no less important, easy to understand and implement. The secret sauce of our protocol is a new communication abstraction, called detectable all-to-all (DA2A), which allows us to detect participants (byzantine or rational) that deviate from the protocol, and punish them. We implement FairLedger in the Hyperledger open source project, using Iroha framework, one of the biggest projects therein. To evaluate FairLegder's performance, we also implement it in the PBFT framework and compare the two protocols. Our results show that in failure-free scenarios FairLedger achieves better throughput than both Iroha's implementation and PBFT in wide-area settings. △ Less

Submitted 10 June, 2019; originally announced June 2019.

arXiv:1904.10067 [pdf, other]

Flexible Byzantine Fault Tolerance

Authors: Dahlia Malkhi, Kartik Nayak, Ling Ren

Abstract: This paper introduces Flexible BFT, a new approach for BFT consensus solution design revolving around two pillars, stronger resilience and diversity. The first pillar, stronger resilience, involves a new fault model called alive-but-corrupt faults. Alive-but-corrupt replicas may arbitrarily deviate from the protocol in an attempt to break safety of the protocol. However, if they cannot break safet… ▽ More This paper introduces Flexible BFT, a new approach for BFT consensus solution design revolving around two pillars, stronger resilience and diversity. The first pillar, stronger resilience, involves a new fault model called alive-but-corrupt faults. Alive-but-corrupt replicas may arbitrarily deviate from the protocol in an attempt to break safety of the protocol. However, if they cannot break safety, they will not try to prevent liveness of the protocol. Combining alive-but-corrupt faults into the model, Flexible BFT is resilient to higher corruption levels than possible in a pure Byzantine fault model. The second pillar, diversity, designs consensus solutions whose protocol transcript is used to draw different commit decisions under diverse beliefs. With this separation, the same Flexible BFT solution supports synchronous and asynchronous beliefs, as well as varying resilience threshold combinations of Byzantine and alive-but-corrupt faults. At a technical level, Flexible BFT achieves the above results using two new ideas. First, it introduces a synchronous BFT protocol in which only the commit step requires to know the network delay bound and thus replicas execute the protocol without any synchrony assumption. Second, it introduces a notion called Flexible Byzantine Quorums by dissecting the roles of different quorums in existing consensus protocols. △ Less

Submitted 30 May, 2019; v1 submitted 22 April, 2019; originally announced April 2019.

arXiv:1811.01332 [pdf, ps, other]

Validated Asynchronous Byzantine Agreement with Optimal Resilience and Asymptotically Optimal Time and Word Communication

Authors: Ittai Abraham, Dahlia Malkhi, Alexander Spiegelman

Abstract: We provide a new protocol for Validated Asynchronous Byzantine Agreement. Validated (multi-valued) Asynchronous Byzantine Agreement is a key building block in constructing Atomic Broadcast and fault-tolerant state machine replication in the asynchronous setting. Our protocol can withstand the optimal number $f<n/3$ of Byzantine failures and reaches agreement in the asymptotically optimal expected… ▽ More We provide a new protocol for Validated Asynchronous Byzantine Agreement. Validated (multi-valued) Asynchronous Byzantine Agreement is a key building block in constructing Atomic Broadcast and fault-tolerant state machine replication in the asynchronous setting. Our protocol can withstand the optimal number $f<n/3$ of Byzantine failures and reaches agreement in the asymptotically optimal expected $O(1)$ running time. Honest parties in our protocol send only an expected $O(n^2)$ messages where each message contains a value and a constant number of signatures. Hence our total expected communication is $O(n^2)$ words. The best previous result of Cachin et al. from 2001 solves Validated Byzantine Agreement with optimal resilience and $O(1)$ expected time but with $O(n^3)$ expected word communication. Our work addresses an open question of Cachin et al. from 2001 and improves the expected word communication from $O(n^3)$ to the asymptotically optimal $O(n^2)$. △ Less

Submitted 4 November, 2018; originally announced November 2018.

arXiv:1807.03720 [pdf, other]

sAVSS: Scalable Asynchronous Verifiable Secret Sharing in BFT Protocols

Authors: Soumya Basu, Alin Tomescu, Ittai Abraham, Dahlia Malkhi, Michael K. Reiter, Emin Gün Sirer

Abstract: This paper introduces a new way to incorporate verifiable secret sharing (VSS) schemes into Byzantine Fault Tolerance (BFT) protocols. This technique extends the threshold guarantee of classical Byzantine Fault Tolerant algorithms to include privacy as well. This provides applications with a powerful primitive: a threshold trusted third party, which simplifies many difficult problems such as a fai… ▽ More This paper introduces a new way to incorporate verifiable secret sharing (VSS) schemes into Byzantine Fault Tolerance (BFT) protocols. This technique extends the threshold guarantee of classical Byzantine Fault Tolerant algorithms to include privacy as well. This provides applications with a powerful primitive: a threshold trusted third party, which simplifies many difficult problems such as a fair exchange. In order to incorporate VSS into BFT, we introduced sAVSS, a framework that transforms any VSS scheme into an asynchronous VSS scheme with constant overhead. By incorporating Kate et al.'s scheme into our framework, we obtain an asynchronous VSS that has constant overhead on each replica -- the first of its kind. We show that a key-value store built using BFT replication and sAVSS supports writing secret-shared values with about a 30% - 50% throughput overhead with less than 35 millisecond request latencies. △ Less

Submitted 21 December, 2018; v1 submitted 10 July, 2018; originally announced July 2018.

arXiv:1804.01626 [pdf, other]

SBFT: a Scalable and Decentralized Trust Infrastructure

Authors: Guy Golan Gueta, Ittai Abraham, Shelly Grossman, Dahlia Malkhi, Benny Pinkas, Michael K. Reiter, Dragos-Adrian Seredinschi, Orr Tamir, Alin Tomescu

Abstract: SBFT is a state of the art Byzantine fault tolerant permissioned blockchain system that addresses the challenges of scalability, decentralization and world-scale geo-replication. SBFTis optimized for decentralization and can easily handle more than 200 active replicas in a real world-scale deployment. We evaluate \sysname in a world-scale geo-replicated deployment with 209 replicas withstanding f=… ▽ More SBFT is a state of the art Byzantine fault tolerant permissioned blockchain system that addresses the challenges of scalability, decentralization and world-scale geo-replication. SBFTis optimized for decentralization and can easily handle more than 200 active replicas in a real world-scale deployment. We evaluate \sysname in a world-scale geo-replicated deployment with 209 replicas withstanding f=64 Byzantine failures. We provide experiments that show how the different algorithmic ingredients of \sysname increase its performance and scalability. The results show that SBFT simultaneously provides almost 2x better throughput and about 1.5x better latency relative to a highly optimized system that implements the PBFT protocol. To achieve this performance improvement, SBFT uses a combination of four ingredients: using collectors and threshold signatures to reduce communication to linear, using an optimistic fast path, reducing client communication and utilizing redundant servers for the fast path. △ Less

Submitted 2 January, 2019; v1 submitted 4 April, 2018; originally announced April 2018.

arXiv:1803.05069 [pdf, other]

HotStuff: BFT Consensus in the Lens of Blockchain

Authors: Maofan Yin, Dahlia Malkhi, Michael K. Reiter, Guy Golan Gueta, Ittai Abraham

Abstract: We present HotStuff, a leader-based Byzantine fault-tolerant replication protocol for the partially synchronous model. Once network communication becomes synchronous, HotStuff enables a correct leader to drive the protocol to consensus at the pace of actual (vs. maximum) network delay--a property called responsiveness--and with communication complexity that is linear in the number of replicas. To… ▽ More We present HotStuff, a leader-based Byzantine fault-tolerant replication protocol for the partially synchronous model. Once network communication becomes synchronous, HotStuff enables a correct leader to drive the protocol to consensus at the pace of actual (vs. maximum) network delay--a property called responsiveness--and with communication complexity that is linear in the number of replicas. To our knowledge, HotStuff is the first partially synchronous BFT replication protocol exhibiting these combined properties. HotStuff is built around a novel framework that forms a bridge between classical BFT foundations and blockchains. It allows the expression of other known protocols (DLS, PBFT, Tendermint, Casper), and ours, in a common framework. Our deployment of HotStuff over a network with over 100 replicas achieves throughput and latency comparable to that of BFT-SMaRt, while enjoying linear communication footprint during leader failover (vs. quadratic with BFT-SMaRt). △ Less

Submitted 23 July, 2019; v1 submitted 13 March, 2018; originally announced March 2018.

Comments: a shorter version of this paper has been published in PODC'19, which does not include interpretation of other protocols using the framework, system evaluation or additional proofs in appendices

arXiv:1803.03620 [pdf, other]

Stable and Consistent Membership at Scale with Rapid

Authors: Lalith Suresh, Dahlia Malkhi, Parikshit Gopalan, Ivan Porto Carreiro, Zeeshan Lokhandwala

Abstract: We present the design and evaluation of Rapid, a distributed membership service. At Rapid's core is a scheme for multi-process cut detection (CD) that revolves around two key insights: (i) it suspects a failure of a process only after alerts arrive from multiple sources, and (ii) when a group of processes experience problems, it detects failures of the entire group, rather than conclude about each… ▽ More We present the design and evaluation of Rapid, a distributed membership service. At Rapid's core is a scheme for multi-process cut detection (CD) that revolves around two key insights: (i) it suspects a failure of a process only after alerts arrive from multiple sources, and (ii) when a group of processes experience problems, it detects failures of the entire group, rather than conclude about each process individually. Implementing these insights translates into a simple membership algorithm with low communication overhead. We present evidence that our strategy suffices to drive unanimous detection almost-everywhere, even when complex network conditions arise, such as one-way reachability problems, firewall misconfigurations, and high packet loss. Furthermore, we present both empirical evidence and analyses that proves that the almost-everywhere detection happens with high probability. To complete the design, Rapid contains a leaderless consensus protocol that converts multi-process cut detections into a view-change decision. The resulting membership service works both in fully decentralized as well as logically centralized modes. We present an evaluation of Rapid in moderately scalable cloud settings. Rapid bootstraps 2000 node clusters 2-5.8x faster than prevailing tools such as Memberlist and ZooKeeper, remains stable in face of complex failure scenarios, and provides strong consistency guarantees. It is easy to integrate Rapid into existing distributed applications, of which we demonstrate two. △ Less

Submitted 9 March, 2018; originally announced March 2018.

Comments: 15 pages

arXiv:1801.10022 [pdf, ps, other]

Revisiting Fast Practical Byzantine Fault Tolerance: Thelma, Velma, and Zelma

Authors: Ittai Abraham, Guy Gueta, Dahlia Malkhi, Jean-Philippe Martin

Abstract: In a previous note (arXiv:1712.01367 [cs.DC]) , we observed a safety violation in Zyzzyva and a liveness violation in FaB. In this manuscript, we sketch fixes to both. The same view-change core is applied in the two schemes, and additionally, applied to combine them and create a single, enhanced scheme that has the benefits of both approaches. In a previous note (arXiv:1712.01367 [cs.DC]) , we observed a safety violation in Zyzzyva and a liveness violation in FaB. In this manuscript, we sketch fixes to both. The same view-change core is applied in the two schemes, and additionally, applied to combine them and create a single, enhanced scheme that has the benefits of both approaches. △ Less

Submitted 26 January, 2018; originally announced January 2018.

Comments: arXiv admin note: text overlap with arXiv:1712.01367

arXiv:1712.01367 [pdf, ps, other]

Revisiting Fast Practical Byzantine Fault Tolerance

Authors: Ittai Abraham, Guy Gueta, Dahlia Malkhi, Lorenzo Alvisi, Rama Kotla, Jean-Philippe Martin

Abstract: In this note, we observe a safety violation in Zyzzyva and a liveness violation in FaB. To demonstrate these issues, we require relatively simple scenarios, involving only four replicas, and one or two view changes. In all of them, the problem is manifested already in the first log slot. In this note, we observe a safety violation in Zyzzyva and a liveness violation in FaB. To demonstrate these issues, we require relatively simple scenarios, involving only four replicas, and one or two view changes. In all of them, the problem is manifested already in the first log slot. △ Less

Submitted 4 December, 2017; originally announced December 2017.

arXiv:1612.02916 [pdf, other]

Solida: A Blockchain Protocol Based on Reconfigurable Byzantine Consensus

Authors: Ittai Abraham, Dahlia Malkhi, Kartik Nayak, Ling Ren, Alexander Spiegelman

Abstract: The decentralized cryptocurrency Bitcoin has experienced great success but also encountered many challenges. One of the challenges has been the long confirmation time. Another challenge is the lack of incentives at certain steps of the protocol, raising concerns for transaction withholding, selfish mining, etc. To address these challenges, we propose Solida, a decentralized blockchain protocol bas… ▽ More The decentralized cryptocurrency Bitcoin has experienced great success but also encountered many challenges. One of the challenges has been the long confirmation time. Another challenge is the lack of incentives at certain steps of the protocol, raising concerns for transaction withholding, selfish mining, etc. To address these challenges, we propose Solida, a decentralized blockchain protocol based on reconfigurable Byzantine consensus augmented by proof-of-work. Solida improves on Bitcoin in confirmation time, and provides safety and liveness assuming the adversary control less than (roughly) one-third of the total mining power. △ Less

Submitted 18 November, 2017; v1 submitted 8 December, 2016; originally announced December 2016.

arXiv:1608.06696 [pdf, other]

Flexible Paxos: Quorum intersection revisited

Authors: Heidi Howard, Dahlia Malkhi, Alexander Spiegelman

Abstract: Distributed consensus is integral to modern distributed systems. The widely adopted Paxos algorithm uses two phases, each requiring majority agreement, to reliably reach consensus. In this paper, we demonstrate that Paxos, which lies at the foundation of many production systems, is conservative. Specifically, we observe that each of the phases of Paxos may use non-intersecting quorums. Majority qu… ▽ More Distributed consensus is integral to modern distributed systems. The widely adopted Paxos algorithm uses two phases, each requiring majority agreement, to reliably reach consensus. In this paper, we demonstrate that Paxos, which lies at the foundation of many production systems, is conservative. Specifically, we observe that each of the phases of Paxos may use non-intersecting quorums. Majority quorums are not necessary as intersection is required only across phases. Using this weakening of the requirements made in the original formulation, we propose Flexible Paxos, which generalizes over the Paxos algorithm to provide flexible quorums. We show that Flexible Paxos is safe, efficient and easy to utilize in existing distributed systems. We conclude by discussing the wide reaching implications of this result. Examples include improved availability from reducing the size of second phase quorums by one when the number of acceptors is even and utilizing small disjoint phase-2 quorums to speed up the steady-state. △ Less

Submitted 23 August, 2016; originally announced August 2016.

ACM Class: C.2.4

arXiv:1402.2701 [pdf, ps, other]

Optimal Gossip with Direct Addressing

Authors: Bernhard Haeupler, Dahlia Malkhi

Abstract: Gossip algorithms spread information by having nodes repeatedly forward information to a few random contacts. By their very nature, gossip algorithms tend to be distributed and fault tolerant. If done right, they can also be fast and message-efficient. A common model for gossip communication is the random phone call model, in which in each synchronous round each node can PUSH or PULL information t… ▽ More Gossip algorithms spread information by having nodes repeatedly forward information to a few random contacts. By their very nature, gossip algorithms tend to be distributed and fault tolerant. If done right, they can also be fast and message-efficient. A common model for gossip communication is the random phone call model, in which in each synchronous round each node can PUSH or PULL information to or from a random other node. For example, Karp et al. [FOCS 2000] gave algorithms in this model that spread a message to all nodes in $Θ(\log n)$ rounds while sending only $O(\log \log n)$ messages per node on average. Recently, Avin and Elsässer [DISC 2013], studied the random phone call model with the natural and commonly used assumption of direct addressing. Direct addressing allows nodes to directly contact nodes whose ID (e.g., IP address) was learned before. They show that in this setting, one can "break the $\log n$ barrier" and achieve a gossip algorithm running in $O(\sqrt{\log n})$ rounds, albeit while using $O(\sqrt{\log n})$ messages per node. We study the same model and give a simple gossip algorithm which spreads a message in only $O(\log \log n)$ rounds. We also prove a matching $Ω(\log \log n)$ lower bound which shows that this running time is best possible. In particular we show that any gossip algorithm takes with high probability at least $0.99 \log \log n$ rounds to terminate. Lastly, our algorithm can be tweaked to send only $O(1)$ messages per node on average with only $O(\log n)$ bits per message. Our algorithm therefore simultaneously achieves the optimal round-, message-, and bit-complexity for this setting. As all prior gossip algorithms, our algorithm is also robust against failures. In particular, if in the beginning an oblivious adversary fails any $F$ nodes our algorithm still, with high probability, informs all but $o(F)$ surviving nodes. △ Less

Submitted 11 February, 2014; originally announced February 2014.

arXiv:cs/0507034 [pdf, ps, other]

Papillon: Greedy Routing in Rings

Authors: Ittai Abraham, Dahlia Malkhi, Gurmeet Singh Manku

Abstract: We study {\sc greedy} routing over $n$ nodes placed in a ring, with the \emph{distance} between two nodes defined to be the clockwise or the absolute distance between them along the ring. Such graphs arise in the context of modeling social networks and in routing networks for peer-to-peer systems. We construct the first network over $n$ nodes in which {\sc greedy} routing takes… ▽ More We study {\sc greedy} routing over $n$ nodes placed in a ring, with the \emph{distance} between two nodes defined to be the clockwise or the absolute distance between them along the ring. Such graphs arise in the context of modeling social networks and in routing networks for peer-to-peer systems. We construct the first network over $n$ nodes in which {\sc greedy} routing takes $O(\log n / \log d)$ hops in the worst-case, with $d$ out-going links per node. Our result has the first asymptotically optimal greedy routing complexity. Previous constructions required $O(\frac{\log^2 n}{d})$ hops. △ Less

Submitted 13 July, 2005; originally announced July 2005.

arXiv:cs/9908011 [pdf, ps, other]

The Load and Availability of Byzantine Quorum Systems

Authors: Dahlia Malkhi, Michael Reiter, Avishai Wool

Abstract: Replicated services accessed via {\em quorums} enable each access to be performed at only a subset (quorum) of the servers, and achieve consistency across accesses by requiring any two quorums to intersect. Recently, $b$-masking quorum systems, whose intersections contain at least $2b+1$ servers, have been proposed to construct replicated services tolerant of $b$ arbitrary (Byzantine) server fai… ▽ More Replicated services accessed via {\em quorums} enable each access to be performed at only a subset (quorum) of the servers, and achieve consistency across accesses by requiring any two quorums to intersect. Recently, $b$-masking quorum systems, whose intersections contain at least $2b+1$ servers, have been proposed to construct replicated services tolerant of $b$ arbitrary (Byzantine) server failures. In this paper we consider a hybrid fault model allowing benign failures in addition to the Byzantine ones. We present four novel constructions for $b$-masking quorum systems in this model, each of which has optimal {\em load} (the probability of access of the busiest server) or optimal availability (probability of some quorum surviving failures). To show optimality we also prove lower bounds on the load and availability of any $b$-masking quorum system in this model. △ Less

Submitted 12 August, 1999; originally announced August 1999.

Comments: preprint of a paper to appear in the SIAM Journal of Computing

ACM Class: C.2.4; C.4

arXiv:cs/9908010 [pdf, ps, other]

On Propagating Updates in a Byzantine Environment

Authors: Dahlia Malkhi, Yishay Mansour, Michael Reiter

Abstract: We study how to efficiently diffuse updates to a large distributed system of data replicas, some of which may exhibit arbitrary (Byzantine) failures. We assume that strictly fewer than $t$ replicas fail, and that each update is initially received by at least $t$ correct replicas. The goal is to diffuse each update to all correct replicas while ensuring that correct replicas accept no updates gen… ▽ More We study how to efficiently diffuse updates to a large distributed system of data replicas, some of which may exhibit arbitrary (Byzantine) failures. We assume that strictly fewer than $t$ replicas fail, and that each update is initially received by at least $t$ correct replicas. The goal is to diffuse each update to all correct replicas while ensuring that correct replicas accept no updates generated spuriously by faulty replicas. To achieve reliable diffusion, each correct replica accepts an update only after receiving it from at least $t$ others. We provide the first analysis of epidemic-style protocols for such environments. This analysis is fundamentally different from known analyses for the benign case due to our treatment of fully Byzantine failures---which, among other things, precludes the use of digital signatures for authenticating forwarded updates. We propose two epidemic-style diffusion algorithms and two measures that characterize the efficiency of diffusion algorithms in general. We characterize both of our algorithms according to these measures, and also prove lower bounds with regards to these measures that show that our algorithms are close to optimal. △ Less

Submitted 12 August, 1999; originally announced August 1999.

ACM Class: C.2.0; C.2.4; C.4

arXiv:cs/9908009 [pdf, ps, other]

Secure Execution of Java Applets using a Remote Playground

Authors: Dahlia Malkhi, Michael Reiter

Abstract: Mobile code presents a number of threats to machines that execute it. We introduce an approach for protecting machines and the resources they hold from mobile code, and describe a system based on our approach for protecting host machines from Java 1.1 applets. In our approach, each Java applet downloaded to the protected domain is rerouted to a dedicated machine (or set of machines), the {\em pl… ▽ More Mobile code presents a number of threats to machines that execute it. We introduce an approach for protecting machines and the resources they hold from mobile code, and describe a system based on our approach for protecting host machines from Java 1.1 applets. In our approach, each Java applet downloaded to the protected domain is rerouted to a dedicated machine (or set of machines), the {\em playground}, at which it is executed. Prior to execution the applet is transformed to use the downloading user's web browser as a graphics terminal for its input and output, and so the user has the illusion that the applet is running on her own machine. In reality, however, mobile code runs only in the sanitized environment of the playground, where user files cannot be mounted and from which only limited network connections are accepted by machines in the protected domain. Our playground thus provides a second level of defense against mobile code that circumvents language-based defenses. The paper presents the design and implementation of a playground for Java 1.1 applets, and discusses extensions of it for other forms of mobile code including Java 1.2. △ Less

Submitted 12 August, 1999; originally announced August 1999.

Comments: preprint of a paper to appear in IEEE Transactions on Software Engineering

ACM Class: C.2.0

arXiv:cs/9908008 [pdf, ps, other]

Secure Multicast in a WAN

Authors: Dahlia Malkhi, Michael Merritt, Ohad Rodeh

Abstract: A secure reliable multicast protocol enables a process to send a message to a group of recipients such that all correct destinations receive the same message, despite the malicious efforts of fewer than a third of the total number of processes, including the sender. This has been sh own to be a useful tool in building secure distributed services, albeit with a cost that typically grows linearly… ▽ More A secure reliable multicast protocol enables a process to send a message to a group of recipients such that all correct destinations receive the same message, despite the malicious efforts of fewer than a third of the total number of processes, including the sender. This has been sh own to be a useful tool in building secure distributed services, albeit with a cost that typically grows linearly with the size of the system. For very large networks, for which this is prohibitive, we present two approaches for reducing the cost: First, we show a protocol whose cost is on the order of the number of tolerated failures. Secondly, we show how relaxing the consistency requirement to a probabilistic guarantee can reduce the associated cost, effectively to a constant. △ Less

Submitted 12 August, 1999; originally announced August 1999.

Comments: preprint of a paper to appear in the Distributed Computing Journal

ACM Class: c.2.0; c.2.4; c.4

Showing 1–29 of 29 results for author: Malkhi, D