-
FACTS About Building Retrieval Augmented Generation-based Chatbots
Authors:
Rama Akkiraju,
Anbang Xu,
Deepak Bora,
Tan Yu,
Lu An,
Vishal Seth,
Aaditya Shukla,
Pritam Gundecha,
Hridhay Mehta,
Ashwin Jha,
Prithvi Raj,
Abhinav Balasubramanian,
Murali Maram,
Guru Muthusamy,
Shivakesh Reddy Annepally,
Sidney Knowles,
Min Du,
Nick Burnett,
Sean Javiya,
Ashok Marannan,
Mamta Kumari,
Surbhi Jha,
Ethan Dereszenski,
Anupam Chakraborty,
Subhash Ranjan
, et al. (13 additional authors not shown)
Abstract:
Enterprise chatbots, powered by generative AI, are emerging as key applications to enhance employee productivity. Retrieval Augmented Generation (RAG), Large Language Models (LLMs), and orchestration frameworks like Langchain and Llamaindex are crucial for building these chatbots. However, creating effective enterprise chatbots is challenging and requires meticulous RAG pipeline engineering. This…
▽ More
Enterprise chatbots, powered by generative AI, are emerging as key applications to enhance employee productivity. Retrieval Augmented Generation (RAG), Large Language Models (LLMs), and orchestration frameworks like Langchain and Llamaindex are crucial for building these chatbots. However, creating effective enterprise chatbots is challenging and requires meticulous RAG pipeline engineering. This includes fine-tuning embeddings and LLMs, extracting documents from vector databases, rephrasing queries, reranking results, designing prompts, honoring document access controls, providing concise responses, including references, safeguarding personal information, and building orchestration agents. We present a framework for building RAG-based chatbots based on our experience with three NVIDIA chatbots: for IT/HR benefits, financial earnings, and general content. Our contributions are three-fold: introducing the FACTS framework (Freshness, Architectures, Cost, Testing, Security), presenting fifteen RAG pipeline control points, and providing empirical results on accuracy-latency tradeoffs between large and small LLMs. To the best of our knowledge, this is the first paper of its kind that provides a holistic view of the factors as well as solutions for building secure enterprise-grade chatbots."
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Predicting Visual Attention in Graphic Design Documents
Authors:
Souradeep Chakraborty,
Zijun Wei,
Conor Kelton,
Seoyoung Ahn,
Aruna Balasubramanian,
Gregory J. Zelinsky,
Dimitris Samaras
Abstract:
We present a model for predicting visual attention during the free viewing of graphic design documents. While existing works on this topic have aimed at predicting static saliency of graphic designs, our work is the first attempt to predict both spatial attention and dynamic temporal order in which the document regions are fixated by gaze using a deep learning based model. We propose a two-stage m…
▽ More
We present a model for predicting visual attention during the free viewing of graphic design documents. While existing works on this topic have aimed at predicting static saliency of graphic designs, our work is the first attempt to predict both spatial attention and dynamic temporal order in which the document regions are fixated by gaze using a deep learning based model. We propose a two-stage model for predicting dynamic attention on such documents, with webpages being our primary choice of document design for demonstration. In the first stage, we predict the saliency maps for each of the document components (e.g. logos, banners, texts, etc. for webpages) conditioned on the type of document layout. These component saliency maps are then jointly used to predict the overall document saliency. In the second stage, we use these layout-specific component saliency maps as the state representation for an inverse reinforcement learning model of fixation scanpath prediction during document viewing. To test our model, we collected a new dataset consisting of eye movements from 41 people freely viewing 450 webpages (the largest dataset of its kind). Experimental results show that our model outperforms existing models in both saliency and scanpath prediction for webpages, and also generalizes very well to other graphic design documents such as comics, posters, mobile UIs, etc. and natural images.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Decidability and Complexity of Decision Problems for Affine Continuous VASS
Authors:
A. R. Balasubramanian
Abstract:
Vector addition system with states (VASS) is a popular model for the verification of concurrent systems. VASS consists of finitely many control states and a set of counters which can be incremented and decremented, but not tested for zero. VASS is a relatively well-studied model of computation and many results regarding the decidability of decision problems for VASS are well-known. Given that the…
▽ More
Vector addition system with states (VASS) is a popular model for the verification of concurrent systems. VASS consists of finitely many control states and a set of counters which can be incremented and decremented, but not tested for zero. VASS is a relatively well-studied model of computation and many results regarding the decidability of decision problems for VASS are well-known. Given that the complexity of solving almost all problems for VASS is very high, various tractable over-approximations of the reachability relation of VASS have been proposed in the literature. One such tractable over-approximation is the so-called continuous VASS, in which counters are allowed to have non-negative rational values and whenever an update is performed, the update is first scaled by an arbitrary non-zero fraction.
In this paper, we consider affine continuous VASS, which extend continuous VASS by allowing integer affine operations. Affine continuous VASS serve as an over-approximation to the model of affine VASS, in the same way that continuous VASS over-approximates the reachability relation of VASS. We investigate the tractability of affine continuous VASS with respect to the reachability, coverability and state-reachability problems for different classes of affine operations and we prove an almost-complete classification of the decidability of these problems. Namely, except for the coverability problem for a single family of classes of affine operations, we completely determine the decidability status of these problems for all classes. Furthermore, except for this single family, we also complement all of our decidability results with tight complexity-theoretic upper and lower bounds.
△ Less
Submitted 17 May, 2024;
originally announced May 2024.
-
Reachability in Continuous Pushdown VASS
Authors:
A. R. Balasubramanian,
Rupak Majumdar,
Ramanathan S. Thinniyam,
Georg Zetzsche
Abstract:
Pushdown Vector Addition Systems with States (PVASS) consist of finitely many control states, a pushdown stack, and a set of counters that can be incremented and decremented, but not tested for zero. Whether the reachability problem is decidable for PVASS is a long-standing open problem.
We consider continuous PVASS, which are PVASS with a continuous semantics. This means, the counter values are…
▽ More
Pushdown Vector Addition Systems with States (PVASS) consist of finitely many control states, a pushdown stack, and a set of counters that can be incremented and decremented, but not tested for zero. Whether the reachability problem is decidable for PVASS is a long-standing open problem.
We consider continuous PVASS, which are PVASS with a continuous semantics. This means, the counter values are rational numbers and whenever a vector is added to the current counter values, this vector is first scaled with an arbitrarily chosen rational factor between zero and one. We show that reachability in continuous PVASS is NEXPTIME-complete. Our result is unusually robust: Reachability can be decided in NEXPTIME even if all numbers are specified in binary. On the other hand, NEXPTIME-hardness already holds for coverability, in fixed dimension, for bounded stack, and even if all numbers are specified in unary.
△ Less
Submitted 31 October, 2023; v1 submitted 25 October, 2023;
originally announced October 2023.
-
Transformer-based Sequence Labeling for Audio Classification based on MFCCs
Authors:
C. S. Sonali,
Chinmayi B S,
Ahana Balasubramanian
Abstract:
Audio classification is vital in areas such as speech and music recognition. Feature extraction from the audio signal, such as Mel-Spectrograms and MFCCs, is a critical step in audio classification. These features are transformed into spectrograms for classification. Researchers have explored various techniques, including traditional machine and deep learning methods to classify spectrograms, but…
▽ More
Audio classification is vital in areas such as speech and music recognition. Feature extraction from the audio signal, such as Mel-Spectrograms and MFCCs, is a critical step in audio classification. These features are transformed into spectrograms for classification. Researchers have explored various techniques, including traditional machine and deep learning methods to classify spectrograms, but these can be computationally expensive. To simplify this process, a more straightforward approach inspired by sequence classification in NLP can be used. This paper proposes a Transformer-encoder-based model for audio classification using MFCCs. The model was benchmarked against the ESC-50, Speech Commands v0.02 and UrbanSound8k datasets and has shown strong performance, with the highest accuracy of 95.2% obtained upon training the model on the UrbanSound8k dataset. The model consisted of a mere 127,544 total parameters, making it light-weight yet highly efficient at the audio classification task.
△ Less
Submitted 5 July, 2023; v1 submitted 30 April, 2023;
originally announced May 2023.
-
Parameterized Verification of Coverability in Infinite State Broadcast Networks
Authors:
A. R. Balasubramanian
Abstract:
Parameterized verification of coverability in broadcast networks with finite state processes has been studied for different types of models and topologies. In this paper, we attempt to develop a theory of broadcast networks in which the processes can be well-structured transition systems. The resulting formalism is called well-structured broadcast networks. For various types of communication topol…
▽ More
Parameterized verification of coverability in broadcast networks with finite state processes has been studied for different types of models and topologies. In this paper, we attempt to develop a theory of broadcast networks in which the processes can be well-structured transition systems. The resulting formalism is called well-structured broadcast networks. For various types of communication topologies, we prove the decidability of coverability in the static case, i.e, when the network topology is not allowed to change. We do this by showing that for these types of static communication topologies, the broadcast network itself is a well-structured transition system, hence proving the decidability of coverability in the broadcast network. We also give an algorithm to decide coverability of well-structured broadcast networks when reconfiguration of links between nodes is allowed. Finally, with minor modifications of this algorithm we prove decidability of coverability when the underlying process is a pushdown automaton.
△ Less
Submitted 21 April, 2023;
originally announced April 2023.
-
Coefficient Synthesis for Threshold Automata
Authors:
A. R. Balasubramanian
Abstract:
Threshold automata are a formalism for modeling fault-tolerant distributed algorithms. The main feature of threshold automata is the notion of a threshold guard, which allows us to compare the number of received messages with the total number of different types of processes. In this paper, we consider the coefficient synthesis problem for threshold automata, in which we are given a sketch of a thr…
▽ More
Threshold automata are a formalism for modeling fault-tolerant distributed algorithms. The main feature of threshold automata is the notion of a threshold guard, which allows us to compare the number of received messages with the total number of different types of processes. In this paper, we consider the coefficient synthesis problem for threshold automata, in which we are given a sketch of a threshold automaton (with the constants in the threshold guards left unspecified) and a specification and we want to synthesize a set of constants which when plugged into the sketch, gives a threshold automaton satisfying the specification. Our main result is that this problem is undecidable, even when the specification is a coverability specification and the underlying sketch is acyclic.
△ Less
Submitted 18 April, 2023;
originally announced April 2023.
-
Analyzing DCTCP and Cubic Buffer Sharing under Diverse Router Configurations
Authors:
Santiago Vargas,
Aruna Balasubramanian,
Srikanth Sundaresan
Abstract:
In this work, we look at the impact of router configurations on DCTCP and Cubic traffic when both algorithms share router buffers in the data center. Modern data centers host traffic with mixed congestion controls, including DCTCP and Cubic traffic. Both DCTCP and Cubic in the data center can compete with each other and potentially starve and/or be unfair to each other when sharing buffer space in…
▽ More
In this work, we look at the impact of router configurations on DCTCP and Cubic traffic when both algorithms share router buffers in the data center. Modern data centers host traffic with mixed congestion controls, including DCTCP and Cubic traffic. Both DCTCP and Cubic in the data center can compete with each other and potentially starve and/or be unfair to each other when sharing buffer space in the data center. This happens since both algorithms are at odds with each other in terms of buffer utilization paradigms where DCTCP attempts to limit buffer utilization while Cubic generally fills buffers to obtain high throughput. As a result, we propose methods for a measurement-driven analysis of DCTCP and Cubic performance when sharing buffers in data center routers via simulation. We run around 10000 simulation experiments with unique router configurations and network conditions. Afterwards, we present a generalizable ML model to capture the effect that different buffer settings have on DCTCP and Cubic streaming traffic in the data center. Finally, we suggest that this model can be used to tune buffer settings in the data center.
△ Less
Submitted 11 February, 2023;
originally announced February 2023.
-
Parameterized Analysis of Reconfigurable Broadcast Networks (Long Version)
Authors:
A. R. Balasubramanian,
Lucie Guillou,
Chana Weil-Kennedy
Abstract:
Reconfigurable broadcast networks (RBN) are a model of distributed computation in which agents can broadcast messages to other agents using some underlying communication topology which can change arbitrarily over the course of executions. In this paper, we conduct parameterized analysis of RBN. We consider cubes,(infinite) sets of configurations in the form of lower and upper bounds on the number…
▽ More
Reconfigurable broadcast networks (RBN) are a model of distributed computation in which agents can broadcast messages to other agents using some underlying communication topology which can change arbitrarily over the course of executions. In this paper, we conduct parameterized analysis of RBN. We consider cubes,(infinite) sets of configurations in the form of lower and upper bounds on the number of agents in each state, and we show that we can evaluate boolean combinations over cubes and reachability sets of cubes in PSPACE. In particular, reachability from a cube to another cube is a PSPACE-complete problem.
To prove the upper bound for this parameterized analysis, we prove some structural properties about the reachability sets and the symbolic graph abstraction of RBN, which might be of independent interest. We justify this claim by providing two applications of these results. First, we show that the almost-sure coverability problem is PSPACE-complete for RBN, thereby closing a complexity gap from a previous paper. Second, we define a computation model using RBN, à la population protocols, called RBN protocols. We characterize precisely the set of predicates that can be computed by such protocols.
△ Less
Submitted 11 July, 2022; v1 submitted 25 January, 2022;
originally announced January 2022.
-
Efficient Data Exchange in Unmanned Aerial Vehicle Networks Utilizing Unsupervised Learning-Based Clustering
Authors:
Hao Song,
Lingjia Liu,
Ananth Balasubramanian
Abstract:
An unmanned aerial vehicle (UAV) network can serve as an aerial relay to periodically receive packets from macro base stations (BSs). Severe packet loss may happen especially when UAVs have bad wireless connections to a BS. In this paper, a data exchange scheme is proposed utilizing unsupervised learning to enable efficient lost packet retrieval through reliable wireless transmissions between UAVs…
▽ More
An unmanned aerial vehicle (UAV) network can serve as an aerial relay to periodically receive packets from macro base stations (BSs). Severe packet loss may happen especially when UAVs have bad wireless connections to a BS. In this paper, a data exchange scheme is proposed utilizing unsupervised learning to enable efficient lost packet retrieval through reliable wireless transmissions between UAVs instead of through retransmissions of macro BSs with a longer delay and higher overhead. With the proposed scheme, all UAVs are assigned to multiple clusters and a UAV can only request its lost packets to UAVs in the same cluster. By this way, UAVs in different clusters could carry out their lost packets retrieval processes simultaneously to expedite data exchange. The agglomerative hierarchical clustering, a type of unsupervised learning, is used to conduct clustering, guaranteeing that UAVs clustered together could supply and supplement each other's lost packets. To further enhance data exchange efficiency, a data exchange mechanism is designed, where the priority of UAVs performing data exchange is determined by the number of their lost packets or the number of requested packets that they can provide. The introduced data exchange mechanism can make each request-reply process maximally beneficial to other UAVs' lost packet retrieval in the same cluster. A new random backoff procedure based on the carrier sense multiple access with collision avoidance (CSMA/CA) is designed to support the data exchange mechanism. Finally, simulation studies are conducted to demonstrate the effectiveness and superiority of our proposed data exchange scheme.
△ Less
Submitted 22 December, 2021;
originally announced December 2021.
-
Reconfigurable Broadcast Networks and Asynchronous Shared-Memory Systems are Equivalent
Authors:
A. R. Balasubramanian,
Chana Weil-Kennedy
Abstract:
We show the equivalence of two distributed computing models, namely reconfigurable broadcast networks (RBN) and asynchronous shared-memory systems (ASMS), that were introduced independently. Both RBN and ASMS are systems in which a collection of anonymous, finite-state processes run the same protocol. In RBN, the processes communicate by selective broadcast: a process can broadcast a message which…
▽ More
We show the equivalence of two distributed computing models, namely reconfigurable broadcast networks (RBN) and asynchronous shared-memory systems (ASMS), that were introduced independently. Both RBN and ASMS are systems in which a collection of anonymous, finite-state processes run the same protocol. In RBN, the processes communicate by selective broadcast: a process can broadcast a message which is received by all of its neighbors, and the set of neighbors of a process can change arbitrarily over time. In ASMS, the processes communicate by shared memory: a process can either write to or read from a shared register. Our main result is that RBN and ASMS can simulate each other, i.e. they are equivalent with respect to parameterized reachability, where we are given two (possibly infinite) sets of configurations C and C' defined by upper and lower bounds on the number of processes in each state and we would like to decide if some configuration in C can reach some configuration in C'. Using this simulation equivalence, we transfer results of RBN to ASMS and vice versa. Finally, we show that RBN and ASMS can simulate a third distributed model called immediate observation (IO) nets. Moreover, for a slightly stronger notion of simulation (which is satisfied by all the simulations given in this paper), we show that IO nets cannot simulate RBN.
△ Less
Submitted 16 September, 2021;
originally announced September 2021.
-
Reconfigurable Broadcast Networks and Asynchronous Shared-Memory Systems are Equivalent (Long Version)
Authors:
A. R. Balasubramanian,
Chana Weil-Kennedy
Abstract:
We show the equivalence of two distributed computing models, namely reconfigurable broadcast networks (RBN) and asynchronous shared-memory systems (ASMS), that were introduced independently. Both RBN and ASMS are systems in which a collection of anonymous, finite-state processes run the same protocol. In RBN, the processes communicate by selective broadcast: a process can broadcast a message which…
▽ More
We show the equivalence of two distributed computing models, namely reconfigurable broadcast networks (RBN) and asynchronous shared-memory systems (ASMS), that were introduced independently. Both RBN and ASMS are systems in which a collection of anonymous, finite-state processes run the same protocol. In RBN, the processes communicate by selective broadcast: a process can broadcast a message which is received by all of its neighbors, and the set of neighbors of a process can change arbitrarily over time. In ASMS, the processes communicate by shared memory: a process can either write to or read from a shared register. Our main result is that RBN and ASMS can simulate each other, i.e. they are equivalent with respect to parameterized reachability, where we are given two (possibly infinite) sets of configurations C and C' defined by upper and lower bounds on the number of processes in each state and we would like to decide if some configuration in C can reach some configuration in C'. Using this simulation equivalence, we transfer results of RBN to ASMS and vice versa. Finally, we show that RBN and ASMS can simulate a third distributed model called immediate observation (IO) nets. Moreover, for a slightly stronger notion of simulation (which is satisfied by all the simulations given in this paper), we show that IO nets cannot simulate RBN.
△ Less
Submitted 26 August, 2021; v1 submitted 17 August, 2021;
originally announced August 2021.
-
IrEne: Interpretable Energy Prediction for Transformers
Authors:
Qingqing Cao,
Yash Kumar Lal,
Harsh Trivedi,
Aruna Balasubramanian,
Niranjan Balasubramanian
Abstract:
Existing software-based energy measurements of NLP models are not accurate because they do not consider the complex interactions between energy consumption and model execution. We present IrEne, an interpretable and extensible energy prediction system that accurately predicts the inference energy consumption of a wide range of Transformer-based NLP models. IrEne constructs a model tree graph that…
▽ More
Existing software-based energy measurements of NLP models are not accurate because they do not consider the complex interactions between energy consumption and model execution. We present IrEne, an interpretable and extensible energy prediction system that accurately predicts the inference energy consumption of a wide range of Transformer-based NLP models. IrEne constructs a model tree graph that breaks down the NLP model into modules that are further broken down into low-level machine learning (ML) primitives. IrEne predicts the inference energy consumption of the ML primitives as a function of generalizable features and fine-grained runtime resource usage. IrEne then aggregates these low-level predictions recursively to predict the energy of each module and finally of the entire model. Experiments across multiple Transformer models show IrEne predicts inference energy consumption of transformer models with an error of under 7% compared to the ground truth. In contrast, existing energy models see an error of over 50%. We also show how IrEne can be used to conduct energy bottleneck analysis and to easily evaluate the energy impact of different architectural choices. We release the code and data at https://github.com/StonyBrookNLP/irene.
△ Less
Submitted 2 June, 2021;
originally announced June 2021.
-
Decidability and Complexity in Weakening and Contraction Hypersequent Substructural Logics
Authors:
A. R. Balasubramanian,
Timo Lang,
Revantha Ramanayake
Abstract:
We establish decidability for the infinitely many axiomatic extensions of the commutative Full Lambek logic with weakening FLew (i.e. IMALLW) that have a cut-free hypersequent proof calculus (specifically: every analytic structural rule extension). Decidability for the corresponding extensions of its contraction counterpart FLec was established recently but their computational complexity was left…
▽ More
We establish decidability for the infinitely many axiomatic extensions of the commutative Full Lambek logic with weakening FLew (i.e. IMALLW) that have a cut-free hypersequent proof calculus (specifically: every analytic structural rule extension). Decidability for the corresponding extensions of its contraction counterpart FLec was established recently but their computational complexity was left unanswered. In the second part of this paper, we introduce just enough on length functions for well-quasi-orderings and the fast-growing complexity classes to obtain complexity upper bounds for both the weakening and contraction extensions. A specific instance of this result yields the first complexity bound for the prominent fuzzy logic MTL (monoidal t-norm based logic) providing an answer to a long-standing open problem.
△ Less
Submitted 19 April, 2021;
originally announced April 2021.
-
Adaptive Synchronisation of Pushdown Automata
Authors:
A. R. Balasubramanian,
K. S. Thejaswini
Abstract:
We introduce the notion of adaptive synchronisation for pushdown automata, in which there is an external observer who has no knowledge about the current state of the pushdown automaton, but can observe the contents of the stack. The observer would then like to decide if it is possible to bring the automaton from any state into some predetermined state by giving inputs to it in an \emph{adaptive} m…
▽ More
We introduce the notion of adaptive synchronisation for pushdown automata, in which there is an external observer who has no knowledge about the current state of the pushdown automaton, but can observe the contents of the stack. The observer would then like to decide if it is possible to bring the automaton from any state into some predetermined state by giving inputs to it in an \emph{adaptive} manner, i.e., the next input letter to be given can depend on how the contents of the stack changed after the current input letter. We show that for non-deterministic pushdown automata, this problem is 2-EXPTIME-complete and for deterministic pushdown automata, we show EXPTIME-completeness.
To prove the lower bounds, we first introduce (different variants of) subset-synchronisation and show that these problems are polynomial-time equivalent with the adaptive synchronisation problem. We then prove hardness results for the subset-synchronisation problems. For proving the upper bounds, we consider the problem of deciding if a given alternating pushdown system has an accepting run with at most $k$ leaves and we provide an $n^{O(k^2)}$ time algorithm for this problem.
△ Less
Submitted 13 February, 2021;
originally announced February 2021.
-
Accelerating Deep Learning Inference via Learned Caches
Authors:
Arjun Balasubramanian,
Adarsh Kumar,
Yuhan Liu,
Han Cao,
Shivaram Venkataraman,
Aditya Akella
Abstract:
Deep Neural Networks (DNNs) are witnessing increased adoption in multiple domains owing to their high accuracy in solving real-world problems. However, this high accuracy has been achieved by building deeper networks, posing a fundamental challenge to the low latency inference desired by user-facing applications. Current low latency solutions trade-off on accuracy or fail to exploit the inherent t…
▽ More
Deep Neural Networks (DNNs) are witnessing increased adoption in multiple domains owing to their high accuracy in solving real-world problems. However, this high accuracy has been achieved by building deeper networks, posing a fundamental challenge to the low latency inference desired by user-facing applications. Current low latency solutions trade-off on accuracy or fail to exploit the inherent temporal locality in prediction serving workloads.
We observe that caching hidden layer outputs of the DNN can introduce a form of late-binding where inference requests only consume the amount of computation needed. This enables a mechanism for achieving low latencies, coupled with an ability to exploit temporal locality. However, traditional caching approaches incur high memory overheads and lookup latencies, leading us to design learned caches - caches that consist of simple ML models that are continuously updated. We present the design of GATI, an end-to-end prediction serving system that incorporates learned caches for low-latency DNN inference. Results show that GATI can reduce inference latency by up to 7.69X on realistic workloads.
△ Less
Submitted 18 January, 2021;
originally announced January 2021.
-
Bew: Towards Answering Business-Entity-Related Web Questions
Authors:
Qingqing Cao,
Oriana Riva,
Aruna Balasubramanian,
Niranjan Balasubramanian
Abstract:
We present BewQA, a system specifically designed to answer a class of questions that we call Bew questions. Bew questions are related to businesses/services such as restaurants, hotels, and movie theaters; for example, "Until what time is happy hour?". These questions are challenging to answer because the answers are found in open-domain Web, are present in short sentences without surrounding cont…
▽ More
We present BewQA, a system specifically designed to answer a class of questions that we call Bew questions. Bew questions are related to businesses/services such as restaurants, hotels, and movie theaters; for example, "Until what time is happy hour?". These questions are challenging to answer because the answers are found in open-domain Web, are present in short sentences without surrounding context, and are dynamic since the webpage information can be updated frequently. Under these conditions, existing QA systems perform poorly. We present a practical approach, called BewQA, that can answer Bew queries by mining a template of the business-related webpages and using the template to guide the search. We show how we can extract the template automatically by leveraging aggregator websites that aggregate information about business entities in a domain (e.g., restaurants). We answer a given question by identifying the section from the extracted template that is most likely to contain the answer. By doing so we can extract the answers even when the answer span does not have sufficient context. Importantly, BewQA does not require any training. We crowdsource a new dataset of 1066 Bew questions and ground-truth answers in the restaurant domain. Compared to state-of-the-art QA models, BewQA has a 27 percent point improvement in F1 score. Compared to a commercial search engine, BewQA answered correctly 29% more Bew questions.
△ Less
Submitted 10 December, 2020;
originally announced December 2020.
-
Finding Cut-Offs in Leaderless Rendez-Vous Protocols is Easy
Authors:
A. R. Balasubramanian,
Javier Esparza,
Mikhail Raskin
Abstract:
In rendez-vous protocols an arbitrarily large number of indistinguishable finite-state agents interact in pairs. The cut-off problem asks if there exists a number $B$ such that all initial configurations of the protocol with at least $B$ agents in a given initial state can reach a final configuration with all agents in a given final state. In a recent paper (Horn and Sangnier, CONCUR 2020), Horn a…
▽ More
In rendez-vous protocols an arbitrarily large number of indistinguishable finite-state agents interact in pairs. The cut-off problem asks if there exists a number $B$ such that all initial configurations of the protocol with at least $B$ agents in a given initial state can reach a final configuration with all agents in a given final state. In a recent paper (Horn and Sangnier, CONCUR 2020), Horn and Sangnier proved that the cut-off problem is decidable (and at least as hard as the Petri net reachability problem) for protocols with a leader, and in EXPSPACE for leaderless protocols. Further, for the special class of symmetric protocols they reduce these bounds to PSPACE and NP, respectively. The problem of lowering these upper bounds or finding matching lower bounds was left open. We show that the cut-off problem is P-complete for leaderless protocols and in NC for leaderless symmetric protocols. Further, we also consider a variant of the cut-off problem suggested in (Horn and Sangnier, CONCUR 2020), which we call the bounded-loss cut-off problem and prove that this problem is P-complete for leaderless protocols and NL-complete for leaderless symmetric protocols. Finally, by reusing some of the techniques applied for the analysis of leaderless protocols, we show that the cut-off problem for symmetric protocols with a leader is NP-complete, thereby improving upon all the elementary upper bounds of (Horn and Sangnier, CONCUR 2020).
△ Less
Submitted 11 October, 2023; v1 submitted 19 October, 2020;
originally announced October 2020.
-
Towards Accurate and Reliable Energy Measurement of NLP Models
Authors:
Qingqing Cao,
Aruna Balasubramanian,
Niranjan Balasubramanian
Abstract:
Accurate and reliable measurement of energy consumption is critical for making well-informed design choices when choosing and training large scale NLP models. In this work, we show that existing software-based energy measurements are not accurate because they do not take into account hardware differences and how resource utilization affects energy consumption. We conduct energy measurement experim…
▽ More
Accurate and reliable measurement of energy consumption is critical for making well-informed design choices when choosing and training large scale NLP models. In this work, we show that existing software-based energy measurements are not accurate because they do not take into account hardware differences and how resource utilization affects energy consumption. We conduct energy measurement experiments with four different models for a question answering task. We quantify the error of existing software-based energy measurements by using a hardware power meter that provides highly accurate energy measurements. Our key takeaway is the need for a more accurate energy estimation model that takes into account hardware variabilities and the non-linear relationship between resource utilization and energy consumption. We release the code and data at https://github.com/csarron/sustainlp2020-energy.
△ Less
Submitted 11 October, 2020;
originally announced October 2020.
-
Complexity of Verification and Synthesis of Threshold Automata
Authors:
A. R. Balasubramanian,
Javier Esparza,
Marijana Lazic
Abstract:
Threshold automata are a formalism for modeling and analyzing fault-tolerant distributed algorithms, recently introduced by Konnov, Veith, and Widder, describing protocols executed by a fixed but arbitrary number of processes. We conduct the first systematic study of the complexity of verification and synthesis problems for threshold automata. We prove that the coverability, reachability, safety,…
▽ More
Threshold automata are a formalism for modeling and analyzing fault-tolerant distributed algorithms, recently introduced by Konnov, Veith, and Widder, describing protocols executed by a fixed but arbitrary number of processes. We conduct the first systematic study of the complexity of verification and synthesis problems for threshold automata. We prove that the coverability, reachability, safety, and liveness problems are NP-complete, and that the bounded synthesis problem is $Σ_p^2$ complete. A key to our results is a novel characterization of the reachability relation of a threshold automaton as an existential Presburger formula. The characterization also leads to novel verification and synthesis algorithms. We report on an implementation, and provide experimental results.
△ Less
Submitted 13 July, 2020;
originally announced July 2020.
-
DeFormer: Decomposing Pre-trained Transformers for Faster Question Answering
Authors:
Qingqing Cao,
Harsh Trivedi,
Aruna Balasubramanian,
Niranjan Balasubramanian
Abstract:
Transformer-based QA models use input-wide self-attention -- i.e. across both the question and the input passage -- at all layers, causing them to be slow and memory-intensive. It turns out that we can get by without input-wide self-attention at all layers, especially in the lower layers. We introduce DeFormer, a decomposed transformer, which substitutes the full self-attention with question-wide…
▽ More
Transformer-based QA models use input-wide self-attention -- i.e. across both the question and the input passage -- at all layers, causing them to be slow and memory-intensive. It turns out that we can get by without input-wide self-attention at all layers, especially in the lower layers. We introduce DeFormer, a decomposed transformer, which substitutes the full self-attention with question-wide and passage-wide self-attentions in the lower layers. This allows for question-independent processing of the input text representations, which in turn enables pre-computing passage representations reducing runtime compute drastically. Furthermore, because DeFormer is largely similar to the original model, we can initialize DeFormer with the pre-training weights of a standard transformer, and directly fine-tune on the target QA dataset. We show DeFormer versions of BERT and XLNet can be used to speed up QA by over 4.3x and with simple distillation-based losses they incur only a 1% drop in accuracy. We open source the code at https://github.com/StonyBrookNLP/deformer.
△ Less
Submitted 2 May, 2020;
originally announced May 2020.
-
Characterizing consensus in the Heard-Of model
Authors:
A. R. Balasubramanian,
Igor Walukiewicz
Abstract:
The Heard-Of model is a simple and relatively expressive model of distributed computation. Because of this, it has gained a considerable attention of the verification community. We give a characterization of all algorithms solving consensus in a fragment of this model. The fragment is big enough to cover many prominent consensus algorithms. The characterization is purely syntactic: it is expressed…
▽ More
The Heard-Of model is a simple and relatively expressive model of distributed computation. Because of this, it has gained a considerable attention of the verification community. We give a characterization of all algorithms solving consensus in a fragment of this model. The fragment is big enough to cover many prominent consensus algorithms. The characterization is purely syntactic: it is expressed in terms of some conditions on the text of the algorithm. One of the recent methods of verification of distributed algorithms is to abstract an algorithm to the Heard-Of model and then to verify the abstract algorithm using semi-automatic procedures. Our results allow, in some cases, to avoid the second step in this methodology.
△ Less
Submitted 20 April, 2020;
originally announced April 2020.
-
Accelerating Deep Learning Inference via Freezing
Authors:
Adarsh Kumar,
Arjun Balasubramanian,
Shivaram Venkataraman,
Aditya Akella
Abstract:
Over the last few years, Deep Neural Networks (DNNs) have become ubiquitous owing to their high accuracy on real-world tasks. However, this increase in accuracy comes at the cost of computationally expensive models leading to higher prediction latencies. Prior efforts to reduce this latency such as quantization, model distillation, and any-time prediction models typically trade-off accuracy for pe…
▽ More
Over the last few years, Deep Neural Networks (DNNs) have become ubiquitous owing to their high accuracy on real-world tasks. However, this increase in accuracy comes at the cost of computationally expensive models leading to higher prediction latencies. Prior efforts to reduce this latency such as quantization, model distillation, and any-time prediction models typically trade-off accuracy for performance. In this work, we observe that caching intermediate layer outputs can help us avoid running all the layers of a DNN for a sizeable fraction of inference requests. We find that this can potentially reduce the number of effective layers by half for 91.58% of CIFAR-10 requests run on ResNet-18. We present Freeze Inference, a system that introduces approximate caching at each intermediate layer and we discuss techniques to reduce the cache size and improve the cache hit rate. Finally, we discuss some of the open research challenges in realizing such a design.
△ Less
Submitted 7 February, 2020;
originally announced February 2020.
-
Archipelago: A Scalable Low-Latency Serverless Platform
Authors:
Arjun Singhvi,
Kevin Houck,
Arjun Balasubramanian,
Mohammed Danish Shaikh,
Shivaram Venkataraman,
Aditya Akella
Abstract:
The increased use of micro-services to build web applications has spurred the rapid growth of Function-as-a-Service (FaaS) or serverless computing platforms. While FaaS simplifies provisioning and scaling for application developers, it introduces new challenges in resource management that need to be handled by the cloud provider. Our analysis of popular serverless workloads indicates that schedule…
▽ More
The increased use of micro-services to build web applications has spurred the rapid growth of Function-as-a-Service (FaaS) or serverless computing platforms. While FaaS simplifies provisioning and scaling for application developers, it introduces new challenges in resource management that need to be handled by the cloud provider. Our analysis of popular serverless workloads indicates that schedulers need to handle functions that are very short-lived, have unpredictable arrival patterns, and require expensive setup of sandboxes. The challenge of running a large number of such functions in a multi-tenant cluster makes existing scheduling frameworks unsuitable.
We present Archipelago, a platform that enables low latency request execution in a multi-tenant serverless setting. Archipelago views each application as a DAG of functions, and every DAG in associated with a latency deadline. Archipelago achieves its per-DAG request latency goals by: (1) partitioning a given cluster into a number of smaller worker pools, and associating each pool with a semi-global scheduler (SGS), (2) using a latency-aware scheduler within each SGS along with proactive sandbox allocation to reduce overheads, and (3) using a load balancing layer to route requests for different DAGs to the appropriate SGS, and automatically scale the number of SGSs per DAG. Our testbed results show that Archipelago meets the latency deadline for more than 99% of realistic application request workloads, and reduces tail latencies by up to 36X compared to state-of-the-art serverless platforms.
△ Less
Submitted 21 November, 2019;
originally announced November 2019.
-
Complexity of controlled bad sequences over finite sets of $\mathbb{N}^d$
Authors:
A. R. Balasubramanian
Abstract:
We provide upper and lower bounds for the length of controlled bad sequences over the majoring and the minoring orderings of finite sets of $\mathbb{N}^d$. The results are obtained by bounding the length of such sequences by functions from the Cichon hierarchy. This allows us to translate these results to bounds over the fast-growing complexity classes.
The obtained bounds are proven to be tight…
▽ More
We provide upper and lower bounds for the length of controlled bad sequences over the majoring and the minoring orderings of finite sets of $\mathbb{N}^d$. The results are obtained by bounding the length of such sequences by functions from the Cichon hierarchy. This allows us to translate these results to bounds over the fast-growing complexity classes.
The obtained bounds are proven to be tight for the majoring ordering, which solves a problem left open by Abriola, Figueira and Senno (Theor. Comp. Sci, Vol. 603). Finally, we use the results on controlled bad sequences to prove upper bounds for the emptiness problem of some classes of automata.
△ Less
Submitted 8 June, 2020; v1 submitted 4 September, 2019;
originally announced September 2019.
-
Themis: Fair and Efficient GPU Cluster Scheduling
Authors:
Kshiteej Mahajan,
Arjun Balasubramanian,
Arjun Singhvi,
Shivaram Venkataraman,
Aditya Akella,
Amar Phanishayee,
Shuchi Chawla
Abstract:
Modern distributed machine learning (ML) training workloads benefit significantly from leveraging GPUs. However, significant contention ensues when multiple such workloads are run atop a shared cluster of GPUs. A key question is how to fairly apportion GPUs across workloads. We find that established cluster scheduling disciplines are a poor fit because of ML workloads' unique attributes: ML jobs h…
▽ More
Modern distributed machine learning (ML) training workloads benefit significantly from leveraging GPUs. However, significant contention ensues when multiple such workloads are run atop a shared cluster of GPUs. A key question is how to fairly apportion GPUs across workloads. We find that established cluster scheduling disciplines are a poor fit because of ML workloads' unique attributes: ML jobs have long-running tasks that need to be gang-scheduled, and their performance is sensitive to tasks' relative placement.
We propose Themis, a new scheduling framework for ML training workloads. It's GPU allocation policy enforces that ML workloads complete in a finish-time fair manner, a new notion we introduce. To capture placement sensitivity and ensure efficiency, Themis uses a two-level scheduling architecture where ML workloads bid on available resources that are offered in an auction run by a central arbiter. Our auction design allocates GPUs to winning bids by trading off efficiency for fairness in the short term but ensuring finish-time fairness in the long term. Our evaluation on a production trace shows that Themis can improve fairness by more than 2.25X and is ~5% to 250% more cluster efficient in comparison to state-of-the-art schedulers.
△ Less
Submitted 29 October, 2019; v1 submitted 2 July, 2019;
originally announced July 2019.
-
Generalized threshold arrangements
Authors:
A. R. Balasubramanian
Abstract:
An arrangement of hyperplanes is a finite collection of hyperplanes in a real Euclidean space. To such a collection one associates the characteristic polynomial that encodes the combinatorics of intersections of the hyperplanes. Finding the characteristic polynomial of the Shi threshold and the Catalan threshold arrangements was an open problem in Stanley's list of problems in [1]. Seunghyun Seo s…
▽ More
An arrangement of hyperplanes is a finite collection of hyperplanes in a real Euclidean space. To such a collection one associates the characteristic polynomial that encodes the combinatorics of intersections of the hyperplanes. Finding the characteristic polynomial of the Shi threshold and the Catalan threshold arrangements was an open problem in Stanley's list of problems in [1]. Seunghyun Seo solved both the problems by clever arguments using the finite field method in [3,4]. However, in his paper, he left open the problem of computing the characteristic polynomial of a broader class of threshold arrangements, the so-called "generalized threshold" arrangements whose defining set of hyperplanes is given by $x_i + x_j = -l,-l+1,...,m-1,m$ for $1 \le i < j \le n$ where $l,m \in \mathbb{N}$. In this paper, we present a method for computing the characteristic polynomial of this family of arrangements.
△ Less
Submitted 18 April, 2019;
originally announced April 2019.
-
Fast, accurate, and transferable many-body interatomic potentials by symbolic regression
Authors:
Alberto Hernandez,
Adarsh Balasubramanian,
Fenglin Yuan,
Simon Mason,
Tim Mueller
Abstract:
The length and time scales of atomistic simulations are limited by the computational cost of the methods used to predict material properties. In recent years there has been great progress in the use of machine learning algorithms to develop fast and accurate interatomic potential models, but it remains a challenge to develop models that generalize well and are fast enough to be used at extreme tim…
▽ More
The length and time scales of atomistic simulations are limited by the computational cost of the methods used to predict material properties. In recent years there has been great progress in the use of machine learning algorithms to develop fast and accurate interatomic potential models, but it remains a challenge to develop models that generalize well and are fast enough to be used at extreme time and length scales. To address this challenge, we have developed a machine learning algorithm based on symbolic regression in the form of genetic programming that is capable of discovering accurate, computationally efficient manybody potential models. The key to our approach is to explore a hypothesis space of models based on fundamental physical principles and select models within this hypothesis space based on their accuracy, speed, and simplicity. The focus on simplicity reduces the risk of overfitting the training data and increases the chances of discovering a model that generalizes well. Our algorithm was validated by rediscovering an exact Lennard-Jones potential and a Sutton Chen embedded atom method potential from training data generated using these models. By using training data generated from density functional theory calculations, we found potential models for elemental copper that are simple, as fast as embedded atom models, and capable of accurately predicting properties outside of their training set. Our approach requires relatively small sets of training data, making it possible to generate training data using highly accurate methods at a reasonable computational cost. We present our approach, the forms of the discovered models, and assessments of their transferability, accuracy and speed.
△ Less
Submitted 17 August, 2019; v1 submitted 1 April, 2019;
originally announced April 2019.
-
Parameterized Verification of Coverability in Well-Structured Broadcast Networks
Authors:
A. R. Balasubramanian
Abstract:
Parameterized verification of coverability in broadcast networks with finite state processes has been studied for different types of models and topologies. In this paper, we attempt to develop a theory of broadcast networks in which the processes can be well-structured transition systems. The resulting formalism is called well-structured broadcast networks. We give an algorithm to decide coverabil…
▽ More
Parameterized verification of coverability in broadcast networks with finite state processes has been studied for different types of models and topologies. In this paper, we attempt to develop a theory of broadcast networks in which the processes can be well-structured transition systems. The resulting formalism is called well-structured broadcast networks. We give an algorithm to decide coverability of well-structured broadcast networks when reconfiguration of links between nodes is allowed. Further, for various types of communication topologies, we also prove the decidability of coverability in the static case as well. We do this by showing that for these types of static communication topologies, the broadcast network itself is a well-structured transition system, hence proving the decidability of coverability in the broadcast network.
△ Less
Submitted 9 September, 2018;
originally announced September 2018.
-
Parameterized verification of synchronization in constrained reconfigurable broadcast networks
Authors:
A. R. Balasubramanian,
Nathalie Bertrand,
Nicolas Markey
Abstract:
Reconfigurable broadcast networks provide a convenient formalism for modelling and reasoning about networks of mobile agents broadcasting messages to other agents following some (evolving) communication topology. The parameterized verification of such models aims at checking whether a given property holds irrespective of the initial configuration (number of agents, initial states and initial commu…
▽ More
Reconfigurable broadcast networks provide a convenient formalism for modelling and reasoning about networks of mobile agents broadcasting messages to other agents following some (evolving) communication topology. The parameterized verification of such models aims at checking whether a given property holds irrespective of the initial configuration (number of agents, initial states and initial communication topology). We focus here on the synchronization property, asking whether all agents converge to a set of target states after some execution. This problem is known to be decidable in polynomial time when no constraints are imposed on the evolution of the communication topology (while it is undecidable for static broadcast networks).
In this paper we investigate how various constraints on reconfigurations affect the decidability and complexity of the synchronization problem. In particular, we show that when bounding the number of reconfigured links between two communications steps by a constant, synchronization becomes undecidable; on the other hand, synchronization remains decidable in PTIME when the bound grows with the number of agents.
△ Less
Submitted 23 February, 2018;
originally announced February 2018.
-
MobiRNN: Efficient Recurrent Neural Network Execution on Mobile GPU
Authors:
Qingqing Cao,
Niranjan Balasubramanian,
Aruna Balasubramanian
Abstract:
In this paper, we explore optimizations to run Recurrent Neural Network (RNN) models locally on mobile devices. RNN models are widely used for Natural Language Processing, Machine Translation, and other tasks. However, existing mobile applications that use RNN models do so on the cloud. To address privacy and efficiency concerns, we show how RNN models can be run locally on mobile devices. Existin…
▽ More
In this paper, we explore optimizations to run Recurrent Neural Network (RNN) models locally on mobile devices. RNN models are widely used for Natural Language Processing, Machine Translation, and other tasks. However, existing mobile applications that use RNN models do so on the cloud. To address privacy and efficiency concerns, we show how RNN models can be run locally on mobile devices. Existing work on porting deep learning models to mobile devices focus on Convolution Neural Networks (CNNs) and cannot be applied directly to RNN models. In response, we present MobiRNN, a mobile-specific optimization framework that implements GPU offloading specifically for mobile GPUs. Evaluations using an RNN model for activity recognition shows that MobiRNN does significantly decrease the latency of running RNN models on phones.
△ Less
Submitted 2 June, 2017;
originally announced June 2017.
-
Secure Symmetrical Multilevel Diversity Coding
Authors:
Anantharaman Balasubramanian,
Hung D. Ly,
Shuo Li,
Tie Liu,
Scott L. Miller
Abstract:
Symmetrical Multilevel Diversity Coding (SMDC) is a network compression problem introduced by Roche (1992) and Yeung (1995). In this setting, a simple separate coding strategy known as superposition coding was shown to be optimal in terms of achieving the minimum sum rate (Roche, Yeung, and Hau, 1997) and the entire admissible rate region (Yeung and Zhang, 1999) of the problem. This paper consider…
▽ More
Symmetrical Multilevel Diversity Coding (SMDC) is a network compression problem introduced by Roche (1992) and Yeung (1995). In this setting, a simple separate coding strategy known as superposition coding was shown to be optimal in terms of achieving the minimum sum rate (Roche, Yeung, and Hau, 1997) and the entire admissible rate region (Yeung and Zhang, 1999) of the problem. This paper considers a natural generalization of SMDC to the secure communication setting with an additional eavesdropper. It is required that all sources need to be kept perfectly secret from the eavesdropper as long as the number of encoder outputs available at the eavesdropper is no more than a given threshold. First, the problem of encoding individual sources is studied. A precise characterization of the entire admissible rate region is established via a connection to the problem of secure coding over a three-layer wiretap network and utilizing some basic polyhedral structure of the admissible rate region. Building on this result, it is then shown that superposition coding remains optimal in terms of achieving the minimum sum rate for the general secure SMDC problem.
△ Less
Submitted 9 January, 2012;
originally announced January 2012.