subscribe to arXiv mailings

FACTS About Building Retrieval Augmented Generation-based Chatbots

Authors: Rama Akkiraju, Anbang Xu, Deepak Bora, Tan Yu, Lu An, Vishal Seth, Aaditya Shukla, Pritam Gundecha, Hridhay Mehta, Ashwin Jha, Prithvi Raj, Abhinav Balasubramanian, Murali Maram, Guru Muthusamy, Shivakesh Reddy Annepally, Sidney Knowles, Min Du, Nick Burnett, Sean Javiya, Ashok Marannan, Mamta Kumari, Surbhi Jha, Ethan Dereszenski, Anupam Chakraborty, Subhash Ranjan , et al. (13 additional authors not shown)

Abstract: Enterprise chatbots, powered by generative AI, are emerging as key applications to enhance employee productivity. Retrieval Augmented Generation (RAG), Large Language Models (LLMs), and orchestration frameworks like Langchain and Llamaindex are crucial for building these chatbots. However, creating effective enterprise chatbots is challenging and requires meticulous RAG pipeline engineering. This… ▽ More Enterprise chatbots, powered by generative AI, are emerging as key applications to enhance employee productivity. Retrieval Augmented Generation (RAG), Large Language Models (LLMs), and orchestration frameworks like Langchain and Llamaindex are crucial for building these chatbots. However, creating effective enterprise chatbots is challenging and requires meticulous RAG pipeline engineering. This includes fine-tuning embeddings and LLMs, extracting documents from vector databases, rephrasing queries, reranking results, designing prompts, honoring document access controls, providing concise responses, including references, safeguarding personal information, and building orchestration agents. We present a framework for building RAG-based chatbots based on our experience with three NVIDIA chatbots: for IT/HR benefits, financial earnings, and general content. Our contributions are three-fold: introducing the FACTS framework (Freshness, Architectures, Cost, Testing, Security), presenting fifteen RAG pipeline control points, and providing empirical results on accuracy-latency tradeoffs between large and small LLMs. To the best of our knowledge, this is the first paper of its kind that provides a holistic view of the factors as well as solutions for building secure enterprise-grade chatbots." △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: 8 pages, 6 figures, 2 tables, Preprint submission to ACM CIKM 2024

arXiv:2407.02439 [pdf, other]

doi 10.1109/TMM.2022.3176942

Predicting Visual Attention in Graphic Design Documents

Authors: Souradeep Chakraborty, Zijun Wei, Conor Kelton, Seoyoung Ahn, Aruna Balasubramanian, Gregory J. Zelinsky, Dimitris Samaras

Abstract: We present a model for predicting visual attention during the free viewing of graphic design documents. While existing works on this topic have aimed at predicting static saliency of graphic designs, our work is the first attempt to predict both spatial attention and dynamic temporal order in which the document regions are fixated by gaze using a deep learning based model. We propose a two-stage m… ▽ More We present a model for predicting visual attention during the free viewing of graphic design documents. While existing works on this topic have aimed at predicting static saliency of graphic designs, our work is the first attempt to predict both spatial attention and dynamic temporal order in which the document regions are fixated by gaze using a deep learning based model. We propose a two-stage model for predicting dynamic attention on such documents, with webpages being our primary choice of document design for demonstration. In the first stage, we predict the saliency maps for each of the document components (e.g. logos, banners, texts, etc. for webpages) conditioned on the type of document layout. These component saliency maps are then jointly used to predict the overall document saliency. In the second stage, we use these layout-specific component saliency maps as the state representation for an inverse reinforcement learning model of fixation scanpath prediction during document viewing. To test our model, we collected a new dataset consisting of eye movements from 41 people freely viewing 450 webpages (the largest dataset of its kind). Experimental results show that our model outperforms existing models in both saliency and scanpath prediction for webpages, and also generalizes very well to other graphic design documents such as comics, posters, mobile UIs, etc. and natural images. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Journal ref: IEEE Transactions on Multimedia 25 (2022): 4478-4493

arXiv:2405.11085 [pdf, ps, other]

Decidability and Complexity of Decision Problems for Affine Continuous VASS

Authors: A. R. Balasubramanian

Abstract: Vector addition system with states (VASS) is a popular model for the verification of concurrent systems. VASS consists of finitely many control states and a set of counters which can be incremented and decremented, but not tested for zero. VASS is a relatively well-studied model of computation and many results regarding the decidability of decision problems for VASS are well-known. Given that the… ▽ More Vector addition system with states (VASS) is a popular model for the verification of concurrent systems. VASS consists of finitely many control states and a set of counters which can be incremented and decremented, but not tested for zero. VASS is a relatively well-studied model of computation and many results regarding the decidability of decision problems for VASS are well-known. Given that the complexity of solving almost all problems for VASS is very high, various tractable over-approximations of the reachability relation of VASS have been proposed in the literature. One such tractable over-approximation is the so-called continuous VASS, in which counters are allowed to have non-negative rational values and whenever an update is performed, the update is first scaled by an arbitrary non-zero fraction. In this paper, we consider affine continuous VASS, which extend continuous VASS by allowing integer affine operations. Affine continuous VASS serve as an over-approximation to the model of affine VASS, in the same way that continuous VASS over-approximates the reachability relation of VASS. We investigate the tractability of affine continuous VASS with respect to the reachability, coverability and state-reachability problems for different classes of affine operations and we prove an almost-complete classification of the decidability of these problems. Namely, except for the coverability problem for a single family of classes of affine operations, we completely determine the decidability status of these problems for all classes. Furthermore, except for this single family, we also complement all of our decidability results with tight complexity-theoretic upper and lower bounds. △ Less

Submitted 17 May, 2024; originally announced May 2024.

arXiv:2310.16798 [pdf, other]

Reachability in Continuous Pushdown VASS

Authors: A. R. Balasubramanian, Rupak Majumdar, Ramanathan S. Thinniyam, Georg Zetzsche

Abstract: Pushdown Vector Addition Systems with States (PVASS) consist of finitely many control states, a pushdown stack, and a set of counters that can be incremented and decremented, but not tested for zero. Whether the reachability problem is decidable for PVASS is a long-standing open problem. We consider continuous PVASS, which are PVASS with a continuous semantics. This means, the counter values are… ▽ More Pushdown Vector Addition Systems with States (PVASS) consist of finitely many control states, a pushdown stack, and a set of counters that can be incremented and decremented, but not tested for zero. Whether the reachability problem is decidable for PVASS is a long-standing open problem. We consider continuous PVASS, which are PVASS with a continuous semantics. This means, the counter values are rational numbers and whenever a vector is added to the current counter values, this vector is first scaled with an arbitrarily chosen rational factor between zero and one. We show that reachability in continuous PVASS is NEXPTIME-complete. Our result is unusually robust: Reachability can be decided in NEXPTIME even if all numbers are specified in binary. On the other hand, NEXPTIME-hardness already holds for coverability, in fixed dimension, for bounded stack, and even if all numbers are specified in unary. △ Less

Submitted 31 October, 2023; v1 submitted 25 October, 2023; originally announced October 2023.

arXiv:2305.00417

Transformer-based Sequence Labeling for Audio Classification based on MFCCs

Authors: C. S. Sonali, Chinmayi B S, Ahana Balasubramanian

Abstract: Audio classification is vital in areas such as speech and music recognition. Feature extraction from the audio signal, such as Mel-Spectrograms and MFCCs, is a critical step in audio classification. These features are transformed into spectrograms for classification. Researchers have explored various techniques, including traditional machine and deep learning methods to classify spectrograms, but… ▽ More Audio classification is vital in areas such as speech and music recognition. Feature extraction from the audio signal, such as Mel-Spectrograms and MFCCs, is a critical step in audio classification. These features are transformed into spectrograms for classification. Researchers have explored various techniques, including traditional machine and deep learning methods to classify spectrograms, but these can be computationally expensive. To simplify this process, a more straightforward approach inspired by sequence classification in NLP can be used. This paper proposes a Transformer-encoder-based model for audio classification using MFCCs. The model was benchmarked against the ESC-50, Speech Commands v0.02 and UrbanSound8k datasets and has shown strong performance, with the highest accuracy of 95.2% obtained upon training the model on the UrbanSound8k dataset. The model consisted of a mere 127,544 total parameters, making it light-weight yet highly efficient at the audio classification task. △ Less

Submitted 5 July, 2023; v1 submitted 30 April, 2023; originally announced May 2023.

Comments: Error in the explanation as well inadequate results and conclusion

arXiv:2304.13065 [pdf, other]

Parameterized Verification of Coverability in Infinite State Broadcast Networks

Authors: A. R. Balasubramanian

Abstract: Parameterized verification of coverability in broadcast networks with finite state processes has been studied for different types of models and topologies. In this paper, we attempt to develop a theory of broadcast networks in which the processes can be well-structured transition systems. The resulting formalism is called well-structured broadcast networks. For various types of communication topol… ▽ More Parameterized verification of coverability in broadcast networks with finite state processes has been studied for different types of models and topologies. In this paper, we attempt to develop a theory of broadcast networks in which the processes can be well-structured transition systems. The resulting formalism is called well-structured broadcast networks. For various types of communication topologies, we prove the decidability of coverability in the static case, i.e, when the network topology is not allowed to change. We do this by showing that for these types of static communication topologies, the broadcast network itself is a well-structured transition system, hence proving the decidability of coverability in the broadcast network. We also give an algorithm to decide coverability of well-structured broadcast networks when reconfiguration of links between nodes is allowed. Finally, with minor modifications of this algorithm we prove decidability of coverability when the underlying process is a pushdown automaton. △ Less

Submitted 21 April, 2023; originally announced April 2023.

Comments: Full journal version of arXiv:1809.03099

arXiv:2304.08917 [pdf, ps, other]

Coefficient Synthesis for Threshold Automata

Authors: A. R. Balasubramanian

Abstract: Threshold automata are a formalism for modeling fault-tolerant distributed algorithms. The main feature of threshold automata is the notion of a threshold guard, which allows us to compare the number of received messages with the total number of different types of processes. In this paper, we consider the coefficient synthesis problem for threshold automata, in which we are given a sketch of a thr… ▽ More Threshold automata are a formalism for modeling fault-tolerant distributed algorithms. The main feature of threshold automata is the notion of a threshold guard, which allows us to compare the number of received messages with the total number of different types of processes. In this paper, we consider the coefficient synthesis problem for threshold automata, in which we are given a sketch of a threshold automaton (with the constants in the threshold guards left unspecified) and a specification and we want to synthesize a set of constants which when plugged into the sketch, gives a threshold automaton satisfying the specification. Our main result is that this problem is undecidable, even when the specification is a coverability specification and the underlying sketch is acyclic. △ Less

Submitted 18 April, 2023; originally announced April 2023.

arXiv:2302.05771 [pdf, other]

Analyzing DCTCP and Cubic Buffer Sharing under Diverse Router Configurations

Authors: Santiago Vargas, Aruna Balasubramanian, Srikanth Sundaresan

Abstract: In this work, we look at the impact of router configurations on DCTCP and Cubic traffic when both algorithms share router buffers in the data center. Modern data centers host traffic with mixed congestion controls, including DCTCP and Cubic traffic. Both DCTCP and Cubic in the data center can compete with each other and potentially starve and/or be unfair to each other when sharing buffer space in… ▽ More In this work, we look at the impact of router configurations on DCTCP and Cubic traffic when both algorithms share router buffers in the data center. Modern data centers host traffic with mixed congestion controls, including DCTCP and Cubic traffic. Both DCTCP and Cubic in the data center can compete with each other and potentially starve and/or be unfair to each other when sharing buffer space in the data center. This happens since both algorithms are at odds with each other in terms of buffer utilization paradigms where DCTCP attempts to limit buffer utilization while Cubic generally fills buffers to obtain high throughput. As a result, we propose methods for a measurement-driven analysis of DCTCP and Cubic performance when sharing buffers in data center routers via simulation. We run around 10000 simulation experiments with unique router configurations and network conditions. Afterwards, we present a generalizable ML model to capture the effect that different buffer settings have on DCTCP and Cubic streaming traffic in the data center. Finally, we suggest that this model can be used to tune buffer settings in the data center. △ Less

Submitted 11 February, 2023; originally announced February 2023.

arXiv:2201.10432 [pdf, ps, other]

Parameterized Analysis of Reconfigurable Broadcast Networks (Long Version)

Authors: A. R. Balasubramanian, Lucie Guillou, Chana Weil-Kennedy

Abstract: Reconfigurable broadcast networks (RBN) are a model of distributed computation in which agents can broadcast messages to other agents using some underlying communication topology which can change arbitrarily over the course of executions. In this paper, we conduct parameterized analysis of RBN. We consider cubes,(infinite) sets of configurations in the form of lower and upper bounds on the number… ▽ More Reconfigurable broadcast networks (RBN) are a model of distributed computation in which agents can broadcast messages to other agents using some underlying communication topology which can change arbitrarily over the course of executions. In this paper, we conduct parameterized analysis of RBN. We consider cubes,(infinite) sets of configurations in the form of lower and upper bounds on the number of agents in each state, and we show that we can evaluate boolean combinations over cubes and reachability sets of cubes in PSPACE. In particular, reachability from a cube to another cube is a PSPACE-complete problem. To prove the upper bound for this parameterized analysis, we prove some structural properties about the reachability sets and the symbolic graph abstraction of RBN, which might be of independent interest. We justify this claim by providing two applications of these results. First, we show that the almost-sure coverability problem is PSPACE-complete for RBN, thereby closing a complexity gap from a previous paper. Second, we define a computation model using RBN, à la population protocols, called RBN protocols. We characterize precisely the set of predicates that can be computed by such protocols. △ Less

Submitted 11 July, 2022; v1 submitted 25 January, 2022; originally announced January 2022.

Comments: This is the long version of a paper accepted at FoSSaCS 2022. Erratum: The proof of Theorem 2 contains a mistake, kindly pointed out by Nicolas Waldburger. We are working on a solution

arXiv:2112.14704 [pdf, ps, other]

Efficient Data Exchange in Unmanned Aerial Vehicle Networks Utilizing Unsupervised Learning-Based Clustering

Authors: Hao Song, Lingjia Liu, Ananth Balasubramanian

Abstract: An unmanned aerial vehicle (UAV) network can serve as an aerial relay to periodically receive packets from macro base stations (BSs). Severe packet loss may happen especially when UAVs have bad wireless connections to a BS. In this paper, a data exchange scheme is proposed utilizing unsupervised learning to enable efficient lost packet retrieval through reliable wireless transmissions between UAVs… ▽ More An unmanned aerial vehicle (UAV) network can serve as an aerial relay to periodically receive packets from macro base stations (BSs). Severe packet loss may happen especially when UAVs have bad wireless connections to a BS. In this paper, a data exchange scheme is proposed utilizing unsupervised learning to enable efficient lost packet retrieval through reliable wireless transmissions between UAVs instead of through retransmissions of macro BSs with a longer delay and higher overhead. With the proposed scheme, all UAVs are assigned to multiple clusters and a UAV can only request its lost packets to UAVs in the same cluster. By this way, UAVs in different clusters could carry out their lost packets retrieval processes simultaneously to expedite data exchange. The agglomerative hierarchical clustering, a type of unsupervised learning, is used to conduct clustering, guaranteeing that UAVs clustered together could supply and supplement each other's lost packets. To further enhance data exchange efficiency, a data exchange mechanism is designed, where the priority of UAVs performing data exchange is determined by the number of their lost packets or the number of requested packets that they can provide. The introduced data exchange mechanism can make each request-reply process maximally beneficial to other UAVs' lost packet retrieval in the same cluster. A new random backoff procedure based on the carrier sense multiple access with collision avoidance (CSMA/CA) is designed to support the data exchange mechanism. Finally, simulation studies are conducted to demonstrate the effectiveness and superiority of our proposed data exchange scheme. △ Less

Submitted 22 December, 2021; originally announced December 2021.

Comments: 9 pages, 8 figures

arXiv:2109.08315 [pdf, ps, other]

doi 10.4204/EPTCS.346.2

Reconfigurable Broadcast Networks and Asynchronous Shared-Memory Systems are Equivalent

Authors: A. R. Balasubramanian, Chana Weil-Kennedy

Abstract: We show the equivalence of two distributed computing models, namely reconfigurable broadcast networks (RBN) and asynchronous shared-memory systems (ASMS), that were introduced independently. Both RBN and ASMS are systems in which a collection of anonymous, finite-state processes run the same protocol. In RBN, the processes communicate by selective broadcast: a process can broadcast a message which… ▽ More We show the equivalence of two distributed computing models, namely reconfigurable broadcast networks (RBN) and asynchronous shared-memory systems (ASMS), that were introduced independently. Both RBN and ASMS are systems in which a collection of anonymous, finite-state processes run the same protocol. In RBN, the processes communicate by selective broadcast: a process can broadcast a message which is received by all of its neighbors, and the set of neighbors of a process can change arbitrarily over time. In ASMS, the processes communicate by shared memory: a process can either write to or read from a shared register. Our main result is that RBN and ASMS can simulate each other, i.e. they are equivalent with respect to parameterized reachability, where we are given two (possibly infinite) sets of configurations C and C' defined by upper and lower bounds on the number of processes in each state and we would like to decide if some configuration in C can reach some configuration in C'. Using this simulation equivalence, we transfer results of RBN to ASMS and vice versa. Finally, we show that RBN and ASMS can simulate a third distributed model called immediate observation (IO) nets. Moreover, for a slightly stronger notion of simulation (which is satisfied by all the simulations given in this paper), we show that IO nets cannot simulate RBN. △ Less

Submitted 16 September, 2021; originally announced September 2021.

Comments: In Proceedings GandALF 2021, arXiv:2109.07798. A long version of this paper, containing all proofs, appears at arXiv:2108.07510

Journal ref: EPTCS 346, 2021, pp. 18-34

arXiv:2108.07510 [pdf, ps, other]

Reconfigurable Broadcast Networks and Asynchronous Shared-Memory Systems are Equivalent (Long Version)

Authors: A. R. Balasubramanian, Chana Weil-Kennedy

Abstract: We show the equivalence of two distributed computing models, namely reconfigurable broadcast networks (RBN) and asynchronous shared-memory systems (ASMS), that were introduced independently. Both RBN and ASMS are systems in which a collection of anonymous, finite-state processes run the same protocol. In RBN, the processes communicate by selective broadcast: a process can broadcast a message which… ▽ More We show the equivalence of two distributed computing models, namely reconfigurable broadcast networks (RBN) and asynchronous shared-memory systems (ASMS), that were introduced independently. Both RBN and ASMS are systems in which a collection of anonymous, finite-state processes run the same protocol. In RBN, the processes communicate by selective broadcast: a process can broadcast a message which is received by all of its neighbors, and the set of neighbors of a process can change arbitrarily over time. In ASMS, the processes communicate by shared memory: a process can either write to or read from a shared register. Our main result is that RBN and ASMS can simulate each other, i.e. they are equivalent with respect to parameterized reachability, where we are given two (possibly infinite) sets of configurations C and C' defined by upper and lower bounds on the number of processes in each state and we would like to decide if some configuration in C can reach some configuration in C'. Using this simulation equivalence, we transfer results of RBN to ASMS and vice versa. Finally, we show that RBN and ASMS can simulate a third distributed model called immediate observation (IO) nets. Moreover, for a slightly stronger notion of simulation (which is satisfied by all the simulations given in this paper), we show that IO nets cannot simulate RBN. △ Less

Submitted 26 August, 2021; v1 submitted 17 August, 2021; originally announced August 2021.

Comments: Long version of the paper accepted at Gandalf 2021

arXiv:2106.01199 [pdf, other]

IrEne: Interpretable Energy Prediction for Transformers

Authors: Qingqing Cao, Yash Kumar Lal, Harsh Trivedi, Aruna Balasubramanian, Niranjan Balasubramanian

Abstract: Existing software-based energy measurements of NLP models are not accurate because they do not consider the complex interactions between energy consumption and model execution. We present IrEne, an interpretable and extensible energy prediction system that accurately predicts the inference energy consumption of a wide range of Transformer-based NLP models. IrEne constructs a model tree graph that… ▽ More Existing software-based energy measurements of NLP models are not accurate because they do not consider the complex interactions between energy consumption and model execution. We present IrEne, an interpretable and extensible energy prediction system that accurately predicts the inference energy consumption of a wide range of Transformer-based NLP models. IrEne constructs a model tree graph that breaks down the NLP model into modules that are further broken down into low-level machine learning (ML) primitives. IrEne predicts the inference energy consumption of the ML primitives as a function of generalizable features and fine-grained runtime resource usage. IrEne then aggregates these low-level predictions recursively to predict the energy of each module and finally of the entire model. Experiments across multiple Transformer models show IrEne predicts inference energy consumption of transformer models with an error of under 7% compared to the ground truth. In contrast, existing energy models see an error of over 50%. We also show how IrEne can be used to conduct energy bottleneck analysis and to easily evaluate the energy impact of different architectural choices. We release the code and data at https://github.com/StonyBrookNLP/irene. △ Less

Submitted 2 June, 2021; originally announced June 2021.

Comments: ACL 2021 camera ready

arXiv:2104.09716 [pdf, ps, other]

Decidability and Complexity in Weakening and Contraction Hypersequent Substructural Logics

Authors: A. R. Balasubramanian, Timo Lang, Revantha Ramanayake

Abstract: We establish decidability for the infinitely many axiomatic extensions of the commutative Full Lambek logic with weakening FLew (i.e. IMALLW) that have a cut-free hypersequent proof calculus (specifically: every analytic structural rule extension). Decidability for the corresponding extensions of its contraction counterpart FLec was established recently but their computational complexity was left… ▽ More We establish decidability for the infinitely many axiomatic extensions of the commutative Full Lambek logic with weakening FLew (i.e. IMALLW) that have a cut-free hypersequent proof calculus (specifically: every analytic structural rule extension). Decidability for the corresponding extensions of its contraction counterpart FLec was established recently but their computational complexity was left unanswered. In the second part of this paper, we introduce just enough on length functions for well-quasi-orderings and the fast-growing complexity classes to obtain complexity upper bounds for both the weakening and contraction extensions. A specific instance of this result yields the first complexity bound for the prominent fuzzy logic MTL (monoidal t-norm based logic) providing an answer to a long-standing open problem. △ Less

Submitted 19 April, 2021; originally announced April 2021.

Comments: Accepted for publication in the proceedings of LICS 2021

arXiv:2102.06897 [pdf, other]

Adaptive Synchronisation of Pushdown Automata

Authors: A. R. Balasubramanian, K. S. Thejaswini

Abstract: We introduce the notion of adaptive synchronisation for pushdown automata, in which there is an external observer who has no knowledge about the current state of the pushdown automaton, but can observe the contents of the stack. The observer would then like to decide if it is possible to bring the automaton from any state into some predetermined state by giving inputs to it in an \emph{adaptive} m… ▽ More We introduce the notion of adaptive synchronisation for pushdown automata, in which there is an external observer who has no knowledge about the current state of the pushdown automaton, but can observe the contents of the stack. The observer would then like to decide if it is possible to bring the automaton from any state into some predetermined state by giving inputs to it in an \emph{adaptive} manner, i.e., the next input letter to be given can depend on how the contents of the stack changed after the current input letter. We show that for non-deterministic pushdown automata, this problem is 2-EXPTIME-complete and for deterministic pushdown automata, we show EXPTIME-completeness. To prove the lower bounds, we first introduce (different variants of) subset-synchronisation and show that these problems are polynomial-time equivalent with the adaptive synchronisation problem. We then prove hardness results for the subset-synchronisation problems. For proving the upper bounds, we consider the problem of deciding if a given alternating pushdown system has an accepting run with at most $k$ leaves and we provide an $n^{O(k^2)}$ time algorithm for this problem. △ Less

Submitted 13 February, 2021; originally announced February 2021.

Comments: 29 pages, 5 figures

MSC Class: 68Q45; 68Q17

arXiv:2101.07344 [pdf, other]

Accelerating Deep Learning Inference via Learned Caches

Authors: Arjun Balasubramanian, Adarsh Kumar, Yuhan Liu, Han Cao, Shivaram Venkataraman, Aditya Akella

Abstract: Deep Neural Networks (DNNs) are witnessing increased adoption in multiple domains owing to their high accuracy in solving real-world problems. However, this high accuracy has been achieved by building deeper networks, posing a fundamental challenge to the low latency inference desired by user-facing applications. Current low latency solutions trade-off on accuracy or fail to exploit the inherent t… ▽ More Deep Neural Networks (DNNs) are witnessing increased adoption in multiple domains owing to their high accuracy in solving real-world problems. However, this high accuracy has been achieved by building deeper networks, posing a fundamental challenge to the low latency inference desired by user-facing applications. Current low latency solutions trade-off on accuracy or fail to exploit the inherent temporal locality in prediction serving workloads. We observe that caching hidden layer outputs of the DNN can introduce a form of late-binding where inference requests only consume the amount of computation needed. This enables a mechanism for achieving low latencies, coupled with an ability to exploit temporal locality. However, traditional caching approaches incur high memory overheads and lookup latencies, leading us to design learned caches - caches that consist of simple ML models that are continuously updated. We present the design of GATI, an end-to-end prediction serving system that incorporates learned caches for low-latency DNN inference. Results show that GATI can reduce inference latency by up to 7.69X on realistic workloads. △ Less

Submitted 18 January, 2021; originally announced January 2021.

arXiv:2012.05818 [pdf, other]

Bew: Towards Answering Business-Entity-Related Web Questions

Authors: Qingqing Cao, Oriana Riva, Aruna Balasubramanian, Niranjan Balasubramanian

Abstract: We present BewQA, a system specifically designed to answer a class of questions that we call Bew questions. Bew questions are related to businesses/services such as restaurants, hotels, and movie theaters; for example, "Until what time is happy hour?". These questions are challenging to answer because the answers are found in open-domain Web, are present in short sentences without surrounding cont… ▽ More We present BewQA, a system specifically designed to answer a class of questions that we call Bew questions. Bew questions are related to businesses/services such as restaurants, hotels, and movie theaters; for example, "Until what time is happy hour?". These questions are challenging to answer because the answers are found in open-domain Web, are present in short sentences without surrounding context, and are dynamic since the webpage information can be updated frequently. Under these conditions, existing QA systems perform poorly. We present a practical approach, called BewQA, that can answer Bew queries by mining a template of the business-related webpages and using the template to guide the search. We show how we can extract the template automatically by leveraging aggregator websites that aggregate information about business entities in a domain (e.g., restaurants). We answer a given question by identifying the section from the extracted template that is most likely to contain the answer. By doing so we can extract the answers even when the answer span does not have sufficient context. Importantly, BewQA does not require any training. We crowdsource a new dataset of 1066 Bew questions and ground-truth answers in the restaurant domain. Compared to state-of-the-art QA models, BewQA has a 27 percent point improvement in F1 score. Compared to a commercial search engine, BewQA answered correctly 29% more Bew questions. △ Less

Submitted 10 December, 2020; originally announced December 2020.

arXiv:2010.09471 [pdf, other]

doi 10.46298/lmcs-19(4:2)2023

Finding Cut-Offs in Leaderless Rendez-Vous Protocols is Easy

Authors: A. R. Balasubramanian, Javier Esparza, Mikhail Raskin

Abstract: In rendez-vous protocols an arbitrarily large number of indistinguishable finite-state agents interact in pairs. The cut-off problem asks if there exists a number $B$ such that all initial configurations of the protocol with at least $B$ agents in a given initial state can reach a final configuration with all agents in a given final state. In a recent paper (Horn and Sangnier, CONCUR 2020), Horn a… ▽ More In rendez-vous protocols an arbitrarily large number of indistinguishable finite-state agents interact in pairs. The cut-off problem asks if there exists a number $B$ such that all initial configurations of the protocol with at least $B$ agents in a given initial state can reach a final configuration with all agents in a given final state. In a recent paper (Horn and Sangnier, CONCUR 2020), Horn and Sangnier proved that the cut-off problem is decidable (and at least as hard as the Petri net reachability problem) for protocols with a leader, and in EXPSPACE for leaderless protocols. Further, for the special class of symmetric protocols they reduce these bounds to PSPACE and NP, respectively. The problem of lowering these upper bounds or finding matching lower bounds was left open. We show that the cut-off problem is P-complete for leaderless protocols and in NC for leaderless symmetric protocols. Further, we also consider a variant of the cut-off problem suggested in (Horn and Sangnier, CONCUR 2020), which we call the bounded-loss cut-off problem and prove that this problem is P-complete for leaderless protocols and NL-complete for leaderless symmetric protocols. Finally, by reusing some of the techniques applied for the analysis of leaderless protocols, we show that the cut-off problem for symmetric protocols with a leader is NP-complete, thereby improving upon all the elementary upper bounds of (Horn and Sangnier, CONCUR 2020). △ Less

Submitted 11 October, 2023; v1 submitted 19 October, 2020; originally announced October 2020.

Journal ref: Logical Methods in Computer Science, Volume 19, Issue 4 (October 12, 2023) lmcs:8354

arXiv:2010.05248 [pdf, other]

Towards Accurate and Reliable Energy Measurement of NLP Models

Authors: Qingqing Cao, Aruna Balasubramanian, Niranjan Balasubramanian

Abstract: Accurate and reliable measurement of energy consumption is critical for making well-informed design choices when choosing and training large scale NLP models. In this work, we show that existing software-based energy measurements are not accurate because they do not take into account hardware differences and how resource utilization affects energy consumption. We conduct energy measurement experim… ▽ More Accurate and reliable measurement of energy consumption is critical for making well-informed design choices when choosing and training large scale NLP models. In this work, we show that existing software-based energy measurements are not accurate because they do not take into account hardware differences and how resource utilization affects energy consumption. We conduct energy measurement experiments with four different models for a question answering task. We quantify the error of existing software-based energy measurements by using a hardware power meter that provides highly accurate energy measurements. Our key takeaway is the need for a more accurate energy estimation model that takes into account hardware variabilities and the non-linear relationship between resource utilization and energy consumption. We release the code and data at https://github.com/csarron/sustainlp2020-energy. △ Less

Submitted 11 October, 2020; originally announced October 2020.

Comments: Accepted to SustaiNLP 2020 (co-located with EMNLP 2020)

arXiv:2007.06248 [pdf, ps, other]

Complexity of Verification and Synthesis of Threshold Automata

Authors: A. R. Balasubramanian, Javier Esparza, Marijana Lazic

Abstract: Threshold automata are a formalism for modeling and analyzing fault-tolerant distributed algorithms, recently introduced by Konnov, Veith, and Widder, describing protocols executed by a fixed but arbitrary number of processes. We conduct the first systematic study of the complexity of verification and synthesis problems for threshold automata. We prove that the coverability, reachability, safety,… ▽ More Threshold automata are a formalism for modeling and analyzing fault-tolerant distributed algorithms, recently introduced by Konnov, Veith, and Widder, describing protocols executed by a fixed but arbitrary number of processes. We conduct the first systematic study of the complexity of verification and synthesis problems for threshold automata. We prove that the coverability, reachability, safety, and liveness problems are NP-complete, and that the bounded synthesis problem is $Σ_p^2$ complete. A key to our results is a novel characterization of the reachability relation of a threshold automaton as an existential Presburger formula. The characterization also leads to novel verification and synthesis algorithms. We report on an implementation, and provide experimental results. △ Less

Submitted 13 July, 2020; originally announced July 2020.

Comments: Accepted at ATVA20

arXiv:2005.00697 [pdf, other]

DeFormer: Decomposing Pre-trained Transformers for Faster Question Answering

Authors: Qingqing Cao, Harsh Trivedi, Aruna Balasubramanian, Niranjan Balasubramanian

Abstract: Transformer-based QA models use input-wide self-attention -- i.e. across both the question and the input passage -- at all layers, causing them to be slow and memory-intensive. It turns out that we can get by without input-wide self-attention at all layers, especially in the lower layers. We introduce DeFormer, a decomposed transformer, which substitutes the full self-attention with question-wide… ▽ More Transformer-based QA models use input-wide self-attention -- i.e. across both the question and the input passage -- at all layers, causing them to be slow and memory-intensive. It turns out that we can get by without input-wide self-attention at all layers, especially in the lower layers. We introduce DeFormer, a decomposed transformer, which substitutes the full self-attention with question-wide and passage-wide self-attentions in the lower layers. This allows for question-independent processing of the input text representations, which in turn enables pre-computing passage representations reducing runtime compute drastically. Furthermore, because DeFormer is largely similar to the original model, we can initialize DeFormer with the pre-training weights of a standard transformer, and directly fine-tune on the target QA dataset. We show DeFormer versions of BERT and XLNet can be used to speed up QA by over 4.3x and with simple distillation-based losses they incur only a 1% drop in accuracy. We open source the code at https://github.com/StonyBrookNLP/deformer. △ Less

Submitted 2 May, 2020; originally announced May 2020.

Comments: ACL 2020 camera ready

arXiv:2004.09621 [pdf, other]

Characterizing consensus in the Heard-Of model

Authors: A. R. Balasubramanian, Igor Walukiewicz

Abstract: The Heard-Of model is a simple and relatively expressive model of distributed computation. Because of this, it has gained a considerable attention of the verification community. We give a characterization of all algorithms solving consensus in a fragment of this model. The fragment is big enough to cover many prominent consensus algorithms. The characterization is purely syntactic: it is expressed… ▽ More The Heard-Of model is a simple and relatively expressive model of distributed computation. Because of this, it has gained a considerable attention of the verification community. We give a characterization of all algorithms solving consensus in a fragment of this model. The fragment is big enough to cover many prominent consensus algorithms. The characterization is purely syntactic: it is expressed in terms of some conditions on the text of the algorithm. One of the recent methods of verification of distributed algorithms is to abstract an algorithm to the Heard-Of model and then to verify the abstract algorithm using semi-automatic procedures. Our results allow, in some cases, to avoid the second step in this methodology. △ Less

Submitted 20 April, 2020; originally announced April 2020.

arXiv:2002.02645 [pdf, other]

Accelerating Deep Learning Inference via Freezing

Authors: Adarsh Kumar, Arjun Balasubramanian, Shivaram Venkataraman, Aditya Akella

Abstract: Over the last few years, Deep Neural Networks (DNNs) have become ubiquitous owing to their high accuracy on real-world tasks. However, this increase in accuracy comes at the cost of computationally expensive models leading to higher prediction latencies. Prior efforts to reduce this latency such as quantization, model distillation, and any-time prediction models typically trade-off accuracy for pe… ▽ More Over the last few years, Deep Neural Networks (DNNs) have become ubiquitous owing to their high accuracy on real-world tasks. However, this increase in accuracy comes at the cost of computationally expensive models leading to higher prediction latencies. Prior efforts to reduce this latency such as quantization, model distillation, and any-time prediction models typically trade-off accuracy for performance. In this work, we observe that caching intermediate layer outputs can help us avoid running all the layers of a DNN for a sizeable fraction of inference requests. We find that this can potentially reduce the number of effective layers by half for 91.58% of CIFAR-10 requests run on ResNet-18. We present Freeze Inference, a system that introduces approximate caching at each intermediate layer and we discuss techniques to reduce the cache size and improve the cache hit rate. Finally, we discuss some of the open research challenges in realizing such a design. △ Less

Submitted 7 February, 2020; originally announced February 2020.

Comments: 11th USENIX Workshop on Hot Topics in Cloud Computing, HotCloud 2019

arXiv:1911.09849 [pdf, other]

Archipelago: A Scalable Low-Latency Serverless Platform

Authors: Arjun Singhvi, Kevin Houck, Arjun Balasubramanian, Mohammed Danish Shaikh, Shivaram Venkataraman, Aditya Akella

Abstract: The increased use of micro-services to build web applications has spurred the rapid growth of Function-as-a-Service (FaaS) or serverless computing platforms. While FaaS simplifies provisioning and scaling for application developers, it introduces new challenges in resource management that need to be handled by the cloud provider. Our analysis of popular serverless workloads indicates that schedule… ▽ More The increased use of micro-services to build web applications has spurred the rapid growth of Function-as-a-Service (FaaS) or serverless computing platforms. While FaaS simplifies provisioning and scaling for application developers, it introduces new challenges in resource management that need to be handled by the cloud provider. Our analysis of popular serverless workloads indicates that schedulers need to handle functions that are very short-lived, have unpredictable arrival patterns, and require expensive setup of sandboxes. The challenge of running a large number of such functions in a multi-tenant cluster makes existing scheduling frameworks unsuitable. We present Archipelago, a platform that enables low latency request execution in a multi-tenant serverless setting. Archipelago views each application as a DAG of functions, and every DAG in associated with a latency deadline. Archipelago achieves its per-DAG request latency goals by: (1) partitioning a given cluster into a number of smaller worker pools, and associating each pool with a semi-global scheduler (SGS), (2) using a latency-aware scheduler within each SGS along with proactive sandbox allocation to reduce overheads, and (3) using a load balancing layer to route requests for different DAGs to the appropriate SGS, and automatically scale the number of SGSs per DAG. Our testbed results show that Archipelago meets the latency deadline for more than 99% of realistic application request workloads, and reduces tail latencies by up to 36X compared to state-of-the-art serverless platforms. △ Less

Submitted 21 November, 2019; originally announced November 2019.

Comments: 14 pages

arXiv:1909.01667 [pdf, ps, other]

Complexity of controlled bad sequences over finite sets of $\mathbb{N}^d$

Authors: A. R. Balasubramanian

Abstract: We provide upper and lower bounds for the length of controlled bad sequences over the majoring and the minoring orderings of finite sets of $\mathbb{N}^d$. The results are obtained by bounding the length of such sequences by functions from the Cichon hierarchy. This allows us to translate these results to bounds over the fast-growing complexity classes. The obtained bounds are proven to be tight… ▽ More We provide upper and lower bounds for the length of controlled bad sequences over the majoring and the minoring orderings of finite sets of $\mathbb{N}^d$. The results are obtained by bounding the length of such sequences by functions from the Cichon hierarchy. This allows us to translate these results to bounds over the fast-growing complexity classes. The obtained bounds are proven to be tight for the majoring ordering, which solves a problem left open by Abriola, Figueira and Senno (Theor. Comp. Sci, Vol. 603). Finally, we use the results on controlled bad sequences to prove upper bounds for the emptiness problem of some classes of automata. △ Less

Submitted 8 June, 2020; v1 submitted 4 September, 2019; originally announced September 2019.

arXiv:1907.01484 [pdf, other]

Themis: Fair and Efficient GPU Cluster Scheduling

Authors: Kshiteej Mahajan, Arjun Balasubramanian, Arjun Singhvi, Shivaram Venkataraman, Aditya Akella, Amar Phanishayee, Shuchi Chawla

Abstract: Modern distributed machine learning (ML) training workloads benefit significantly from leveraging GPUs. However, significant contention ensues when multiple such workloads are run atop a shared cluster of GPUs. A key question is how to fairly apportion GPUs across workloads. We find that established cluster scheduling disciplines are a poor fit because of ML workloads' unique attributes: ML jobs h… ▽ More Modern distributed machine learning (ML) training workloads benefit significantly from leveraging GPUs. However, significant contention ensues when multiple such workloads are run atop a shared cluster of GPUs. A key question is how to fairly apportion GPUs across workloads. We find that established cluster scheduling disciplines are a poor fit because of ML workloads' unique attributes: ML jobs have long-running tasks that need to be gang-scheduled, and their performance is sensitive to tasks' relative placement. We propose Themis, a new scheduling framework for ML training workloads. It's GPU allocation policy enforces that ML workloads complete in a finish-time fair manner, a new notion we introduce. To capture placement sensitivity and ensure efficiency, Themis uses a two-level scheduling architecture where ML workloads bid on available resources that are offered in an auction run by a central arbiter. Our auction design allocates GPUs to winning bids by trading off efficiency for fairness in the short term but ensuring finish-time fairness in the long term. Our evaluation on a production trace shows that Themis can improve fairness by more than 2.25X and is ~5% to 250% more cluster efficient in comparison to state-of-the-art schedulers. △ Less

Submitted 29 October, 2019; v1 submitted 2 July, 2019; originally announced July 2019.

arXiv:1904.08903 [pdf, ps, other]

Generalized threshold arrangements

Authors: A. R. Balasubramanian

Abstract: An arrangement of hyperplanes is a finite collection of hyperplanes in a real Euclidean space. To such a collection one associates the characteristic polynomial that encodes the combinatorics of intersections of the hyperplanes. Finding the characteristic polynomial of the Shi threshold and the Catalan threshold arrangements was an open problem in Stanley's list of problems in [1]. Seunghyun Seo s… ▽ More An arrangement of hyperplanes is a finite collection of hyperplanes in a real Euclidean space. To such a collection one associates the characteristic polynomial that encodes the combinatorics of intersections of the hyperplanes. Finding the characteristic polynomial of the Shi threshold and the Catalan threshold arrangements was an open problem in Stanley's list of problems in [1]. Seunghyun Seo solved both the problems by clever arguments using the finite field method in [3,4]. However, in his paper, he left open the problem of computing the characteristic polynomial of a broader class of threshold arrangements, the so-called "generalized threshold" arrangements whose defining set of hyperplanes is given by $x_i + x_j = -l,-l+1,...,m-1,m$ for $1 \le i < j \le n$ where $l,m \in \mathbb{N}$. In this paper, we present a method for computing the characteristic polynomial of this family of arrangements. △ Less

Submitted 18 April, 2019; originally announced April 2019.

arXiv:1904.01095 [pdf]

Fast, accurate, and transferable many-body interatomic potentials by symbolic regression

Authors: Alberto Hernandez, Adarsh Balasubramanian, Fenglin Yuan, Simon Mason, Tim Mueller

Abstract: The length and time scales of atomistic simulations are limited by the computational cost of the methods used to predict material properties. In recent years there has been great progress in the use of machine learning algorithms to develop fast and accurate interatomic potential models, but it remains a challenge to develop models that generalize well and are fast enough to be used at extreme tim… ▽ More The length and time scales of atomistic simulations are limited by the computational cost of the methods used to predict material properties. In recent years there has been great progress in the use of machine learning algorithms to develop fast and accurate interatomic potential models, but it remains a challenge to develop models that generalize well and are fast enough to be used at extreme time and length scales. To address this challenge, we have developed a machine learning algorithm based on symbolic regression in the form of genetic programming that is capable of discovering accurate, computationally efficient manybody potential models. The key to our approach is to explore a hypothesis space of models based on fundamental physical principles and select models within this hypothesis space based on their accuracy, speed, and simplicity. The focus on simplicity reduces the risk of overfitting the training data and increases the chances of discovering a model that generalizes well. Our algorithm was validated by rediscovering an exact Lennard-Jones potential and a Sutton Chen embedded atom method potential from training data generated using these models. By using training data generated from density functional theory calculations, we found potential models for elemental copper that are simple, as fast as embedded atom models, and capable of accurately predicting properties outside of their training set. Our approach requires relatively small sets of training data, making it possible to generate training data using highly accurate methods at a reasonable computational cost. We present our approach, the forms of the discovered models, and assessments of their transferability, accuracy and speed. △ Less

Submitted 17 August, 2019; v1 submitted 1 April, 2019; originally announced April 2019.

arXiv:1809.03099 [pdf, ps, other]

doi 10.4204/EPTCS.277.10

Parameterized Verification of Coverability in Well-Structured Broadcast Networks

Authors: A. R. Balasubramanian

Abstract: Parameterized verification of coverability in broadcast networks with finite state processes has been studied for different types of models and topologies. In this paper, we attempt to develop a theory of broadcast networks in which the processes can be well-structured transition systems. The resulting formalism is called well-structured broadcast networks. We give an algorithm to decide coverabil… ▽ More Parameterized verification of coverability in broadcast networks with finite state processes has been studied for different types of models and topologies. In this paper, we attempt to develop a theory of broadcast networks in which the processes can be well-structured transition systems. The resulting formalism is called well-structured broadcast networks. We give an algorithm to decide coverability of well-structured broadcast networks when reconfiguration of links between nodes is allowed. Further, for various types of communication topologies, we also prove the decidability of coverability in the static case as well. We do this by showing that for these types of static communication topologies, the broadcast network itself is a well-structured transition system, hence proving the decidability of coverability in the broadcast network. △ Less

Submitted 9 September, 2018; originally announced September 2018.

Comments: In Proceedings GandALF 2018, arXiv:1809.02416

Journal ref: EPTCS 277, 2018, pp. 133-146

arXiv:1802.08469 [pdf, ps, other]

Parameterized verification of synchronization in constrained reconfigurable broadcast networks

Authors: A. R. Balasubramanian, Nathalie Bertrand, Nicolas Markey

Abstract: Reconfigurable broadcast networks provide a convenient formalism for modelling and reasoning about networks of mobile agents broadcasting messages to other agents following some (evolving) communication topology. The parameterized verification of such models aims at checking whether a given property holds irrespective of the initial configuration (number of agents, initial states and initial commu… ▽ More Reconfigurable broadcast networks provide a convenient formalism for modelling and reasoning about networks of mobile agents broadcasting messages to other agents following some (evolving) communication topology. The parameterized verification of such models aims at checking whether a given property holds irrespective of the initial configuration (number of agents, initial states and initial communication topology). We focus here on the synchronization property, asking whether all agents converge to a set of target states after some execution. This problem is known to be decidable in polynomial time when no constraints are imposed on the evolution of the communication topology (while it is undecidable for static broadcast networks). In this paper we investigate how various constraints on reconfigurations affect the decidability and complexity of the synchronization problem. In particular, we show that when bounding the number of reconfigured links between two communications steps by a constant, synchronization becomes undecidable; on the other hand, synchronization remains decidable in PTIME when the bound grows with the number of agents. △ Less

Submitted 23 February, 2018; originally announced February 2018.

Comments: Accepted for publication in TACAS 2018

arXiv:1706.00878 [pdf, other]

MobiRNN: Efficient Recurrent Neural Network Execution on Mobile GPU

Authors: Qingqing Cao, Niranjan Balasubramanian, Aruna Balasubramanian

Abstract: In this paper, we explore optimizations to run Recurrent Neural Network (RNN) models locally on mobile devices. RNN models are widely used for Natural Language Processing, Machine Translation, and other tasks. However, existing mobile applications that use RNN models do so on the cloud. To address privacy and efficiency concerns, we show how RNN models can be run locally on mobile devices. Existin… ▽ More In this paper, we explore optimizations to run Recurrent Neural Network (RNN) models locally on mobile devices. RNN models are widely used for Natural Language Processing, Machine Translation, and other tasks. However, existing mobile applications that use RNN models do so on the cloud. To address privacy and efficiency concerns, we show how RNN models can be run locally on mobile devices. Existing work on porting deep learning models to mobile devices focus on Convolution Neural Networks (CNNs) and cannot be applied directly to RNN models. In response, we present MobiRNN, a mobile-specific optimization framework that implements GPU offloading specifically for mobile GPUs. Evaluations using an RNN model for activity recognition shows that MobiRNN does significantly decrease the latency of running RNN models on phones. △ Less

Submitted 2 June, 2017; originally announced June 2017.

Comments: Published at 1st International Workshop on Embedded and Mobile Deep Learning colocated with MobiSys 2017

arXiv:1201.1935 [pdf, ps, other]

doi 10.1109/TIT.2013.2245395

Secure Symmetrical Multilevel Diversity Coding

Authors: Anantharaman Balasubramanian, Hung D. Ly, Shuo Li, Tie Liu, Scott L. Miller

Abstract: Symmetrical Multilevel Diversity Coding (SMDC) is a network compression problem introduced by Roche (1992) and Yeung (1995). In this setting, a simple separate coding strategy known as superposition coding was shown to be optimal in terms of achieving the minimum sum rate (Roche, Yeung, and Hau, 1997) and the entire admissible rate region (Yeung and Zhang, 1999) of the problem. This paper consider… ▽ More Symmetrical Multilevel Diversity Coding (SMDC) is a network compression problem introduced by Roche (1992) and Yeung (1995). In this setting, a simple separate coding strategy known as superposition coding was shown to be optimal in terms of achieving the minimum sum rate (Roche, Yeung, and Hau, 1997) and the entire admissible rate region (Yeung and Zhang, 1999) of the problem. This paper considers a natural generalization of SMDC to the secure communication setting with an additional eavesdropper. It is required that all sources need to be kept perfectly secret from the eavesdropper as long as the number of encoder outputs available at the eavesdropper is no more than a given threshold. First, the problem of encoding individual sources is studied. A precise characterization of the entire admissible rate region is established via a connection to the problem of secure coding over a three-layer wiretap network and utilizing some basic polyhedral structure of the admissible rate region. Building on this result, it is then shown that superposition coding remains optimal in terms of achieving the minimum sum rate for the general secure SMDC problem. △ Less

Submitted 9 January, 2012; originally announced January 2012.

Comments: Submitted to the IEEE Transactions on Information Theory in May 2011. Minor revision made to the current version in January 2012

Showing 1–32 of 32 results for author: Balasubramanian, A