subscribe to arXiv mailings

Linear-Complexity Self-Supervised Learning for Speech Processing

Authors: Shucong Zhang, Titouan Parcollet, Rogier van Dalen, Sourav Bhattacharya

Abstract: Self-supervised learning (SSL) models usually require weeks of pre-training with dozens of high-end GPUs. These models typically have a multi-headed self-attention (MHSA) context encoder. However, MHSA takes quadratic time and space in the input length, contributing to the high pre-training cost. Linear-complexity alternatives to MHSA have been proposed. For instance, in supervised training, the S… ▽ More Self-supervised learning (SSL) models usually require weeks of pre-training with dozens of high-end GPUs. These models typically have a multi-headed self-attention (MHSA) context encoder. However, MHSA takes quadratic time and space in the input length, contributing to the high pre-training cost. Linear-complexity alternatives to MHSA have been proposed. For instance, in supervised training, the SummaryMixing model is the first to outperform MHSA across multiple speech processing tasks. However, these cheaper alternatives have not been explored for SSL yet. This paper studies a linear-complexity context encoder for SSL for the first time. With better or equivalent performance for the downstream tasks of the MP3S benchmark, SummaryMixing reduces the pre-training time and peak VRAM of wav2vec 2.0 model by 18% and by 23%, respectively, leading to the pre-training of a 155M wav2vec 2.0 model finished within one week with 4 Tesla A100 GPUs. Code is available at https://github.com/SamsungLabs/SummaryMixing. △ Less

Submitted 18 July, 2024; originally announced July 2024.

Comments: Interspeech 2024

arXiv:2407.12063 [pdf, other]

Unveiling Scaling Laws in the Regulatory Functions of Reddit

Authors: Shambhobi Bhattacharya, Jisung Yoon, Hyejin Youn

Abstract: Online platforms like Reddit, Wikipedia, and Facebook are integral to modern life, enabling content creation and sharing through posts, comments, and discussions. Despite their virtual and often anonymous nature, these platforms need rules and oversight to maintain a safe and productive environment. As these communities grow, a key question arises: how does the need for regulatory functions scale?… ▽ More Online platforms like Reddit, Wikipedia, and Facebook are integral to modern life, enabling content creation and sharing through posts, comments, and discussions. Despite their virtual and often anonymous nature, these platforms need rules and oversight to maintain a safe and productive environment. As these communities grow, a key question arises: how does the need for regulatory functions scale? Do larger groups require more regulatory actions and oversight per person, or can they manage with less? Our analysis of Reddit's regulatory functions reveals robust scaling relationships across different subreddits, suggesting universal patterns between community size and the amount of regulation needed. We found that the number of comments and moderator actions, such as comment removals, grew faster than the community size, with superlinear exponents of 1.12 and 1.18, respectively. However, bot-based rule enforcement did not keep pace with community growth, exhibiting a slightly sublinear exponent of 0.95. Further analysis of the residuals from these scaling behaviors identified a 'trade-off axis,' where one-way coordination mechanisms (bots and moderators) counteract two-way interactions (comments) and vice versa. Our findings suggest that a more proactive moderation approach, characterized by increased bot activity and moderator comment removals, tends to result in less user engagement under the scaling framework. Understanding these natural scaling patterns and interactions can help platform administrators and policymakers foster healthy online communities while mitigating harmful behaviors such as harassment, doxxing, and misinformation. Without proper regulation, these negative behaviors can proliferate and cause significant damage. Targeted interventions based on these insights are key to ensuring online platforms remain safe and beneficial spaces. △ Less

Submitted 15 July, 2024; originally announced July 2024.

arXiv:2407.05141 [pdf, other]

Impact of Network Topology on Byzantine Resilience in Decentralized Federated Learning

Authors: Siddhartha Bhattacharya, Daniel Helo, Joshua Siegel

Abstract: Federated learning (FL) enables a collaborative environment for training machine learning models without sharing training data between users. This is typically achieved by aggregating model gradients on a central server. Decentralized federated learning is a rising paradigm that enables users to collaboratively train machine learning models in a peer-to-peer manner, without the need for a central… ▽ More Federated learning (FL) enables a collaborative environment for training machine learning models without sharing training data between users. This is typically achieved by aggregating model gradients on a central server. Decentralized federated learning is a rising paradigm that enables users to collaboratively train machine learning models in a peer-to-peer manner, without the need for a central aggregation server. However, before applying decentralized FL in real-world use training environments, nodes that deviate from the FL process (Byzantine nodes) must be considered when selecting an aggregation function. Recent research has focused on Byzantine-robust aggregation for client-server or fully connected networks, but has not yet evaluated such aggregation schemes for complex topologies possible with decentralized FL. Thus, the need for empirical evidence of Byzantine robustness in differing network topologies is evident. This work investigates the effects of state-of-the-art Byzantine-robust aggregation methods in complex, large-scale network structures. We find that state-of-the-art Byzantine robust aggregation strategies are not resilient within large non-fully connected networks. As such, our findings point the field towards the development of topology-aware aggregation schemes, especially necessary within the context of large scale real-world deployment. △ Less

Submitted 6 July, 2024; originally announced July 2024.

Comments: 8 pages, 6 figures

ACM Class: I.2.11; C.4; C.2.4

arXiv:2406.15527 [pdf, other]

Data Efficient Evaluation of Large Language Models and Text-to-Image Models via Adaptive Sampling

Authors: Cong Xu, Gayathri Saranathan, Mahammad Parwez Alam, Arpit Shah, James Lim, Soon Yee Wong, Foltin Martin, Suparna Bhattacharya

Abstract: Evaluating LLMs and text-to-image models is a computationally intensive task often overlooked. Efficient evaluation is crucial for understanding the diverse capabilities of these models and enabling comparisons across a growing number of new models and benchmarks. To address this, we introduce SubLIME, a data-efficient evaluation framework that employs adaptive sampling techniques, such as cluster… ▽ More Evaluating LLMs and text-to-image models is a computationally intensive task often overlooked. Efficient evaluation is crucial for understanding the diverse capabilities of these models and enabling comparisons across a growing number of new models and benchmarks. To address this, we introduce SubLIME, a data-efficient evaluation framework that employs adaptive sampling techniques, such as clustering and quality-based methods, to create representative subsets of benchmarks. Our approach ensures statistically aligned model rankings compared to full datasets, evidenced by high Pearson correlation coefficients. Empirical analysis across six NLP benchmarks reveals that: (1) quality-based sampling consistently achieves strong correlations (0.85 to 0.95) with full datasets at a 10\% sampling rate such as Quality SE and Quality CPD (2) clustering methods excel in specific benchmarks such as MMLU (3) no single method universally outperforms others across all metrics. Extending this framework, we leverage the HEIM leaderboard to cover 25 text-to-image models on 17 different benchmarks. SubLIME dynamically selects the optimal technique for each benchmark, significantly reducing evaluation costs while preserving ranking integrity and score distribution. Notably, a minimal sampling rate of 1% proves effective for benchmarks like MMLU. Additionally, we demonstrate that employing difficulty-based sampling to target more challenging benchmark segments enhances model differentiation with broader score distributions. We also combine semantic search, tool use, and GPT-4 review to identify redundancy across benchmarks within specific LLM categories, such as coding benchmarks. This allows us to further reduce the number of samples needed to maintain targeted rank preservation. Overall, SubLIME offers a versatile and cost-effective solution for the robust evaluation of LLMs and text-to-image models. △ Less

Submitted 21 June, 2024; originally announced June 2024.

arXiv:2406.13944 [pdf, other]

Generalization error of min-norm interpolators in transfer learning

Authors: Yanke Song, Sohom Bhattacharya, Pragya Sur

Abstract: This paper establishes the generalization error of pooled min-$\ell_2$-norm interpolation in transfer learning where data from diverse distributions are available. Min-norm interpolators emerge naturally as implicit regularized limits of modern machine learning algorithms. Previous work characterized their out-of-distribution risk when samples from the test distribution are unavailable during trai… ▽ More This paper establishes the generalization error of pooled min-$\ell_2$-norm interpolation in transfer learning where data from diverse distributions are available. Min-norm interpolators emerge naturally as implicit regularized limits of modern machine learning algorithms. Previous work characterized their out-of-distribution risk when samples from the test distribution are unavailable during training. However, in many applications, a limited amount of test data may be available during training, yet properties of min-norm interpolation in this setting are not well-understood. We address this gap by characterizing the bias and variance of pooled min-$\ell_2$-norm interpolation under covariate and model shifts. The pooled interpolator captures both early fusion and a form of intermediate fusion. Our results have several implications: under model shift, for low signal-to-noise ratio (SNR), adding data always hurts. For higher SNR, transfer learning helps as long as the shift-to-signal (SSR) ratio lies below a threshold that we characterize explicitly. By consistently estimating these ratios, we provide a data-driven method to determine: (i) when the pooled interpolator outperforms the target-based interpolator, and (ii) the optimal number of target samples that minimizes the generalization error. Under covariate shift, if the source sample size is small relative to the dimension, heterogeneity between between domains improves the risk, and vice versa. We establish a novel anisotropic local law to achieve these characterizations, which may be of independent interest in random matrix theory. We supplement our theoretical characterizations with comprehensive simulations that demonstrate the finite-sample efficacy of our results. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: 53 pages, 2 figures

arXiv:2405.19164 [pdf, other]

Learning from Litigation: Graphs and LLMs for Retrieval and Reasoning in eDiscovery

Authors: Sounak Lahiri, Sumit Pai, Tim Weninger, Sanmitra Bhattacharya

Abstract: Electronic Discovery (eDiscovery) involves identifying relevant documents from a vast collection based on legal production requests. The integration of artificial intelligence (AI) and natural language processing (NLP) has transformed this process, helping document review and enhance efficiency and cost-effectiveness. Although traditional approaches like BM25 or fine-tuned pre-trained models are c… ▽ More Electronic Discovery (eDiscovery) involves identifying relevant documents from a vast collection based on legal production requests. The integration of artificial intelligence (AI) and natural language processing (NLP) has transformed this process, helping document review and enhance efficiency and cost-effectiveness. Although traditional approaches like BM25 or fine-tuned pre-trained models are common in eDiscovery, they face performance, computational, and interpretability challenges. In contrast, Large Language Model (LLM)-based methods prioritize interpretability but sacrifice performance and throughput. This paper introduces DISCOvery Graph (DISCOG), a hybrid approach that combines the strengths of two worlds: a heterogeneous graph-based method for accurate document relevance prediction and subsequent LLM-driven approach for reasoning. Graph representational learning generates embeddings and predicts links, ranking the corpus for a given request, and the LLMs provide reasoning for document relevance. Our approach handles datasets with balanced and imbalanced distributions, outperforming baselines in F1-score, precision, and recall by an average of 12%, 3%, and 16%, respectively. In an enterprise context, our approach drastically reduces document review costs by 99.9% compared to manual processes and by 95% compared to LLM-based classification methods △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: 8 pages, 2 tables, 6 figures

arXiv:2405.15449 [pdf, ps, other]

Faster $(Δ+ 1)$-Edge Coloring: Breaking the $m \sqrt{n}$ Time Barrier

Authors: Sayan Bhattacharya, Din Carmon, Martín Costa, Shay Solomon, Tianyi Zhang

Abstract: Vizing's theorem states that any $n$-vertex $m$-edge graph of maximum degree $Δ$ can be {\em edge colored} using at most $Δ+ 1$ different colors [Diskret.~Analiz, '64]. Vizing's original proof is algorithmic and shows that such an edge coloring can be found in $\tilde{O}(mn)$ time. This was subsequently improved to $\tilde O(m\sqrt{n})$, independently by Arjomandi [1982] and by Gabow et al.~[1985]… ▽ More Vizing's theorem states that any $n$-vertex $m$-edge graph of maximum degree $Δ$ can be {\em edge colored} using at most $Δ+ 1$ different colors [Diskret.~Analiz, '64]. Vizing's original proof is algorithmic and shows that such an edge coloring can be found in $\tilde{O}(mn)$ time. This was subsequently improved to $\tilde O(m\sqrt{n})$, independently by Arjomandi [1982] and by Gabow et al.~[1985]. In this paper we present an algorithm that computes such an edge coloring in $\tilde O(mn^{1/3})$ time, giving the first polynomial improvement for this fundamental problem in over 40 years. △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: Started to circulate in April 2024

arXiv:2404.14855 [pdf, other]

The Geometry of the Set of Equivalent Linear Neural Networks

Authors: Jonathan Richard Shewchuk, Sagnik Bhattacharya

Abstract: We characterize the geometry and topology of the set of all weight vectors for which a linear neural network computes the same linear transformation $W$. This set of weight vectors is called the fiber of $W$ (under the matrix multiplication map), and it is embedded in the Euclidean weight space of all possible weight vectors. The fiber is an algebraic variety that is not necessarily a manifold. We… ▽ More We characterize the geometry and topology of the set of all weight vectors for which a linear neural network computes the same linear transformation $W$. This set of weight vectors is called the fiber of $W$ (under the matrix multiplication map), and it is embedded in the Euclidean weight space of all possible weight vectors. The fiber is an algebraic variety that is not necessarily a manifold. We describe a natural way to stratify the fiber--that is, to partition the algebraic variety into a finite set of manifolds of varying dimensions called strata. We call this set of strata the rank stratification. We derive the dimensions of these strata and the relationships by which they adjoin each other. Although the strata are disjoint, their closures are not. Our strata satisfy the frontier condition: if a stratum intersects the closure of another stratum, then the former stratum is a subset of the closure of the latter stratum. Each stratum is a manifold of class $C^\infty$ embedded in weight space, so it has a well-defined tangent space and normal space at every point (weight vector). We show how to determine the subspaces tangent to and normal to a specified stratum at a specified point on the stratum, and we construct elegant bases for those subspaces. To help achieve these goals, we first derive what we call a Fundamental Theorem of Linear Neural Networks, analogous to what Strang calls the Fundamental Theorem of Linear Algebra. We show how to decompose each layer of a linear neural network into a set of subspaces that show how information flows through the neural network. Each stratum of the fiber represents a different pattern by which information flows (or fails to flow) through the neural network. The topology of a stratum depends solely on this decomposition. So does its geometry, up to a linear transformation in weight space. △ Less

Submitted 23 April, 2024; originally announced April 2024.

Comments: 99 pages, 14 figures

arXiv:2404.13855 [pdf, other]

Understanding the role of FFNs in driving multilingual behaviour in LLMs

Authors: Sunit Bhattacharya, Ondřej Bojar

Abstract: Multilingualism in Large Language Models (LLMs) is an yet under-explored area. In this paper, we conduct an in-depth analysis of the multilingual capabilities of a family of a Large Language Model, examining its architecture, activation patterns, and processing mechanisms across languages. We introduce novel metrics to probe the model's multilingual behaviour at different layers and shed light on… ▽ More Multilingualism in Large Language Models (LLMs) is an yet under-explored area. In this paper, we conduct an in-depth analysis of the multilingual capabilities of a family of a Large Language Model, examining its architecture, activation patterns, and processing mechanisms across languages. We introduce novel metrics to probe the model's multilingual behaviour at different layers and shed light on the impact of architectural choices on multilingual processing. Our findings reveal different patterns of multilinugal processing in the sublayers of Feed-Forward Networks of the models. Furthermore, we uncover the phenomenon of "over-layerization" in certain model configurations, where increasing layer depth without corresponding adjustments to other parameters may degrade model performance. Through comparisons within and across languages, we demonstrate the interplay between model architecture, layer depth, and multilingual processing capabilities of LLMs trained on multiple languages. △ Less

Submitted 21 April, 2024; originally announced April 2024.

Comments: 10 pages

arXiv:2404.07645 [pdf, other]

Simba: Mamba augmented U-ShiftGCN for Skeletal Action Recognition in Videos

Authors: Soumyabrata Chaudhuri, Saumik Bhattacharya

Abstract: Skeleton Action Recognition (SAR) involves identifying human actions using skeletal joint coordinates and their interconnections. While plain Transformers have been attempted for this task, they still fall short compared to the current leading methods, which are rooted in Graph Convolutional Networks (GCNs) due to the absence of structural priors. Recently, a novel selective state space model, Mam… ▽ More Skeleton Action Recognition (SAR) involves identifying human actions using skeletal joint coordinates and their interconnections. While plain Transformers have been attempted for this task, they still fall short compared to the current leading methods, which are rooted in Graph Convolutional Networks (GCNs) due to the absence of structural priors. Recently, a novel selective state space model, Mamba, has surfaced as a compelling alternative to the attention mechanism in Transformers, offering efficient modeling of long sequences. In this work, to the utmost extent of our awareness, we present the first SAR framework incorporating Mamba. Each fundamental block of our model adopts a novel U-ShiftGCN architecture with Mamba as its core component. The encoder segment of the U-ShiftGCN is devised to extract spatial features from the skeletal data using downsampling vanilla Shift S-GCN blocks. These spatial features then undergo intermediate temporal modeling facilitated by the Mamba block before progressing to the encoder section, which comprises vanilla upsampling Shift S-GCN blocks. Additionally, a Shift T-GCN (ShiftTCN) temporal modeling unit is employed before the exit of each fundamental block to refine temporal representations. This particular integration of downsampling spatial, intermediate temporal, upsampling spatial, and ultimate temporal subunits yields promising results for skeleton action recognition. We dub the resulting model \textbf{Simba}, which attains state-of-the-art performance across three well-known benchmark skeleton action recognition datasets: NTU RGB+D, NTU RGB+D 120, and Northwestern-UCLA. Interestingly, U-ShiftGCN (Simba without Intermediate Mamba Block) by itself is capable of performing reasonably well and surpasses our baseline. △ Less

Submitted 11 April, 2024; originally announced April 2024.

Comments: 20 pages, 6 tables, 1 figure

arXiv:2403.05174 [pdf, other]

VTruST: Controllable value function based subset selection for Data-Centric Trustworthy AI

Authors: Soumi Das, Shubhadip Nag, Shreyyash Sharma, Suparna Bhattacharya, Sourangshu Bhattacharya

Abstract: Trustworthy AI is crucial to the widespread adoption of AI in high-stakes applications with fairness, robustness, and accuracy being some of the key trustworthiness metrics. In this work, we propose a controllable framework for data-centric trustworthy AI (DCTAI)- VTruST, that allows users to control the trade-offs between the different trustworthiness metrics of the constructed training datasets.… ▽ More Trustworthy AI is crucial to the widespread adoption of AI in high-stakes applications with fairness, robustness, and accuracy being some of the key trustworthiness metrics. In this work, we propose a controllable framework for data-centric trustworthy AI (DCTAI)- VTruST, that allows users to control the trade-offs between the different trustworthiness metrics of the constructed training datasets. A key challenge in implementing an efficient DCTAI framework is to design an online value-function-based training data subset selection algorithm. We pose the training data valuation and subset selection problem as an online sparse approximation formulation. We propose a novel online version of the Orthogonal Matching Pursuit (OMP) algorithm for solving this problem. Experimental results show that VTruST outperforms the state-of-the-art baselines on social, image, and scientific datasets. We also show that the data values generated by VTruST can provide effective data-centric explanations for different trustworthiness metrics. △ Less

Submitted 8 March, 2024; originally announced March 2024.

Comments: Accepted in ICLR 2024 DMLR workshop

arXiv:2403.03767 [pdf, other]

doi 10.1021/acs.jctc.4c00314

Predicting the Temperature Dependence of Surfactant CMCs Using Graph Neural Networks

Authors: Christoforos Brozos, Jan G. Rittig, Sandip Bhattacharya, Elie Akanny, Christina Kohlmann, Alexander Mitsos

Abstract: The critical micelle concentration (CMC) of surfactant molecules is an essential property for surfactant applications in industry. Recently, classical QSPR and Graph Neural Networks (GNNs), a deep learning technique, have been successfully applied to predict the CMC of surfactants at room temperature. However, these models have not yet considered the temperature dependency of the CMC, which is hig… ▽ More The critical micelle concentration (CMC) of surfactant molecules is an essential property for surfactant applications in industry. Recently, classical QSPR and Graph Neural Networks (GNNs), a deep learning technique, have been successfully applied to predict the CMC of surfactants at room temperature. However, these models have not yet considered the temperature dependency of the CMC, which is highly relevant for practical applications. We herein develop a GNN model for temperature-dependent CMC prediction of surfactants. We collect about 1400 data points from public sources for all surfactant classes, i.e., ionic, nonionic, and zwitterionic, at multiple temperatures. We test the predictive quality of the model for following scenarios: i) when CMC data for surfactants are present in the training of the model in at least one different temperature, and ii) CMC data for surfactants are not present in the training, i.e., generalizing to unseen surfactants. In both test scenarios, our model exhibits a high predictive performance of R$^2 \geq $ 0.94 on test data. We also find that the model performance varies by surfactant class. Finally, we evaluate the model for sugar-based surfactants with complex molecular structures, as these represent a more sustainable alternative to synthetic surfactants and are therefore of great interest for future applications in the personal and home care industries. △ Less

Submitted 6 March, 2024; originally announced March 2024.

arXiv:2403.02765 [pdf, other]

G4-Attention: Deep Learning Model with Attention for predicting DNA G-Quadruplexes

Authors: Shrimon Mukherjee, Pulakesh Pramanik, Partha Basuchowdhuri, Santanu Bhattacharya

Abstract: G-Quadruplexes are the four-stranded non-canonical nucleic acid secondary structures, formed by the stacking arrangement of the guanine tetramers. They are involved in a wide range of biological roles because of their exceptionally unique and distinct structural characteristics. After the completion of the human genome sequencing project, a lot of bioinformatic algorithms were introduced to predic… ▽ More G-Quadruplexes are the four-stranded non-canonical nucleic acid secondary structures, formed by the stacking arrangement of the guanine tetramers. They are involved in a wide range of biological roles because of their exceptionally unique and distinct structural characteristics. After the completion of the human genome sequencing project, a lot of bioinformatic algorithms were introduced to predict the active G4s regions \textit{in vitro} based on the canonical G4 sequence elements, G-\textit{richness}, and G-\textit{skewness}, as well as the non-canonical sequence features. Recently, sequencing techniques like G4-seq and G4-ChIP-seq were developed to map the G4s \textit{in vitro}, and \textit{in vivo} respectively at a few hundred base resolution. Subsequently, several machine learning approaches were developed for predicting the G4 regions using the existing databases. However, their prediction models were simplistic, and the prediction accuracy was notably poor. In response, here, we propose a novel convolutional neural network with Bi-LSTM and attention layers, named G4-attention, to predict the G4 forming sequences with improved accuracy. G4-attention achieves high accuracy and attains state-of-the-art results in the G4 prediction task. Our model also predicts the G4 regions accurately in the highly class-imbalanced datasets. In addition, the developed model trained on the human genome dataset can be applied to any non-human genome DNA sequences to predict the G4 formation propensities. △ Less

Submitted 5 March, 2024; originally announced March 2024.

arXiv:2402.17778 [pdf, other]

Dynamic Anchor Selection and Real-Time Pose Prediction for Ultra-wideband Tagless Gate

Authors: Junyoung Choi, Sagnik Bhattacharya, Joohyun Lee

Abstract: Ultra-wideband (UWB) is emerging as a promising solution that can realize proximity services, such as UWB tagless gate (UTG), thanks to centimeter-level localization accuracy based on two different ranging methods such as downlink time-difference of arrival (DL-TDoA) and double-sided two-way ranging (DS-TWR). The UTG is a UWB-based proximity service that provides a seamless gate pass system withou… ▽ More Ultra-wideband (UWB) is emerging as a promising solution that can realize proximity services, such as UWB tagless gate (UTG), thanks to centimeter-level localization accuracy based on two different ranging methods such as downlink time-difference of arrival (DL-TDoA) and double-sided two-way ranging (DS-TWR). The UTG is a UWB-based proximity service that provides a seamless gate pass system without requiring real-time mobile device (MD) tapping. The location of MD is calculated using DL-TDoA, and the MD communicates with the nearest UTG using DS-TWR to open the gate. Therefore, the knowledge about the exact location of MD is the main challenge of UTG, and hence we provide the solutions for both DL-TDoA and DS-TWR. In this paper, we propose dynamic anchor selection for extremely accurate DL-TDoA localization and pose prediction for DS-TWR, called DynaPose. The pose is defined as the actual location of MD on the human body, which affects the localization accuracy. DynaPose is based on line-of-sight (LOS) and non-LOS (NLOS) classification using deep learning for anchor selection and pose prediction. Deep learning models use the UWB channel impulse response and the inertial measurement unit embedded in the smartphone. DynaPose is implemented on Samsung Galaxy Note20 Ultra and Qorvo UWB board to show the feasibility and applicability. DynaPose achieves a LOS/NLOS classification accuracy of 0.984, 62% higher DL-TDoA localization accuracy, and ultimately detects four different poses with an accuracy of 0.961 in real-time. △ Less

Submitted 22 February, 2024; originally announced February 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2402.08399

arXiv:2402.13952 [pdf, ps, other]

Aaronson-Ambainis Conjecture Is True For Random Restrictions

Authors: Sreejata Kishor Bhattacharya

Abstract: In an attempt to show that the acceptance probability of a quantum query algorithm making $q$ queries can be well-approximated almost everywhere by a classical decision tree of depth $\leq \text{poly}(q)$, Aaronson and Ambainis proposed the following conjecture: let $f: \{ \pm 1\}^n \rightarrow [0,1]$ be a degree $d$ polynomial with variance $\geq ε$. Then, there exists a coordinate of $f$ with in… ▽ More In an attempt to show that the acceptance probability of a quantum query algorithm making $q$ queries can be well-approximated almost everywhere by a classical decision tree of depth $\leq \text{poly}(q)$, Aaronson and Ambainis proposed the following conjecture: let $f: \{ \pm 1\}^n \rightarrow [0,1]$ be a degree $d$ polynomial with variance $\geq ε$. Then, there exists a coordinate of $f$ with influence $\geq \text{poly} (ε, 1/d)$. We show that for any polynomial $f: \{ \pm 1\}^n \rightarrow [0,1]$ of degree $d$ $(d \geq 2)$ and variance $\text{Var}[f] \geq 1/d$, if $ρ$ denotes a random restriction with survival probability $\dfrac{\log(d)}{C_1 d}$, $$ \text{Pr} \left[f_ρ \text{ has a coordinate with influence} \geq \dfrac{\text{Var}[f]^2 }{d^{C_2}} \right] \geq \dfrac{\text{Var}[f] \log(d)}{50C_1 d}$$ where $C_1, C_2>0$ are universal constants. Thus, Aaronson-Ambainis conjecture is true for a non-negligible fraction of random restrictions of the given polynomial assuming its variance is not too low. △ Less

Submitted 21 February, 2024; originally announced February 2024.

arXiv:2402.10515 [pdf, other]

Power-Efficient Indoor Localization Using Adaptive Channel-aware Ultra-wideband DL-TDOA

Authors: Sagnik Bhattacharya, Junyoung Choi, Joohyun Lee

Abstract: Among the various Ultra-wideband (UWB) ranging methods, the absence of uplink communication or centralized computation makes downlink time-difference-of-arrival (DL-TDOA) localization the most suitable for large-scale industrial deployments. However, temporary or permanent obstacles in the deployment region often lead to non-line-of-sight (NLOS) channel path and signal outage effects, which result… ▽ More Among the various Ultra-wideband (UWB) ranging methods, the absence of uplink communication or centralized computation makes downlink time-difference-of-arrival (DL-TDOA) localization the most suitable for large-scale industrial deployments. However, temporary or permanent obstacles in the deployment region often lead to non-line-of-sight (NLOS) channel path and signal outage effects, which result in localization errors. Prior research has addressed this problem by increasing the ranging frequency, which leads to a heavy increase in the user device power consumption. It also does not contribute to any increase in localization accuracy under line-of-sight (LOS) conditions. In this paper, we propose and implement a novel low-power channel-aware dynamic frequency DL-TDOA ranging algorithm. It comprises NLOS probability predictor based on a convolutional neural network (CNN), a dynamic ranging frequency control module, and an IMU sensor-based ranging filter. Based on the conducted experiments, we show that the proposed algorithm achieves 50% higher accuracy in NLOS conditions while having 46% lower power consumption in LOS conditions compared to baseline methods from prior research. △ Less

Submitted 16 February, 2024; originally announced February 2024.

Journal ref: IEEE GLOBECOM 2023

arXiv:2402.08399 [pdf, other]

Deep Learning-based Real-time Smartphone Pose Detection for Ultra-wideband Tagless Gate

Authors: Junyoung Choi, Sagnik Bhattacharya

Abstract: As commercial interest in proximity services increased, the development of various wireless localization techniques was promoted. In line with this trend, Ultra-wideband (UWB) is emerging as a promising solution that can realize proximity services thanks to centimeter-level localization accuracy. In addition, since the actual location of the mobile device (MD) on the human body, called pose, affec… ▽ More As commercial interest in proximity services increased, the development of various wireless localization techniques was promoted. In line with this trend, Ultra-wideband (UWB) is emerging as a promising solution that can realize proximity services thanks to centimeter-level localization accuracy. In addition, since the actual location of the mobile device (MD) on the human body, called pose, affects the localization accuracy, poses are also important to provide accurate proximity services, especially for the UWB tagless gate (UTG). In this paper, a real-time pose detector, termed D3, is proposed to estimate the pose of MD when users pass through UTG. D3 is based on line-of-sight (LOS) and non-LOS (NLOS) classification using UWB channel impulse response and utilizes the inertial measurement unit embedded in the smartphone to estimate the pose. D3 is implemented on Samsung Galaxy Note20 Ultra (i.e., SMN986B) and Qorvo UWB board to show the feasibility and applicability. D3 achieved an LOS/NLOS classification accuracy of 0.984, and ultimately detected four different poses of MD with an accuracy of 0.961 in real-time. △ Less

Submitted 13 February, 2024; originally announced February 2024.

Journal ref: IEEE GLOBECOM 2023

arXiv:2402.06738 [pdf, other]

EntGPT: Linking Generative Large Language Models with Knowledge Bases

Authors: Yifan Ding, Amrit Poudel, Qingkai Zeng, Tim Weninger, Balaji Veeramani, Sanmitra Bhattacharya

Abstract: The ability of Large Language Models (LLMs) to generate factually correct output remains relatively unexplored due to the lack of fact-checking and knowledge grounding during training and inference. In this work, we aim to address this challenge through the Entity Disambiguation (ED) task. We first consider prompt engineering, and design a three-step hard-prompting method to probe LLMs' ED perform… ▽ More The ability of Large Language Models (LLMs) to generate factually correct output remains relatively unexplored due to the lack of fact-checking and knowledge grounding during training and inference. In this work, we aim to address this challenge through the Entity Disambiguation (ED) task. We first consider prompt engineering, and design a three-step hard-prompting method to probe LLMs' ED performance without supervised fine-tuning (SFT). Overall, the prompting method improves the micro-F_1 score of the original vanilla models by a large margin, on some cases up to 36% and higher, and obtains comparable performance across 10 datasets when compared to existing methods with SFT. We further improve the knowledge grounding ability through instruction tuning (IT) with similar prompts and responses. The instruction-tuned model not only achieves higher micro-F1 score performance as compared to several baseline methods on supervised entity disambiguation tasks with an average micro-F_1 improvement of 2.1% over the existing baseline models, but also obtains higher accuracy on six Question Answering (QA) tasks in the zero-shot setting. Our methodologies apply to both open- and closed-source LLMs. △ Less

Submitted 9 February, 2024; originally announced February 2024.

arXiv:2402.04364 [pdf, ps, other]

Exponential Separation Between Powers of Regular and General Resolution Over Parities

Authors: Sreejata Kishor Bhattacharya, Arkadev Chattopadhyay, Pavel Dvořák

Abstract: Proving super-polynomial lower bounds on the size of proofs of unsatisfiability of Boolean formulas using resolution over parities is an outstanding problem that has received a lot of attention after its introduction by Raz and Tzamaret [Ann. Pure Appl. Log.'08]. Very recently, Efremenko, Garlík and Itsykson [ECCC'23] proved the first exponential lower bounds on the size of ResLin proofs that were… ▽ More Proving super-polynomial lower bounds on the size of proofs of unsatisfiability of Boolean formulas using resolution over parities is an outstanding problem that has received a lot of attention after its introduction by Raz and Tzamaret [Ann. Pure Appl. Log.'08]. Very recently, Efremenko, Garlík and Itsykson [ECCC'23] proved the first exponential lower bounds on the size of ResLin proofs that were additionally restricted to be bottom-regular. We show that there are formulas for which such regular ResLin proofs of unsatisfiability continue to have exponential size even though there exists short proofs of their unsatisfiability in ordinary, non-regular resolution. This is the first super-polynomial separation between the power of general ResLin and and that of regular ResLin for any natural notion of regularity. Our argument, while building upon the work of Efremenko et al., uses additional ideas from the literature on lifting theorems. △ Less

Submitted 23 February, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

arXiv:2401.09243 [pdf, other]

DiffClone: Enhanced Behaviour Cloning in Robotics with Diffusion-Driven Policy Learning

Authors: Sabariswaran Mani, Sreyas Venkataraman, Abhranil Chandra, Adyan Rizvi, Yash Sirvi, Soumojit Bhattacharya, Aritra Hazra

Abstract: Robot learning tasks are extremely compute-intensive and hardware-specific. Thus the avenues of tackling these challenges, using a diverse dataset of offline demonstrations that can be used to train robot manipulation agents, is very appealing. The Train-Offline-Test-Online (TOTO) Benchmark provides a well-curated open-source dataset for offline training comprised mostly of expert data and also be… ▽ More Robot learning tasks are extremely compute-intensive and hardware-specific. Thus the avenues of tackling these challenges, using a diverse dataset of offline demonstrations that can be used to train robot manipulation agents, is very appealing. The Train-Offline-Test-Online (TOTO) Benchmark provides a well-curated open-source dataset for offline training comprised mostly of expert data and also benchmark scores of the common offline-RL and behaviour cloning agents. In this paper, we introduce DiffClone, an offline algorithm of enhanced behaviour cloning agent with diffusion-based policy learning, and measured the efficacy of our method on real online physical robots at test time. This is also our official submission to the Train-Offline-Test-Online (TOTO) Benchmark Challenge organized at NeurIPS 2023. We experimented with both pre-trained visual representation and agent policies. In our experiments, we find that MOCO finetuned ResNet50 performs the best in comparison to other finetuned representations. Goal state conditioning and mapping to transitions resulted in a minute increase in the success rate and mean-reward. As for the agent policy, we developed DiffClone, a behaviour cloning agent improved using conditional diffusion. △ Less

Submitted 23 May, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

Comments: NeurIPS 2023 Train Offline Test Online Workshop and Competition (Best Paper Oral Presentation / Winning Competition Submission)

arXiv:2401.01874 [pdf, other]

doi 10.1016/j.colsurfa.2024.134133

Graph Neural Networks for Surfactant Multi-Property Prediction

Authors: Christoforos Brozos, Jan G. Rittig, Sandip Bhattacharya, Elie Akanny, Christina Kohlmann, Alexander Mitsos

Abstract: Surfactants are of high importance in different industrial sectors such as cosmetics, detergents, oil recovery and drug delivery systems. Therefore, many quantitative structure-property relationship (QSPR) models have been developed for surfactants. Each predictive model typically focuses on one surfactant class, mostly nonionics. Graph Neural Networks (GNNs) have exhibited a great predictive perf… ▽ More Surfactants are of high importance in different industrial sectors such as cosmetics, detergents, oil recovery and drug delivery systems. Therefore, many quantitative structure-property relationship (QSPR) models have been developed for surfactants. Each predictive model typically focuses on one surfactant class, mostly nonionics. Graph Neural Networks (GNNs) have exhibited a great predictive performance for property prediction of ionic liquids, polymers and drugs in general. Specifically for surfactants, GNNs can successfully predict critical micelle concentration (CMC), a key surfactant property associated with micellization. A key factor in the predictive ability of QSPR and GNN models is the data available for training. Based on extensive literature search, we create the largest available CMC database with 429 molecules and the first large data collection for surface excess concentration ($Γ$$_{m}$), another surfactant property associated with foaming, with 164 molecules. Then, we develop GNN models to predict the CMC and $Γ$$_{m}$ and we explore different learning approaches, i.e., single- and multi-task learning, as well as different training strategies, namely ensemble and transfer learning. We find that a multi-task GNN with ensemble learning trained on all $Γ$$_{m}$ and CMC data performs best. Finally, we test the ability of our CMC model to generalize on industrial grade pure component surfactants. The GNN yields highly accurate predictions for CMC, showing great potential for future industrial applications. △ Less

Submitted 3 January, 2024; originally announced January 2024.

arXiv:2401.01008 [pdf, other]

Fast Sampling Through The Reuse Of Attention Maps In Diffusion Models

Authors: Rosco Hunter, Łukasz Dudziak, Mohamed S. Abdelfattah, Abhinav Mehrotra, Sourav Bhattacharya, Hongkai Wen

Abstract: Text-to-image diffusion models have demonstrated unprecedented capabilities for flexible and realistic image synthesis. Nevertheless, these models rely on a time-consuming sampling procedure, which has motivated attempts to reduce their latency. When improving efficiency, researchers often use the original diffusion model to train an additional network designed specifically for fast image generati… ▽ More Text-to-image diffusion models have demonstrated unprecedented capabilities for flexible and realistic image synthesis. Nevertheless, these models rely on a time-consuming sampling procedure, which has motivated attempts to reduce their latency. When improving efficiency, researchers often use the original diffusion model to train an additional network designed specifically for fast image generation. In contrast, our approach seeks to reduce latency directly, without any retraining, fine-tuning, or knowledge distillation. In particular, we find the repeated calculation of attention maps to be costly yet redundant, and instead suggest reusing them during sampling. Our specific reuse strategies are based on ODE theory, which implies that the later a map is reused, the smaller the distortion in the final image. We empirically compare these reuse strategies with few-step sampling procedures of comparable latency, finding that reuse generates images that are closer to those produced by the original high-latency diffusion model. △ Less

Submitted 24 May, 2024; v1 submitted 13 December, 2023; originally announced January 2024.

arXiv:2312.17726 [pdf, ps, other]

Comparing Effectiveness and Efficiency of Interactive Application Security Testing (IAST) and Runtime Application Self-Protection (RASP) Tools in a Large Java-based System

Authors: Aishwarya Seth, Saikath Bhattacharya, Sarah Elder, Nusrat Zahan, Laurie Williams

Abstract: Security resources are scarce, and practitioners need guidance in the effective and efficient usage of techniques and tools available in the cybersecurity industry. Two emerging tool types, Interactive Application Security Testing (IAST) and Runtime Application Self-Protection (RASP), have not been thoroughly evaluated against well-established counterparts such as Dynamic Application Security Test… ▽ More Security resources are scarce, and practitioners need guidance in the effective and efficient usage of techniques and tools available in the cybersecurity industry. Two emerging tool types, Interactive Application Security Testing (IAST) and Runtime Application Self-Protection (RASP), have not been thoroughly evaluated against well-established counterparts such as Dynamic Application Security Testing (DAST) and Static Application Security Testing (SAST). The goal of this research is to aid practitioners in making informed choices about the use of Interactive Application Security Testing (IAST) and Runtime Application Self-Protection (RASP) tools through an analysis of their effectiveness and efficiency in comparison with different vulnerability detection and prevention techniques and tools. We apply IAST and RASP on OpenMRS, an open-source Java-based online application. We compare the efficiency and effectiveness of IAST and RASP with techniques applied on OpenMRS in prior work. We measure efficiency and effectiveness in terms of the number and type of vulnerabilities detected and prevented per hour. Our study shows IAST performed relatively well compared to other techniques, performing second-best in both efficiency and effectiveness. IAST detected eight Top-10 OWASP security risks compared to nine by SMPT and seven for EMPT, DAST, and SAST. IAST found more vulnerabilities than SMPT. The efficiency of IAST (2.14 VpH) is second to only EMPT (2.22 VpH). These findings imply that our study benefited from using IAST when conducting black-box security testing. In the context of a large, enterprise-scale web application such as OpenMRS, RASP does not replace vulnerability detection, while IAST is a powerful tool that complements other techniques. △ Less

Submitted 29 December, 2023; originally announced December 2023.

arXiv:2312.04825 [pdf, other]

Weighted Combinatorial Laplacian and its Application to Coverage Repair in Sensor Networks

Authors: Shunsaku Yadokoro, Subhrajit Bhattacharya

Abstract: We define the weighted combinatorial Laplacian operators on a simplicial complex and investigate their spectral properties. Eigenvalues close to zero and the corresponding eigenvectors of them are especially of our interest, and we show that they can detect almost $n$-dimensional holes in the given complex. Real-valued weights on simplices allow gradient descent based optimization, which in turn g… ▽ More We define the weighted combinatorial Laplacian operators on a simplicial complex and investigate their spectral properties. Eigenvalues close to zero and the corresponding eigenvectors of them are especially of our interest, and we show that they can detect almost $n$-dimensional holes in the given complex. Real-valued weights on simplices allow gradient descent based optimization, which in turn gives an efficient dynamic coverage repair algorithm for the sensor network of a mobile robot team. Using the theory of relative homology, we also extend the problem of dynamic coverage repair to environments with obstacles. △ Less

Submitted 14 April, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

Comments: 26 pages, 11 figures

arXiv:2311.12046 [pdf, other]

LATIS: Lambda Abstraction-based Thermal Image Super-resolution

Authors: Gargi Panda, Soumitra Kundu, Saumik Bhattacharya, Aurobinda Routray

Abstract: Single image super-resolution (SISR) is an effective technique to improve the quality of low-resolution thermal images. Recently, transformer-based methods have achieved significant performance in SISR. However, in the SR task, only a small number of pixels are involved in the transformers self-attention (SA) mechanism due to the computational complexity of the attention mechanism. The lambda abst… ▽ More Single image super-resolution (SISR) is an effective technique to improve the quality of low-resolution thermal images. Recently, transformer-based methods have achieved significant performance in SISR. However, in the SR task, only a small number of pixels are involved in the transformers self-attention (SA) mechanism due to the computational complexity of the attention mechanism. The lambda abstraction is a promising alternative to SA in modeling long-range interactions while being computationally more efficient. This paper presents lambda abstraction-based thermal image super-resolution (LATIS), a novel lightweight architecture for SISR of thermal images. LATIS sequentially captures local and global information using the local and global feature block (LGFB). In LGFB, we introduce a global feature extraction (GFE) module based on the lambda abstraction mechanism, channel-shuffle and convolution (CSConv) layer to encode local context. Besides, to improve the performance further, we propose a differentiable patch-wise histogram-based loss function. Experimental results demonstrate that our LATIS, with the least model parameters and complexity, achieves better or comparable performance with state-of-the-art methods across multiple datasets. △ Less

Submitted 17 November, 2023; originally announced November 2023.

arXiv:2311.08367 [pdf, other]

Arboricity-Dependent Algorithms for Edge Coloring

Authors: Sayan Bhattacharya, Martín Costa, Nadav Panski, Shay Solomon

Abstract: The problem of edge coloring has been extensively studied over the years. Recently, this problem has received significant attention in the dynamic setting, where we are given a dynamic graph evolving via a sequence of edge insertions and deletions and our objective is to maintain an edge coloring of the graph. Currently, it is not known whether it is possible to maintain a $(Δ+ O(Δ^{1 - μ}))$-ed… ▽ More The problem of edge coloring has been extensively studied over the years. Recently, this problem has received significant attention in the dynamic setting, where we are given a dynamic graph evolving via a sequence of edge insertions and deletions and our objective is to maintain an edge coloring of the graph. Currently, it is not known whether it is possible to maintain a $(Δ+ O(Δ^{1 - μ}))$-edge coloring in $\tilde{O}(1)$ update time, for any constant $μ> 0$, where $Δ$ is the maximum degree of the graph. In this paper, we show how to efficiently maintain a $(Δ+ O(α))$-edge coloring in $\tilde O(1)$ amortized update time, where $α$ is the arboricty of the graph. Thus, we answer this question in the affirmative for graphs of sufficiently small arboricity. △ Less

Submitted 7 February, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

Comments: Started to circulate in September 2023

arXiv:2311.08027 [pdf, other]

A practical key-recovery attack on LWE-based key-encapsulation mechanism schemes using Rowhammer

Authors: Puja Mondal, Suparna Kundu, Sarani Bhattacharya, Angshuman Karmakar, Ingrid Verbauwhede

Abstract: Physical attacks are serious threats to cryptosystems deployed in the real world. In this work, we propose a microarchitectural end-to-end attack methodology on generic lattice-based post-quantum key encapsulation mechanisms to recover the long-term secret key. Our attack targets a critical component of a Fujisaki-Okamoto transform that is used in the construction of almost all lattice-based key e… ▽ More Physical attacks are serious threats to cryptosystems deployed in the real world. In this work, we propose a microarchitectural end-to-end attack methodology on generic lattice-based post-quantum key encapsulation mechanisms to recover the long-term secret key. Our attack targets a critical component of a Fujisaki-Okamoto transform that is used in the construction of almost all lattice-based key encapsulation mechanisms. We demonstrate our attack model on practical schemes such as Kyber and Saber by using Rowhammer. We show that our attack is highly practical and imposes little preconditions on the attacker to succeed. As an additional contribution, we propose an improved version of the plaintext checking oracle, which is used by almost all physical attack strategies on lattice-based key-encapsulation mechanisms. Our improvement reduces the number of queries to the plaintext checking oracle by as much as $39\%$ for Saber and approximately $23\%$ for Kyber768. This can be of independent interest and can also be used to reduce the complexity of other attacks. △ Less

Submitted 14 November, 2023; originally announced November 2023.

ACM Class: E.3.3

arXiv:2311.03267 [pdf, ps, other]

Nibbling at Long Cycles: Dynamic (and Static) Edge Coloring in Optimal Time

Authors: Sayan Bhattacharya, Martín Costa, Nadav Panski, Shay Solomon

Abstract: We consider the problem of maintaining a $(1+ε)Δ$-edge coloring in a dynamic graph $G$ with $n$ nodes and maximum degree at most $Δ$. The state-of-the-art update time is $O_ε(\text{polylog}(n))$, by Duan, He and Zhang [SODA'19] and by Christiansen [STOC'23], and more precisely $O(\log^7 n/ε^2)$, where $Δ= Ω(\log^2 n / ε^2)$. The following natural question arises: What is the best possible update… ▽ More We consider the problem of maintaining a $(1+ε)Δ$-edge coloring in a dynamic graph $G$ with $n$ nodes and maximum degree at most $Δ$. The state-of-the-art update time is $O_ε(\text{polylog}(n))$, by Duan, He and Zhang [SODA'19] and by Christiansen [STOC'23], and more precisely $O(\log^7 n/ε^2)$, where $Δ= Ω(\log^2 n / ε^2)$. The following natural question arises: What is the best possible update time of an algorithm for this task? More specifically, \textbf{ can we bring it all the way down to some constant} (for constant $ε$)? This question coincides with the \emph{static} time barrier for the problem: Even for $(2Δ-1)$-coloring, there is only a naive $O(m \log Δ)$-time algorithm. We answer this fundamental question in the affirmative, by presenting a dynamic $(1+ε)Δ$-edge coloring algorithm with $O(\log^4 (1/ε)/ε^9)$ update time, provided $Δ= Ω_ε(\text{polylog}(n))$. As a corollary, we also get the first linear time (for constant $ε$) \emph{static} algorithm for $(1+ε)Δ$-edge coloring; in particular, we achieve a running time of $O(m \log (1/ε)/ε^2)$. We obtain our results by carefully combining a variant of the \textsc{Nibble} algorithm from Bhattacharya, Grandoni and Wajc [SODA'21] with the subsampling technique of Kulkarni, Liu, Sah, Sawhney and Tarnawski [STOC'22]. △ Less

Submitted 6 November, 2023; originally announced November 2023.

Comments: Accepted at SODA 2024

arXiv:2311.03084 [pdf, other]

A Simple yet Efficient Ensemble Approach for AI-generated Text Detection

Authors: Harika Abburi, Kalyani Roy, Michael Suesserman, Nirmala Pudota, Balaji Veeramani, Edward Bowen, Sanmitra Bhattacharya

Abstract: Recent Large Language Models (LLMs) have demonstrated remarkable capabilities in generating text that closely resembles human writing across wide range of styles and genres. However, such capabilities are prone to potential abuse, such as fake news generation, spam email creation, and misuse in academic assignments. Hence, it is essential to build automated approaches capable of distinguishing bet… ▽ More Recent Large Language Models (LLMs) have demonstrated remarkable capabilities in generating text that closely resembles human writing across wide range of styles and genres. However, such capabilities are prone to potential abuse, such as fake news generation, spam email creation, and misuse in academic assignments. Hence, it is essential to build automated approaches capable of distinguishing between artificially generated text and human-authored text. In this paper, we propose a simple yet efficient solution to this problem by ensembling predictions from multiple constituent LLMs. Compared to previous state-of-the-art approaches, which are perplexity-based or uses ensembles with a number of LLMs, our condensed ensembling approach uses only two constituent LLMs to achieve comparable performance. Experiments conducted on four benchmark datasets for generative text classification show performance improvements in the range of 0.5 to 100\% compared to previous state-of-the-art approaches. We also study the influence that the training data from individual LLMs have on model performance. We found that substituting commercially-restrictive Generative Pre-trained Transformer (GPT) data with data generated from other open language models such as Falcon, Large Language Model Meta AI (LLaMA2), and Mosaic Pretrained Transformers (MPT) is a feasible alternative when developing generative text detectors. Furthermore, to demonstrate zero-shot generalization, we experimented with an English essays dataset, and results suggest that our ensembling approach can handle new data effectively. △ Less

Submitted 7 November, 2023; v1 submitted 6 November, 2023; originally announced November 2023.

arXiv:2311.01534 [pdf, other]

Approximate Multiagent Reinforcement Learning for On-Demand Urban Mobility Problem on a Large Map (extended version)

Authors: Daniel Garces, Sushmita Bhattacharya, Dimitri Bertsekas, Stephanie Gil

Abstract: In this paper, we focus on the autonomous multiagent taxi routing problem for a large urban environment where the location and number of future ride requests are unknown a-priori, but can be estimated by an empirical distribution. Recent theory has shown that a rollout algorithm with a stable base policy produces a near-optimal stable policy. In the routing setting, a policy is stable if its execu… ▽ More In this paper, we focus on the autonomous multiagent taxi routing problem for a large urban environment where the location and number of future ride requests are unknown a-priori, but can be estimated by an empirical distribution. Recent theory has shown that a rollout algorithm with a stable base policy produces a near-optimal stable policy. In the routing setting, a policy is stable if its execution keeps the number of outstanding requests uniformly bounded over time. Although, rollout-based approaches are well-suited for learning cooperative multiagent policies with considerations for future demand, applying such methods to a large urban environment can be computationally expensive due to the large number of taxis required for stability. In this paper, we aim to address the computational bottleneck of multiagent rollout by proposing an approximate multiagent rollout-based two phase algorithm that reduces computational costs, while still achieving a stable near-optimal policy. Our approach partitions the graph into sectors based on the predicted demand and the maximum number of taxis that can run sequentially given the user's computational resources. The algorithm then applies instantaneous assignment (IA) for re-balancing taxis across sectors and a sector-wide multiagent rollout algorithm that is executed in parallel for each sector. We provide two main theoretical results: 1) characterize the number of taxis $m$ that is sufficient for IA to be stable; 2) derive a necessary condition on $m$ to maintain stability for IA as time goes to infinity. Our numerical results show that our approach achieves stability for an $m$ that satisfies the theoretical conditions. We also empirically demonstrate that our proposed two phase algorithm has equivalent performance to the one-at-a-time rollout over the entire map, but with significantly lower runtimes. △ Less

Submitted 8 March, 2024; v1 submitted 2 November, 2023; originally announced November 2023.

Comments: 11 pages, 5 figures, 1 lemma, and 2 theorems

arXiv:2310.20638 [pdf, other]

Histopathological Image Analysis with Style-Augmented Feature Domain Mixing for Improved Generalization

Authors: Vaibhav Khamankar, Sutanu Bera, Saumik Bhattacharya, Debashis Sen, Prabir Kumar Biswas

Abstract: Histopathological images are essential for medical diagnosis and treatment planning, but interpreting them accurately using machine learning can be challenging due to variations in tissue preparation, staining and imaging protocols. Domain generalization aims to address such limitations by enabling the learning models to generalize to new datasets or populations. Style transfer-based data augmenta… ▽ More Histopathological images are essential for medical diagnosis and treatment planning, but interpreting them accurately using machine learning can be challenging due to variations in tissue preparation, staining and imaging protocols. Domain generalization aims to address such limitations by enabling the learning models to generalize to new datasets or populations. Style transfer-based data augmentation is an emerging technique that can be used to improve the generalizability of machine learning models for histopathological images. However, existing style transfer-based methods can be computationally expensive, and they rely on artistic styles, which can negatively impact model accuracy. In this study, we propose a feature domain style mixing technique that uses adaptive instance normalization to generate style-augmented versions of images. We compare our proposed method with existing style transfer-based data augmentation methods and found that it performs similarly or better, despite requiring less computation and time. Our results demonstrate the potential of feature domain statistics mixing in the generalization of learning models for histopathological image analysis. △ Less

Submitted 31 October, 2023; originally announced October 2023.

Comments: Paper is published in MedAGI 2023 (MICCAI 2023 1st International Workshop on Foundation Models for General Medical AI) Code link: https://github.com/Vaibhav-Khamankar/FuseStyle Paper link: https://nbviewer.org/github/MedAGI/medagi.github.io/blob/main/src/assets/papers/P17.pdf

arXiv:2310.18371 [pdf, ps, other]

In-Context Ability Transfer for Question Decomposition in Complex QA

Authors: Venktesh V, Sourangshu Bhattacharya, Avishek Anand

Abstract: Answering complex questions is a challenging task that requires question decomposition and multistep reasoning for arriving at the solution. While existing supervised and unsupervised approaches are specialized to a certain task and involve training, recently proposed prompt-based approaches offer generalizable solutions to tackle a wide variety of complex question-answering (QA) tasks. However, e… ▽ More Answering complex questions is a challenging task that requires question decomposition and multistep reasoning for arriving at the solution. While existing supervised and unsupervised approaches are specialized to a certain task and involve training, recently proposed prompt-based approaches offer generalizable solutions to tackle a wide variety of complex question-answering (QA) tasks. However, existing prompt-based approaches that are effective for complex QA tasks involve expensive hand annotations from experts in the form of rationales and are not generalizable to newer complex QA scenarios and tasks. We propose, icat (In-Context Ability Transfer) which induces reasoning capabilities in LLMs without any LLM fine-tuning or manual annotation of in-context samples. We transfer the ability to decompose complex questions to simpler questions or generate step-by-step rationales to LLMs, by careful selection from available data sources of related tasks. We also propose an automated uncertainty-aware exemplar selection approach for selecting examples from transfer data sources. Finally, we conduct large-scale experiments on a variety of complex QA tasks involving numerical reasoning, compositional complex QA, and heterogeneous complex QA which require decomposed reasoning. We show that ICAT convincingly outperforms existing prompt-based solutions without involving any model training, showcasing the benefits of re-using existing abilities. △ Less

Submitted 26 October, 2023; originally announced October 2023.

Comments: 10 pages

arXiv:2310.17420 [pdf, other]

Fully Dynamic $k$-Clustering in $\tilde O(k)$ Update Time

Authors: Sayan Bhattacharya, Martín Costa, Silvio Lattanzi, Nikos Parotsidis

Abstract: We present a $O(1)$-approximate fully dynamic algorithm for the $k$-median and $k$-means problems on metric spaces with amortized update time $\tilde O(k)$ and worst-case query time $\tilde O(k^2)$. We complement our theoretical analysis with the first in-depth experimental study for the dynamic $k$-median problem on general metrics, focusing on comparing our dynamic algorithm to the current state… ▽ More We present a $O(1)$-approximate fully dynamic algorithm for the $k$-median and $k$-means problems on metric spaces with amortized update time $\tilde O(k)$ and worst-case query time $\tilde O(k^2)$. We complement our theoretical analysis with the first in-depth experimental study for the dynamic $k$-median problem on general metrics, focusing on comparing our dynamic algorithm to the current state-of-the-art by Henzinger and Kale [ESA'20]. Finally, we also provide a lower bound for dynamic $k$-median which shows that any $O(1)$-approximate algorithm with $\tilde O(\text{poly}(k))$ query time must have $\tilde Ω(k)$ amortized update time, even in the incremental setting. △ Less

Submitted 26 October, 2023; originally announced October 2023.

Comments: Accepted at NeurIPS 2023

arXiv:2310.15552 [pdf, other]

Unveiling Multilinguality in Transformer Models: Exploring Language Specificity in Feed-Forward Networks

Authors: Sunit Bhattacharya, Ondrej Bojar

Abstract: Recent research suggests that the feed-forward module within Transformers can be viewed as a collection of key-value memories, where the keys learn to capture specific patterns from the input based on the training examples. The values then combine the output from the 'memories' of the keys to generate predictions about the next token. This leads to an incremental process of prediction that gradual… ▽ More Recent research suggests that the feed-forward module within Transformers can be viewed as a collection of key-value memories, where the keys learn to capture specific patterns from the input based on the training examples. The values then combine the output from the 'memories' of the keys to generate predictions about the next token. This leads to an incremental process of prediction that gradually converges towards the final token choice near the output layers. This interesting perspective raises questions about how multilingual models might leverage this mechanism. Specifically, for autoregressive models trained on two or more languages, do all neurons (across layers) respond equally to all languages? No! Our hypothesis centers around the notion that during pretraining, certain model parameters learn strong language-specific features, while others learn more language-agnostic (shared across languages) features. To validate this, we conduct experiments utilizing parallel corpora of two languages that the model was initially pretrained on. Our findings reveal that the layers closest to the network's input or output tend to exhibit more language-specific behaviour compared to the layers in the middle. △ Less

Submitted 24 October, 2023; originally announced October 2023.

arXiv:2310.05172 [pdf, other]

On the Amplification of Cache Occupancy Attacks in Randomized Cache Architectures

Authors: Anirban Chakraborty, Nimish Mishra, Sayandeep Saha, Sarani Bhattacharya, Debdeep Mukhopadhyay

Abstract: In this work, we explore the applicability of cache occupancy attacks and the implications of secured cache design rationales on such attacks. In particular, we show that one of the well-known cache randomization schemes, MIRAGE, touted to be resilient against eviction-based attacks, amplifies the chances of cache occupancy attack, making it more vulnerable compared to contemporary designs. We lev… ▽ More In this work, we explore the applicability of cache occupancy attacks and the implications of secured cache design rationales on such attacks. In particular, we show that one of the well-known cache randomization schemes, MIRAGE, touted to be resilient against eviction-based attacks, amplifies the chances of cache occupancy attack, making it more vulnerable compared to contemporary designs. We leverage MIRAGE's global eviction property to demonstrate covert channel with byte-level granularity, with far less cache occupancy requirement (just $10\%$ of LLC) than other schemes. For instance, ScatterCache (a randomisation scheme with lesser security guarantees than MIRAGE) and generic set-associative caches require $40\%$ and $30\%$ cache occupancy, respectively, to exhibit covert communication. Furthermore, we extend our attack vectors to include side-channel, template-based fingerprinting of workloads in a cross-core setting. We demonstrate the potency of such fingerprinting on both inhouse LLC simulator as well as on SPEC2017 workloads on gem5. Finally, we pinpoint implementation inconsistencies in MIRAGE's publicly available gem5 artifact which motivates a re-evaluation of the performance statistics of MIRAGE with respect to ScatterCache and baseline set-associative cache. We find MIRAGE, in reality, performs worse than what is previously reported in literature, a concern that should be addressed in successor generations of secured caches. △ Less

Submitted 8 October, 2023; originally announced October 2023.

arXiv:2310.04453 [pdf, other]

COVID-19 South African Vaccine Hesitancy Models Show Boost in Performance Upon Fine-Tuning on M-pox Tweets

Authors: Nicholas Perikli, Srimoy Bhattacharya, Blessing Ogbuokiri, Zahra Movahedi Nia, Benjamin Lieberman, Nidhi Tripathi, Salah-Eddine Dahbi, Finn Stevenson, Nicola Bragazzi, Jude Kong, Bruce Mellado

Abstract: Very large numbers of M-pox cases have, since the start of May 2022, been reported in non-endemic countries leading many to fear that the M-pox Outbreak would rapidly transition into another pandemic, while the COVID-19 pandemic ravages on. Given the similarities of M-pox with COVID-19, we chose to test the performance of COVID-19 models trained on South African twitter data on a hand-labelled M-p… ▽ More Very large numbers of M-pox cases have, since the start of May 2022, been reported in non-endemic countries leading many to fear that the M-pox Outbreak would rapidly transition into another pandemic, while the COVID-19 pandemic ravages on. Given the similarities of M-pox with COVID-19, we chose to test the performance of COVID-19 models trained on South African twitter data on a hand-labelled M-pox dataset before and after fine-tuning. More than 20k M-pox-related tweets from South Africa were hand-labelled as being either positive, negative or neutral. After fine-tuning these COVID-19 models on the M-pox dataset, the F1-scores increased by more than 8% falling just short of 70%, but still outperforming state-of-the-art models and well-known classification algorithms. An LDA-based topic modelling procedure was used to compare the miss-classified M-pox tweets of the original COVID-19 RoBERTa model with its fine-tuned version, and from this analysis, we were able to draw conclusions on how to build more sophisticated models. △ Less

Submitted 4 October, 2023; originally announced October 2023.

arXiv:2310.00917 [pdf, other]

Harnessing the Power of Multi-Lingual Datasets for Pre-training: Towards Enhancing Text Spotting Performance

Authors: Alloy Das, Sanket Biswas, Ayan Banerjee, Josep Lladós, Umapada Pal, Saumik Bhattacharya

Abstract: The adaptation capability to a wide range of domains is crucial for scene text spotting models when deployed to real-world conditions. However, existing state-of-the-art (SOTA) approaches usually incorporate scene text detection and recognition simply by pretraining on natural scene text datasets, which do not directly exploit the intermediate feature representations between multiple domains. Here… ▽ More The adaptation capability to a wide range of domains is crucial for scene text spotting models when deployed to real-world conditions. However, existing state-of-the-art (SOTA) approaches usually incorporate scene text detection and recognition simply by pretraining on natural scene text datasets, which do not directly exploit the intermediate feature representations between multiple domains. Here, we investigate the problem of domain-adaptive scene text spotting, i.e., training a model on multi-domain source data such that it can directly adapt to target domains rather than being specialized for a specific domain or scenario. Further, we investigate a transformer baseline called Swin-TESTR to focus on solving scene-text spotting for both regular and arbitrary-shaped scene text along with an exhaustive evaluation. The results clearly demonstrate the potential of intermediate representations to achieve significant performance on text spotting benchmarks across multiple domains (e.g. language, synth-to-real, and documents). both in terms of accuracy and efficiency. △ Less

Submitted 1 November, 2023; v1 submitted 2 October, 2023; originally announced October 2023.

Comments: Accepted to the 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2024)

arXiv:2309.07755 [pdf, other]

Generative AI Text Classification using Ensemble LLM Approaches

Authors: Harika Abburi, Michael Suesserman, Nirmala Pudota, Balaji Veeramani, Edward Bowen, Sanmitra Bhattacharya

Abstract: Large Language Models (LLMs) have shown impressive performance across a variety of Artificial Intelligence (AI) and natural language processing tasks, such as content creation, report generation, etc. However, unregulated malign application of these models can create undesirable consequences such as generation of fake news, plagiarism, etc. As a result, accurate detection of AI-generated language… ▽ More Large Language Models (LLMs) have shown impressive performance across a variety of Artificial Intelligence (AI) and natural language processing tasks, such as content creation, report generation, etc. However, unregulated malign application of these models can create undesirable consequences such as generation of fake news, plagiarism, etc. As a result, accurate detection of AI-generated language can be crucial in responsible usage of LLMs. In this work, we explore 1) whether a certain body of text is AI generated or written by human, and 2) attribution of a specific language model in generating a body of text. Texts in both English and Spanish are considered. The datasets used in this study are provided as part of the Automated Text Identification (AuTexTification) shared task. For each of the research objectives stated above, we propose an ensemble neural model that generates probabilities from different pre-trained LLMs which are used as features to a Traditional Machine Learning (TML) classifier following it. For the first task of distinguishing between AI and human generated text, our model ranked in fifth and thirteenth place (with macro $F1$ scores of 0.733 and 0.649) for English and Spanish texts, respectively. For the second task on model attribution, our model ranked in first place with macro $F1$ scores of 0.625 and 0.653 for English and Spanish texts, respectively. △ Less

Submitted 14 September, 2023; originally announced September 2023.

arXiv:2308.11854 [pdf, other]

doi 10.5120/ijca2023923042

Finding the Perfect Fit: Applying Regression Models to ClimateBench v1.0

Authors: Anmol Chaure, Ashok Kumar Behera, Sudip Bhattacharya

Abstract: Climate projections using data driven machine learning models acting as emulators, is one of the prevailing areas of research to enable policy makers make informed decisions. Use of machine learning emulators as surrogates for computationally heavy GCM simulators reduces time and carbon footprints. In this direction, ClimateBench [1] is a recently curated benchmarking dataset for evaluating the pe… ▽ More Climate projections using data driven machine learning models acting as emulators, is one of the prevailing areas of research to enable policy makers make informed decisions. Use of machine learning emulators as surrogates for computationally heavy GCM simulators reduces time and carbon footprints. In this direction, ClimateBench [1] is a recently curated benchmarking dataset for evaluating the performance of machine learning emulators designed for climate data. Recent studies have reported that despite being considered fundamental, regression models offer several advantages pertaining to climate emulations. In particular, by leveraging the kernel trick, regression models can capture complex relationships and improve their predictive capabilities. This study focuses on evaluating non-linear regression models using the aforementioned dataset. Specifically, we compare the emulation capabilities of three non-linear regression models. Among them, Gaussian Process Regressor demonstrates the best-in-class performance against standard evaluation metrics used for climate field emulation studies. However, Gaussian Process Regression suffers from being computational resource hungry in terms of space and time complexity. Alternatively, Support Vector and Kernel Ridge models also deliver competitive results and but there are certain trade-offs to be addressed. Additionally, we are actively investigating the performance of composite kernels and techniques such as variational inference to further enhance the performance of the regression models and effectively model complex non-linear patterns, including phenomena like precipitation. △ Less

Submitted 22 August, 2023; originally announced August 2023.

Journal ref: International Journal of Computer Applications 185(29):31-39, August 2023

arXiv:2308.09104 [pdf, other]

A comprehensive study of spike and slab shrinkage priors for structurally sparse Bayesian neural networks

Authors: Sanket Jantre, Shrijita Bhattacharya, Tapabrata Maiti

Abstract: Network complexity and computational efficiency have become increasingly significant aspects of deep learning. Sparse deep learning addresses these challenges by recovering a sparse representation of the underlying target function by reducing heavily over-parameterized deep neural networks. Specifically, deep neural architectures compressed via structured sparsity (e.g. node sparsity) provide low… ▽ More Network complexity and computational efficiency have become increasingly significant aspects of deep learning. Sparse deep learning addresses these challenges by recovering a sparse representation of the underlying target function by reducing heavily over-parameterized deep neural networks. Specifically, deep neural architectures compressed via structured sparsity (e.g. node sparsity) provide low latency inference, higher data throughput, and reduced energy consumption. In this paper, we explore two well-established shrinkage techniques, Lasso and Horseshoe, for model compression in Bayesian neural networks. To this end, we propose structurally sparse Bayesian neural networks which systematically prune excessive nodes with (i) Spike-and-Slab Group Lasso (SS-GL), and (ii) Spike-and-Slab Group Horseshoe (SS-GHS) priors, and develop computationally tractable variational inference including continuous relaxation of Bernoulli variables. We establish the contraction rates of the variational posterior of our proposed models as a function of the network topology, layer-wise node cardinalities, and bounds on the network weights. We empirically demonstrate the competitive performance of our models compared to the baseline models in prediction accuracy, model compression, and inference latency. △ Less

Submitted 17 August, 2023; originally announced August 2023.

arXiv:2308.08917 [pdf, other]

Unfolding for Joint Channel Estimation and Symbol Detection in MIMO Communication Systems

Authors: Swati Bhattacharya, K. V. S. Hari, Yonina C. Eldar

Abstract: This paper proposes a Joint Channel Estimation and Symbol Detection (JED) scheme for Multiple-Input Multiple-Output (MIMO) wireless communication systems. Our proposed method for JED using Alternating Direction Method of Multipliers (JED-ADMM) and its model-based neural network version JED using Unfolded ADMM (JED-U-ADMM) markedly improve the symbol detection performance over JED using Alternating… ▽ More This paper proposes a Joint Channel Estimation and Symbol Detection (JED) scheme for Multiple-Input Multiple-Output (MIMO) wireless communication systems. Our proposed method for JED using Alternating Direction Method of Multipliers (JED-ADMM) and its model-based neural network version JED using Unfolded ADMM (JED-U-ADMM) markedly improve the symbol detection performance over JED using Alternating Minimization (JED-AM) for a range of MIMO antenna configurations. Both proposed algorithms exploit the non-smooth constraint, that occurs as a result of the Quadrature Amplitude Modulation (QAM) data symbols, to effectively improve the performance using the ADMM iterations. The proposed unfolded network JED-U-ADMM consists of a few trainable parameters and requires a small training set. We show the efficacy of the proposed methods for both uncorrelated and correlated MIMO channels. For certain configurations, the gain in SNR for a desired BER of $10^{-2}$ for the proposed JED-ADMM and JED-U-ADMM is upto $4$ dB and is also accompanied by a significant reduction in computational complexity of upto $75\%$, depending on the MIMO configuration, as compared to the complexity of JED-AM. △ Less

Submitted 21 August, 2023; v1 submitted 17 August, 2023; originally announced August 2023.

Comments: 14 pages, 19 figures, submitted to IEEE Transactions on Signal Processing

arXiv:2308.04178 [pdf, ps, other]

Assistive Chatbots for healthcare: a succinct review

Authors: Basabdatta Sen Bhattacharya, Vibhav Sinai Pissurlenkar

Abstract: Artificial Intelligence (AI) for supporting healthcare services has never been more necessitated than by the recent global pandemic. Here, we review the state-of-the-art in AI-enabled Chatbots in healthcare proposed during the last 10 years (2013-2023). The focus on AI-enabled technology is because of its potential for enhancing the quality of human-machine interaction via Chatbots, reducing depen… ▽ More Artificial Intelligence (AI) for supporting healthcare services has never been more necessitated than by the recent global pandemic. Here, we review the state-of-the-art in AI-enabled Chatbots in healthcare proposed during the last 10 years (2013-2023). The focus on AI-enabled technology is because of its potential for enhancing the quality of human-machine interaction via Chatbots, reducing dependence on human-human interaction and saving man-hours. Our review indicates that there are a handful of (commercial) Chatbots that are being used for patient support, while there are others (non-commercial) that are in the clinical trial phases. However, there is a lack of trust on this technology regarding patient safety and data protection, as well as a lack of wider awareness on its benefits among the healthcare workers and professionals. Also, patients have expressed dissatisfaction with Natural Language Processing (NLP) skills of the Chatbots in comparison to humans. Notwithstanding the recent introduction of ChatGPT that has raised the bar for the NLP technology, this Chatbot cannot be trusted with patient safety and medical ethics without thorough and rigorous checks to serve in the `narrow' domain of assistive healthcare. Our review suggests that to enable deployment and integration of AI-enabled Chatbots in public health services, the need of the hour is: to build technology that is simple and safe to use; to build confidence on the technology among: (a) the medical community by focussed training and development; (b) the patients and wider community through outreach. △ Less

Submitted 8 August, 2023; originally announced August 2023.

arXiv:2308.03908 [pdf, other]

ViLP: Knowledge Exploration using Vision, Language, and Pose Embeddings for Video Action Recognition

Authors: Soumyabrata Chaudhuri, Saumik Bhattacharya

Abstract: Video Action Recognition (VAR) is a challenging task due to its inherent complexities. Though different approaches have been explored in the literature, designing a unified framework to recognize a large number of human actions is still a challenging problem. Recently, Multi-Modal Learning (MML) has demonstrated promising results in this domain. In literature, 2D skeleton or pose modality has ofte… ▽ More Video Action Recognition (VAR) is a challenging task due to its inherent complexities. Though different approaches have been explored in the literature, designing a unified framework to recognize a large number of human actions is still a challenging problem. Recently, Multi-Modal Learning (MML) has demonstrated promising results in this domain. In literature, 2D skeleton or pose modality has often been used for this task, either independently or in conjunction with the visual information (RGB modality) present in videos. However, the combination of pose, visual information, and text attributes has not been explored yet, though text and pose attributes independently have been proven to be effective in numerous computer vision tasks. In this paper, we present the first pose augmented Vision-language model (VLM) for VAR. Notably, our scheme achieves an accuracy of 92.81% and 73.02% on two popular human video action recognition benchmark datasets, UCF-101 and HMDB-51, respectively, even without any video data pre-training, and an accuracy of 96.11% and 75.75% after kinetics pre-training. △ Less

Submitted 7 August, 2023; originally announced August 2023.

Comments: 7 pages, 3 figures, 2 Tables

arXiv:2308.02905 [pdf, other]

FAST: Font-Agnostic Scene Text Editing

Authors: Alloy Das, Prasun Roy, Saumik Bhattacharya, Subhankar Ghosh, Umapada Pal, Michael Blumenstein

Abstract: Scene Text Editing (STE) is a challenging research problem, and it aims to modify existing texts in an image while preserving the background and the font style of the original text of the image. Due to its various real-life applications, researchers have explored several approaches toward STE in recent years. However, most of the existing STE methods show inferior editing performance because of (1… ▽ More Scene Text Editing (STE) is a challenging research problem, and it aims to modify existing texts in an image while preserving the background and the font style of the original text of the image. Due to its various real-life applications, researchers have explored several approaches toward STE in recent years. However, most of the existing STE methods show inferior editing performance because of (1) complex image backgrounds, (2) various font styles, and (3) varying word lengths within the text. To address such inferior editing performance issues, in this paper, we propose a novel font-agnostic scene text editing framework, named FAST, for simultaneously generating text in arbitrary styles and locations while preserving a natural and realistic appearance through combined mask generation and style transfer. The proposed approach differs from the existing methods as they directly modify all image pixels. Instead, the proposed method has introduced a filtering mechanism to remove background distractions, allowing the network to focus solely on the text regions where editing is required. Additionally, a text-style transfer module has been designed to mitigate the challenges posed by varying word lengths. Extensive experiments and ablations have been conducted, and the results demonstrate that the proposed method outperforms the existing methods both qualitatively and quantitatively. △ Less

Submitted 5 August, 2023; originally announced August 2023.

Comments: 13 pages, in submission

arXiv:2308.01140 [pdf, other]

Dynamically Scaled Temperature in Self-Supervised Contrastive Learning

Authors: Siladittya Manna, Soumitri Chattopadhyay, Rakesh Dey, Saumik Bhattacharya, Umapada Pal

Abstract: In contemporary self-supervised contrastive algorithms like SimCLR, MoCo, etc., the task of balancing attraction between two semantically similar samples and repulsion between two samples of different classes is primarily affected by the presence of hard negative samples. While the InfoNCE loss has been shown to impose penalties based on hardness, the temperature hyper-parameter is the key to regu… ▽ More In contemporary self-supervised contrastive algorithms like SimCLR, MoCo, etc., the task of balancing attraction between two semantically similar samples and repulsion between two samples of different classes is primarily affected by the presence of hard negative samples. While the InfoNCE loss has been shown to impose penalties based on hardness, the temperature hyper-parameter is the key to regulating the penalties and the trade-off between uniformity and tolerance. In this work, we focus our attention on improving the performance of InfoNCE loss in self-supervised learning by proposing a novel cosine similarity dependent temperature scaling function to effectively optimize the distribution of the samples in the feature space. We also provide mathematical analyses to support the construction of such a dynamically scaled temperature function. Experimental evidence shows that the proposed framework outperforms the contrastive loss-based SSL algorithms. △ Less

Submitted 10 May, 2024; v1 submitted 2 August, 2023; originally announced August 2023.

arXiv:2307.15844 [pdf, other]

doi 10.1109/TIT.2024.3353769

Shared Information for a Markov Chain on a Tree

Authors: Sagnik Bhattacharya, Prakash Narayan

Abstract: Shared information is a measure of mutual dependence among multiple jointly distributed random variables with finite alphabets. For a Markov chain on a tree with a given joint distribution, we give a new proof of an explicit characterization of shared information. The Markov chain on a tree is shown to possess a global Markov property based on graph separation; this property plays a key role in ou… ▽ More Shared information is a measure of mutual dependence among multiple jointly distributed random variables with finite alphabets. For a Markov chain on a tree with a given joint distribution, we give a new proof of an explicit characterization of shared information. The Markov chain on a tree is shown to possess a global Markov property based on graph separation; this property plays a key role in our proofs. When the underlying joint distribution is not known, we exploit the special form of this characterization to provide a multiarmed bandit algorithm for estimating shared information, and analyze its error performance. △ Less

Submitted 21 January, 2024; v1 submitted 28 July, 2023; originally announced July 2023.

Comments: 13 pages, 4 figures, submitted to IEEE Transactions on Information Theory

arXiv:2307.15072 [pdf, other]

Detecting the Presence of COVID-19 Vaccination Hesitancy from South African Twitter Data Using Machine Learning

Authors: Nicholas Perikli, Srimoy Bhattacharya, Blessing Ogbuokiri, Zahra Movahedi Nia, Benjamin Lieberman, Nidhi Tripathi, Salah-Eddine Dahbi, Finn Stevenson, Nicola Bragazzi, Jude Kong, Bruce Mellado

Abstract: Very few social media studies have been done on South African user-generated content during the COVID-19 pandemic and even fewer using hand-labelling over automated methods. Vaccination is a major tool in the fight against the pandemic, but vaccine hesitancy jeopardizes any public health effort. In this study, sentiment analysis on South African tweets related to vaccine hesitancy was performed, w… ▽ More Very few social media studies have been done on South African user-generated content during the COVID-19 pandemic and even fewer using hand-labelling over automated methods. Vaccination is a major tool in the fight against the pandemic, but vaccine hesitancy jeopardizes any public health effort. In this study, sentiment analysis on South African tweets related to vaccine hesitancy was performed, with the aim of training AI-mediated classification models and assessing their reliability in categorizing UGC. A dataset of 30000 tweets from South Africa were extracted and hand-labelled into one of three sentiment classes: positive, negative, neutral. The machine learning models used were LSTM, bi-LSTM, SVM, BERT-base-cased and the RoBERTa-base models, whereby their hyperparameters were carefully chosen and tuned using the WandB platform. We used two different approaches when we pre-processed our data for comparison: one was semantics-based, while the other was corpus-based. The pre-processing of the tweets in our dataset was performed using both methods, respectively. All models were found to have low F1-scores within a range of 45$\%$-55$\%$, except for BERT and RoBERTa which both achieved significantly better measures with overall F1-scores of 60$\%$ and 61$\%$, respectively. Topic modelling using an LDA was performed on the miss-classified tweets of the RoBERTa model to gain insight on how to further improve model accuracy. △ Less

Submitted 12 July, 2023; originally announced July 2023.

arXiv:2307.07421 [pdf, other]

SummaryMixing: A Linear-Complexity Alternative to Self-Attention for Speech Recognition and Understanding

Authors: Titouan Parcollet, Rogier van Dalen, Shucong Zhang, Sourav Bhattacharya

Abstract: Modern speech processing systems rely on self-attention. Unfortunately, token mixing with self-attention takes quadratic time in the length of the speech utterance, slowing down inference and training and increasing memory consumption. Cheaper alternatives to self-attention for ASR have been developed, but they fail to consistently reach the same level of accuracy. This paper, therefore, proposes… ▽ More Modern speech processing systems rely on self-attention. Unfortunately, token mixing with self-attention takes quadratic time in the length of the speech utterance, slowing down inference and training and increasing memory consumption. Cheaper alternatives to self-attention for ASR have been developed, but they fail to consistently reach the same level of accuracy. This paper, therefore, proposes a novel linear-time alternative to self-attention. It summarises an utterance with the mean over vectors for all time steps. This single summary is then combined with time-specific information. We call this method "SummaryMixing". Introducing SummaryMixing in state-of-the-art ASR models makes it feasible to preserve or exceed previous speech recognition performance while making training and inference up to 28% faster and reducing memory use by half. △ Less

Submitted 11 July, 2024; v1 submitted 12 July, 2023; originally announced July 2023.

Comments: Interspeech 2024

arXiv:2307.06659 [pdf, other]

A Comprehensive Analysis of Blockchain Applications for Securing Computer Vision Systems

Authors: Ramalingam M, Chemmalar Selvi, Nancy Victor, Rajeswari Chengoden, Sweta Bhattacharya, Praveen Kumar Reddy Maddikunta, Duehee Lee, Md. Jalil Piran, Neelu Khare, Gokul Yendri, Thippa Reddy Gadekallu

Abstract: Blockchain (BC) and Computer Vision (CV) are the two emerging fields with the potential to transform various sectors.The ability of BC can help in offering decentralized and secure data storage, while CV allows machines to learn and understand visual data. This integration of the two technologies holds massive promise for developing innovative applications that can provide solutions to the challen… ▽ More Blockchain (BC) and Computer Vision (CV) are the two emerging fields with the potential to transform various sectors.The ability of BC can help in offering decentralized and secure data storage, while CV allows machines to learn and understand visual data. This integration of the two technologies holds massive promise for developing innovative applications that can provide solutions to the challenges in various sectors such as supply chain management, healthcare, smart cities, and defense. This review explores a comprehensive analysis of the integration of BC and CV by examining their combination and potential applications. It also provides a detailed analysis of the fundamental concepts of both technologies, highlighting their strengths and limitations. This paper also explores current research efforts that make use of the benefits offered by this combination. The effort includes how BC can be used as an added layer of security in CV systems and also ensure data integrity, enabling decentralized image and video analytics using BC. The challenges and open issues associated with this integration are also identified, and appropriate potential future directions are also proposed. △ Less

Submitted 13 July, 2023; originally announced July 2023.

arXiv:2307.02415 [pdf, ps, other]

Density-Sensitive Algorithms for $(Δ+ 1)$-Edge Coloring

Authors: Sayan Bhattacharya, Martín Costa, Nadav Panski, Shay Solomon

Abstract: Vizing's theorem asserts the existence of a {$(Δ+1)$-edge coloring} for any graph $G$, where $Δ= Δ(G)$ denotes the maximum degree of $G$. Several polynomial time $(Δ+1)$-edge coloring algorithms are known, and the state-of-the-art running time (up to polylogarithmic factors) is $\tilde{O}(\min\{m \cdot \sqrt{n}, m \cdot Δ\})$, by Gabow et al.\ from 1985, where $n$ and $m$ denote the number of vert… ▽ More Vizing's theorem asserts the existence of a {$(Δ+1)$-edge coloring} for any graph $G$, where $Δ= Δ(G)$ denotes the maximum degree of $G$. Several polynomial time $(Δ+1)$-edge coloring algorithms are known, and the state-of-the-art running time (up to polylogarithmic factors) is $\tilde{O}(\min\{m \cdot \sqrt{n}, m \cdot Δ\})$, by Gabow et al.\ from 1985, where $n$ and $m$ denote the number of vertices and edges in the graph, respectively. (The $\tilde{O}$ notation suppresses polylogarithmic factors.) Recently, Sinnamon shaved off a polylogarithmic factor from the time bound of Gabow et al. The {arboricity} $α= α(G)$ of a graph $G$ is the minimum number of edge-disjoint forests into which its edge set can be partitioned, and it is a measure of the graph's ``uniform density''. While $α\le Δ$ in any graph, many natural and real-world graphs exhibit a significant separation between $α$ and $Δ$. In this work we design a $(Δ+1)$-edge coloring algorithm with a running time of $\tilde{O}(\min\{m \cdot \sqrt{n}, m \cdot Δ\})\cdot \fracαΔ$, thus improving the longstanding time barrier by a factor of $\fracαΔ$. In particular, we achieve a near-linear runtime for bounded arboricity graphs (i.e., $α= \tilde{O}(1)$) as well as when $α= \tilde{O}(\fracΔ{\sqrt{n}})$. Our algorithm builds on Sinnamon's algorithm, and can be viewed as a density-sensitive refinement of it. △ Less

Submitted 5 July, 2023; originally announced July 2023.

Showing 1–50 of 247 results for author: Bhattacharya, S