subscribe to arXiv mailings

arXiv:2407.19551 [pdf, other]

doi 10.1007/s11263-023-01810-0

Improving Domain Adaptation Through Class Aware Frequency Transformation

Authors: Vikash Kumar, Himanshu Patil, Rohit Lal, Anirban Chakraborty

Abstract: In this work, we explore the usage of the Frequency Transformation for reducing the domain shift between the source and target domain (e.g., synthetic image and real image respectively) towards solving the Domain Adaptation task. Most of the Unsupervised Domain Adaptation (UDA) algorithms focus on reducing the global domain shift between labelled source and unlabelled target domains by matching th… ▽ More In this work, we explore the usage of the Frequency Transformation for reducing the domain shift between the source and target domain (e.g., synthetic image and real image respectively) towards solving the Domain Adaptation task. Most of the Unsupervised Domain Adaptation (UDA) algorithms focus on reducing the global domain shift between labelled source and unlabelled target domains by matching the marginal distributions under a small domain gap assumption. UDA performance degrades for the cases where the domain gap between source and target distribution is large. In order to bring the source and the target domains closer, we propose a novel approach based on traditional image processing technique Class Aware Frequency Transformation (CAFT) that utilizes pseudo label based class consistent low-frequency swapping for improving the overall performance of the existing UDA algorithms. The proposed approach, when compared with the state-of-the-art deep learning based methods, is computationally more efficient and can easily be plugged into any existing UDA algorithm to improve its performance. Additionally, we introduce a novel approach based on absolute difference of top-2 class prediction probabilities (ADT2P) for filtering target pseudo labels into clean and noisy sets. Samples with clean pseudo labels can be used to improve the performance of unsupervised learning algorithms. We name the overall framework as CAFT++. We evaluate the same on the top of different UDA algorithms across many public domain adaptation datasets. Our extensive experiments indicate that CAFT++ is able to achieve significant performance gains across all the popular benchmarks. △ Less

Submitted 28 July, 2024; originally announced July 2024.

Comments: Accepted at the International Journal of Computer Vision

arXiv:2407.19099 [pdf, other]

Sponsored is the New Organic: Implications of Sponsored Results on Quality of Search Results in the Amazon Marketplace

Authors: Abhisek Dash, Saptarshi Ghosh, Animesh Mukherjee, Abhijnan Chakraborty, Krishna P. Gummadi

Abstract: Interleaving sponsored results (advertisements) amongst organic results on search engine result pages (SERP) has become a common practice across multiple digital platforms. Advertisements have catered to consumer satisfaction and fostered competition in digital public spaces; making them an appealing gateway for businesses to reach their consumers. However, especially in the context of digital mar… ▽ More Interleaving sponsored results (advertisements) amongst organic results on search engine result pages (SERP) has become a common practice across multiple digital platforms. Advertisements have catered to consumer satisfaction and fostered competition in digital public spaces; making them an appealing gateway for businesses to reach their consumers. However, especially in the context of digital marketplaces, due to the competitive nature of the sponsored results with the organic ones, multiple unwanted repercussions have surfaced affecting different stakeholders. From the consumers' perspective the sponsored ads/results may cause degradation of search quality and nudge consumers to potentially irrelevant and costlier products. The sponsored ads may also affect the level playing field of the competition in the marketplaces among sellers. To understand and unravel these potential concerns, we analyse the Amazon digital marketplace in four different countries by simulating 4,800 search operations. Our analyses over SERPs consisting 2M organic and 638K sponsored results show items with poor organic ranks (beyond 100th position) appear as sponsored results even before the top organic results on the first page of Amazon SERP. Moreover, we also observe that in majority of the cases, these top sponsored results are costlier and are of poorer quality than the top organic results. We believe these observations can motivate researchers for further deliberation to bring in more transparency and guard rails in the advertising practices followed in digital marketplaces. △ Less

Submitted 26 July, 2024; originally announced July 2024.

Comments: This work has been accepted as a full paper in AAAI/ACM conference on Artificial Intelligence, Ethics and Society (AIES) 2024

arXiv:2407.07858 [pdf, other]

FACTS About Building Retrieval Augmented Generation-based Chatbots

Authors: Rama Akkiraju, Anbang Xu, Deepak Bora, Tan Yu, Lu An, Vishal Seth, Aaditya Shukla, Pritam Gundecha, Hridhay Mehta, Ashwin Jha, Prithvi Raj, Abhinav Balasubramanian, Murali Maram, Guru Muthusamy, Shivakesh Reddy Annepally, Sidney Knowles, Min Du, Nick Burnett, Sean Javiya, Ashok Marannan, Mamta Kumari, Surbhi Jha, Ethan Dereszenski, Anupam Chakraborty, Subhash Ranjan , et al. (13 additional authors not shown)

Abstract: Enterprise chatbots, powered by generative AI, are emerging as key applications to enhance employee productivity. Retrieval Augmented Generation (RAG), Large Language Models (LLMs), and orchestration frameworks like Langchain and Llamaindex are crucial for building these chatbots. However, creating effective enterprise chatbots is challenging and requires meticulous RAG pipeline engineering. This… ▽ More Enterprise chatbots, powered by generative AI, are emerging as key applications to enhance employee productivity. Retrieval Augmented Generation (RAG), Large Language Models (LLMs), and orchestration frameworks like Langchain and Llamaindex are crucial for building these chatbots. However, creating effective enterprise chatbots is challenging and requires meticulous RAG pipeline engineering. This includes fine-tuning embeddings and LLMs, extracting documents from vector databases, rephrasing queries, reranking results, designing prompts, honoring document access controls, providing concise responses, including references, safeguarding personal information, and building orchestration agents. We present a framework for building RAG-based chatbots based on our experience with three NVIDIA chatbots: for IT/HR benefits, financial earnings, and general content. Our contributions are three-fold: introducing the FACTS framework (Freshness, Architectures, Cost, Testing, Security), presenting fifteen RAG pipeline control points, and providing empirical results on accuracy-latency tradeoffs between large and small LLMs. To the best of our knowledge, this is the first paper of its kind that provides a holistic view of the factors as well as solutions for building secure enterprise-grade chatbots." △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: 8 pages, 6 figures, 2 tables, Preprint submission to ACM CIKM 2024

arXiv:2407.01732 [pdf, other]

Investigating Nudges toward Related Sellers on E-commerce Marketplaces: A Case Study on Amazon

Authors: Abhisek Dash, Abhijnan Chakraborty, Saptarshi Ghosh, Animesh Mukherjee, Krishna P. Gummadi

Abstract: E-commerce marketplaces provide business opportunities to millions of sellers worldwide. Some of these sellers have special relationships with the marketplace by virtue of using their subsidiary services (e.g., fulfillment and/or shipping services provided by the marketplace) -- we refer to such sellers collectively as Related Sellers. When multiple sellers offer to sell the same product, the mark… ▽ More E-commerce marketplaces provide business opportunities to millions of sellers worldwide. Some of these sellers have special relationships with the marketplace by virtue of using their subsidiary services (e.g., fulfillment and/or shipping services provided by the marketplace) -- we refer to such sellers collectively as Related Sellers. When multiple sellers offer to sell the same product, the marketplace helps a customer in selecting an offer (by a seller) through (a) a default offer selection algorithm, (b) showing features about each of the offers and the corresponding sellers (price, seller performance metrics, seller's number of ratings etc.), and (c) finally evaluating the sellers along these features. In this paper, we perform an end-to-end investigation into how the above apparatus can nudge customers toward the Related Sellers on Amazon's four different marketplaces in India, USA, Germany and France. We find that given explicit choices, customers' preferred offers and algorithmically selected offers can be significantly different. We highlight that Amazon is adopting different performance metric evaluation policies for different sellers, potentially benefiting Related Sellers. For instance, such policies result in notable discrepancy between the actual performance metric and the presented performance metric of Related Sellers. We further observe that among the seller-centric features visible to customers, sellers' number of ratings influences their decisions the most, yet it may not reflect the true quality of service by the seller, rather reflecting the scale at which the seller operates, thereby implicitly steering customers toward larger Related Sellers. Moreover, when customers are shown the rectified metrics for the different sellers, their preference toward Related Sellers is almost halved. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: This work has been accepted for presentation at the ACM Conference on Computer-Supported Cooperative Work and Social Computing (CSCW) 2024. It will appear in Proceedings of the ACM on Human-Computer Interaction

arXiv:2407.00067 [pdf]

doi 10.22214/ijraset.2023.49044

Perceptron Collaborative Filtering

Authors: Arya Chakraborty

Abstract: While multivariate logistic regression classifiers are a great way of implementing collaborative filtering - a method of making automatic predictions about the interests of a user by collecting preferences or taste information from many other users, we can also achieve similar results using neural networks. A recommender system is a subclass of information filtering system that provide suggestions… ▽ More While multivariate logistic regression classifiers are a great way of implementing collaborative filtering - a method of making automatic predictions about the interests of a user by collecting preferences or taste information from many other users, we can also achieve similar results using neural networks. A recommender system is a subclass of information filtering system that provide suggestions for items that are most pertinent to a particular user. A perceptron or a neural network is a machine learning model designed for fitting complex datasets using backpropagation and gradient descent. When coupled with advanced optimization techniques, the model may prove to be a great substitute for classical logistic classifiers. The optimizations include feature scaling, mean normalization, regularization, hyperparameter tuning and using stochastic/mini-batch gradient descent instead of regular gradient descent. In this use case, we will use the perceptron in the recommender system to fit the parameters i.e., the data from a multitude of users and use it to predict the preference/interest of a particular user. △ Less

Submitted 17 June, 2024; originally announced July 2024.

Comments: 11 pages, 7 figures

ACM Class: I.2.6; I.2.8

Journal ref: International Journal for Research in Applied Science and Engineering Technology, Volume 11, Issue II (2023) 437-447

arXiv:2406.15809 [pdf, other]

LaMSUM: A Novel Framework for Extractive Summarization of User Generated Content using LLMs

Authors: Garima Chhikara, Anurag Sharma, V. Gurucharan, Kripabandhu Ghosh, Abhijnan Chakraborty

Abstract: Large Language Models (LLMs) have demonstrated impressive performance across a wide range of NLP tasks, including summarization. Inherently LLMs produce abstractive summaries, and the task of achieving extractive summaries through LLMs still remains largely unexplored. To bridge this gap, in this work, we propose a novel framework LaMSUM to generate extractive summaries through LLMs for large user… ▽ More Large Language Models (LLMs) have demonstrated impressive performance across a wide range of NLP tasks, including summarization. Inherently LLMs produce abstractive summaries, and the task of achieving extractive summaries through LLMs still remains largely unexplored. To bridge this gap, in this work, we propose a novel framework LaMSUM to generate extractive summaries through LLMs for large user-generated text by leveraging voting algorithms. Our evaluation on three popular open-source LLMs (Llama 3, Mixtral and Gemini) reveal that the LaMSUM outperforms state-of-the-art extractive summarization methods. We further attempt to provide the rationale behind the output summary produced by LLMs. Overall, this is one of the early attempts to achieve extractive summarization for large user-generated text by utilizing LLMs, and likely to generate further interest in the community. △ Less

Submitted 22 June, 2024; originally announced June 2024.

Comments: Under review

arXiv:2406.15510 [pdf]

doi 10.22214/ijraset.2023.50714

Calculation of the Comparative Efficiency of Algorithms Using a Single Metric

Authors: Arya Chakraborty

Abstract: While time complexity and space complexity of an algorithm helps to determine its efficiency when time or space needs to be optimized respectively, they fail to determine the more efficient algorithm when time and space both need to be optimized simultaneously. This resulted in the development of the A1-Score Factor which solve the problem i.e., helps to find the algorithm which optimizes both tim… ▽ More While time complexity and space complexity of an algorithm helps to determine its efficiency when time or space needs to be optimized respectively, they fail to determine the more efficient algorithm when time and space both need to be optimized simultaneously. This resulted in the development of the A1-Score Factor which solve the problem i.e., helps to find the algorithm which optimizes both time and space simultaneously. The following research paper contains the hypothesis, the proof, the theoretical and the graphical implementation of the A1-Score Factor along with the use cases of the same. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: 7 pages, 3 figures

ACM Class: F.2.0

Journal ref: International Journal for Research in Applied Science and Engineering Technology, Volume 11, Issue IV (2023) 2549-2553

arXiv:2406.08695 [pdf, other]

Global AI Governance in Healthcare: A Cross-Jurisdictional Regulatory Analysis

Authors: Attrayee Chakraborty, Mandar Karhade

Abstract: Artificial Intelligence (AI) is being adopted across the world and promises a new revolution in healthcare. While AI-enabled medical devices in North America dominate 42.3% of the global market, the use of AI-enabled medical devices in other countries is still a story waiting to be unfolded. We aim to delve deeper into global regulatory approaches towards AI use in healthcare, with a focus on how… ▽ More Artificial Intelligence (AI) is being adopted across the world and promises a new revolution in healthcare. While AI-enabled medical devices in North America dominate 42.3% of the global market, the use of AI-enabled medical devices in other countries is still a story waiting to be unfolded. We aim to delve deeper into global regulatory approaches towards AI use in healthcare, with a focus on how common themes are emerging globally. We compare these themes to the World Health Organization's (WHO) regulatory considerations and principles on ethical use of AI for healthcare applications. Our work seeks to take a global perspective on AI policy by analyzing 14 legal jurisdictions including countries representative of various regions in the world (North America, South America, South East Asia, Middle East, Africa, Australia, and the Asia-Pacific). Our eventual goal is to foster a global conversation on the ethical use of AI in healthcare and the regulations that will guide it. We propose solutions to promote international harmonization of AI regulations and examine the requirements for regulating generative AI, using China and Singapore as examples of countries with well-developed policies in this area. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 32 pages, 8 figures, 5 tables

MSC Class: K.4.1; K.6; K.5.2; J.3

arXiv:2406.06755 [pdf, other]

Optimal Federated Learning for Nonparametric Regression with Heterogeneous Distributed Differential Privacy Constraints

Authors: T. Tony Cai, Abhinav Chakraborty, Lasse Vuursteen

Abstract: This paper studies federated learning for nonparametric regression in the context of distributed samples across different servers, each adhering to distinct differential privacy constraints. The setting we consider is heterogeneous, encompassing both varying sample sizes and differential privacy constraints across servers. Within this framework, both global and pointwise estimation are considered,… ▽ More This paper studies federated learning for nonparametric regression in the context of distributed samples across different servers, each adhering to distinct differential privacy constraints. The setting we consider is heterogeneous, encompassing both varying sample sizes and differential privacy constraints across servers. Within this framework, both global and pointwise estimation are considered, and optimal rates of convergence over the Besov spaces are established. Distributed privacy-preserving estimators are proposed and their risk properties are investigated. Matching minimax lower bounds, up to a logarithmic factor, are established for both global and pointwise estimation. Together, these findings shed light on the tradeoff between statistical accuracy and privacy preservation. In particular, we characterize the compromise not only in terms of the privacy budget but also concerning the loss incurred by distributing data within the privacy framework as a whole. This insight captures the folklore wisdom that it is easier to retain privacy in larger samples, and explores the differences between pointwise and global estimation under distributed privacy constraints. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: 49 pages total, consisting of an article (24 pages) and a supplement (25 pages)

MSC Class: 62G08; 62C20; 68P27; 62F30;

arXiv:2406.06749 [pdf, other]

Federated Nonparametric Hypothesis Testing with Differential Privacy Constraints: Optimal Rates and Adaptive Tests

Authors: T. Tony Cai, Abhinav Chakraborty, Lasse Vuursteen

Abstract: Federated learning has attracted significant recent attention due to its applicability across a wide range of settings where data is collected and analyzed across disparate locations. In this paper, we study federated nonparametric goodness-of-fit testing in the white-noise-with-drift model under distributed differential privacy (DP) constraints. We first establish matching lower and upper bound… ▽ More Federated learning has attracted significant recent attention due to its applicability across a wide range of settings where data is collected and analyzed across disparate locations. In this paper, we study federated nonparametric goodness-of-fit testing in the white-noise-with-drift model under distributed differential privacy (DP) constraints. We first establish matching lower and upper bounds, up to a logarithmic factor, on the minimax separation rate. This optimal rate serves as a benchmark for the difficulty of the testing problem, factoring in model characteristics such as the number of observations, noise level, and regularity of the signal class, along with the strictness of the $(ε,δ)$-DP requirement. The results demonstrate interesting and novel phase transition phenomena. Furthermore, the results reveal an interesting phenomenon that distributed one-shot protocols with access to shared randomness outperform those without access to shared randomness. We also construct a data-driven testing procedure that possesses the ability to adapt to an unknown regularity parameter over a large collection of function classes with minimal additional cost, all while maintaining adherence to the same set of DP constraints. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: 77 pages total; consisting of a main article (28 pages) and supplement (49 pages)

MSC Class: 62G10; 62C20; 68P27; 62F30

arXiv:2406.06034 [pdf, other]

Shesha: Multi-head Microarchitectural Leakage Discovery in new-generation Intel Processors

Authors: Anirban Chakraborty, Nimish Mishra, Debdeep Mukhopadhyay

Abstract: Transient execution attacks have been one of the widely explored microarchitectural side channels since the discovery of Spectre and Meltdown. However, much of the research has been driven by manual discovery of new transient paths through well-known speculative events. Although a few attempts exist in literature on automating transient leakage discovery, such tools focus on finding variants of kn… ▽ More Transient execution attacks have been one of the widely explored microarchitectural side channels since the discovery of Spectre and Meltdown. However, much of the research has been driven by manual discovery of new transient paths through well-known speculative events. Although a few attempts exist in literature on automating transient leakage discovery, such tools focus on finding variants of known transient attacks and explore a small subset of instruction set. Further, they take a random fuzzing approach that does not scale as the complexity of search space increases. In this work, we identify that the search space of bad speculation is disjointedly fragmented into equivalence classes, and then use this observation to develop a framework named Shesha, inspired by Particle Swarm Optimization, which exhibits faster convergence rates than state-of-the-art fuzzing techniques for automatic discovery of transient execution attacks. We then use Shesha to explore the vast search space of extensions to the x86 Instruction Set Architecture (ISAs), thereby focusing on previously unexplored avenues of bad speculation. As such, we report five previously unreported transient execution paths in Instruction Set Extensions (ISEs) on new generation of Intel processors. We then perform extensive reverse engineering of each of the transient execution paths and provide root-cause analysis. Using the discovered transient execution paths, we develop attack building blocks to exhibit exploitable transient windows. Finally, we demonstrate data leakage from Fused Multiply-Add instructions through SIMD buffer and extract victim data from various cryptographic implementations. △ Less

Submitted 14 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

Comments: USENIX Security Symposium, 2024

arXiv:2406.04360 [pdf, other]

Size biased Multinomial Modelling of detection data in Software testing

Authors: Pallabi Ghosh, Ashis Kr. Chakraborty, Soumen Dey

Abstract: Estimation of software reliability often poses a considerable challenge, particularly for critical softwares. Several methods of estimation of reliability of software are already available in the literature. But, so far almost nobody used the concept of size of a bug for estimating software reliability. In this article we make used of the bug size or the eventual bug size which helps us to determi… ▽ More Estimation of software reliability often poses a considerable challenge, particularly for critical softwares. Several methods of estimation of reliability of software are already available in the literature. But, so far almost nobody used the concept of size of a bug for estimating software reliability. In this article we make used of the bug size or the eventual bug size which helps us to determine reliability of software more precisely. The size-biased model developed here can also be used for similar fields like hydrocarbon exploration. The model has been validated through simulation and subsequently used for a critical space application software testing data. The estimated results match the actual observations to a large extent. △ Less

Submitted 24 May, 2024; originally announced June 2024.

Comments: Submitted to OPSEARCH

arXiv:2406.02794 [pdf, other]

PriME: Privacy-aware Membership profile Estimation in networks

Authors: Abhinav Chakraborty, Sayak Chatterjee, Sagnik Nandy

Abstract: This paper presents a novel approach to estimating community membership probabilities for network vertices generated by the Degree Corrected Mixed Membership Stochastic Block Model while preserving individual edge privacy. Operating within the $\varepsilon$-edge local differential privacy framework, we introduce an optimal private algorithm based on a symmetric edge flip mechanism and spectral clu… ▽ More This paper presents a novel approach to estimating community membership probabilities for network vertices generated by the Degree Corrected Mixed Membership Stochastic Block Model while preserving individual edge privacy. Operating within the $\varepsilon$-edge local differential privacy framework, we introduce an optimal private algorithm based on a symmetric edge flip mechanism and spectral clustering for accurate estimation of vertex community memberships. We conduct a comprehensive analysis of the estimation risk and establish the optimality of our procedure by providing matching lower bounds to the minimax risk under privacy constraints. To validate our approach, we demonstrate its performance through numerical simulations and its practical application to real-world data. This work represents a significant step forward in balancing accurate community membership estimation with stringent privacy preservation in network data analysis. △ Less

Submitted 4 June, 2024; originally announced June 2024.

arXiv:2405.20935 [pdf, other]

Effective Interplay between Sparsity and Quantization: From Theory to Practice

Authors: Simla Burcu Harma, Ayan Chakraborty, Elizaveta Kostenok, Danila Mishin, Dongho Ha, Babak Falsafi, Martin Jaggi, Ming Liu, Yunho Oh, Suvinay Subramanian, Amir Yazdanbakhsh

Abstract: The increasing size of deep neural networks necessitates effective model compression to improve computational efficiency and reduce their memory footprint. Sparsity and quantization are two prominent compression methods that have individually demonstrated significant reduction in computational and memory footprints while preserving model accuracy. While effective, the interplay between these two m… ▽ More The increasing size of deep neural networks necessitates effective model compression to improve computational efficiency and reduce their memory footprint. Sparsity and quantization are two prominent compression methods that have individually demonstrated significant reduction in computational and memory footprints while preserving model accuracy. While effective, the interplay between these two methods remains an open question. In this paper, we investigate the interaction between these two methods and assess whether their combination impacts final model accuracy. We mathematically prove that applying sparsity before quantization is the optimal sequence for these operations, minimizing error in computation. Our empirical studies across a wide range of models, including OPT and Llama model families (125M-8B) and ViT corroborate these theoretical findings. In addition, through rigorous analysis, we demonstrate that sparsity and quantization are not orthogonal; their interaction can significantly harm model accuracy, with quantization error playing a dominant role in this degradation. Our findings extend to the efficient deployment of large models in resource-limited compute platforms and reduce serving cost, offering insights into best practices for applying these compression methods to maximize efficacy without compromising accuracy. △ Less

Submitted 31 May, 2024; originally announced May 2024.

arXiv:2405.19561 [pdf, other]

Quo Vadis ChatGPT? From Large Language Models to Large Knowledge Models

Authors: Venkat Venkatasubramanian, Arijit Chakraborty

Abstract: The startling success of ChatGPT and other large language models (LLMs) using transformer-based generative neural network architecture in applications such as natural language processing and image synthesis has many researchers excited about potential opportunities in process systems engineering (PSE). The almost human-like performance of LLMs in these areas is indeed very impressive, surprising,… ▽ More The startling success of ChatGPT and other large language models (LLMs) using transformer-based generative neural network architecture in applications such as natural language processing and image synthesis has many researchers excited about potential opportunities in process systems engineering (PSE). The almost human-like performance of LLMs in these areas is indeed very impressive, surprising, and a major breakthrough. Their capabilities are very useful in certain tasks, such as writing first drafts of documents, code writing assistance, text summarization, etc. However, their success is limited in highly scientific domains as they cannot yet reason, plan, or explain due to their lack of in-depth domain knowledge. This is a problem in domains such as chemical engineering as they are governed by fundamental laws of physics and chemistry (and biology), constitutive relations, and highly technical knowledge about materials, processes, and systems. Although purely data-driven machine learning has its immediate uses, the long-term success of AI in scientific and engineering domains would depend on developing hybrid AI systems that use first principles and technical knowledge effectively. We call these hybrid AI systems Large Knowledge Models (LKMs), as they will not be limited to only NLP-based techniques or NLP-like applications. In this paper, we discuss the challenges and opportunities in developing such systems in chemical engineering. △ Less

Submitted 29 May, 2024; originally announced May 2024.

ACM Class: I.2.0; I.2.7

arXiv:2405.14707 [pdf]

Artificial Intelligence (AI) in Legal Data Mining

Authors: Aniket Deroy, Naksatra Kumar Bailung, Kripabandhu Ghosh, Saptarshi Ghosh, Abhijnan Chakraborty

Abstract: Despite the availability of vast amounts of data, legal data is often unstructured, making it difficult even for law practitioners to ingest and comprehend the same. It is important to organise the legal information in a way that is useful for practitioners and downstream automation tasks. The word ontology was used by Greek philosophers to discuss concepts of existence, being, becoming and realit… ▽ More Despite the availability of vast amounts of data, legal data is often unstructured, making it difficult even for law practitioners to ingest and comprehend the same. It is important to organise the legal information in a way that is useful for practitioners and downstream automation tasks. The word ontology was used by Greek philosophers to discuss concepts of existence, being, becoming and reality. Today, scientists use this term to describe the relation between concepts, data, and entities. A great example for a working ontology was developed by Dhani and Bhatt. This ontology deals with Indian court cases on intellectual property rights (IPR) The future of legal ontologies is likely to be handled by computer experts and legal experts alike. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: Book name-Technology and Analytics for Law and Justice, Page no-273-297, Chapter no-14

arXiv:2405.07828 [pdf, other]

Can LLMs Help Predict Elections? (Counter)Evidence from the World's Largest Democracy

Authors: Pratik Gujral, Kshitij Awaldhi, Navya Jain, Bhavuk Bhandula, Abhijnan Chakraborty

Abstract: The study of how social media affects the formation of public opinion and its influence on political results has been a popular field of inquiry. However, current approaches frequently offer a limited comprehension of the complex political phenomena, yielding inconsistent outcomes. In this work, we introduce a new method: harnessing the capabilities of Large Language Models (LLMs) to examine socia… ▽ More The study of how social media affects the formation of public opinion and its influence on political results has been a popular field of inquiry. However, current approaches frequently offer a limited comprehension of the complex political phenomena, yielding inconsistent outcomes. In this work, we introduce a new method: harnessing the capabilities of Large Language Models (LLMs) to examine social media data and forecast election outcomes. Our research diverges from traditional methodologies in two crucial respects. First, we utilize the sophisticated capabilities of foundational LLMs, which can comprehend the complex linguistic subtleties and contextual details present in social media data. Second, we focus on data from X (Twitter) in India to predict state assembly election outcomes. Our method entails sentiment analysis of election-related tweets through LLMs to forecast the actual election results, and we demonstrate the superiority of our LLM-based method against more traditional exit and opinion polls. Overall, our research offers valuable insights into the unique dynamics of Indian politics and the remarkable impact of social media in molding public attitudes within this context. △ Less

Submitted 13 May, 2024; originally announced May 2024.

arXiv:2404.19260 [pdf, ps, other]

Aspect and Opinion Term Extraction Using Graph Attention Network

Authors: Abir Chakraborty

Abstract: In this work we investigate the capability of Graph Attention Network for extracting aspect and opinion terms. Aspect and opinion term extraction is posed as a token-level classification task akin to named entity recognition. We use the dependency tree of the input query as additional feature in a Graph Attention Network along with the token and part-of-speech features. We show that the dependency… ▽ More In this work we investigate the capability of Graph Attention Network for extracting aspect and opinion terms. Aspect and opinion term extraction is posed as a token-level classification task akin to named entity recognition. We use the dependency tree of the input query as additional feature in a Graph Attention Network along with the token and part-of-speech features. We show that the dependency structure is a powerful feature that in the presence of a CRF layer substantially improves the performance and generates the best result on the commonly used datasets from SemEval 2014, 2015 and 2016. We experiment with additional layers like BiLSTM and Transformer in addition to the CRF layer. We also show that our approach works well in the presence of multiple aspects or sentiments in the same query and it is not necessary to modify the dependency tree based on a single aspect as was the original application for sentiment classification. △ Less

Submitted 30 April, 2024; originally announced April 2024.

arXiv:2404.19234 [pdf, other]

Multi-hop Question Answering over Knowledge Graphs using Large Language Models

Authors: Abir Chakraborty

Abstract: Knowledge graphs (KGs) are large datasets with specific structures representing large knowledge bases (KB) where each node represents a key entity and relations amongst them are typed edges. Natural language queries formed to extract information from a KB entail starting from specific nodes and reasoning over multiple edges of the corresponding KG to arrive at the correct set of answer nodes. Trad… ▽ More Knowledge graphs (KGs) are large datasets with specific structures representing large knowledge bases (KB) where each node represents a key entity and relations amongst them are typed edges. Natural language queries formed to extract information from a KB entail starting from specific nodes and reasoning over multiple edges of the corresponding KG to arrive at the correct set of answer nodes. Traditional approaches of question answering on KG are based on (a) semantic parsing (SP), where a logical form (e.g., S-expression, SPARQL query, etc.) is generated using node and edge embeddings and then reasoning over these representations or tuning language models to generate the final answer directly, or (b) information-retrieval based that works by extracting entities and relations sequentially. In this work, we evaluate the capability of (LLMs) to answer questions over KG that involve multiple hops. We show that depending upon the size and nature of the KG we need different approaches to extract and feed the relevant information to an LLM since every LLM comes with a fixed context window. We evaluate our approach on six KGs with and without the availability of example-specific sub-graphs and show that both the IR and SP-based methods can be adopted by LLMs resulting in an extremely competitive performance. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.11949 [pdf, other]

Sketch-guided Image Inpainting with Partial Discrete Diffusion Process

Authors: Nakul Sharma, Aditay Tripathi, Anirban Chakraborty, Anand Mishra

Abstract: In this work, we study the task of sketch-guided image inpainting. Unlike the well-explored natural language-guided image inpainting, which excels in capturing semantic details, the relatively less-studied sketch-guided inpainting offers greater user control in specifying the object's shape and pose to be inpainted. As one of the early solutions to this task, we introduce a novel partial discrete… ▽ More In this work, we study the task of sketch-guided image inpainting. Unlike the well-explored natural language-guided image inpainting, which excels in capturing semantic details, the relatively less-studied sketch-guided inpainting offers greater user control in specifying the object's shape and pose to be inpainted. As one of the early solutions to this task, we introduce a novel partial discrete diffusion process (PDDP). The forward pass of the PDDP corrupts the masked regions of the image and the backward pass reconstructs these masked regions conditioned on hand-drawn sketches using our proposed sketch-guided bi-directional transformer. The proposed novel transformer module accepts two inputs -- the image containing the masked region to be inpainted and the query sketch to model the reverse diffusion process. This strategy effectively addresses the domain gap between sketches and natural images, thereby, enhancing the quality of inpainting results. In the absence of a large-scale dataset specific to this task, we synthesize a dataset from the MS-COCO to train and extensively evaluate our proposed framework against various competent approaches in the literature. The qualitative and quantitative results and user studies establish that the proposed method inpaints realistic objects that fit the context in terms of the visual appearance of the provided sketch. To aid further research, we have made our code publicly available at https://github.com/vl2g/Sketch-Inpainting . △ Less

Submitted 18 April, 2024; originally announced April 2024.

Comments: Accepted to NTIRE Workshop @ CVPR 2024

arXiv:2404.08893 [pdf, other]

Early detection of disease outbreaks and non-outbreaks using incidence data

Authors: Shan Gao, Amit K. Chakraborty, Russell Greiner, Mark A. Lewis, Hao Wang

Abstract: Forecasting the occurrence and absence of novel disease outbreaks is essential for disease management. Here, we develop a general model, with no real-world training data, that accurately forecasts outbreaks and non-outbreaks. We propose a novel framework, using a feature-based time series classification method to forecast outbreaks and non-outbreaks. We tested our methods on synthetic data from a… ▽ More Forecasting the occurrence and absence of novel disease outbreaks is essential for disease management. Here, we develop a general model, with no real-world training data, that accurately forecasts outbreaks and non-outbreaks. We propose a novel framework, using a feature-based time series classification method to forecast outbreaks and non-outbreaks. We tested our methods on synthetic data from a Susceptible-Infected-Recovered model for slowly changing, noisy disease dynamics. Outbreak sequences give a transcritical bifurcation within a specified future time window, whereas non-outbreak (null bifurcation) sequences do not. We identified incipient differences in time series of infectives leading to future outbreaks and non-outbreaks. These differences are reflected in 22 statistical features and 5 early warning signal indicators. Classifier performance, given by the area under the receiver-operating curve, ranged from 0.99 for large expanding windows of training data to 0.7 for small rolling windows. Real-world performances of classifiers were tested on two empirical datasets, COVID-19 data from Singapore and SARS data from Hong Kong, with two classifiers exhibiting high accuracy. In summary, we showed that there are statistical features that distinguish outbreak and non-outbreak sequences long before outbreaks occur. We could detect these differences in synthetic and real-world data sets, well before potential outbreaks occur. △ Less

Submitted 12 April, 2024; originally announced April 2024.

arXiv:2403.18623 [pdf, other]

Antitrust, Amazon, and Algorithmic Auditing

Authors: Abhisek Dash, Abhijnan Chakraborty, Saptarshi Ghosh, Animesh Mukherjee, Jens Frankenreiter, Stefan Bechtold, Krishna P. Gummadi

Abstract: In digital markets, antitrust law and special regulations aim to ensure that markets remain competitive despite the dominating role that digital platforms play today in everyone's life. Unlike traditional markets, market participant behavior is easily observable in these markets. We present a series of empirical investigations into the extent to which Amazon engages in practices that are typically… ▽ More In digital markets, antitrust law and special regulations aim to ensure that markets remain competitive despite the dominating role that digital platforms play today in everyone's life. Unlike traditional markets, market participant behavior is easily observable in these markets. We present a series of empirical investigations into the extent to which Amazon engages in practices that are typically described as self-preferencing. We discuss how the computer science tools used in this paper can be used in a regulatory environment that is based on algorithmic auditing and requires regulating digital markets at scale. △ Less

Submitted 25 April, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

Comments: The paper has been accepted to appear at Journal of Institutional and Theoretical Economics (JITE) 2024

arXiv:2403.16233 [pdf, other]

An early warning indicator trained on stochastic disease-spreading models with different noises

Authors: Amit K. Chakraborty, Shan Gao, Reza Miry, Pouria Ramazi, Russell Greiner, Mark A. Lewis, Hao Wang

Abstract: The timely detection of disease outbreaks through reliable early warning signals (EWSs) is indispensable for effective public health mitigation strategies. Nevertheless, the intricate dynamics of real-world disease spread, often influenced by diverse sources of noise and limited data in the early stages of outbreaks, pose a significant challenge in developing reliable EWSs, as the performance of e… ▽ More The timely detection of disease outbreaks through reliable early warning signals (EWSs) is indispensable for effective public health mitigation strategies. Nevertheless, the intricate dynamics of real-world disease spread, often influenced by diverse sources of noise and limited data in the early stages of outbreaks, pose a significant challenge in developing reliable EWSs, as the performance of existing indicators varies with extrinsic and intrinsic noises. Here, we address the challenge of modeling disease when the measurements are corrupted by additive white noise, multiplicative environmental noise, and demographic noise into a standard epidemic mathematical model. To navigate the complexities introduced by these noise sources, we employ a deep learning algorithm that provides EWS in infectious disease outbreak by training on noise-induced disease-spreading models. The indicator's effectiveness is demonstrated through its application to real-world COVID-19 cases in Edmonton and simulated time series derived from diverse disease spread models affected by noise. Notably, the indicator captures an impending transition in a time series of disease outbreaks and outperforms existing indicators. This study contributes to advancing early warning capabilities by addressing the intricate dynamics inherent in real-world disease spread, presenting a promising avenue for enhancing public health preparedness and response efforts. △ Less

Submitted 24 March, 2024; originally announced March 2024.

arXiv:2403.08053 [pdf, other]

Generating Clarification Questions for Disambiguating Contracts

Authors: Anmol Singhal, Chirag Jain, Preethu Rose Anish, Arkajyoti Chakraborty, Smita Ghaisas

Abstract: Enterprises frequently enter into commercial contracts that can serve as vital sources of project-specific requirements. Contractual clauses are obligatory, and the requirements derived from contracts can detail the downstream implementation activities that non-legal stakeholders, including requirement analysts, engineers, and delivery personnel, need to conduct. However, comprehending contracts i… ▽ More Enterprises frequently enter into commercial contracts that can serve as vital sources of project-specific requirements. Contractual clauses are obligatory, and the requirements derived from contracts can detail the downstream implementation activities that non-legal stakeholders, including requirement analysts, engineers, and delivery personnel, need to conduct. However, comprehending contracts is cognitively demanding and error-prone for such stakeholders due to the extensive use of Legalese and the inherent complexity of contract language. Furthermore, contracts often contain ambiguously worded clauses to ensure comprehensive coverage. In contrast, non-legal stakeholders require a detailed and unambiguous comprehension of contractual clauses to craft actionable requirements. In this work, we introduce a novel legal NLP task that involves generating clarification questions for contracts. These questions aim to identify contract ambiguities on a document level, thereby assisting non-legal stakeholders in obtaining the necessary details for eliciting requirements. This task is challenged by three core issues: (1) data availability, (2) the length and unstructured nature of contracts, and (3) the complexity of legal text. To address these issues, we propose ConRAP, a retrieval-augmented prompting framework for generating clarification questions to disambiguate contractual text. Experiments conducted on contracts sourced from the publicly available CUAD dataset show that ConRAP with ChatGPT can detect ambiguities with an F2 score of 0.87. 70% of the generated clarification questions are deemed useful by human evaluators. △ Less

Submitted 12 March, 2024; originally announced March 2024.

Comments: 9 pages, 3 figures, accepted to LREC-COLING 2024

arXiv:2402.18502 [pdf, other]

Few-Shot Fairness: Unveiling LLM's Potential for Fairness-Aware Classification

Authors: Garima Chhikara, Anurag Sharma, Kripabandhu Ghosh, Abhijnan Chakraborty

Abstract: Employing Large Language Models (LLM) in various downstream applications such as classification is crucial, especially for smaller companies lacking the expertise and resources required for fine-tuning a model. Fairness in LLMs helps ensure inclusivity, equal representation based on factors such as race, gender and promotes responsible AI deployment. As the use of LLMs has become increasingly prev… ▽ More Employing Large Language Models (LLM) in various downstream applications such as classification is crucial, especially for smaller companies lacking the expertise and resources required for fine-tuning a model. Fairness in LLMs helps ensure inclusivity, equal representation based on factors such as race, gender and promotes responsible AI deployment. As the use of LLMs has become increasingly prevalent, it is essential to assess whether LLMs can generate fair outcomes when subjected to considerations of fairness. In this study, we introduce a framework outlining fairness regulations aligned with various fairness definitions, with each definition being modulated by varying degrees of abstraction. We explore the configuration for in-context learning and the procedure for selecting in-context demonstrations using RAG, while incorporating fairness rules into the process. Experiments conducted with different LLMs indicate that GPT-4 delivers superior results in terms of both accuracy and fairness compared to other models. This work is one of the early attempts to achieve fairness in prediction tasks by utilizing LLMs through in-context learning. △ Less

Submitted 28 February, 2024; originally announced February 2024.

Comments: Under review

arXiv:2402.12759 [pdf, other]

doi 10.1145/3543507.3583398

Towards Fair Allocation in Social Commerce Platforms

Authors: Anjali Gupta, Shreyans J. Nagori, Abhijnan Chakraborty, Rohit Vaish, Sayan Ranu, Prajit Prashant Nadkarni, Narendra Varma Dasararaju, Muthusamy Chelliah

Abstract: Social commerce platforms are emerging businesses where producers sell products through re-sellers who advertise the products to other customers in their social network. Due to the increasing popularity of this business model, thousands of small producers and re-sellers are starting to depend on these platforms for their livelihood; thus, it is important to provide fair earning opportunities to th… ▽ More Social commerce platforms are emerging businesses where producers sell products through re-sellers who advertise the products to other customers in their social network. Due to the increasing popularity of this business model, thousands of small producers and re-sellers are starting to depend on these platforms for their livelihood; thus, it is important to provide fair earning opportunities to them. The enormous product space in such platforms prohibits manual search, and motivates the need for recommendation algorithms to effectively allocate product exposure and, consequently, earning opportunities. In this work, we focus on the fairness of such allocations in social commerce platforms and formulate the problem of assigning products to re-sellers as a fair division problem with indivisible items under two-sided cardinality constraints, wherein each product must be given to at least a certain number of re-sellers and each re-seller must get a certain number of products. Our work systematically explores various well-studied benchmarks of fairness -- including Nash social welfare, envy-freeness up to one item (EF1), and equitability up to one item (EQ1) -- from both theoretical and experimental perspectives. We find that the existential and computational guarantees of these concepts known from the unconstrained setting do not extend to our constrained model. To address this limitation, we develop a mixed-integer linear program and other scalable heuristics that provide near-optimal approximation of Nash social welfare in simulated and real social commerce datasets. Overall, our work takes the first step towards achieving provable fairness alongside reasonable revenue guarantees on social commerce platforms. △ Less

Submitted 20 February, 2024; originally announced February 2024.

arXiv:2402.01874 [pdf, other]

The RL/LLM Taxonomy Tree: Reviewing Synergies Between Reinforcement Learning and Large Language Models

Authors: Moschoula Pternea, Prerna Singh, Abir Chakraborty, Yagna Oruganti, Mirco Milletari, Sayli Bapat, Kebei Jiang

Abstract: In this work, we review research studies that combine Reinforcement Learning (RL) and Large Language Models (LLMs), two areas that owe their momentum to the development of deep neural networks. We propose a novel taxonomy of three main classes based on the way that the two model types interact with each other. The first class, RL4LLM, includes studies where RL is leveraged to improve the performan… ▽ More In this work, we review research studies that combine Reinforcement Learning (RL) and Large Language Models (LLMs), two areas that owe their momentum to the development of deep neural networks. We propose a novel taxonomy of three main classes based on the way that the two model types interact with each other. The first class, RL4LLM, includes studies where RL is leveraged to improve the performance of LLMs on tasks related to Natural Language Processing. L4LLM is divided into two sub-categories depending on whether RL is used to directly fine-tune an existing LLM or to improve the prompt of the LLM. In the second class, LLM4RL, an LLM assists the training of an RL model that performs a task that is not inherently related to natural language. We further break down LLM4RL based on the component of the RL training framework that the LLM assists or replaces, namely reward shaping, goal generation, and policy function. Finally, in the third class, RL+LLM, an LLM and an RL agent are embedded in a common planning framework without either of them contributing to training or fine-tuning of the other. We further branch this class to distinguish between studies with and without natural language feedback. We use this taxonomy to explore the motivations behind the synergy of LLMs and RL and explain the reasons for its success, while pinpointing potential shortcomings and areas where further research is needed, as well as alternative methodologies that serve the same goal. △ Less

Submitted 2 February, 2024; originally announced February 2024.

Comments: 30 pages (including bibliography), 1 figure, 7 tables

arXiv:2401.16596 [pdf, other]

PrIsing: Privacy-Preserving Peer Effect Estimation via Ising Model

Authors: Abhinav Chakraborty, Anirban Chatterjee, Abhinandan Dalal

Abstract: The Ising model, originally developed as a spin-glass model for ferromagnetic elements, has gained popularity as a network-based model for capturing dependencies in agents' outputs. Its increasing adoption in healthcare and the social sciences has raised privacy concerns regarding the confidentiality of agents' responses. In this paper, we present a novel $(\varepsilon,δ)$-differentially private a… ▽ More The Ising model, originally developed as a spin-glass model for ferromagnetic elements, has gained popularity as a network-based model for capturing dependencies in agents' outputs. Its increasing adoption in healthcare and the social sciences has raised privacy concerns regarding the confidentiality of agents' responses. In this paper, we present a novel $(\varepsilon,δ)$-differentially private algorithm specifically designed to protect the privacy of individual agents' outcomes. Our algorithm allows for precise estimation of the natural parameter using a single network through an objective perturbation technique. Furthermore, we establish regret bounds for this algorithm and assess its performance on synthetic datasets and two real-world networks: one involving HIV status in a social network and the other concerning the political leaning of online blogs. △ Less

Submitted 29 January, 2024; originally announced January 2024.

Comments: To Appear in AISTATS 2024

arXiv:2401.15502 [pdf, other]

Differentially private Bayesian tests

Authors: Abhisek Chakraborty, Saptati Datta

Abstract: Differential privacy has emerged as an significant cornerstone in the realm of scientific hypothesis testing utilizing confidential data. In reporting scientific discoveries, Bayesian tests are widely adopted since they effectively circumnavigate the key criticisms of P-values, namely, lack of interpretability and inability to quantify evidence in support of the competing hypotheses. We present a… ▽ More Differential privacy has emerged as an significant cornerstone in the realm of scientific hypothesis testing utilizing confidential data. In reporting scientific discoveries, Bayesian tests are widely adopted since they effectively circumnavigate the key criticisms of P-values, namely, lack of interpretability and inability to quantify evidence in support of the competing hypotheses. We present a novel differentially private Bayesian hypotheses testing framework that arise naturally under a principled data generative mechanism, inherently maintaining the interpretability of the resulting inferences. Furthermore, by focusing on differentially private Bayes factors based on widely used test statistics, we circumvent the need to model the complete data generative mechanism and ensure substantial computational benefits. We also provide a set of sufficient conditions to establish results on Bayes factor consistency under the proposed framework. The utility of the devised technology is showcased via several numerical experiments. △ Less

Submitted 1 May, 2024; v1 submitted 27 January, 2024; originally announced January 2024.

arXiv:2312.11280 [pdf, other]

Towards Fairness in Online Service with k Servers and its Application on Fair Food Delivery

Authors: Daman Deep Singh, Amit Kumar, Abhijnan Chakraborty

Abstract: The k-SERVER problem is one of the most prominent problems in online algorithms with several variants and extensions. However, simplifying assumptions like instantaneous server movements and zero service time has hitherto limited its applicability to real-world problems. In this paper, we introduce a realistic generalization of k-SERVER without such assumptions - the k-FOOD problem, where requests… ▽ More The k-SERVER problem is one of the most prominent problems in online algorithms with several variants and extensions. However, simplifying assumptions like instantaneous server movements and zero service time has hitherto limited its applicability to real-world problems. In this paper, we introduce a realistic generalization of k-SERVER without such assumptions - the k-FOOD problem, where requests with source-destination locations and an associated pickup time window arrive in an online fashion, and each has to be served by exactly one of the available k servers. The k-FOOD problem offers the versatility to model a variety of real-world use cases such as food delivery, ride sharing, and quick commerce. Moreover, motivated by the need for fairness in online platforms, we introduce the FAIR k-FOOD problem with the max-min objective. We establish that both k-FOOD and FAIR k-FOOD problems are strongly NP-hard and develop an optimal offline algorithm that arises naturally from a time-expanded flow network. Subsequently, we propose an online algorithm DOC4FOOD involving virtual movements of servers to the nearest request location. Experiments on a real-world food-delivery dataset, alongside synthetic datasets, establish the efficacy of the proposed algorithm against state-of-the-art fair food delivery algorithms. △ Less

Submitted 18 December, 2023; originally announced December 2023.

Comments: AAAI 2024 Conference

arXiv:2311.02328 [pdf, other]

An Operator Learning Framework for Spatiotemporal Super-resolution of Scientific Simulations

Authors: Valentin Duruisseaux, Amit Chakraborty

Abstract: In numerous contexts, high-resolution solutions to partial differential equations are required to capture faithfully essential dynamics which occur at small spatiotemporal scales, but these solutions can be very difficult and slow to obtain using traditional methods due to limited computational resources. A recent direction to circumvent these computational limitations is to use machine learning t… ▽ More In numerous contexts, high-resolution solutions to partial differential equations are required to capture faithfully essential dynamics which occur at small spatiotemporal scales, but these solutions can be very difficult and slow to obtain using traditional methods due to limited computational resources. A recent direction to circumvent these computational limitations is to use machine learning techniques for super-resolution, to reconstruct high-resolution numerical solutions from low-resolution simulations which can be obtained more efficiently. The proposed approach, the Super Resolution Operator Network (SROpNet), frames super-resolution as an operator learning problem and draws inspiration from existing architectures to learn continuous representations of solutions to parametric differential equations from low-resolution approximations, which can then be evaluated at any desired location. In addition, no restrictions are imposed on the locations of (the fixed number of) spatiotemporal sensors at which the low-resolution approximations are provided, thereby enabling the consideration of a broader spectrum of problems arising in practice, for which many existing super-resolution approaches are not well-suited. △ Less

Submitted 6 April, 2024; v1 submitted 4 November, 2023; originally announced November 2023.

Comments: 31 pages

arXiv:2310.19692 [pdf]

Elimination of Static Hazards in Asynchronous Sequential Circuits using Quantum dot Cellular Automata

Authors: Angshuman Khan, Chiradeep Mukherjee, Ankan Kumar Chakraborty, Ratna Chakrabarty, Debashis De

Abstract: There is nowhere else in emerging technology, but in Quantum-dot Cellular Automata, one can find high speed, low power operation, and high packaging density, which deals with electrostatic interaction between electrons within a cell. Literature survey lacks in hazards free design of QCA circuit. Hazards create ambiguous and unpredictable output, which can be avoided. This work considers both hazar… ▽ More There is nowhere else in emerging technology, but in Quantum-dot Cellular Automata, one can find high speed, low power operation, and high packaging density, which deals with electrostatic interaction between electrons within a cell. Literature survey lacks in hazards free design of QCA circuit. Hazards create ambiguous and unpredictable output, which can be avoided. This work considers both hazards and hazards-free asynchronous sequential circuits; both are compared in terms of kink energy, and a better one has been proposed. The circuit simulation has been verified in the QCADesigner tool. △ Less

Submitted 30 October, 2023; originally announced October 2023.

Comments: In Proc. 2015 2nd International Conference on Microelectronics, Circuits and Systems (Micro 2015), Kolkata, India, 2015, vol. II, pp. 140-145

arXiv:2310.12447 [pdf, other]

Constrained Reweighting of Distributions: an Optimal Transport Approach

Authors: Abhisek Chakraborty, Anirban Bhattacharya, Debdeep Pati

Abstract: We commonly encounter the problem of identifying an optimally weight adjusted version of the empirical distribution of observed data, adhering to predefined constraints on the weights. Such constraints often manifest as restrictions on the moments, tail behaviour, shapes, number of modes, etc., of the resulting weight adjusted empirical distribution. In this article, we substantially enhance the f… ▽ More We commonly encounter the problem of identifying an optimally weight adjusted version of the empirical distribution of observed data, adhering to predefined constraints on the weights. Such constraints often manifest as restrictions on the moments, tail behaviour, shapes, number of modes, etc., of the resulting weight adjusted empirical distribution. In this article, we substantially enhance the flexibility of such methodology by introducing a nonparametrically imbued distributional constraints on the weights, and developing a general framework leveraging the maximum entropy principle and tools from optimal transport. The key idea is to ensure that the maximum entropy weight adjusted empirical distribution of the observed data is close to a pre-specified probability distribution in terms of the optimal transport metric while allowing for subtle departures. The versatility of the framework is demonstrated in the context of three disparate applications where data re-weighting is warranted to satisfy side constraints on the optimization problem at the heart of the statistical task: namely, portfolio allocation, semi-parametric inference for complex surveys, and ensuring algorithmic fairness in machine learning algorithms. △ Less

Submitted 16 January, 2024; v1 submitted 18 October, 2023; originally announced October 2023.

Comments: arXiv admin note: text overlap with arXiv:2303.10085

arXiv:2310.10862 [pdf, other]

The Invisible Map: Visual-Inertial SLAM with Fiducial Markers for Smartphone-based Indoor Navigation

Authors: Paul Ruvolo, Ayush Chakraborty, Rucha Dave, Richard Li, Duncan Mazza, Xierui Shen, Raiyan Siddique, Krishna Suresh

Abstract: We present a system for creating building-scale, easily navigable 3D maps using mainstream smartphones. In our approach, we formulate the 3D-mapping problem as an instance of Graph SLAM and infer the position of both building landmarks (fiducial markers) and navigable paths through the environment (phone poses). Our results demonstrate the system's ability to create accurate 3D maps. Further, we h… ▽ More We present a system for creating building-scale, easily navigable 3D maps using mainstream smartphones. In our approach, we formulate the 3D-mapping problem as an instance of Graph SLAM and infer the position of both building landmarks (fiducial markers) and navigable paths through the environment (phone poses). Our results demonstrate the system's ability to create accurate 3D maps. Further, we highlight the importance of careful selection of mapping hyperparameters and provide a novel technique for tuning these hyperparameters to adapt our algorithm to new environments. △ Less

Submitted 16 October, 2023; originally announced October 2023.

arXiv:2310.09813 [pdf, other]

Parking Problem by Oblivious Mobile Robots in Infinite Grids

Authors: Abhinav Chakraborty, Krishnendu Mukhopadhyaya

Abstract: In this paper, the parking problem of a swarm of mobile robots has been studied. The robots are deployed at the nodes of an infinite grid, which has a subset of prefixed nodes marked as parking nodes. Each parking node p_i has a capacity of k_i which is given as input and equals the maximum number of robots a parking node can accommodate. As a solution to the parking problem, robots need to partit… ▽ More In this paper, the parking problem of a swarm of mobile robots has been studied. The robots are deployed at the nodes of an infinite grid, which has a subset of prefixed nodes marked as parking nodes. Each parking node p_i has a capacity of k_i which is given as input and equals the maximum number of robots a parking node can accommodate. As a solution to the parking problem, robots need to partition themselves into groups so that each parking node contains a number of robots that are equal to the capacity of the node in the final configuration. It is assumed that the number of robots in the initial configuration represents the sum of the capacities of the parking nodes. The robots are assumed to be autonomous, anonymous, homogeneous, identical and oblivious. They operate under an asynchronous scheduler. They neither have any agreement on the coordinate axes nor do they agree on a common chirality. All the initial configurations for which the problem is unsolvable have been identified. A deterministic distributed algorithm has been proposed for the remaining configurations, ensuring the solvability of the problem. △ Less

Submitted 15 October, 2023; originally announced October 2023.

arXiv:2310.05172 [pdf, other]

On the Amplification of Cache Occupancy Attacks in Randomized Cache Architectures

Authors: Anirban Chakraborty, Nimish Mishra, Sayandeep Saha, Sarani Bhattacharya, Debdeep Mukhopadhyay

Abstract: In this work, we explore the applicability of cache occupancy attacks and the implications of secured cache design rationales on such attacks. In particular, we show that one of the well-known cache randomization schemes, MIRAGE, touted to be resilient against eviction-based attacks, amplifies the chances of cache occupancy attack, making it more vulnerable compared to contemporary designs. We lev… ▽ More In this work, we explore the applicability of cache occupancy attacks and the implications of secured cache design rationales on such attacks. In particular, we show that one of the well-known cache randomization schemes, MIRAGE, touted to be resilient against eviction-based attacks, amplifies the chances of cache occupancy attack, making it more vulnerable compared to contemporary designs. We leverage MIRAGE's global eviction property to demonstrate covert channel with byte-level granularity, with far less cache occupancy requirement (just $10\%$ of LLC) than other schemes. For instance, ScatterCache (a randomisation scheme with lesser security guarantees than MIRAGE) and generic set-associative caches require $40\%$ and $30\%$ cache occupancy, respectively, to exhibit covert communication. Furthermore, we extend our attack vectors to include side-channel, template-based fingerprinting of workloads in a cross-core setting. We demonstrate the potency of such fingerprinting on both inhouse LLC simulator as well as on SPEC2017 workloads on gem5. Finally, we pinpoint implementation inconsistencies in MIRAGE's publicly available gem5 artifact which motivates a re-evaluation of the performance statistics of MIRAGE with respect to ScatterCache and baseline set-associative cache. We find MIRAGE, in reality, performs worse than what is previously reported in literature, a concern that should be addressed in successor generations of secured caches. △ Less

Submitted 8 October, 2023; originally announced October 2023.

arXiv:2309.05132 [pdf, other]

DAD++: Improved Data-free Test Time Adversarial Defense

Authors: Gaurav Kumar Nayak, Inder Khatri, Shubham Randive, Ruchit Rawal, Anirban Chakraborty

Abstract: With the increasing deployment of deep neural networks in safety-critical applications such as self-driving cars, medical imaging, anomaly detection, etc., adversarial robustness has become a crucial concern in the reliability of these networks in real-world scenarios. A plethora of works based on adversarial training and regularization-based techniques have been proposed to make these deep networ… ▽ More With the increasing deployment of deep neural networks in safety-critical applications such as self-driving cars, medical imaging, anomaly detection, etc., adversarial robustness has become a crucial concern in the reliability of these networks in real-world scenarios. A plethora of works based on adversarial training and regularization-based techniques have been proposed to make these deep networks robust against adversarial attacks. However, these methods require either retraining models or training them from scratch, making them infeasible to defend pre-trained models when access to training data is restricted. To address this problem, we propose a test time Data-free Adversarial Defense (DAD) containing detection and correction frameworks. Moreover, to further improve the efficacy of the correction framework in cases when the detector is under-confident, we propose a soft-detection scheme (dubbed as "DAD++"). We conduct a wide range of experiments and ablations on several datasets and network architectures to show the efficacy of our proposed approach. Furthermore, we demonstrate the applicability of our approach in imparting adversarial defense at test time under data-free (or data-efficient) applications/setups, such as Data-free Knowledge Distillation and Source-free Unsupervised Domain Adaptation, as well as Semi-supervised classification frameworks. We observe that in all the experiments and applications, our DAD++ gives an impressive performance against various adversarial attacks with a minimal drop in clean accuracy. The source code is available at: https://github.com/vcl-iisc/Improved-Data-free-Test-Time-Adversarial-Defense △ Less

Submitted 10 September, 2023; originally announced September 2023.

Comments: IJCV Journal (Under Review)

arXiv:2309.00726 [pdf, other]

An Anisotropic $hp$-Adaptation Framework for Ultraweak Discontinuous Petrov-Galerkin Formulations

Authors: Ankit Chakraborty, Stefan Henneking, Leszek Demkowicz

Abstract: In this article, we present a three-dimensional anisotropic $hp$-mesh refinement strategy for ultraweak discontinuous Petrov--Galerkin (DPG) formulations with optimal test functions. The refinement strategy utilizes the built-in residual-based error estimator accompanying the DPG discretization. The refinement strategy is a two-step process: (a) use the built-in error estimator to mark and isotrop… ▽ More In this article, we present a three-dimensional anisotropic $hp$-mesh refinement strategy for ultraweak discontinuous Petrov--Galerkin (DPG) formulations with optimal test functions. The refinement strategy utilizes the built-in residual-based error estimator accompanying the DPG discretization. The refinement strategy is a two-step process: (a) use the built-in error estimator to mark and isotropically $hp$-refine elements of the (coarse) mesh to generate a finer mesh; (b) use the reference solution on the finer mesh to compute optimal $h$- and $p$-refinements of the selected elements in the coarse mesh. The process is repeated with coarse and fine mesh being generated in every adaptation cycle, until a prescribed error tolerance is achieved. We demonstrate the performance of the proposed refinement strategy using several numerical examples on hexahedral meshes. △ Less

Submitted 1 September, 2023; originally announced September 2023.

arXiv:2307.10177 [pdf, other]

Bayesian Spike Train Inference via Non-Local Priors

Authors: Abhisek Chakraborty

Abstract: Advances in neuroscience have enabled researchers to measure the activities of large numbers of neurons simultaneously in behaving animals. We have access to the fluorescence of each of the neurons which provides a first-order approximation of the neural activity over time. Determining the exact spike of a neuron from this fluorescence trace constitutes an active area of research within the field… ▽ More Advances in neuroscience have enabled researchers to measure the activities of large numbers of neurons simultaneously in behaving animals. We have access to the fluorescence of each of the neurons which provides a first-order approximation of the neural activity over time. Determining the exact spike of a neuron from this fluorescence trace constitutes an active area of research within the field of computational neuroscience. We propose a novel Bayesian approach based on a mixture of half-non-local prior densities and point masses for this task. Instead of a computationally expensive MCMC algorithm, we adopt a stochastic search-based approach that is capable of taking advantage of modern computing environments often equipped with multiple processors, to explore all possible arrangements of spikes and lack thereof in an observed spike train. It then reports the highest posterior probability arrangement of spikes and posterior probability for a spike at each location of the spike train. Our proposals lead to substantial improvements over existing proposals based on L1 regularization, and enjoy comparable estimation accuracy to the state-of-the-art L0 proposal, in simulations, and on recent calcium imaging data sets. Notably, contrary to optimization-based frequentist approaches, our methodology yields automatic uncertainty quantification associated with the spike-train inference. △ Less

Submitted 27 May, 2023; originally announced July 2023.

arXiv:2307.04819 [pdf, ps, other]

A Kalman Filter based Low Complexity Throughput Prediction Algorithm for 5G Cellular Networks

Authors: Mayukh Biswas, Ayan Chakraborty, Basabdatta Palit

Abstract: Throughput Prediction is one of the primary preconditions for the uninterrupted operation of several network-aware mobile applications, namely video streaming. Recent works have advocated using Machine Learning (ML) and Deep Learning (DL) for cellular network throughput prediction. In contrast, this work has proposed a low computationally complex simple solution which models the future throughput… ▽ More Throughput Prediction is one of the primary preconditions for the uninterrupted operation of several network-aware mobile applications, namely video streaming. Recent works have advocated using Machine Learning (ML) and Deep Learning (DL) for cellular network throughput prediction. In contrast, this work has proposed a low computationally complex simple solution which models the future throughput as a multiple linear regression of several present network parameters and present throughput. It then feeds the variance of prediction error and measurement error, which is inherent in any measurement setup but unaccounted for in existing works, to a Kalman filter-based prediction-correction approach to obtain the optimal estimates of the future throughput. Extensive experiments across seven publicly available 5G throughput datasets for different prediction window lengths have shown that the proposed method outperforms the baseline ML and DL algorithms by delivering more accurate results within a shorter timeframe for inferencing and retraining. Furthermore, in comparison to its ML and DL counterparts, the proposed throughput prediction method is also found to deliver higher QoE to both streaming and live video users when used in conjunction with popular Model Predictive Control (MPC) based adaptive bitrate streaming algorithms. △ Less

Submitted 26 November, 2023; v1 submitted 10 July, 2023; originally announced July 2023.

Comments: 13 pages, 14 figures

arXiv:2306.16391 [pdf, other]

The Power of Telemetry: Uncovering Software-Based Side-Channel Attacks on Apple M1/M2 Systems

Authors: Nikhil Chawla, Chen Liu, Abhishek Chakraborty, Igor Chervatyuk, Ke Sun, Thais Moreira Hamasaki, Henrique Kawakami

Abstract: Power analysis is a class of side-channel attacks, where power consumption data is used to infer sensitive information and extract secrets from a system. Traditionally, such attacks required physical access to the target, as well as specialized devices to measure the power consumption with enough precision. The PLATYPUS attack has shown that on-chip power meter capabilities exposed to a software i… ▽ More Power analysis is a class of side-channel attacks, where power consumption data is used to infer sensitive information and extract secrets from a system. Traditionally, such attacks required physical access to the target, as well as specialized devices to measure the power consumption with enough precision. The PLATYPUS attack has shown that on-chip power meter capabilities exposed to a software interface might form a new class of power side-channel attacks. This paper presents a software-based power side-channel attack on Apple Silicon M1/M2 platforms, exploiting the System Management Controller (SMC) and its power-related keys, which provides access to the on-chip power meters through a software interface to user space software. We observed data-dependent power consumption reporting from such keys and analyzed the correlations between the power consumption and the processed data. Our work also demonstrated how an unprivileged user mode application successfully recovers bytes from an AES encryption key from a cryptographic service supported by a kernel mode driver in macOS. Furthermore, we discuss the impact of software-based power side-channels in the industry, possible countermeasures, and the overall implications of software interfaces for modern on-chip power management systems. △ Less

Submitted 28 June, 2023; originally announced June 2023.

Comments: 6 pages, 4 figures, 5 tables

arXiv:2306.06034 [pdf, other]

RANS-PINN based Simulation Surrogates for Predicting Turbulent Flows

Authors: Shinjan Ghosh, Amit Chakraborty, Georgia Olympia Brikis, Biswadip Dey

Abstract: Physics-informed neural networks (PINNs) provide a framework to build surrogate models for dynamical systems governed by differential equations. During the learning process, PINNs incorporate a physics-based regularization term within the loss function to enhance generalization performance. Since simulating dynamics controlled by partial differential equations (PDEs) can be computationally expensi… ▽ More Physics-informed neural networks (PINNs) provide a framework to build surrogate models for dynamical systems governed by differential equations. During the learning process, PINNs incorporate a physics-based regularization term within the loss function to enhance generalization performance. Since simulating dynamics controlled by partial differential equations (PDEs) can be computationally expensive, PINNs have gained popularity in learning parametric surrogates for fluid flow problems governed by Navier-Stokes equations. In this work, we introduce RANS-PINN, a modified PINN framework, to predict flow fields (i.e., velocity and pressure) in high Reynolds number turbulent flow regimes. To account for the additional complexity introduced by turbulence, RANS-PINN employs a 2-equation eddy viscosity model based on a Reynolds-averaged Navier-Stokes (RANS) formulation. Furthermore, we adopt a novel training approach that ensures effective initialization and balance among the various components of the loss function. The effectiveness of the RANS-PINN framework is then demonstrated using a parametric PINN. △ Less

Submitted 11 August, 2023; v1 submitted 9 June, 2023; originally announced June 2023.

Journal ref: Published at the 1st workshop on Synergy of Scientific and Machine Learning Modeling, ICML 2023

arXiv:2305.19600 [pdf, other]

Adaptive Self-Distillation for Minimizing Client Drift in Heterogeneous Federated Learning

Authors: M. Yashwanth, Gaurav Kumar Nayak, Arya Singh, Yogesh Simmhan, Anirban Chakraborty

Abstract: Federated Learning (FL) is a machine learning paradigm that enables clients to jointly train a global model by aggregating the locally trained models without sharing any local training data. In practice, there can often be substantial heterogeneity (e.g., class imbalance) across the local data distributions observed by each of these clients. Under such non-iid data distributions across clients, FL… ▽ More Federated Learning (FL) is a machine learning paradigm that enables clients to jointly train a global model by aggregating the locally trained models without sharing any local training data. In practice, there can often be substantial heterogeneity (e.g., class imbalance) across the local data distributions observed by each of these clients. Under such non-iid data distributions across clients, FL suffers from the 'client-drift' problem where every client drifts to its own local optimum. This results in slower convergence and poor performance of the aggregated model. To address this limitation, we propose a novel regularization technique based on adaptive self-distillation (ASD) for training models on the client side. Our regularization scheme adaptively adjusts to the client's training data based on the global model entropy and the client's label distribution. The proposed regularization can be easily integrated atop existing, state-of-the-art FL algorithms, leading to a further boost in the performance of these off-the-shelf methods. We theoretically explain how ASD reduces client-drift and also explain its generalization ability. We demonstrate the efficacy of our approach through extensive experiments on multiple real-world benchmarks and show substantial gains in performance over state-of-the-art methods. △ Less

Submitted 6 February, 2024; v1 submitted 31 May, 2023; originally announced May 2023.

arXiv:2305.17557 [pdf, other]

Fair Clustering via Hierarchical Fair-Dirichlet Process

Authors: Abhisek Chakraborty, Anirban Bhattacharya, Debdeep Pati

Abstract: The advent of ML-driven decision-making and policy formation has led to an increasing focus on algorithmic fairness. As clustering is one of the most commonly used unsupervised machine learning approaches, there has naturally been a proliferation of literature on {\em fair clustering}. A popular notion of fairness in clustering mandates the clusters to be {\em balanced}, i.e., each level of a prot… ▽ More The advent of ML-driven decision-making and policy formation has led to an increasing focus on algorithmic fairness. As clustering is one of the most commonly used unsupervised machine learning approaches, there has naturally been a proliferation of literature on {\em fair clustering}. A popular notion of fairness in clustering mandates the clusters to be {\em balanced}, i.e., each level of a protected attribute must be approximately equally represented in each cluster. Building upon the original framework, this literature has rapidly expanded in various aspects. In this article, we offer a novel model-based formulation of fair clustering, complementing the existing literature which is almost exclusively based on optimizing appropriate objective functions. △ Less

Submitted 27 May, 2023; originally announced May 2023.

arXiv:2304.00955 [pdf, other]

A short note on the paper `Are Randomized Caches Really Random?'

Authors: Anirban Chakraborty, Sarani Bhattacharya, Sayandeep Saha, Debdeep Mukhopadhyay

Abstract: In this paper, we analyse the results and claims presented in the paper \emph{`Are Randomized Caches Truly Random? Formal Analysis of Randomized Partitioned Caches'}, presented at HPCA conference 2023. In addition, we also analyse the applicability of `Bucket and Ball' analytical model presented in MIRAGE (Usenix Security 2021) for its security estimation. We put forth the fallacies in the origina… ▽ More In this paper, we analyse the results and claims presented in the paper \emph{`Are Randomized Caches Truly Random? Formal Analysis of Randomized Partitioned Caches'}, presented at HPCA conference 2023. In addition, we also analyse the applicability of `Bucket and Ball' analytical model presented in MIRAGE (Usenix Security 2021) for its security estimation. We put forth the fallacies in the original bucket and ball model and discuss its implications. Finally, we demonstrate a cache occupancy attack on MIRAGE with just $10\%$ of total cache capacity and extend the framework to establish a covert channel and a template-based fingerprinting attack. △ Less

Submitted 3 April, 2023; originally announced April 2023.

arXiv:2303.10085 [pdf, other]

Robust probabilistic inference via a constrained transport metric

Authors: Abhisek Chakraborty, Anirban Bhattacharya, Debdeep Pati

Abstract: Flexible Bayesian models are typically constructed using limits of large parametric models with a multitude of parameters that are often uninterpretable. In this article, we offer a novel alternative by constructing an exponentially tilted empirical likelihood carefully designed to concentrate near a parametric family of distributions of choice with respect to a novel variant of the Wasserstein me… ▽ More Flexible Bayesian models are typically constructed using limits of large parametric models with a multitude of parameters that are often uninterpretable. In this article, we offer a novel alternative by constructing an exponentially tilted empirical likelihood carefully designed to concentrate near a parametric family of distributions of choice with respect to a novel variant of the Wasserstein metric, which is then combined with a prior distribution on model parameters to obtain a robustified posterior. The proposed approach finds applications in a wide variety of robust inference problems, where we intend to perform inference on the parameters associated with the centering distribution in presence of outliers. Our proposed transport metric enjoys great computational simplicity, exploiting the Sinkhorn regularization for discrete optimal transport problems, and being inherently parallelizable. We demonstrate superior performance of our methodology when compared against state-of-the-art robust Bayesian inference methods. We also demonstrate equivalence of our approach with a nonparametric Bayesian formulation under a suitable asymptotic framework, testifying to its flexibility. The constrained entropy maximization that sits at the heart of our likelihood formulation finds its utility beyond robust Bayesian inference; an illustration is provided in a trustworthy machine learning application. △ Less

Submitted 17 March, 2023; originally announced March 2023.

arXiv:2303.08784 [pdf, other]

Query-guided Attention in Vision Transformers for Localizing Objects Using a Single Sketch

Authors: Aditay Tripathi, Anand Mishra, Anirban Chakraborty

Abstract: In this work, we investigate the problem of sketch-based object localization on natural images, where given a crude hand-drawn sketch of an object, the goal is to localize all the instances of the same object on the target image. This problem proves difficult due to the abstract nature of hand-drawn sketches, variations in the style and quality of sketches, and the large domain gap existing betwee… ▽ More In this work, we investigate the problem of sketch-based object localization on natural images, where given a crude hand-drawn sketch of an object, the goal is to localize all the instances of the same object on the target image. This problem proves difficult due to the abstract nature of hand-drawn sketches, variations in the style and quality of sketches, and the large domain gap existing between the sketches and the natural images. To mitigate these challenges, existing works proposed attention-based frameworks to incorporate query information into the image features. However, in these works, the query features are incorporated after the image features have already been independently learned, leading to inadequate alignment. In contrast, we propose a sketch-guided vision transformer encoder that uses cross-attention after each block of the transformer-based image encoder to learn query-conditioned image features leading to stronger alignment with the query sketch. Further, at the output of the decoder, the object and the sketch features are refined to bring the representation of relevant objects closer to the sketch query and thereby improve the localization. The proposed model also generalizes to the object categories not seen during training, as the target image features learned by our method are query-aware. Our localization framework can also utilize multiple sketch queries via a trainable novel sketch fusion strategy. The model is evaluated on the images from the public object detection benchmark, namely MS-COCO, using the sketch queries from QuickDraw! and Sketchy datasets. Compared with existing localization methods, the proposed approach gives a $6.6\%$ and $8.0\%$ improvement in mAP for seen objects using sketch queries from QuickDraw! and Sketchy datasets, respectively, and a $12.2\%$ improvement in AP@50 for large objects that are `unseen' during training. △ Less

Submitted 15 March, 2023; originally announced March 2023.

arXiv:2212.06011 [pdf, other]

A Neural ODE Interpretation of Transformer Layers

Authors: Yaofeng Desmond Zhong, Tongtao Zhang, Amit Chakraborty, Biswadip Dey

Abstract: Transformer layers, which use an alternating pattern of multi-head attention and multi-layer perceptron (MLP) layers, provide an effective tool for a variety of machine learning problems. As the transformer layers use residual connections to avoid the problem of vanishing gradients, they can be viewed as the numerical integration of a differential equation. In this extended abstract, we build upon… ▽ More Transformer layers, which use an alternating pattern of multi-head attention and multi-layer perceptron (MLP) layers, provide an effective tool for a variety of machine learning problems. As the transformer layers use residual connections to avoid the problem of vanishing gradients, they can be viewed as the numerical integration of a differential equation. In this extended abstract, we build upon this connection and propose a modification of the internal architecture of a transformer layer. The proposed model places the multi-head attention sublayer and the MLP sublayer parallel to each other. Our experiments show that this simple modification improves the performance of transformer networks in multiple tasks. Moreover, for the image classification task, we show that using neural ODE solvers with a sophisticated integration scheme further improves performance. △ Less

Submitted 12 December, 2022; originally announced December 2022.

Journal ref: Published at the DLDE Workshop in NeurIPS 2022

arXiv:2212.02048 [pdf, other]

Hodge Decomposition of the Remittance Network on the XRP ledger in the Price Hike of January 2018

Authors: Yuichi Ikeda, Abhijit Chakraborty

Abstract: This study analyzes the remittance transaction recorded on the XRP ledger for ETH and USD from July 2017 to Jun 2018, including the bubble period in early 2018. Using the Hodge decomposition, we estimate the ``loop flow'' in the international remittance of cryptoassets during the bubble period. We found characteristic differences between those fiat currencies and cryptoassets during the bubble per… ▽ More This study analyzes the remittance transaction recorded on the XRP ledger for ETH and USD from July 2017 to Jun 2018, including the bubble period in early 2018. Using the Hodge decomposition, we estimate the ``loop flow'' in the international remittance of cryptoassets during the bubble period. We found characteristic differences between those fiat currencies and cryptoassets during the bubble period. For ETH, there was a significant increase in the loop flow during the cryptoasset price peak. This might be related to money laundering or arbitrage transaction. There was a slight increase in the loop flow for USD during the cryptoasset price peak. △ Less

Submitted 5 December, 2022; originally announced December 2022.

arXiv:2212.00749 [pdf, other]

Multimodal Query-guided Object Localization

Authors: Aditay Tripathi, Rajath R Dani, Anand Mishra, Anirban Chakraborty

Abstract: Consider a scenario in one-shot query-guided object localization where neither an image of the object nor the object category name is available as a query. In such a scenario, a hand-drawn sketch of the object could be a choice for a query. However, hand-drawn crude sketches alone, when used as queries, might be ambiguous for object localization, e.g., a sketch of a laptop could be confused for a… ▽ More Consider a scenario in one-shot query-guided object localization where neither an image of the object nor the object category name is available as a query. In such a scenario, a hand-drawn sketch of the object could be a choice for a query. However, hand-drawn crude sketches alone, when used as queries, might be ambiguous for object localization, e.g., a sketch of a laptop could be confused for a sofa. On the other hand, a linguistic definition of the category, e.g., a small portable computer small enough to use in your lap" along with the sketch query, gives better visual and semantic cues for object localization. In this work, we present a multimodal query-guided object localization approach under the challenging open-set setting. In particular, we use queries from two modalities, namely, hand-drawn sketch and description of the object (also known as gloss), to perform object localization. Multimodal query-guided object localization is a challenging task, especially when a large domain gap exists between the queries and the natural images, as well as due to the challenge of combining the complementary and minimal information present across the queries. For example, hand-drawn crude sketches contain abstract shape information of an object, while the text descriptions often capture partial semantic information about a given object category. To address the aforementioned challenges, we present a novel cross-modal attention scheme that guides the region proposal network to generate object proposals relevant to the input queries and a novel orthogonal projection-based proposal scoring technique that scores each proposal with respect to the queries, thereby yielding the final localization results. ... △ Less

Submitted 24 July, 2024; v1 submitted 1 December, 2022; originally announced December 2022.

Comments: Accepted to MMTA

Showing 1–50 of 183 results for author: Chakraborty, A