Skip to main content

Showing 1–50 of 180 results for author: Agrawal, P

  1. arXiv:2407.16677  [pdf, other

    cs.RO cs.LG

    From Imitation to Refinement -- Residual RL for Precise Visual Assembly

    Authors: Lars Ankile, Anthony Simeonov, Idan Shenfeld, Marcel Torne, Pulkit Agrawal

    Abstract: Behavior cloning (BC) currently stands as a dominant paradigm for learning real-world visual manipulation. However, in tasks that require locally corrective behaviors like multi-part assembly, learning robust policies purely from human demonstrations remains challenging. Reinforcement learning (RL) can mitigate these limitations by allowing policies to acquire locally corrective behaviors through… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  2. arXiv:2407.16186  [pdf, other

    cs.RO cs.AI cs.LG

    Automatic Environment Shaping is the Next Frontier in RL

    Authors: Younghyo Park, Gabriel B. Margolis, Pulkit Agrawal

    Abstract: Many roboticists dream of presenting a robot with a task in the evening and returning the next morning to find the robot capable of solving the task. What is preventing us from achieving this? Sim-to-real reinforcement learning (RL) has achieved impressive performance on challenging robotics tasks, but requires substantial human effort to set up the task in a way that is amenable to RL. It's our p… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: ICML 2024 Position Track; Website at https://auto-env-shaping.github.io/

  3. arXiv:2407.13755  [pdf, other

    cs.LG

    Random Latent Exploration for Deep Reinforcement Learning

    Authors: Srinath Mahankali, Zhang-Wei Hong, Ayush Sekhari, Alexander Rakhlin, Pulkit Agrawal

    Abstract: The ability to efficiently explore high-dimensional state spaces is essential for the practical success of deep Reinforcement Learning (RL). This paper introduces a new exploration technique called Random Latent Exploration (RLE), that combines the strengths of bonus-based and noise-based (two popular approaches for effective exploration in deep RL) exploration strategies. RLE leverages the idea o… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Accepted to ICML 2024

  4. arXiv:2407.13743  [pdf, ps, other

    cs.LG stat.ML

    Optimistic Q-learning for average reward and episodic reinforcement learning

    Authors: Priyank Agrawal, Shipra Agrawal

    Abstract: We present an optimistic Q-learning algorithm for regret minimization in average reward reinforcement learning under an additional assumption on the underlying MDP that for all policies, the expected time to visit some frequent state $s_0$ is finite and upper bounded by $H$. Our setting strictly generalizes the episodic setting and is significantly less restrictive than the assumption of bounded h… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 36 pages

  5. arXiv:2407.07884  [pdf, other

    cs.RO cs.AI cs.LG eess.SY

    Vegetable Peeling: A Case Study in Constrained Dexterous Manipulation

    Authors: Tao Chen, Eric Cousineau, Naveen Kuppuswamy, Pulkit Agrawal

    Abstract: Recent studies have made significant progress in addressing dexterous manipulation problems, particularly in in-hand object reorientation. However, there are few existing works that explore the potential utilization of developed dexterous manipulation controllers for downstream tasks. In this study, we focus on constrained dexterous manipulation for food peeling. Food peeling presents various cons… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  6. arXiv:2407.03995  [pdf, other

    cs.LG cs.AI cs.RO

    ROER: Regularized Optimal Experience Replay

    Authors: Changling Li, Zhang-Wei Hong, Pulkit Agrawal, Divyansh Garg, Joni Pajarinen

    Abstract: Experience replay serves as a key component in the success of online reinforcement learning (RL). Prioritized experience replay (PER) reweights experiences by the temporal difference (TD) error empirically enhancing the performance. However, few works have explored the motivation of using TD error. In this work, we provide an alternative perspective on TD-error-based reweighting. We show the conne… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Journal ref: Reinforcement Learning Journal, vol. 1, no. 1, 2024, pp. TBD

  7. arXiv:2406.00681  [pdf, other

    cs.LG

    Learning Multimodal Behaviors from Scratch with Diffusion Policy Gradient

    Authors: Zechu Li, Rickmer Krohn, Tao Chen, Anurag Ajay, Pulkit Agrawal, Georgia Chalvatzaki

    Abstract: Deep reinforcement learning (RL) algorithms typically parameterize the policy as a deep network that outputs either a deterministic action or a stochastic one modeled as a Gaussian distribution, hence restricting learning to a single behavioral mode. Meanwhile, diffusion models emerged as a powerful framework for multimodal learning. However, the use of diffusion policies in online RL is hindered… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  8. arXiv:2405.14159  [pdf, other

    cs.CL cs.AI

    Super Tiny Language Models

    Authors: Dylan Hillier, Leon Guertler, Cheston Tan, Palaash Agrawal, Chen Ruirui, Bobby Cheng

    Abstract: The rapid advancement of large language models (LLMs) has led to significant improvements in natural language processing but also poses challenges due to their high computational and energy demands. This paper introduces a series of research efforts focused on Super Tiny Language Models (STLMs), which aim to deliver high performance with significantly reduced parameter counts. We explore innovativ… ▽ More

    Submitted 26 June, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: 11 pages, 4 figures

    ACM Class: I.2.7

  9. arXiv:2405.06639  [pdf, other

    cs.LG cs.AI cs.CL

    Value Augmented Sampling for Language Model Alignment and Personalization

    Authors: Seungwook Han, Idan Shenfeld, Akash Srivastava, Yoon Kim, Pulkit Agrawal

    Abstract: Aligning Large Language Models (LLMs) to cater to different human preferences, learning new skills, and unlearning harmful behavior is an important problem. Search-based methods, such as Best-of-N or Monte-Carlo Tree Search, are performant, but impractical for LLM adaptation due to their high inference cost. On the other hand, using Reinforcement Learning (RL) for adaptation is computationally eff… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

    Comments: Website: https://sites.google.com/view/llm-vas

  10. arXiv:2405.05938  [pdf, other

    cs.CL

    DOLOMITES: Domain-Specific Long-Form Methodical Tasks

    Authors: Chaitanya Malaviya, Priyanka Agrawal, Kuzman Ganchev, Pranesh Srinivasan, Fantine Huot, Jonathan Berant, Mark Yatskar, Dipanjan Das, Mirella Lapata, Chris Alberti

    Abstract: Experts in various fields routinely perform methodical writing tasks to plan, organize, and report their work. From a clinician writing a differential diagnosis for a patient, to a teacher writing a lesson plan for students, these tasks are pervasive, requiring to methodically generate structured long-form output for a given input. We develop a typology of methodical tasks structured in the form o… ▽ More

    Submitted 28 May, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

    Comments: Dataset now available at https://dolomites-benchmark.github.io

  11. arXiv:2405.01402  [pdf, other

    cs.RO cs.AI cs.LG eess.SY

    Learning Force Control for Legged Manipulation

    Authors: Tifanny Portela, Gabriel B. Margolis, Yandong Ji, Pulkit Agrawal

    Abstract: Controlling contact forces during interactions is critical for locomotion and manipulation tasks. While sim-to-real reinforcement learning (RL) has succeeded in many contact-rich problems, current RL methods achieve forceful interactions implicitly without explicitly regulating forces. We propose a method for training RL policies for direct force control without requiring access to force sensing.… ▽ More

    Submitted 20 May, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

    Comments: This work has been accepted to ICRA24, as well as the Loco-manipulation workshop at ICRA24

  12. arXiv:2404.14735  [pdf, other

    cs.RO

    Rank2Reward: Learning Shaped Reward Functions from Passive Video

    Authors: Daniel Yang, Davin Tjia, Jacob Berg, Dima Damen, Pulkit Agrawal, Abhishek Gupta

    Abstract: Teaching robots novel skills with demonstrations via human-in-the-loop data collection techniques like kinesthetic teaching or teleoperation puts a heavy burden on human supervisors. In contrast to this paradigm, it is often significantly easier to provide raw, action-free visual data of tasks being performed. Moreover, this data can even be mined from video datasets or the web. Ideally, this data… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: ICRA 2024

  13. arXiv:2404.04817  [pdf, other

    cs.CL

    FRACTAL: Fine-Grained Scoring from Aggregate Text Labels

    Authors: Yukti Makhija, Priyanka Agrawal, Rishi Saket, Aravindan Raghuveer

    Abstract: Large language models (LLMs) are being increasingly tuned to power complex generation tasks such as writing, fact-seeking, querying and reasoning. Traditionally, human or model feedback for evaluating and further tuning LLM performance has been provided at the response level, enabling faster and more cost-effective assessments. However, recent works (Amplayo et al. [2022], Wu et al. [2023]) indica… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: 22 pages, 1 figure

  14. arXiv:2404.03729  [pdf, other

    cs.RO cs.LG

    JUICER: Data-Efficient Imitation Learning for Robotic Assembly

    Authors: Lars Ankile, Anthony Simeonov, Idan Shenfeld, Pulkit Agrawal

    Abstract: While learning from demonstrations is powerful for acquiring visuomotor policies, high-performance imitation without large demonstration datasets remains challenging for tasks requiring precise, long-horizon manipulation. This paper proposes a pipeline for improving imitation learning performance with a small human demonstration budget. We apply our approach to assembly tasks that require precisel… ▽ More

    Submitted 9 April, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

    Comments: Project website: https://imitation-juicer.github.io/

  15. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  16. arXiv:2403.03949  [pdf, other

    cs.RO cs.AI cs.LG

    Reconciling Reality through Simulation: A Real-to-Sim-to-Real Approach for Robust Manipulation

    Authors: Marcel Torne, Anthony Simeonov, Zechu Li, April Chan, Tao Chen, Abhishek Gupta, Pulkit Agrawal

    Abstract: Imitation learning methods need significant human supervision to learn policies robust to changes in object poses, physical disturbances, and visual distractors. Reinforcement learning, on the other hand, can explore the environment autonomously to learn robust behaviors but may require impractical amounts of unsafe real-world data collection. To learn performant, robust policies without the burde… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: Project page: https://real-to-sim-to-real.github.io/RialTo/

  17. arXiv:2402.19464  [pdf, other

    cs.LG cs.AI cs.CL

    Curiosity-driven Red-teaming for Large Language Models

    Authors: Zhang-Wei Hong, Idan Shenfeld, Tsun-Hsuan Wang, Yung-Sung Chuang, Aldo Pareja, James Glass, Akash Srivastava, Pulkit Agrawal

    Abstract: Large language models (LLMs) hold great potential for many natural language applications but risk generating incorrect or toxic content. To probe when an LLM generates unwanted content, the current paradigm is to recruit a \textit{red team} of human testers to design input prompts (i.e., test cases) that elicit undesirable responses from LLMs. However, relying solely on human testers is expensive… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

    Comments: Published at ICLR 2024

  18. arXiv:2402.16828  [pdf, other

    cs.LG cs.AI cs.CV

    Training Neural Networks from Scratch with Parallel Low-Rank Adapters

    Authors: Minyoung Huh, Brian Cheung, Jeremy Bernstein, Phillip Isola, Pulkit Agrawal

    Abstract: The scalability of deep learning models is fundamentally limited by computing resources, memory, and communication. Although methods like low-rank adaptation (LoRA) have reduced the cost of model finetuning, its application in model pre-training remains largely unexplored. This paper explores extending LoRA to model pre-training, identifying the inherent constraints and limitations of standard LoR… ▽ More

    Submitted 26 July, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

  19. Publicly auditable privacy-preserving electoral rolls

    Authors: Prashant Agrawal, Mahabir Prasad Jhanwar, Subodh Vishnu Sharma, Subhashis Banerjee

    Abstract: While existing literature on electronic voting has extensively addressed verifiability of voting protocols, the vulnerability of electoral rolls in large public elections remains a critical concern. To ensure integrity of electoral rolls, the current practice is to either make electoral rolls public or share them with the political parties. However, this enables construction of detailed voter prof… ▽ More

    Submitted 2 June, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

    Report number: CSF 2024

    Journal ref: 2024 IEEE 37th Computer Security Foundations Symposium (CSF)

  20. arXiv:2402.01805  [pdf, other

    cs.CL cs.AI

    Can LLMs perform structured graph reasoning?

    Authors: Palaash Agrawal, Shavak Vasania, Cheston Tan

    Abstract: Pretrained Large Language Models (LLMs) have demonstrated various reasoning capabilities through language-based prompts alone, particularly in unstructured task settings (tasks purely based on language semantics). However, LLMs often struggle with structured tasks, because of the inherent incompatibility of input representation. Reducing structured tasks to uni-dimensional language semantics often… ▽ More

    Submitted 18 April, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

  21. Streaming Bilingual End-to-End ASR model using Attention over Multiple Softmax

    Authors: Aditya Patil, Vikas Joshi, Purvi Agrawal, Rupesh Mehta

    Abstract: Even with several advancements in multilingual modeling, it is challenging to recognize multiple languages using a single neural model, without knowing the input language and most multilingual models assume the availability of the input language. In this work, we propose a novel bilingual end-to-end (E2E) modeling approach, where a single neural model can recognize both languages and also support… ▽ More

    Submitted 21 January, 2024; originally announced January 2024.

    Comments: Published in IEEE's Spoken Language Technology (SLT) 2022, 8 pages (6 + 2 for references), 5 figures

    Journal ref: 2022 IEEE Spoken Language Technology Workshop (SLT), Doha, Qatar, 2023, pp. 252-259

  22. arXiv:2401.10460  [pdf, other

    cs.SD cs.LG eess.AS

    Ultra-lightweight Neural Differential DSP Vocoder For High Quality Speech Synthesis

    Authors: Prabhav Agrawal, Thilo Koehler, Zhiping Xiu, Prashant Serai, Qing He

    Abstract: Neural vocoders model the raw audio waveform and synthesize high-quality audio, but even the highly efficient ones, like MB-MelGAN and LPCNet, fail to run real-time on a low-end device like a smartglass. A pure digital signal processing (DSP) based vocoder can be implemented via lightweight fast Fourier transforms (FFT), and therefore, is a magnitude faster than any neural vocoder. A DSP vocoder o… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

    Comments: Accepted for ICASSP 2024

  23. arXiv:2401.06783  [pdf, other

    cs.CL cs.AI cs.LG cs.SI

    MultiSiam: A Multiple Input Siamese Network For Social Media Text Classification And Duplicate Text Detection

    Authors: Sudhanshu Bhoi, Swapnil Markhedkar, Shruti Phadke, Prashant Agrawal

    Abstract: Social media accounts post increasingly similar content, creating a chaotic experience across platforms, which makes accessing desired information difficult. These posts can be organized by categorizing and grouping duplicates across social handles and accounts. There can be more than one duplicate of a post, however, a conventional Siamese neural network only considers a pair of inputs for duplic… ▽ More

    Submitted 6 January, 2024; originally announced January 2024.

  24. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  25. arXiv:2311.09344  [pdf, other

    cs.CL

    Language and Task Arithmetic with Parameter-Efficient Layers for Zero-Shot Summarization

    Authors: Alexandra Chronopoulou, Jonas Pfeiffer, Joshua Maynez, Xinyi Wang, Sebastian Ruder, Priyanka Agrawal

    Abstract: Parameter-efficient fine-tuning (PEFT) using labeled task data can significantly improve the performance of large language models (LLMs) on the downstream task. However, there are 7000 languages in the world and many of these languages lack labeled data for real-world language generation tasks. In this paper, we propose to improve zero-shot cross-lingual transfer by composing language or task spec… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

  26. arXiv:2311.07592  [pdf, other

    cs.CL cs.AI cs.IR

    Hallucination-minimized Data-to-answer Framework for Financial Decision-makers

    Authors: Sohini Roychowdhury, Andres Alvarez, Brian Moore, Marko Krema, Maria Paz Gelpi, Federico Martin Rodriguez, Angel Rodriguez, Jose Ramon Cabrejas, Pablo Martinez Serrano, Punit Agrawal, Arijit Mukherjee

    Abstract: Large Language Models (LLMs) have been applied to build several automation and personalized question-answering prototypes so far. However, scaling such prototypes to robust products with minimized hallucinations or fake responses still remains an open challenge, especially in niche data-table heavy domains such as financial decision making. In this work, we present a novel Langchain-based framewor… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

    Comments: 11 pages, 5 figures, 4 tables

  27. arXiv:2311.01405  [pdf, other

    cs.RO cs.CV cs.LG

    Learning to See Physical Properties with Active Sensing Motor Policies

    Authors: Gabriel B. Margolis, Xiang Fu, Yandong Ji, Pulkit Agrawal

    Abstract: Knowledge of terrain's physical properties inferred from color images can aid in making efficient robotic locomotion plans. However, unlike image classification, it is unintuitive for humans to label image patches with physical properties. Without labeled data, building a vision system that takes as input the observed terrain and predicts physical properties remains challenging. We present a metho… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

    Comments: In CoRL 2023. Website: https://gmargo11.github.io/active-sensing-loco/

  28. arXiv:2310.20608  [pdf, other

    cs.LG cs.AI cs.RO

    Autonomous Robotic Reinforcement Learning with Asynchronous Human Feedback

    Authors: Max Balsells, Marcel Torne, Zihan Wang, Samedh Desai, Pulkit Agrawal, Abhishek Gupta

    Abstract: Ideally, we would place a robot in a real-world environment and leave it there improving on its own by gathering more experience autonomously. However, algorithms for autonomous robotic learning have been challenging to realize in the real world. While this has often been attributed to the challenge of sample complexity, even sample-efficient techniques are hampered by two major challenges - the d… ▽ More

    Submitted 31 October, 2023; originally announced October 2023.

    Comments: Project website https://guided-exploration-autonomous-rl.github.io/GEAR/

  29. arXiv:2310.17550  [pdf, other

    cs.LG cs.AI

    Human-Guided Complexity-Controlled Abstractions

    Authors: Andi Peng, Mycal Tucker, Eoin Kenny, Noga Zaslavsky, Pulkit Agrawal, Julie Shah

    Abstract: Neural networks often learn task-specific latent representations that fail to generalize to novel settings or tasks. Conversely, humans learn discrete representations (i.e., concepts or words) at a variety of abstraction levels (e.g., "bird" vs. "sparrow") and deploy the appropriate abstraction based on task. Inspired by this, we train neural models to generate a spectrum of discrete representatio… ▽ More

    Submitted 27 October, 2023; v1 submitted 26 October, 2023; originally announced October 2023.

    Comments: NeurIPS 2023

  30. arXiv:2310.17537  [pdf, other

    cs.AI cs.LG

    Neuro-Inspired Fragmentation and Recall to Overcome Catastrophic Forgetting in Curiosity

    Authors: Jaedong Hwang, Zhang-Wei Hong, Eric Chen, Akhilan Boopathy, Pulkit Agrawal, Ila Fiete

    Abstract: Deep reinforcement learning methods exhibit impressive performance on a range of tasks but still struggle on hard exploration tasks in large environments with sparse rewards. To address this, intrinsic rewards can be generated using forward model prediction errors that decrease as the environment becomes known, and incentivize an agent to explore novel states. While prediction-based intrinsic rewa… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

    Comments: NeurIPS 2023 Workshop - Intrinsically Motivated Open-ended Learning

  31. arXiv:2310.08803  [pdf, other

    cs.AI

    Advancing Perception in Artificial Intelligence through Principles of Cognitive Science

    Authors: Palaash Agrawal, Cheston Tan, Heena Rathore

    Abstract: Although artificial intelligence (AI) has achieved many feats at a rapid pace, there still exist open problems and fundamental shortcomings related to performance and resource efficiency. Since AI researchers benchmark a significant proportion of performance standards through human intelligence, cognitive sciences-inspired AI is a promising domain of research. Studying cognitive science can provid… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

    Comments: Summary: a detailed review of the current state of perception models through the lens of cognitive AI

  32. arXiv:2310.04413  [pdf, other

    cs.LG cs.AI

    Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced Datasets

    Authors: Zhang-Wei Hong, Aviral Kumar, Sathwik Karnik, Abhishek Bhandwaldar, Akash Srivastava, Joni Pajarinen, Romain Laroche, Abhishek Gupta, Pulkit Agrawal

    Abstract: Offline policy learning is aimed at learning decision-making policies using existing datasets of trajectories without collecting additional data. The primary motivation for using reinforcement learning (RL) instead of supervised learning techniques such as behavior cloning is to find a policy that achieves a higher average return than the trajectories constituting the dataset. However, we empirica… ▽ More

    Submitted 11 October, 2023; v1 submitted 6 October, 2023; originally announced October 2023.

    Comments: Accepted NeurIPS 2023

    Journal ref: NeurIPS 2023

  33. arXiv:2310.00488  [pdf, other

    cs.LG cs.AI

    On Memorization and Privacy Risks of Sharpness Aware Minimization

    Authors: Young In Kim, Pratiksha Agrawal, Johannes O. Royset, Rajiv Khanna

    Abstract: In many recent works, there is an increased focus on designing algorithms that seek flatter optima for neural network loss optimization as there is empirical evidence that it leads to better generalization performance in many datasets. In this work, we dissect these performance gains through the lens of data memorization in overparameterized models. We define a new metric that helps us identify wh… ▽ More

    Submitted 3 January, 2024; v1 submitted 30 September, 2023; originally announced October 2023.

  34. InFER: A Multi-Ethnic Indian Facial Expression Recognition Dataset

    Authors: Syed Sameen Ahmad Rizvi, Preyansh Agrawal, Jagat Sesh Challa, Pratik Narang

    Abstract: The rapid advancement in deep learning over the past decade has transformed Facial Expression Recognition (FER) systems, as newer methods have been proposed that outperform the existing traditional handcrafted techniques. However, such a supervised learning approach requires a sufficiently large training dataset covering all the possible scenarios. And since most people exhibit facial expressions… ▽ More

    Submitted 30 September, 2023; originally announced October 2023.

    Comments: In Proceedings of the 15th International Conference on Agents and Artificial Intelligence Volume 3: ICAART; ISBN 978-989-758-623-1; ISSN 2184-433X, SciTePress, pages 550-557. DOI: 10.5220/0011699400003393

    Journal ref: Volume 3: ICAART, 2023, pages - 550-557

  35. arXiv:2309.14321  [pdf, other

    cs.RO cs.LG

    Lifelong Robot Learning with Human Assisted Language Planners

    Authors: Meenal Parakh, Alisha Fong, Anthony Simeonov, Tao Chen, Abhishek Gupta, Pulkit Agrawal

    Abstract: Large Language Models (LLMs) have been shown to act like planners that can decompose high-level instructions into a sequence of executable instructions. However, current LLM-based planners are only able to operate with a fixed set of skills. We overcome this critical limitation and present a method for using LLM-based planners to query new skills and teach robots these skills in a data and time-ef… ▽ More

    Submitted 24 October, 2023; v1 submitted 25 September, 2023; originally announced September 2023.

  36. arXiv:2309.08587  [pdf, other

    cs.LG cs.AI cs.RO

    Compositional Foundation Models for Hierarchical Planning

    Authors: Anurag Ajay, Seungwook Han, Yilun Du, Shuang Li, Abhi Gupta, Tommi Jaakkola, Josh Tenenbaum, Leslie Kaelbling, Akash Srivastava, Pulkit Agrawal

    Abstract: To make effective decisions in novel environments with long-horizon goals, it is crucial to engage in hierarchical reasoning across spatial and temporal scales. This entails planning abstract subgoal sequences, visually reasoning about the underlying plans, and executing actions in accordance with the devised plan through visual-motor control. We propose Compositional Foundation Models for Hierarc… ▽ More

    Submitted 21 September, 2023; v1 submitted 15 September, 2023; originally announced September 2023.

    Comments: Website: https://hierarchical-planning-foundation-model.github.io/

  37. arXiv:2309.06680  [pdf, other

    cs.CV

    STUPD: A Synthetic Dataset for Spatial and Temporal Relation Reasoning

    Authors: Palaash Agrawal, Haidi Azaman, Cheston Tan

    Abstract: Understanding relations between objects is crucial for understanding the semantics of a visual scene. It is also an essential step in order to bridge visual and language models. However, current state-of-the-art computer vision models still lack the ability to perform spatial reasoning well. Existing datasets mostly cover a relatively small number of spatial relations, all of which are static rela… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

    Comments: Submitted to Neurips Dataset track. 24 pages including citations and appendix

  38. arXiv:2309.06580  [pdf

    cs.CL cs.AI

    Can humans help BERT gain "confidence"?

    Authors: Piyush Agrawal

    Abstract: The advancements in artificial intelligence over the last decade have opened a multitude of avenues for interdisciplinary research. Since the idea of artificial intelligence was inspired by the working of neurons in the brain, it seems pretty practical to combine the two fields and take the help of cognitive data to train AI models. Not only it will help to get a deeper understanding of the techno… ▽ More

    Submitted 31 August, 2023; originally announced September 2023.

    Comments: Masters thesis

  39. arXiv:2308.08362  [pdf, other

    cs.CY

    Functional Consistency across Retail Central Bank Digital Currency and Commercial Bank Money

    Authors: Lee Braine, Shreepad Shukla, Piyush Agrawal

    Abstract: Central banks are actively exploring central bank digital currencies (CBDCs) by conducting research, proofs of concept and pilots. However, adoption of a retail CBDC can risk fragmenting both payments markets and retail deposits if the retail CBDC and commercial bank money do not have common operational characteristics. In this paper, we focus on a potential UK retail CBDC, the 'digital pound', an… ▽ More

    Submitted 16 August, 2023; originally announced August 2023.

    Comments: 24 pages, 3 figures, 3 tables

  40. arXiv:2308.08270  [pdf, other

    cs.DC cs.PF

    Towards Benchmarking Power-Performance Characteristics of Federated Learning Clients

    Authors: Pratik Agrawal, Philipp Wiesner, Odej Kao

    Abstract: Federated Learning (FL) is a decentralized machine learning approach where local models are trained on distributed clients, allowing privacy-preserving collaboration by sharing model updates instead of raw data. However, the added communication overhead and increased training time caused by heterogenous data distributions results in higher energy consumption and carbon emissions for achieving simi… ▽ More

    Submitted 16 August, 2023; originally announced August 2023.

    Comments: Machine Learning and Networking Workshop, NetSys 2023

  41. arXiv:2307.15455  [pdf, other

    cs.CL

    Trie-NLG: Trie Context Augmentation to Improve Personalized Query Auto-Completion for Short and Unseen Prefixes

    Authors: Kaushal Kumar Maurya, Maunendra Sankar Desarkar, Manish Gupta, Puneet Agrawal

    Abstract: Query auto-completion (QAC) aims to suggest plausible completions for a given query prefix. Traditionally, QAC systems have leveraged tries curated from historical query logs to suggest most popular completions. In this context, there are two specific scenarios that are difficult to handle for any QAC system: short prefixes (which are inherently ambiguous) and unseen prefixes. Recently, personaliz… ▽ More

    Submitted 23 October, 2023; v1 submitted 28 July, 2023; originally announced July 2023.

    Comments: ECML-PKDD 2023 (Journal Track)

    Journal ref: Data Mining and Knowledge Discovery (DAMI) 2023

  42. arXiv:2307.12983  [pdf, other

    cs.LG cs.AI cs.RO

    Parallel $Q$-Learning: Scaling Off-policy Reinforcement Learning under Massively Parallel Simulation

    Authors: Zechu Li, Tao Chen, Zhang-Wei Hong, Anurag Ajay, Pulkit Agrawal

    Abstract: Reinforcement learning is time-consuming for complex tasks due to the need for large amounts of training data. Recent advances in GPU-based simulation, such as Isaac Gym, have sped up data collection thousands of times on a commodity GPU. Most prior works used on-policy methods like PPO due to their simplicity and ease of scaling. Off-policy methods are more data efficient but challenging to scale… ▽ More

    Submitted 24 July, 2023; originally announced July 2023.

    Comments: Accepted by ICML 2023

  43. arXiv:2307.11049  [pdf, other

    cs.LG cs.AI cs.RO

    Breadcrumbs to the Goal: Goal-Conditioned Exploration from Human-in-the-Loop Feedback

    Authors: Marcel Torne, Max Balsells, Zihan Wang, Samedh Desai, Tao Chen, Pulkit Agrawal, Abhishek Gupta

    Abstract: Exploration and reward specification are fundamental and intertwined challenges for reinforcement learning. Solving sequential decision-making tasks requiring expansive exploration requires either careful design of reward functions or the use of novelty-seeking exploration bonuses. Human supervisors can provide effective guidance in the loop to direct the exploration process, but prior methods to… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.

  44. arXiv:2307.06333  [pdf, other

    cs.LG cs.AI cs.HC cs.RO

    Diagnosis, Feedback, Adaptation: A Human-in-the-Loop Framework for Test-Time Policy Adaptation

    Authors: Andi Peng, Aviv Netanyahu, Mark Ho, Tianmin Shu, Andreea Bobu, Julie Shah, Pulkit Agrawal

    Abstract: Policies often fail due to distribution shift -- changes in the state and reward that occur when a policy is deployed in new environments. Data augmentation can increase robustness by making the model invariant to task-irrelevant changes in the agent's observation. However, designers don't know which concepts are irrelevant a priori, especially when different end users have different preferences a… ▽ More

    Submitted 13 July, 2023; v1 submitted 12 July, 2023; originally announced July 2023.

    Comments: International Conference on Machine Learning (ICML) 2023

  45. arXiv:2307.05793  [pdf, other

    cs.AI cs.RO

    Grid Cell-Inspired Fragmentation and Recall for Efficient Map Building

    Authors: Jaedong Hwang, Zhang-Wei Hong, Eric Chen, Akhilan Boopathy, Pulkit Agrawal, Ila Fiete

    Abstract: Animals and robots navigate through environments by building and refining maps of space. These maps enable functions including navigation back to home, planning, search and foraging. Here, we use observations from neuroscience, specifically the observed fragmentation of grid cell map in compartmentalized spaces, to propose and apply the concept of Fragmentation-and-Recall (FARMap) in the mapping o… ▽ More

    Submitted 8 July, 2024; v1 submitted 11 July, 2023; originally announced July 2023.

    Comments: TMLR (Featured Certification)

  46. arXiv:2307.04751  [pdf, other

    cs.RO cs.CV cs.LG

    Shelving, Stacking, Hanging: Relational Pose Diffusion for Multi-modal Rearrangement

    Authors: Anthony Simeonov, Ankit Goyal, Lucas Manuelli, Lin Yen-Chen, Alina Sarmiento, Alberto Rodriguez, Pulkit Agrawal, Dieter Fox

    Abstract: We propose a system for rearranging objects in a scene to achieve a desired object-scene placing relationship, such as a book inserted in an open slot of a bookshelf. The pipeline generalizes to novel geometries, poses, and layouts of both scenes and objects, and is trained from demonstrations to operate directly on 3D point clouds. Our system overcomes challenges associated with the existence of… ▽ More

    Submitted 10 July, 2023; originally announced July 2023.

    Comments: Project page: https://anthonysimeonov.github.io/rpdiff-multi-modal/

  47. arXiv:2307.03186  [pdf, other

    cs.LG

    TGRL: An Algorithm for Teacher Guided Reinforcement Learning

    Authors: Idan Shenfeld, Zhang-Wei Hong, Aviv Tamar, Pulkit Agrawal

    Abstract: Learning from rewards (i.e., reinforcement learning or RL) and learning to imitate a teacher (i.e., teacher-student learning) are two established approaches for solving sequential decision-making problems. To combine the benefits of these different forms of learning, it is common to train a policy to maximize a combination of reinforcement and teacher-student learning objectives. However, without… ▽ More

    Submitted 19 February, 2024; v1 submitted 6 July, 2023; originally announced July 2023.

    Journal ref: ICML 2023

  48. arXiv:2306.16793  [pdf, other

    cs.CL

    Benchmarking Large Language Model Capabilities for Conditional Generation

    Authors: Joshua Maynez, Priyanka Agrawal, Sebastian Gehrmann

    Abstract: Pre-trained large language models (PLMs) underlie most new developments in natural language processing. They have shifted the field from application-specific model pipelines to a single model that is adapted to a wide range of tasks. Autoregressive PLMs like GPT-3 or PaLM, alongside techniques like few-shot learning, have additionally shifted the output modality to generation instead of classifica… ▽ More

    Submitted 29 June, 2023; originally announced June 2023.

  49. arXiv:2306.13085  [pdf, other

    cs.LG cs.AI

    Harnessing Mixed Offline Reinforcement Learning Datasets via Trajectory Weighting

    Authors: Zhang-Wei Hong, Pulkit Agrawal, Rémi Tachet des Combes, Romain Laroche

    Abstract: Most offline reinforcement learning (RL) algorithms return a target policy maximizing a trade-off between (1) the expected performance gain over the behavior policy that collected the dataset, and (2) the risk stemming from the out-of-distribution-ness of the induced state-action occupancy. It follows that the performance of the target policy is strongly related to the performance of the behavior… ▽ More

    Submitted 22 June, 2023; originally announced June 2023.

    Journal ref: Conference paper at ICLR 2023

  50. arXiv:2305.17250  [pdf, other

    cs.LG cs.AI

    Self-Supervised Reinforcement Learning that Transfers using Random Features

    Authors: Boyuan Chen, Chuning Zhu, Pulkit Agrawal, Kaiqing Zhang, Abhishek Gupta

    Abstract: Model-free reinforcement learning algorithms have exhibited great potential in solving single-task sequential decision-making problems with high-dimensional observations and long horizons, but are known to be hard to generalize across tasks. Model-based RL, on the other hand, learns task-agnostic models of the world that naturally enables transfer across different reward functions, but struggles t… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.