Skip to main content

Showing 1–50 of 73 results for author: Saxena, S

  1. arXiv:2407.07860  [pdf, other

    cs.CV

    Controlling Space and Time with Diffusion Models

    Authors: Daniel Watson, Saurabh Saxena, Lala Li, Andrea Tagliasacchi, David J. Fleet

    Abstract: We present 4DiM, a cascaded diffusion model for 4D novel view synthesis (NVS), conditioned on one or more images of a general scene, and a set of camera poses and timestamps. To overcome challenges due to limited availability of 4D training data, we advocate joint training on 3D (with camera pose), 4D (pose+time) and video (time but no pose) data and propose a new architecture that enables the sam… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  2. arXiv:2407.06345  [pdf, other

    cs.HC cs.CE cs.CY cs.ET

    Multi-person eye tracking for real-world scene perception in social settings

    Authors: Shreshth Saxena, Areez Visram, Neil Lobo, Zahid Mirza, Mehak Rafi Khan, Biranugan Pirabaharan, Alexander Nguyen, Lauren K. Fink

    Abstract: Eye movements provide a window into human behaviour, attention, and interaction dynamics. Previous research suggests that eye movements are highly influenced by task, setting, and social others; however, most eye tracking research is conducted in single-person, in-lab settings and is yet to be validated in multi-person, naturalistic contexts. One such prevalent real-world context is the collective… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: Please refer to the supplementary video illustrating the proposed approach in this paper here: https://tinyurl.com/multipersonET

    ACM Class: I.4.8; J.4; J.5; C.4; D.2.10

  3. arXiv:2405.16616  [pdf, other

    cs.LG cs.SI

    DPHGNN: A Dual Perspective Hypergraph Neural Networks

    Authors: Siddhant Saxena, Shounak Ghatak, Raghu Kolla, Debashis Mukherjee, Tanmoy Chakraborty

    Abstract: Message passing on hypergraphs has been a standard framework for learning higher-order correlations between hypernodes. Recently-proposed hypergraph neural networks (HGNNs) can be categorized into spatial and spectral methods based on their design choices. In this work, we analyze the impact of change in hypergraph topology on the suboptimal performance of HGNNs and propose DPHGNN, a novel dual-pe… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Comments: Accepted in SIGKDD'24 -- Research Track

  4. arXiv:2405.15811  [pdf, other

    cs.DS cs.CG

    Maximizing Weighted Dominance in the Plane

    Authors: Waseem Akram, Sanjeev Saxena

    Abstract: Let P be a set of n weighted points, Q be a set of m unweighted points in the plane, and k a non-negative integer. We consider the problem of computing a subset $Q'\subseteq Q$ with size at most k such that the sum of the weights of the points of P dominated by at least one point in the set Q' is maximized. A point q in the plane dominates another point p if and only if $x(q)\ge x(p)$ and… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  5. arXiv:2405.04333  [pdf

    cs.AI

    A Fourth Wave of Open Data? Exploring the Spectrum of Scenarios for Open Data and Generative AI

    Authors: Hannah Chafetz, Sampriti Saxena, Stefaan G. Verhulst

    Abstract: Since late 2022, generative AI has taken the world by storm, with widespread use of tools including ChatGPT, Gemini, and Claude. Generative AI and large language model (LLM) applications are transforming how individuals find and access data and knowledge. However, the intricate relationship between open data and generative AI, and the vast potential it holds for driving innovation in this field re… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: 58 pages

  6. arXiv:2404.13113  [pdf, other

    quant-ph cs.ET

    Towards quantum computing for clinical trial design and optimization: A perspective on new opportunities and challenges

    Authors: Hakan Doga, M. Emre Sahin, Joao Bettencourt-Silva, Anh Pham, Eunyoung Kim, Alan Andress, Sudhir Saxena, Aritra Bose, Laxmi Parida, Jan Lukas Robertus, Hideaki Kawaguchi, Radwa Soliman, Daniel Blankenberg

    Abstract: Clinical trials are pivotal in the drug discovery process to determine the safety and efficacy of a drug candidate. The high failure rates of these trials are attributed to deficiencies in clinical model development and protocol design. Improvements in the clinical drug design process could therefore yield significant benefits for all stakeholders involved. This paper examines the current challeng… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  7. arXiv:2403.00952  [pdf, other

    cs.CL cs.LG

    MediSwift: Efficient Sparse Pre-trained Biomedical Language Models

    Authors: Vithursan Thangarasa, Mahmoud Salem, Shreyas Saxena, Kevin Leong, Joel Hestness, Sean Lie

    Abstract: Large language models (LLMs) are typically trained on general source data for various domains, but a recent surge in domain-specific LLMs has shown their potential to outperform general-purpose models in domain-specific tasks (e.g., biomedicine). Although domain-specific pre-training enhances efficiency and leads to smaller models, the computational costs of training these LLMs remain high, posing… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

  8. arXiv:2402.12531  [pdf, other

    cs.CV cs.LG

    Improving Deep Generative Models on Many-To-One Image-to-Image Translation

    Authors: Sagar Saxena, Mohammad Nayeem Teli

    Abstract: Deep generative models have been applied to multiple applications in image-to-image translation. Generative Adversarial Networks and Diffusion Models have presented impressive results, setting new state-of-the-art results on these tasks. Most methods have symmetric setups across the different domains in a dataset. These methods assume that all domains have either multiple modalities or only one mo… ▽ More

    Submitted 22 February, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: 11 pages, 6 figures; template format corrected

  9. arXiv:2401.14502  [pdf, other

    cs.RO cs.CV cs.LG

    MResT: Multi-Resolution Sensing for Real-Time Control with Vision-Language Models

    Authors: Saumya Saxena, Mohit Sharma, Oliver Kroemer

    Abstract: Leveraging sensing modalities across diverse spatial and temporal resolutions can improve performance of robotic manipulation tasks. Multi-spatial resolution sensing provides hierarchical information captured at different spatial scales and enables both coarse and precise motions. Simultaneously multi-temporal resolution sensing enables the agent to exhibit high reactivity and real-time control. I… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

    Comments: CoRL'23, Project website: http://tinyurl.com/multi-res-realtime-control

  10. arXiv:2312.13252  [pdf, other

    cs.CV

    Zero-Shot Metric Depth with a Field-of-View Conditioned Diffusion Model

    Authors: Saurabh Saxena, Junhwa Hur, Charles Herrmann, Deqing Sun, David J. Fleet

    Abstract: While methods for monocular depth estimation have made significant strides on standard benchmarks, zero-shot metric depth estimation remains unsolved. Challenges include the joint modeling of indoor and outdoor scenes, which often exhibit significantly different distributions of RGB and depth, and the depth-scale ambiguity due to unknown camera intrinsics. Recent work has proposed specialized mult… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

  11. arXiv:2312.04560  [pdf, other

    cs.CV cs.AI cs.GR

    NeRFiller: Completing Scenes via Generative 3D Inpainting

    Authors: Ethan Weber, Aleksander Hołyński, Varun Jampani, Saurabh Saxena, Noah Snavely, Abhishek Kar, Angjoo Kanazawa

    Abstract: We propose NeRFiller, an approach that completes missing portions of a 3D capture via generative 3D inpainting using off-the-shelf 2D visual generative models. Often parts of a captured 3D scene or object are missing due to mesh reconstruction failures or a lack of observations (e.g., contact regions, such as the bottom of objects, or hard-to-reach areas). We approach this challenging 3D inpaintin… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

    Comments: Project page: https://ethanweber.me/nerfiller

  12. arXiv:2311.13878  [pdf, other

    cs.CL cs.AI

    Minimizing Factual Inconsistency and Hallucination in Large Language Models

    Authors: Muneeswaran I, Shreya Saxena, Siva Prasad, M V Sai Prakash, Advaith Shankar, Varun V, Vishal Vaddina, Saisubramaniam Gopalakrishnan

    Abstract: Large Language Models (LLMs) are widely used in critical fields such as healthcare, education, and finance due to their remarkable proficiency in various language-related tasks. However, LLMs are prone to generating factually incorrect responses or "hallucinations," which can lead to a loss of credibility and trust among users. To address this issue, we propose a multi-stage framework that generat… ▽ More

    Submitted 23 November, 2023; originally announced November 2023.

  13. arXiv:2306.01923  [pdf, other

    cs.CV

    The Surprising Effectiveness of Diffusion Models for Optical Flow and Monocular Depth Estimation

    Authors: Saurabh Saxena, Charles Herrmann, Junhwa Hur, Abhishek Kar, Mohammad Norouzi, Deqing Sun, David J. Fleet

    Abstract: Denoising diffusion probabilistic models have transformed image generation with their impressive fidelity and diversity. We show that they also excel in estimating optical flow and monocular depth, surprisingly, without task-specific architectures and loss functions that are predominant for these tasks. Compared to the point estimates of conventional regression-based methods, diffusion models also… ▽ More

    Submitted 5 December, 2023; v1 submitted 2 June, 2023; originally announced June 2023.

    Comments: NeurIPS 2023 (Oral)

  14. arXiv:2305.07639  [pdf, other

    cs.CV cs.LG

    Efficient Neural Network based Classification and Outlier Detection for Image Moderation using Compressed Sensing and Group Testing

    Authors: Sabyasachi Ghosh, Sanyam Saxena, Ajit Rajwade

    Abstract: Popular social media platforms employ neural network based image moderation engines to classify images uploaded on them as having potentially objectionable content. Such moderation engines must answer a large number of queries with heavy computational cost, even though the actual number of images with objectionable content is usually a tiny fraction. Inspired by recent work on Neural Group Testing… ▽ More

    Submitted 12 May, 2023; originally announced May 2023.

  15. arXiv:2303.16088  [pdf, other

    physics.comp-ph cond-mat.stat-mech cs.LG

    GNN-Assisted Phase Space Integration with Application to Atomistics

    Authors: Shashank Saxena, Jan-Hendrik Bastek, Miguel Spinola, Prateek Gupta, Dennis M. Kochmann

    Abstract: Overcoming the time scale limitations of atomistics can be achieved by switching from the state-space representation of Molecular Dynamics (MD) to a statistical-mechanics-based representation in phase space, where approximations such as maximum-entropy or Gaussian phase packets (GPP) evolve the atomistic ensemble in a time-coarsened fashion. In practice, this requires the computation of expensive… ▽ More

    Submitted 20 March, 2023; originally announced March 2023.

  16. arXiv:2303.11525  [pdf, other

    cs.LG cs.CL cs.CV

    Sparse-IFT: Sparse Iso-FLOP Transformations for Maximizing Training Efficiency

    Authors: Vithursan Thangarasa, Shreyas Saxena, Abhay Gupta, Sean Lie

    Abstract: Recent research has focused on weight sparsity in deep neural network training to reduce FLOPs, aiming for improved efficiency (test accuracy w.r.t training FLOPs). However, sparse weight training often compromises accuracy, requiring extended training schedules to attain the accuracy of dense models. In contrast, our approach, Sparse Iso-FLOP Transformations (Sparse-IFT), uses sparsity to improve… ▽ More

    Submitted 17 July, 2024; v1 submitted 20 March, 2023; originally announced March 2023.

    Comments: 14 pages, 5 figures, 6 Tables (Main Paper) + 8 pages (Supplementary Material). Published at ICML 2024

  17. arXiv:2303.10464  [pdf, other

    cs.LG cs.CL

    SPDF: Sparse Pre-training and Dense Fine-tuning for Large Language Models

    Authors: Vithursan Thangarasa, Abhay Gupta, William Marshall, Tianda Li, Kevin Leong, Dennis DeCoste, Sean Lie, Shreyas Saxena

    Abstract: The pre-training and fine-tuning paradigm has contributed to a number of breakthroughs in Natural Language Processing (NLP). Instead of directly training on a downstream task, language models are first pre-trained on large datasets with cross-domain knowledge (e.g., Pile, MassiveText, etc.) and then fine-tuned on task-specific data (e.g., natural language generation, text summarization, etc.). Sca… ▽ More

    Submitted 29 July, 2023; v1 submitted 18 March, 2023; originally announced March 2023.

    Comments: Accepted to Uncertainty in Artificial Intelligence (UAI) 2023 Conference; 13 pages, 4 figures (Main Paper) + 5 pages (Supplementary Material)

  18. arXiv:2302.14816  [pdf, other

    cs.CV

    Monocular Depth Estimation using Diffusion Models

    Authors: Saurabh Saxena, Abhishek Kar, Mohammad Norouzi, David J. Fleet

    Abstract: We formulate monocular depth estimation using denoising diffusion models, inspired by their recent successes in high fidelity image generation. To that end, we introduce innovations to address problems arising due to noisy, incomplete depth maps in training data, including step-unrolled denoising diffusion, an $L_1$ loss, and depth infilling during training. To cope with the limited availability o… ▽ More

    Submitted 28 February, 2023; originally announced February 2023.

  19. arXiv:2302.11821  [pdf, ps, other

    cs.CG cs.DS

    Storage in Computational Geometry

    Authors: Yijie Han, Sanjeev Saxena

    Abstract: We show that $n$ real numbers can be stored in a constant number of real numbers such that each original real number can be fetched in $O(\log n)$ time. Although our result has implications for many computational geometry problems, we show here, combined with Han's $O(n\sqrt{\log n})$ time real number sorting algorithm [3, arXiv:1801.00776], we can improve the complexity of Kirkpatrick's point l… ▽ More

    Submitted 23 February, 2023; originally announced February 2023.

    Comments: This is an interesting result, especially when read together with paper [3]

  20. Large-Scale Knowledge Synthesis and Complex Information Retrieval from Biomedical Documents

    Authors: Shreya Saxena, Raj Sangani, Siva Prasad, Shubham Kumar, Mihir Athale, Rohan Awhad, Vishal Vaddina

    Abstract: Recent advances in the healthcare industry have led to an abundance of unstructured data, making it challenging to perform tasks such as efficient and accurate information retrieval at scale. Our work offers an all-in-one scalable solution for extracting and exploring complex information from large-scale research documents, which would otherwise be tedious. First, we briefly explain our knowledge… ▽ More

    Submitted 14 February, 2023; originally announced February 2023.

  21. arXiv:2212.10247  [pdf, other

    cs.DS cs.CG

    Dominance for Containment Problems

    Authors: Waseem Akram, Sanjeev Saxena

    Abstract: In a containment problem, the goal is to preprocess a set of geometric objects so that, given a geometric query object, we can report all the objects containing the query object. We consider the containment problem where input objects are homothetic triangles and the query objects considered are line segments, circles, and trapezoids with bases parallel to either axis. We show that this problem ca… ▽ More

    Submitted 20 December, 2022; originally announced December 2022.

  22. arXiv:2210.06366  [pdf, other

    cs.CV cs.AI cs.LG cs.MM

    A Generalist Framework for Panoptic Segmentation of Images and Videos

    Authors: Ting Chen, Lala Li, Saurabh Saxena, Geoffrey Hinton, David J. Fleet

    Abstract: Panoptic segmentation assigns semantic and instance ID labels to every pixel of an image. As permutations of instance IDs are also valid solutions, the task requires learning of high-dimensional one-to-many mapping. As a result, state-of-the-art approaches use customized architectures and task-specific loss functions. We formulate panoptic segmentation as a discrete data generation problem, withou… ▽ More

    Submitted 12 October, 2023; v1 submitted 12 October, 2022; originally announced October 2022.

    Comments: ICCV'23. Code at https://github.com/google-research/pix2seq

  23. arXiv:2209.15132  [pdf, other

    cs.RO

    Dynamic Inference on Graphs using Structured Transition Models

    Authors: Saumya Saxena, Oliver Kroemer

    Abstract: Enabling robots to perform complex dynamic tasks such as picking up an object in one sweeping motion or pushing off a wall to quickly turn a corner is a challenging problem. The dynamic interactions implicit in these tasks are critical towards the successful execution of such tasks. Graph neural networks (GNNs) provide a principled way of learning the dynamics of interactive systems but can suffer… ▽ More

    Submitted 29 September, 2022; originally announced September 2022.

  24. arXiv:2208.02186  [pdf, ps, other

    cs.DM math.CO

    On Brooks' Theorem

    Authors: Gopalan Sajith, Sanjeev Saxena

    Abstract: In this note we give two proofs of Brooks' Theorem. The first is obtained by modifying an earlier proof and the second by combining two earlier proofs. We believe these proofs are easier to teach in Computer Science courses.

    Submitted 3 August, 2022; originally announced August 2022.

    Comments: 5 pages

    MSC Class: 05C15 ACM Class: G.2.2

  25. arXiv:2207.11954  [pdf, ps, other

    cs.DS

    Simpler O(1) Query Algorithm for Level Ancestors

    Authors: Sanjeev Saxena

    Abstract: This note describes a very simple O(1) query time algorithm for finding level ancestors. This is basically a serial (re)-implementation of the parallel algorithm of Berkman and Vishkin (O.Berkman and U.Vishkin, Finding level-ancestors in trees, JCSS, 48, 214--230, 1994). Although the basic algorithm has preprocessing time of O(n log n), by having additional levels or using table lookup, the prep… ▽ More

    Submitted 29 July, 2024; v1 submitted 25 July, 2022; originally announced July 2022.

  26. arXiv:2207.02419  [pdf, other

    cs.CL cs.AI cs.LG

    BioTABQA: Instruction Learning for Biomedical Table Question Answering

    Authors: Man Luo, Sharad Saxena, Swaroop Mishra, Mihir Parmar, Chitta Baral

    Abstract: Table Question Answering (TQA) is an important but under-explored task. Most of the existing QA datasets are in unstructured text format and only few of them use tables as the context. To the best of our knowledge, none of TQA datasets exist in the biomedical domain where tables are frequently used to present information. In this paper, we first curate a table question answering dataset, BioTABQA,… ▽ More

    Submitted 5 July, 2022; originally announced July 2022.

    Comments: BioASQ10 Workshop

  27. arXiv:2206.07669  [pdf, other

    cs.CV cs.CL cs.LG

    A Unified Sequence Interface for Vision Tasks

    Authors: Ting Chen, Saurabh Saxena, Lala Li, Tsung-Yi Lin, David J. Fleet, Geoffrey Hinton

    Abstract: While language tasks are naturally expressed in a single, unified, modeling framework, i.e., generating sequences of tokens, this has not been the case in computer vision. As a result, there is a proliferation of distinct architectures and loss functions for different vision tasks. In this work we show that a diverse set of "core" computer vision tasks can also be unified if formulated in terms of… ▽ More

    Submitted 15 October, 2022; v1 submitted 15 June, 2022; originally announced June 2022.

    Comments: The first three authors contributed equally

  28. arXiv:2206.02518  [pdf

    cs.CE

    A Model for Predicting Ignition Potential of Complex Fuel in Diurnally Variable Environment

    Authors: Saurabh Saxena, Ritambhara Dubey, Neda Yaghoobian

    Abstract: Fuel ignition potential is one of the primary drivers influencing the extent of damage in wildland and wildland-urban interface fires. Determining fire and ember exposure of fuels that vary spatially and temporally will help to recognize necessary defensive actions and reduce damages. In this paper, the development of a new computational model, Temperature And Moisture Evolution predictor for comp… ▽ More

    Submitted 16 January, 2023; v1 submitted 8 May, 2022; originally announced June 2022.

  29. arXiv:2205.11487  [pdf, other

    cs.CV cs.LG

    Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding

    Authors: Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S. Sara Mahdavi, Rapha Gontijo Lopes, Tim Salimans, Jonathan Ho, David J Fleet, Mohammad Norouzi

    Abstract: We present Imagen, a text-to-image diffusion model with an unprecedented degree of photorealism and a deep level of language understanding. Imagen builds on the power of large transformer language models in understanding text and hinges on the strength of diffusion models in high-fidelity image generation. Our key discovery is that generic large language models (e.g. T5), pretrained on text-only c… ▽ More

    Submitted 23 May, 2022; originally announced May 2022.

  30. arXiv:2112.12625  [pdf, other

    cs.CV

    Comparison and Analysis of Image-to-Image Generative Adversarial Networks: A Survey

    Authors: Sagar Saxena, Mohammad Nayeem Teli

    Abstract: Generative Adversarial Networks (GANs) have recently introduced effective methods of performing Image-to-Image translations. These models can be applied and generalized to a variety of domains in Image-to-Image translation without changing any parameters. In this paper, we survey and analyze eight Image-to-Image Generative Adversarial Networks: Pix2Pix, CycleGAN, CoGAN, StarGAN, MUNIT, StarGAN2, D… ▽ More

    Submitted 26 August, 2022; v1 submitted 23 December, 2021; originally announced December 2021.

    Comments: 36 pages, 22 figures, Preprint; format changed, typos corrected

  31. arXiv:2112.02530  [pdf, other

    cs.IR

    Exploring and Mitigating Gender Bias in Recommender Systems with Explicit Feedback

    Authors: Shrikant Saxena, Shweta Jain

    Abstract: Recommender systems are indispensable because they influence our day-to-day behavior and decisions by giving us personalized suggestions. Services like Kindle, Youtube, and Netflix depend heavily on the performance of their recommender systems to ensure that their users have a good experience and to increase revenues. Despite their popularity, it has been shown that recommender systems reproduce a… ▽ More

    Submitted 5 December, 2021; originally announced December 2021.

    Comments: 19 pages, 13 figures

  32. Point Enclosure Problem for Homothetic Polygons

    Authors: Waseem Akram, Sanjeev Saxena

    Abstract: In this paper, we investigate the homothetic point enclosure problem: given a set $S$ of $n$ triangles with sides parallel to three fixed directions, find a data structure for $S$ that can report all the triangles of $S$ that contain a query point efficiently. The problem is "inverse" of the homothetic range search problem. We present an $O(n\log n)$ space solution that supports the queries in… ▽ More

    Submitted 3 December, 2021; originally announced December 2021.

    Journal ref: In: Combinatorial Algorithms. IWOCA 2023. Lecture Notes in Computer Science, vol 13889

  33. arXiv:2109.10852  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Pix2seq: A Language Modeling Framework for Object Detection

    Authors: Ting Chen, Saurabh Saxena, Lala Li, David J. Fleet, Geoffrey Hinton

    Abstract: We present Pix2Seq, a simple and generic framework for object detection. Unlike existing approaches that explicitly integrate prior knowledge about the task, we cast object detection as a language modeling task conditioned on the observed pixel inputs. Object descriptions (e.g., bounding boxes and class labels) are expressed as sequences of discrete tokens, and we train a neural network to perceiv… ▽ More

    Submitted 27 March, 2022; v1 submitted 22 September, 2021; originally announced September 2021.

    Comments: ICLR'22. Code and pretrained models at https://github.com/google-research/pix2seq

  34. arXiv:2109.08771  [pdf, other

    cs.RO

    Search-Based Task Planning with Learned Skill Effect Models for Lifelong Robotic Manipulation

    Authors: Jacky Liang, Mohit Sharma, Alex LaGrassa, Shivam Vats, Saumya Saxena, Oliver Kroemer

    Abstract: Robots deployed in many real-world settings need to be able to acquire new skills and solve new tasks over time. Prior works on planning with skills often make assumptions on the structure of skills and tasks, such as subgoal skills, shared skill implementations, or task-specific plan skeletons, which limit adaptation to new skills and tasks. By contrast, we propose doing task planning by jointly… ▽ More

    Submitted 13 April, 2022; v1 submitted 17 September, 2021; originally announced September 2021.

    Comments: To appear in the International Conference on Robotics and Automation (ICRA) 2022

  35. arXiv:2108.11554  [pdf, other

    cs.CV cs.AI

    XCI-Sketch: Extraction of Color Information from Images for Generation of Colored Outlines and Sketches

    Authors: V Manushree, Sameer Saxena, Parna Chowdhury, Manisimha Varma, Harsh Rathod, Ankita Ghosh, Sahil Khose

    Abstract: Sketches are a medium to convey a visual scene from an individual's creative perspective. The addition of color substantially enhances the overall expressivity of a sketch. This paper proposes two methods to mimic human-drawn colored sketches by utilizing the Contour Drawing Dataset. Our first approach renders colored outline sketches by applying image processing techniques aided by k-means color… ▽ More

    Submitted 7 January, 2022; v1 submitted 25 August, 2021; originally announced August 2021.

    Comments: ML for Creativity and Design workshop at NeurIPS 2021

  36. arXiv:2106.06129  [pdf, other

    cs.CV cs.LG

    Instance-Level Task Parameters: A Robust Multi-task Weighting Framework

    Authors: Pavan Kumar Anasosalu Vasu, Shreyas Saxena, Oncel Tuzel

    Abstract: Recent works have shown that deep neural networks benefit from multi-task learning by learning a shared representation across several related tasks. However, performance of such systems depend on relative weighting between various losses involved during training. Prior works on loss weighting schemes assume that instances are equally easy or hard for all tasks. In order to break this assumption, w… ▽ More

    Submitted 10 June, 2021; originally announced June 2021.

  37. arXiv:2105.13464  [pdf, other

    cs.LG cs.AI cs.CV

    Training With Data Dependent Dynamic Learning Rates

    Authors: Shreyas Saxena, Nidhi Vyas, Dennis DeCoste

    Abstract: Recently many first and second order variants of SGD have been proposed to facilitate training of Deep Neural Networks (DNNs). A common limitation of these works stem from the fact that they use the same learning rate across all instances present in the dataset. This setting is widely adopted under the assumption that loss functions for each instance are similar in nature, and hence, a common lear… ▽ More

    Submitted 27 May, 2021; originally announced May 2021.

  38. arXiv:2104.02461  [pdf, ps, other

    cs.DS

    Sorted Range Reporting and Range Minima Queries

    Authors: Waseem Akram, Sanjeev Saxena

    Abstract: Given an array A[1: n] of n elements drawn from an ordered set, the sorted range selection problem is to build a data structure that can be used to answer the following type of queries efficiently: Given a pair of indices i, j $ (1\le i\le j \le n)$, and a positive integer k, report the k smallest elements from the sub-array A[i: j] in order. Brodal et al. (Brodal, G.S., Fagerberg, R., Greve, M.,… ▽ More

    Submitted 19 September, 2023; v1 submitted 6 April, 2021; originally announced April 2021.

  39. arXiv:2103.14256  [pdf, other

    cs.RO cs.AI cs.LG

    Learning Reactive and Predictive Differentiable Controllers for Switching Linear Dynamical Models

    Authors: Saumya Saxena, Alex LaGrassa, Oliver Kroemer

    Abstract: Humans leverage the dynamics of the environment and their own bodies to accomplish challenging tasks such as grasping an object while walking past it or pushing off a wall to turn a corner. Such tasks often involve switching dynamics as the robot makes and breaks contact. Learning these dynamics is a challenging problem and prone to model inaccuracies, especially near contact regions. In this work… ▽ More

    Submitted 26 March, 2021; originally announced March 2021.

  40. arXiv:2103.11596  [pdf

    cs.CL

    Monolingual and Parallel Corpora for Kangri Low Resource Language

    Authors: Shweta Chauhan, Shefali Saxena, Philemon Daniel

    Abstract: In this paper we present the dataset of Himachali low resource endangered language, Kangri (ISO 639-3xnr) listed in the United Nations Educational, Scientific and Cultural Organization (UNESCO). The compilation of kangri corpus has been a challenging task due to the non-availability of the digitalized resources. The corpus contains 1,81,552 Monolingual and 27,362 Hindi-Kangri Parallel corpora. We… ▽ More

    Submitted 22 March, 2021; originally announced March 2021.

    Comments: 7 pages, 6 Tables, 1 Figure

  41. arXiv:2102.09666  [pdf, other

    eess.AS cs.AI cs.LG cs.SD

    Dynamic curriculum learning via data parameters for noise robust keyword spotting

    Authors: Takuya Higuchi, Shreyas Saxena, Mehrez Souden, Tien Dung Tran, Masood Delfarah, Chandra Dhir

    Abstract: We propose dynamic curriculum learning via data parameters for noise robust keyword spotting. Data parameter learning has recently been introduced for image processing, where weight parameters, so-called data parameters, for target classes and instances are introduced and optimized along with model parameters. The data parameters scale logits and control importance over classes and instances durin… ▽ More

    Submitted 18 February, 2021; originally announced February 2021.

    Comments: Accepted at ICASSP 2021

  42. arXiv:2102.00837  [pdf, other

    cs.LG

    Machine learning pipeline for battery state of health estimation

    Authors: Darius Roman, Saurabh Saxena, Valentin Robu, Michael Pecht, David Flynn

    Abstract: Lithium-ion batteries are ubiquitous in modern day applications ranging from portable electronics to electric vehicles. Irrespective of the application, reliable real-time estimation of battery state of health (SOH) by on-board computers is crucial to the safe operation of the battery, ultimately safeguarding asset integrity. In this paper, we design and evaluate a machine learning pipeline for es… ▽ More

    Submitted 1 February, 2021; originally announced February 2021.

    Comments: Peer review, pre-print to be published in Nature Machine Intelligence - 32 pages and 24 figures (including supplementary material)

    ACM Class: C.4; I.5.1; I.2.6

  43. arXiv:2009.09496  [pdf, other

    cs.LG cs.CV stat.ML

    Learning Soft Labels via Meta Learning

    Authors: Nidhi Vyas, Shreyas Saxena, Thomas Voice

    Abstract: One-hot labels do not represent soft decision boundaries among concepts, and hence, models trained on them are prone to overfitting. Using soft labels as targets provide regularization, but different soft labels might be optimal at different stages of optimization. Also, training with fixed labels in the presence of noisy annotations leads to worse generalization. To address these limitations, we… ▽ More

    Submitted 20 September, 2020; originally announced September 2020.

  44. arXiv:2008.06860  [pdf, ps, other

    cs.CL cs.CR cs.LG

    TextDecepter: Hard Label Black Box Attack on Text Classifiers

    Authors: Sachin Saxena

    Abstract: Machine learning has been proven to be susceptible to carefully crafted samples, known as adversarial examples. The generation of these adversarial examples helps to make the models more robust and gives us an insight into the underlying decision-making of these models. Over the years, researchers have successfully attacked image classifiers in both, white and black-box settings. However, these me… ▽ More

    Submitted 27 December, 2020; v1 submitted 16 August, 2020; originally announced August 2020.

    Comments: 10 pages, 11 tables

  45. arXiv:2008.05844  [pdf, ps, other

    cs.DS

    On seat allocation problem with multiple merit lists

    Authors: Rahul Kumar Singh, Sanjeev Saxena

    Abstract: In this note, we present a simpler algorithm for joint seat allocation problem in case there are two or more merit lists. In case of two lists (the current situation for Engineering seats in India), the running time of the algorithm is proportional to sum of running time for two separate (delinked) allocations. The algorithm is straight forward and natural and is not (at least directly) based on d… ▽ More

    Submitted 13 August, 2020; originally announced August 2020.

  46. arXiv:2006.04859  [pdf, other

    cs.CV cs.RO

    Novel Perception Algorithmic Framework For Object Identification and Tracking In Autonomous Navigation

    Authors: Suryansh Saxena, Isaac K Isukapati

    Abstract: This paper introduces a novel perception framework that has the ability to identify and track objects in autonomous vehicle's field of view. The proposed algorithms don't require any training for achieving this goal. The framework makes use of ego-vehicle's pose estimation and a KD-Tree-based segmentation algorithm to generate object clusters. In turn, using a VFH technique, the geometry of each i… ▽ More

    Submitted 8 June, 2020; originally announced June 2020.

  47. arXiv:2006.01952  [pdf, other

    cs.RO

    Learning Active Task-Oriented Exploration Policies for Bridging the Sim-to-Real Gap

    Authors: Jacky Liang, Saumya Saxena, Oliver Kroemer

    Abstract: Training robotic policies in simulation suffers from the sim-to-real gap, as simulated dynamics can be different from real-world dynamics. Past works tackled this problem through domain randomization and online system-identification. The former is sensitive to the manually-specified training distribution of dynamics parameters and can result in behaviors that are overly conservative. The latter re… ▽ More

    Submitted 5 November, 2020; v1 submitted 2 June, 2020; originally announced June 2020.

    Comments: Published at Robotics: Science and Systems 2020

  48. arXiv:2006.01428  [pdf, ps, other

    cs.CG cs.DS math.CO

    Zone Theorem for Arrangements in three dimensions

    Authors: Sanjeev Saxena

    Abstract: In this note, a simple description of zone theorem in three dimensions is given.

    Submitted 2 June, 2020; originally announced June 2020.

    Journal ref: Information Processing Letters Volume 172, December 2021, 106161

  49. arXiv:2004.07437  [pdf, ps, other

    cs.CL cs.LG

    Non-Autoregressive Machine Translation with Latent Alignments

    Authors: Chitwan Saharia, William Chan, Saurabh Saxena, Mohammad Norouzi

    Abstract: This paper presents two strong methods, CTC and Imputer, for non-autoregressive machine translation that model latent alignments with dynamic programming. We revisit CTC for machine translation and demonstrate that a simple CTC model can achieve state-of-the-art for single-step non-autoregressive machine translation, contrary to what prior work indicates. In addition, we adapt the Imputer model fo… ▽ More

    Submitted 16 November, 2020; v1 submitted 15 April, 2020; originally announced April 2020.

  50. arXiv:2003.12602  [pdf, other

    cs.CV eess.IV

    Source Printer Identification from Document Images Acquired using Smartphone

    Authors: Sharad Joshi, Suraj Saxena, Nitin Khanna

    Abstract: Vast volumes of printed documents continue to be used for various important as well as trivial applications. Such applications often rely on the information provided in the form of printed text documents whose integrity verification poses a challenge due to time constraints and lack of resources. Source printer identification provides essential information about the origin and integrity of a print… ▽ More

    Submitted 27 March, 2020; originally announced March 2020.

    Comments: 10 pages