-
Time Synchronization of TESLA-enabled GNSS Receivers
Authors:
Jason Anderson,
Sherman Lo,
Todd Walter
Abstract:
As TESLA-enabled GNSS for authenticated positioning reaches ubiquity, receivers must use an onboard, GNSS-independent clock and carefully constructed time synchronization algorithms to assert the authenticity afforded. This work provides the necessary checks and synchronization protocols needed in the broadcast-only GNSS context. We provide proof of security for each of our algorithms under a dela…
▽ More
As TESLA-enabled GNSS for authenticated positioning reaches ubiquity, receivers must use an onboard, GNSS-independent clock and carefully constructed time synchronization algorithms to assert the authenticity afforded. This work provides the necessary checks and synchronization protocols needed in the broadcast-only GNSS context. We provide proof of security for each of our algorithms under a delay-capable adversary. The algorithms included herein enable a GNSS receiver to use its onboard, GNSS-independent clock to determine whether a message arrived at the correct time, to determine whether its onboard, GNSS-independent clock is safe to use and when the clock will no longer be safe in the future due to predicted clock drift, and to resynchronize its onboard, GNSS-independent clock. Each algorithm is safe to use even when an adversary induces delays within the protocol. Moreover, we discuss the implications of GNSS authentication schemes that use two simultaneous TESLA instances of different authentication cadences. To a receiver implementer or standards author, this work provides the necessary implementation algorithms to assert security and provides a comprehensive guide on why these methods are required.
△ Less
Submitted 18 July, 2024;
originally announced July 2024.
-
Uncovering Layer-Dependent Activation Sparsity Patterns in ReLU Transformers
Authors:
Cody Wild,
Jesper Anderson
Abstract:
Previous work has demonstrated that MLPs within ReLU Transformers exhibit high levels of sparsity, with many of their activations equal to zero for any given token. We build on that work to more deeply explore how token-level sparsity evolves over the course of training, and how it connects to broader sparsity patterns over the course of a sequence or batch, demonstrating that the different layers…
▽ More
Previous work has demonstrated that MLPs within ReLU Transformers exhibit high levels of sparsity, with many of their activations equal to zero for any given token. We build on that work to more deeply explore how token-level sparsity evolves over the course of training, and how it connects to broader sparsity patterns over the course of a sequence or batch, demonstrating that the different layers within small transformers exhibit distinctly layer-specific patterns on both of these fronts. In particular, we demonstrate that the first and last layer of the network have distinctive and in many ways inverted relationships to sparsity, and explore implications for the structure of feature representations being learned at different depths of the model. We additionally explore the phenomenon of ReLU dimensions "turning off", and show evidence suggesting that "neuron death" is being primarily driven by the dynamics of training, rather than simply occurring randomly or accidentally as a result of outliers.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Regret Analysis of Multi-task Representation Learning for Linear-Quadratic Adaptive Control
Authors:
Bruce D. Lee,
Leonardo F. Toso,
Thomas T. Zhang,
James Anderson,
Nikolai Matni
Abstract:
Representation learning is a powerful tool that enables learning over large multitudes of agents or domains by enforcing that all agents operate on a shared set of learned features. However, many robotics or controls applications that would benefit from collaboration operate in settings with changing environments and goals, whereas most guarantees for representation learning are stated for static…
▽ More
Representation learning is a powerful tool that enables learning over large multitudes of agents or domains by enforcing that all agents operate on a shared set of learned features. However, many robotics or controls applications that would benefit from collaboration operate in settings with changing environments and goals, whereas most guarantees for representation learning are stated for static settings. Toward rigorously establishing the benefit of representation learning in dynamic settings, we analyze the regret of multi-task representation learning for linear-quadratic control. This setting introduces unique challenges. Firstly, we must account for and balance the $\textit{misspecification}$ introduced by an approximate representation. Secondly, we cannot rely on the parameter update schemes of single-task online LQR, for which least-squares often suffices, and must devise a novel scheme to ensure sufficient improvement. We demonstrate that for settings where exploration is "benign", the regret of any agent after $T$ timesteps scales as $\tilde O(\sqrt{T/H})$, where $H$ is the number of agents. In settings with "difficult" exploration, the regret scales as $\tilde O(\sqrt{d_u d_θ} \sqrt{T} + T^{3/4}/H^{1/5})$, where $d_x$ is the state-space dimension, $d_u$ is the input dimension, and $d_θ$ is the task-specific parameter count. In both cases, by comparing to the minimax single-task regret $O(\sqrt{d_x d_u^2}\sqrt{T})$, we see a benefit of a large number of agents. Notably, in the difficult exploration case, by sharing a representation across tasks, the effective task-specific parameter count can often be small $d_θ< d_x d_u$. Lastly, we provide numerical validation of the trends we predict.
△ Less
Submitted 27 July, 2024; v1 submitted 8 July, 2024;
originally announced July 2024.
-
SoK: Web Authentication in the Age of End-to-End Encryption
Authors:
Jenny Blessing,
Daniel Hugenroth,
Ross J. Anderson,
Alastair R. Beresford
Abstract:
The advent of end-to-end encrypted (E2EE) messaging and backup services has brought new challenges for usable authentication. Compared to regular web services, the nature of E2EE implies that the provider cannot recover data for users who have forgotten passwords or lost devices. Therefore, new forms of robustness and recoverability are required, leading to a plethora of solutions ranging from ran…
▽ More
The advent of end-to-end encrypted (E2EE) messaging and backup services has brought new challenges for usable authentication. Compared to regular web services, the nature of E2EE implies that the provider cannot recover data for users who have forgotten passwords or lost devices. Therefore, new forms of robustness and recoverability are required, leading to a plethora of solutions ranging from randomly-generated recovery codes to threshold-based social verification. These implications also spread to new forms of authentication and legacy web services: passwordless authentication ("passkeys") has become a promising candidate to replace passwords altogether, but are inherently device-bound. However, users expect that they can login from multiple devices and recover their passwords in case of device loss--prompting providers to sync credentials to cloud storage using E2EE, resulting in the very same authentication challenges of regular E2EE services. Hence, E2EE authentication quickly becomes relevant not only for a niche group of dedicated E2EE enthusiasts but for the general public using the passwordless authentication techniques promoted by their device vendors. In this paper we systematize existing research literature and industry practice relating to security, privacy, usability, and recoverability of E2EE authentication. We investigate authentication and recovery schemes in all widely-used E2EE web services and survey passwordless authentication deployment in the top-200 most popular websites. Finally, we present concrete research directions based on observed gaps between industry deployment and academic literature.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Momentum for the Win: Collaborative Federated Reinforcement Learning across Heterogeneous Environments
Authors:
Han Wang,
Sihong He,
Zhili Zhang,
Fei Miao,
James Anderson
Abstract:
We explore a Federated Reinforcement Learning (FRL) problem where $N$ agents collaboratively learn a common policy without sharing their trajectory data. To date, existing FRL work has primarily focused on agents operating in the same or ``similar" environments. In contrast, our problem setup allows for arbitrarily large levels of environment heterogeneity. To obtain the optimal policy which maxim…
▽ More
We explore a Federated Reinforcement Learning (FRL) problem where $N$ agents collaboratively learn a common policy without sharing their trajectory data. To date, existing FRL work has primarily focused on agents operating in the same or ``similar" environments. In contrast, our problem setup allows for arbitrarily large levels of environment heterogeneity. To obtain the optimal policy which maximizes the average performance across all potentially completely different environments, we propose two algorithms: FedSVRPG-M and FedHAPG-M. In contrast to existing results, we demonstrate that both FedSVRPG-M and FedHAPG-M, both of which leverage momentum mechanisms, can exactly converge to a stationary point of the average performance function, regardless of the magnitude of environment heterogeneity. Furthermore, by incorporating the benefits of variance-reduction techniques or Hessian approximation, both algorithms achieve state-of-the-art convergence results, characterized by a sample complexity of $\mathcal{O}\left(ε^{-\frac{3}{2}}/N\right)$. Notably, our algorithms enjoy linear convergence speedups with respect to the number of agents, highlighting the benefit of collaboration among agents in finding a common policy.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Data-Efficient and Robust Task Selection for Meta-Learning
Authors:
Donglin Zhan,
James Anderson
Abstract:
Meta-learning methods typically learn tasks under the assumption that all tasks are equally important. However, this assumption is often not valid. In real-world applications, tasks can vary both in their importance during different training stages and in whether they contain noisy labeled data or not, making a uniform approach suboptimal. To address these issues, we propose the Data-Efficient and…
▽ More
Meta-learning methods typically learn tasks under the assumption that all tasks are equally important. However, this assumption is often not valid. In real-world applications, tasks can vary both in their importance during different training stages and in whether they contain noisy labeled data or not, making a uniform approach suboptimal. To address these issues, we propose the Data-Efficient and Robust Task Selection (DERTS) algorithm, which can be incorporated into both gradient and metric-based meta-learning algorithms. DERTS selects weighted subsets of tasks from task pools by minimizing the approximation error of the full gradient of task pools in the meta-training stage. The selected tasks are efficient for rapid training and robust towards noisy label scenarios. Unlike existing algorithms, DERTS does not require any architecture modification for training and can handle noisy label data in both the support and query sets. Analysis of DERTS shows that the algorithm follows similar training dynamics as learning on the full task pools. Experiments show that DERTS outperforms existing sampling strategies for meta-learning on both gradient-based and metric-based meta-learning algorithms in limited data budget and noisy task settings.
△ Less
Submitted 11 May, 2024;
originally announced May 2024.
-
Capabilities of Gemini Models in Medicine
Authors:
Khaled Saab,
Tao Tu,
Wei-Hung Weng,
Ryutaro Tanno,
David Stutz,
Ellery Wulczyn,
Fan Zhang,
Tim Strother,
Chunjong Park,
Elahe Vedadi,
Juanma Zambrano Chaves,
Szu-Yeu Hu,
Mike Schaekermann,
Aishwarya Kamath,
Yong Cheng,
David G. T. Barrett,
Cathy Cheung,
Basil Mustafa,
Anil Palepu,
Daniel McDuff,
Le Hou,
Tomer Golany,
Luyang Liu,
Jean-baptiste Alayrac,
Neil Houlsby
, et al. (42 additional authors not shown)
Abstract:
Excellence in a wide variety of medical applications poses considerable challenges for AI, requiring advanced reasoning, access to up-to-date medical knowledge and understanding of complex multimodal data. Gemini models, with strong general capabilities in multimodal and long-context reasoning, offer exciting possibilities in medicine. Building on these core strengths of Gemini, we introduce Med-G…
▽ More
Excellence in a wide variety of medical applications poses considerable challenges for AI, requiring advanced reasoning, access to up-to-date medical knowledge and understanding of complex multimodal data. Gemini models, with strong general capabilities in multimodal and long-context reasoning, offer exciting possibilities in medicine. Building on these core strengths of Gemini, we introduce Med-Gemini, a family of highly capable multimodal models that are specialized in medicine with the ability to seamlessly use web search, and that can be efficiently tailored to novel modalities using custom encoders. We evaluate Med-Gemini on 14 medical benchmarks, establishing new state-of-the-art (SoTA) performance on 10 of them, and surpass the GPT-4 model family on every benchmark where a direct comparison is viable, often by a wide margin. On the popular MedQA (USMLE) benchmark, our best-performing Med-Gemini model achieves SoTA performance of 91.1% accuracy, using a novel uncertainty-guided search strategy. On 7 multimodal benchmarks including NEJM Image Challenges and MMMU (health & medicine), Med-Gemini improves over GPT-4V by an average relative margin of 44.5%. We demonstrate the effectiveness of Med-Gemini's long-context capabilities through SoTA performance on a needle-in-a-haystack retrieval task from long de-identified health records and medical video question answering, surpassing prior bespoke methods using only in-context learning. Finally, Med-Gemini's performance suggests real-world utility by surpassing human experts on tasks such as medical text summarization, alongside demonstrations of promising potential for multimodal medical dialogue, medical research and education. Taken together, our results offer compelling evidence for Med-Gemini's potential, although further rigorous evaluation will be crucial before real-world deployment in this safety-critical domain.
△ Less
Submitted 1 May, 2024; v1 submitted 29 April, 2024;
originally announced April 2024.
-
UPSS: a User-centric Private Storage System with its applications
Authors:
Arastoo Bozorgi,
Mahya Soleimani Jadidi,
Jonathan Anderson
Abstract:
Strong confidentiality, integrity, user control, reliability and performance are critical requirements in privacy-sensitive applications. Such applications would benefit from a data storage and sharing infrastructure that provides these properties even in decentralized topologies with untrusted storage backends, but users today are forced to choose between systemic security properties and system r…
▽ More
Strong confidentiality, integrity, user control, reliability and performance are critical requirements in privacy-sensitive applications. Such applications would benefit from a data storage and sharing infrastructure that provides these properties even in decentralized topologies with untrusted storage backends, but users today are forced to choose between systemic security properties and system reliability or performance. As an alternative to this status quo we present UPSS: the user-centric private sharing system, a cryptographic storage system that can be used as a conventional filesystem or as the foundation for security-sensitive applications such as redaction with integrity and private revision control. We demonstrate that both the security and performance properties of UPSS exceed that of existing cryptographic filesystems and that its performance is comparable to mature conventional filesystems - in some cases, even superior. Whether used directly via its Rust API or as a conventional filesystem, UPSS provides strong security and practical performance on untrusted storage.
△ Less
Submitted 23 March, 2024;
originally announced March 2024.
-
StarCoder 2 and The Stack v2: The Next Generation
Authors:
Anton Lozhkov,
Raymond Li,
Loubna Ben Allal,
Federico Cassano,
Joel Lamy-Poirier,
Nouamane Tazi,
Ao Tang,
Dmytro Pykhtar,
Jiawei Liu,
Yuxiang Wei,
Tianyang Liu,
Max Tian,
Denis Kocetkov,
Arthur Zucker,
Younes Belkada,
Zijian Wang,
Qian Liu,
Dmitry Abulkhanov,
Indraneil Paul,
Zhuang Li,
Wen-Ding Li,
Megan Risdal,
Jia Li,
Jian Zhu,
Terry Yue Zhuo
, et al. (41 additional authors not shown)
Abstract:
The BigCode project, an open-scientific collaboration focused on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder2. In partnership with Software Heritage (SWH), we build The Stack v2 on top of the digital commons of their source code archive. Alongside the SWH repositories spanning 619 programming languages, we carefully select other high-quality data…
▽ More
The BigCode project, an open-scientific collaboration focused on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder2. In partnership with Software Heritage (SWH), we build The Stack v2 on top of the digital commons of their source code archive. Alongside the SWH repositories spanning 619 programming languages, we carefully select other high-quality data sources, such as GitHub pull requests, Kaggle notebooks, and code documentation. This results in a training set that is 4x larger than the first StarCoder dataset. We train StarCoder2 models with 3B, 7B, and 15B parameters on 3.3 to 4.3 trillion tokens and thoroughly evaluate them on a comprehensive set of Code LLM benchmarks. We find that our small model, StarCoder2-3B, outperforms other Code LLMs of similar size on most benchmarks, and also outperforms StarCoderBase-15B. Our large model, StarCoder2- 15B, significantly outperforms other models of comparable size. In addition, it matches or outperforms CodeLlama-34B, a model more than twice its size. Although DeepSeekCoder- 33B is the best-performing model at code completion for high-resource languages, we find that StarCoder2-15B outperforms it on math and code reasoning benchmarks, as well as several low-resource languages. We make the model weights available under an OpenRAIL license and ensure full transparency regarding the training data by releasing the SoftWare Heritage persistent IDentifiers (SWHIDs) of the source code data.
△ Less
Submitted 29 February, 2024;
originally announced February 2024.
-
Examining the Unique Online Risk Experiences and Mental Health Outcomes of LGBTQ+ versus Heterosexual Youth
Authors:
Tangila Tanni,
Mamtaj Akter,
Joshua Anderson,
Mary Amon,
Pamela Wisniewski
Abstract:
We collected and analyzed Instagram direct messages (DMs) from 173 youth aged 13-21 (including 86 LGBTQ+ youth). We examined youth's risk-flagged social media trace data with their self-reported mental health outcomes to examine how the differing online experiences of LGBTQ+ youth compare with their heterosexual counterparts. We found that LGBTQ+ youth experienced significantly more high-risk onli…
▽ More
We collected and analyzed Instagram direct messages (DMs) from 173 youth aged 13-21 (including 86 LGBTQ+ youth). We examined youth's risk-flagged social media trace data with their self-reported mental health outcomes to examine how the differing online experiences of LGBTQ+ youth compare with their heterosexual counterparts. We found that LGBTQ+ youth experienced significantly more high-risk online interactions compared to heterosexual youth. LGBTQ+ youth reported overall poorer mental health, with online harassment specifically amplifying Self-Harm and Injury. LGBTQ+ youth's mental well-being linked positively to sexual messages, unlike heterosexual youth. Qualitatively, we found that most of the risk-flagged messages of LGBTQ+ youth were sexually motivated; however, a silver lining was that they sought support for their sexual identity from peers on the platform. The study highlights the importance of tailored online safety and inclusive design for LGBTQ+ youth, with implications for CHI community advancements in fostering a supportive online environments.
△ Less
Submitted 14 February, 2024;
originally announced February 2024.
-
Finite-Time Analysis of On-Policy Heterogeneous Federated Reinforcement Learning
Authors:
Chenyu Zhang,
Han Wang,
Aritra Mitra,
James Anderson
Abstract:
Federated reinforcement learning (FRL) has emerged as a promising paradigm for reducing the sample complexity of reinforcement learning tasks by exploiting information from different agents. However, when each agent interacts with a potentially different environment, little to nothing is known theoretically about the non-asymptotic performance of FRL algorithms. The lack of such results can be att…
▽ More
Federated reinforcement learning (FRL) has emerged as a promising paradigm for reducing the sample complexity of reinforcement learning tasks by exploiting information from different agents. However, when each agent interacts with a potentially different environment, little to nothing is known theoretically about the non-asymptotic performance of FRL algorithms. The lack of such results can be attributed to various technical challenges and their intricate interplay: Markovian sampling, linear function approximation, multiple local updates to save communication, heterogeneity in the reward functions and transition kernels of the agents' MDPs, and continuous state-action spaces. Moreover, in the on-policy setting, the behavior policies vary with time, further complicating the analysis. In response, we introduce FedSARSA, a novel federated on-policy reinforcement learning scheme, equipped with linear function approximation, to address these challenges and provide a comprehensive finite-time error analysis. Notably, we establish that FedSARSA converges to a policy that is near-optimal for all agents, with the extent of near-optimality proportional to the level of heterogeneity. Furthermore, we prove that FedSARSA leverages agent collaboration to enable linear speedups as the number of agents increases, which holds for both fixed and adaptive step-size configurations.
△ Less
Submitted 14 April, 2024; v1 submitted 26 January, 2024;
originally announced January 2024.
-
How Beginning Programmers and Code LLMs (Mis)read Each Other
Authors:
Sydney Nguyen,
Hannah McLean Babe,
Yangtian Zi,
Arjun Guha,
Carolyn Jane Anderson,
Molly Q Feldman
Abstract:
Generative AI models, specifically large language models (LLMs), have made strides towards the long-standing goal of text-to-code generation. This progress has invited numerous studies of user interaction. However, less is known about the struggles and strategies of non-experts, for whom each step of the text-to-code problem presents challenges: describing their intent in natural language, evaluat…
▽ More
Generative AI models, specifically large language models (LLMs), have made strides towards the long-standing goal of text-to-code generation. This progress has invited numerous studies of user interaction. However, less is known about the struggles and strategies of non-experts, for whom each step of the text-to-code problem presents challenges: describing their intent in natural language, evaluating the correctness of generated code, and editing prompts when the generated code is incorrect. This paper presents a large-scale controlled study of how 120 beginning coders across three academic institutions approach writing and editing prompts. A novel experimental design allows us to target specific steps in the text-to-code process and reveals that beginners struggle with writing and editing prompts, even for problems at their skill level and when correctness is automatically determined. Our mixed-methods evaluation provides insight into student processes and perceptions with key implications for non-expert Code LLM use within and outside of education.
△ Less
Submitted 7 July, 2024; v1 submitted 26 January, 2024;
originally announced January 2024.
-
Machine Learning for Shipwreck Segmentation from Side Scan Sonar Imagery: Dataset and Benchmark
Authors:
Advaith V. Sethuraman,
Anja Sheppard,
Onur Bagoren,
Christopher Pinnow,
Jamey Anderson,
Timothy C. Havens,
Katherine A. Skinner
Abstract:
Open-source benchmark datasets have been a critical component for advancing machine learning for robot perception in terrestrial applications. Benchmark datasets enable the widespread development of state-of-the-art machine learning methods, which require large datasets for training, validation, and thorough comparison to competing approaches. Underwater environments impose several operational cha…
▽ More
Open-source benchmark datasets have been a critical component for advancing machine learning for robot perception in terrestrial applications. Benchmark datasets enable the widespread development of state-of-the-art machine learning methods, which require large datasets for training, validation, and thorough comparison to competing approaches. Underwater environments impose several operational challenges that hinder efforts to collect large benchmark datasets for marine robot perception. Furthermore, a low abundance of targets of interest relative to the size of the search space leads to increased time and cost required to collect useful datasets for a specific task. As a result, there is limited availability of labeled benchmark datasets for underwater applications. We present the AI4Shipwrecks dataset, which consists of 24 distinct shipwreck sites totaling 286 high-resolution labeled side scan sonar images to advance the state-of-the-art in autonomous sonar image understanding. We leverage the unique abundance of targets in Thunder Bay National Marine Sanctuary in Lake Huron, MI, to collect and compile a sonar imagery benchmark dataset through surveys with an autonomous underwater vehicle (AUV). We consulted with expert marine archaeologists for the labeling of robotically gathered data. We then leverage this dataset to perform benchmark experiments for comparison of state-of-the-art supervised segmentation methods, and we present insights on opportunities and open challenges for the field. The dataset and benchmarking tools will be released as an open-source benchmark dataset to spur innovation in machine learning for Great Lakes and ocean exploration. The dataset and accompanying software are available at https://umfieldrobotics.github.io/ai4shipwrecks/.
△ Less
Submitted 25 January, 2024;
originally announced January 2024.
-
Meta-Learning Linear Quadratic Regulators: A Policy Gradient MAML Approach for Model-free LQR
Authors:
Leonardo F. Toso,
Donglin Zhan,
James Anderson,
Han Wang
Abstract:
We investigate the problem of learning linear quadratic regulators (LQR) in a multi-task, heterogeneous, and model-free setting. We characterize the stability and personalization guarantees of a policy gradient-based (PG) model-agnostic meta-learning (MAML) (Finn et al., 2017) approach for the LQR problem under different task-heterogeneity settings. We show that our MAML-LQR algorithm produces a s…
▽ More
We investigate the problem of learning linear quadratic regulators (LQR) in a multi-task, heterogeneous, and model-free setting. We characterize the stability and personalization guarantees of a policy gradient-based (PG) model-agnostic meta-learning (MAML) (Finn et al., 2017) approach for the LQR problem under different task-heterogeneity settings. We show that our MAML-LQR algorithm produces a stabilizing controller close to each task-specific optimal controller up to a task-heterogeneity bias in both model-based and model-free learning scenarios. Moreover, in the model-based setting, we show that such a controller is achieved with a linear convergence rate, which improves upon sub-linear rates from existing work. Our theoretical guarantees demonstrate that the learned controller can efficiently adapt to unseen LQR tasks.
△ Less
Submitted 31 May, 2024; v1 submitted 25 January, 2024;
originally announced January 2024.
-
Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions
Authors:
Federico Cassano,
Luisa Li,
Akul Sethi,
Noah Shinn,
Abby Brennan-Jones,
Jacob Ginesin,
Edward Berman,
George Chakhnashvili,
Anton Lozhkov,
Carolyn Jane Anderson,
Arjun Guha
Abstract:
A significant amount of research is focused on developing and evaluating large language models for a variety of code synthesis tasks. These include synthesizing code from natural language, synthesizing tests from code, and synthesizing explanations of code. In contrast, the behavior of instructional code editing with LLMs is understudied. These are tasks in which the model is provided a block of c…
▽ More
A significant amount of research is focused on developing and evaluating large language models for a variety of code synthesis tasks. These include synthesizing code from natural language, synthesizing tests from code, and synthesizing explanations of code. In contrast, the behavior of instructional code editing with LLMs is understudied. These are tasks in which the model is provided a block of code and an instruction to modify the code. The editing instruction may ask for a feature to be added or removed, describe a bug and ask for a fix, or ask for a different kind of solution. We introduce a carefully crafted benchmark of code editing tasks and use it to evaluate several cutting edge LLMs. Our evaluation exposes a significant gap between the capabilities of state-of-the-art open and closed models. For example, even GPT-3.5-Turbo is better than the best open model at code editing tasks. We also introduce a new, carefully curated, permissively licensed training dataset of code editing tasks coupled with natural language instructions. Using this training dataset, we show that we can fine-tune open Code LLMs to significantly improve their code editing capabilities, closing the gap between open and closed models. All code, data, and models are available at https://github.com/nuprl/CanItEdit.
△ Less
Submitted 19 March, 2024; v1 submitted 10 December, 2023;
originally announced December 2023.
-
Target-Free Compound Activity Prediction via Few-Shot Learning
Authors:
Peter Eckmann,
Jake Anderson,
Michael K. Gilson,
Rose Yu
Abstract:
Predicting the activities of compounds against protein-based or phenotypic assays using only a few known compounds and their activities is a common task in target-free drug discovery. Existing few-shot learning approaches are limited to predicting binary labels (active/inactive). However, in real-world drug discovery, degrees of compound activity are highly relevant. We study Few-Shot Compound Act…
▽ More
Predicting the activities of compounds against protein-based or phenotypic assays using only a few known compounds and their activities is a common task in target-free drug discovery. Existing few-shot learning approaches are limited to predicting binary labels (active/inactive). However, in real-world drug discovery, degrees of compound activity are highly relevant. We study Few-Shot Compound Activity Prediction (FS-CAP) and design a novel neural architecture to meta-learn continuous compound activities across large bioactivity datasets. Our model aggregates encodings generated from the known compounds and their activities to capture assay information. We also introduce a separate encoder for the unknown compound. We show that FS-CAP surpasses traditional similarity-based techniques as well as other state of the art few-shot learning methods on a variety of target-free drug discovery settings and datasets.
△ Less
Submitted 27 November, 2023;
originally announced November 2023.
-
Improved Communication Efficiency in Federated Natural Policy Gradient via ADMM-based Gradient Updates
Authors:
Guangchen Lan,
Han Wang,
James Anderson,
Christopher Brinton,
Vaneet Aggarwal
Abstract:
Federated reinforcement learning (FedRL) enables agents to collaboratively train a global policy without sharing their individual data. However, high communication overhead remains a critical bottleneck, particularly for natural policy gradient (NPG) methods, which are second-order. To address this issue, we propose the FedNPG-ADMM framework, which leverages the alternating direction method of mul…
▽ More
Federated reinforcement learning (FedRL) enables agents to collaboratively train a global policy without sharing their individual data. However, high communication overhead remains a critical bottleneck, particularly for natural policy gradient (NPG) methods, which are second-order. To address this issue, we propose the FedNPG-ADMM framework, which leverages the alternating direction method of multipliers (ADMM) to approximate global NPG directions efficiently. We theoretically demonstrate that using ADMM-based gradient updates reduces communication complexity from ${O}({d^{2}})$ to ${O}({d})$ at each iteration, where $d$ is the number of model parameters. Furthermore, we show that achieving an $ε$-error stationary convergence requires ${O}(\frac{1}{(1-γ)^{2}ε})$ iterations for discount factor $γ$, demonstrating that FedNPG-ADMM maintains the same convergence rate as the standard FedNPG. Through evaluation of the proposed algorithms in MuJoCo environments, we demonstrate that FedNPG-ADMM maintains the reward performance of standard FedNPG, and that its convergence rate improves when the number of federated agents increases.
△ Less
Submitted 9 October, 2023;
originally announced October 2023.
-
Oracle Complexity Reduction for Model-free LQR: A Stochastic Variance-Reduced Policy Gradient Approach
Authors:
Leonardo F. Toso,
Han Wang,
James Anderson
Abstract:
We investigate the problem of learning an $ε$-approximate solution for the discrete-time Linear Quadratic Regulator (LQR) problem via a Stochastic Variance-Reduced Policy Gradient (SVRPG) approach. Whilst policy gradient methods have proven to converge linearly to the optimal solution of the model-free LQR problem, the substantial requirement for two-point cost queries in gradient estimations may…
▽ More
We investigate the problem of learning an $ε$-approximate solution for the discrete-time Linear Quadratic Regulator (LQR) problem via a Stochastic Variance-Reduced Policy Gradient (SVRPG) approach. Whilst policy gradient methods have proven to converge linearly to the optimal solution of the model-free LQR problem, the substantial requirement for two-point cost queries in gradient estimations may be intractable, particularly in applications where obtaining cost function evaluations at two distinct control input configurations is exceptionally costly. To this end, we propose an oracle-efficient approach. Our method combines both one-point and two-point estimations in a dual-loop variance-reduced algorithm. It achieves an approximate optimal solution with only $O\left(\log\left(1/ε\right)^β\right)$ two-point cost information for $β\in (0,1)$.
△ Less
Submitted 19 September, 2023;
originally announced September 2023.
-
Knowledge Transfer from High-Resource to Low-Resource Programming Languages for Code LLMs
Authors:
Federico Cassano,
John Gouwar,
Francesca Lucchetti,
Claire Schlesinger,
Anders Freeman,
Carolyn Jane Anderson,
Molly Q Feldman,
Michael Greenberg,
Abhinav Jangda,
Arjun Guha
Abstract:
Over the past few years, Large Language Models of Code (Code LLMs) have started to have a significant impact on programming practice. Code LLMs are also emerging as building blocks for research in programming languages and software engineering. However, Code LLMs produce impressive results on programming languages that are well represented in their training data (e.g., Java, Python, or JavaScript)…
▽ More
Over the past few years, Large Language Models of Code (Code LLMs) have started to have a significant impact on programming practice. Code LLMs are also emerging as building blocks for research in programming languages and software engineering. However, Code LLMs produce impressive results on programming languages that are well represented in their training data (e.g., Java, Python, or JavaScript), but struggle with low-resource languages that have limited training data available. Low resource languages include OCaml, Racket, and several others.
This paper presents an effective approach for boosting the performance of Code LLMs on low-resource languages using semi-synthetic data. Our approach, MultiPL-T, translates training data from high-resource languages into training data for low-resource languages in the following way. 1) We use a Code LLM to synthesize tests for commented code from a high-resource language, filtering out faulty tests and code with low test coverage. 2) We use a Code LLM to translate Python code to a target low-resource language, and use tests to validate the translation. We apply this approach to generate tens of thousands of validated training items for Julia, Lua, OCaml, R, and Racket. Furthermore, we use an open model (StarCoderBase) with open training data (The Stack), which allows us to decontaminate benchmarks, train models without violating licenses, and run experiments that could not otherwise be done.
With MultiPL-T generated data, we present fine-tuned versions of StarCoderBase and Code Llama for Julia, Lua, OCaml, R, and Racket. On established benchmarks (MultiPL-E), these models outperform other open Code LLMs. The MultiPL-T approach is easy to apply to new languages, and is significantly more efficient and effective than alternatives such as training longer.
△ Less
Submitted 10 February, 2024; v1 submitted 18 August, 2023;
originally announced August 2023.
-
Sample-Efficient Linear Representation Learning from Non-IID Non-Isotropic Data
Authors:
Thomas T. C. K. Zhang,
Leonardo F. Toso,
James Anderson,
Nikolai Matni
Abstract:
A powerful concept behind much of the recent progress in machine learning is the extraction of common features across data from heterogeneous sources or tasks. Intuitively, using all of one's data to learn a common representation function benefits both computational effort and statistical generalization by leaving a smaller number of parameters to fine-tune on a given task. Toward theoretically gr…
▽ More
A powerful concept behind much of the recent progress in machine learning is the extraction of common features across data from heterogeneous sources or tasks. Intuitively, using all of one's data to learn a common representation function benefits both computational effort and statistical generalization by leaving a smaller number of parameters to fine-tune on a given task. Toward theoretically grounding these merits, we propose a general setting of recovering linear operators $M$ from noisy vector measurements $y = Mx + w$, where the covariates $x$ may be both non-i.i.d. and non-isotropic. We demonstrate that existing isotropy-agnostic representation learning approaches incur biases on the representation update, which causes the scaling of the noise terms to lose favorable dependence on the number of source tasks. This in turn can cause the sample complexity of representation learning to be bottlenecked by the single-task data size. We introduce an adaptation, $\texttt{De-bias & Feature-Whiten}$ ($\texttt{DFW}$), of the popular alternating minimization-descent scheme proposed independently in Collins et al., (2021) and Nayer and Vaswani (2022), and establish linear convergence to the optimal representation with noise level scaling down with the $\textit{total}$ source data size. This leads to generalization bounds on the same order as an oracle empirical risk minimizer. We verify the vital importance of $\texttt{DFW}$ on various numerical simulations. In particular, we show that vanilla alternating-minimization descent fails catastrophically even for iid, but mildly non-isotropic data. Our analysis unifies and generalizes prior work, and provides a flexible framework for a wider range of applications, such as in controls and dynamical systems.
△ Less
Submitted 27 July, 2024; v1 submitted 8 August, 2023;
originally announced August 2023.
-
An R package for parametric estimation of causal effects
Authors:
Joshua Wolff Anderson,
Cyril Rakovski
Abstract:
This article explains the usage of R package CausalModels, which is publicly available on the Comprehensive R Archive Network. While packages are available for sufficiently estimating causal effects, there lacks a package that provides a collection of structural models using the conventional statistical approach developed by Hernan and Robins (2020). CausalModels addresses this deficiency of softw…
▽ More
This article explains the usage of R package CausalModels, which is publicly available on the Comprehensive R Archive Network. While packages are available for sufficiently estimating causal effects, there lacks a package that provides a collection of structural models using the conventional statistical approach developed by Hernan and Robins (2020). CausalModels addresses this deficiency of software in R concerning causal inference by offering tools for methods that account for biases in observational data without requiring extensive statistical knowledge. These methods should not be ignored and may be more appropriate or efficient in solving particular problems. While implementations of these statistical models are distributed among a number of causal packages, CausalModels introduces a simple and accessible framework for a consistent modeling pipeline among a variety of statistical methods for estimating causal effects in a single R package. It consists of common methods including standardization, IP weighting, G-estimation, outcome regression, instrumental variables and propensity matching.
△ Less
Submitted 17 July, 2023; v1 submitted 17 July, 2023;
originally announced July 2023.
-
Universal Session Protocol: A Novel Approach to Session Management
Authors:
Jonathon Anderson
Abstract:
Currently, the TCP/IP model enables exploitation of vulnerabilities anonymously by unconditionally fulfilling every request for a connection into an application; the model only incorporates authentication within applications themselves, rather than as a precondition for access into applications. I am proposing the Universal Session Protocol as a change to the architecture of the TCP/IP model to in…
▽ More
Currently, the TCP/IP model enables exploitation of vulnerabilities anonymously by unconditionally fulfilling every request for a connection into an application; the model only incorporates authentication within applications themselves, rather than as a precondition for access into applications. I am proposing the Universal Session Protocol as a change to the architecture of the TCP/IP model to include a session layer featuring a structured generalized process for authentication negotiation and fulfillment. The Universal Session Protocol addresses an urgent and vital need to eliminate unauthenticated data processing on security critical systems. Previous work regarding TCP/IP security has focused on the application design and implementation and existing protocol layers, but has failed to posit the addition of a session layer as a mitigating control. Failing to implement a distinct authentication layer leaves every resource connected to the global Internet, including life and security critical infrastructure, vulnerable to attacks from anonymous and untraceable sources. The Universal Session Protocol provides a solution by establishing a TCP/IP Session Layer that explicitly provides authentication before a data stream is accessible within an application. After authentication, an identity is associated with the data stream so that all data may be related back to that identity for forensic purposes. If authentication fails, the application will never process user data, rendering the service safe from anonymous bad actors.
△ Less
Submitted 25 June, 2023;
originally announced June 2023.
-
SEEDS: Emulation of Weather Forecast Ensembles with Diffusion Models
Authors:
Lizao Li,
Rob Carver,
Ignacio Lopez-Gomez,
Fei Sha,
John Anderson
Abstract:
Uncertainty quantification is crucial to decision-making. A prominent example is probabilistic forecasting in numerical weather prediction. The dominant approach to representing uncertainty in weather forecasting is to generate an ensemble of forecasts. This is done by running many physics-based simulations under different conditions, which is a computationally costly process. We propose to amorti…
▽ More
Uncertainty quantification is crucial to decision-making. A prominent example is probabilistic forecasting in numerical weather prediction. The dominant approach to representing uncertainty in weather forecasting is to generate an ensemble of forecasts. This is done by running many physics-based simulations under different conditions, which is a computationally costly process. We propose to amortize the computational cost by emulating these forecasts with deep generative diffusion models learned from historical data. The learned models are highly scalable with respect to high-performance computing accelerators and can sample hundreds to tens of thousands of realistic weather forecasts at low cost. When designed to emulate operational ensemble forecasts, the generated ones are similar to physics-based ensembles in important statistical properties and predictive skill. When designed to correct biases present in the operational forecasting system, the generated ensembles show improved probabilistic forecast metrics. They are more reliable and forecast probabilities of extreme weather events more accurately. While this work demonstrates the utility of the methodology by focusing on weather forecasting, the generative artificial intelligence methodology can be extended for uncertainty quantification in climate modeling, where we believe the generation of very large ensembles of climate projections will play an increasingly important role in climate risk assessment.
△ Less
Submitted 8 October, 2023; v1 submitted 24 June, 2023;
originally announced June 2023.
-
Solving and Generating NPR Sunday Puzzles with Large Language Models
Authors:
Jingmiao Zhao,
Carolyn Jane Anderson
Abstract:
We explore the ability of large language models to solve and generate puzzles from the NPR Sunday Puzzle game show using PUZZLEQA, a dataset comprising 15 years of on-air puzzles. We evaluate four large language models using PUZZLEQA, in both multiple choice and free response formats, and explore two prompt engineering techniques to improve free response performance: chain-of-thought reasoning and…
▽ More
We explore the ability of large language models to solve and generate puzzles from the NPR Sunday Puzzle game show using PUZZLEQA, a dataset comprising 15 years of on-air puzzles. We evaluate four large language models using PUZZLEQA, in both multiple choice and free response formats, and explore two prompt engineering techniques to improve free response performance: chain-of-thought reasoning and prompt summarization. We find that state-of-the-art large language models can solve many PUZZLEQA puzzles: the best model, GPT-3.5, achieves 50.2% loose accuracy. However, in our few-shot puzzle generation experiment, we find no evidence that models can generate puzzles: GPT-3.5 generates puzzles with answers that do not conform to the generated rules. Puzzle generation remains a challenging task for future work.
△ Less
Submitted 21 June, 2023;
originally announced June 2023.
-
StudentEval: A Benchmark of Student-Written Prompts for Large Language Models of Code
Authors:
Hannah McLean Babe,
Sydney Nguyen,
Yangtian Zi,
Arjun Guha,
Molly Q Feldman,
Carolyn Jane Anderson
Abstract:
Code LLMs are being rapidly deployed and there is evidence that they can make professional programmers more productive. Current benchmarks for code generation measure whether models generate correct programs given an expert prompt. In this paper, we present a new benchmark containing multiple prompts per problem, written by a specific population of non-expert prompters: beginning programmers. Stud…
▽ More
Code LLMs are being rapidly deployed and there is evidence that they can make professional programmers more productive. Current benchmarks for code generation measure whether models generate correct programs given an expert prompt. In this paper, we present a new benchmark containing multiple prompts per problem, written by a specific population of non-expert prompters: beginning programmers. StudentEval contains 1,749 prompts for 48 problems, written by 80 students who have only completed one semester of Python programming. Our students wrote these prompts while working interactively with a Code LLM, and we observed very mixed success rates. We use StudentEval to evaluate 5 Code LLMs and find that StudentEval is a better discriminator of model performance than existing benchmarks. We analyze the prompts and find significant variation in students' prompting techniques. We also find that nondeterministic LLM sampling could mislead students into thinking that their prompts are more (or less) effective than they actually are, which has implications for how to teach with Code LLMs.
△ Less
Submitted 7 June, 2023;
originally announced June 2023.
-
Neural Ideal Large Eddy Simulation: Modeling Turbulence with Neural Stochastic Differential Equations
Authors:
Anudhyan Boral,
Zhong Yi Wan,
Leonardo Zepeda-Núñez,
James Lottes,
Qing Wang,
Yi-fan Chen,
John Roberts Anderson,
Fei Sha
Abstract:
We introduce a data-driven learning framework that assimilates two powerful ideas: ideal large eddy simulation (LES) from turbulence closure modeling and neural stochastic differential equations (SDE) for stochastic modeling. The ideal LES models the LES flow by treating each full-order trajectory as a random realization of the underlying dynamics, as such, the effect of small-scales is marginaliz…
▽ More
We introduce a data-driven learning framework that assimilates two powerful ideas: ideal large eddy simulation (LES) from turbulence closure modeling and neural stochastic differential equations (SDE) for stochastic modeling. The ideal LES models the LES flow by treating each full-order trajectory as a random realization of the underlying dynamics, as such, the effect of small-scales is marginalized to obtain the deterministic evolution of the LES state. However, ideal LES is analytically intractable. In our work, we use a latent neural SDE to model the evolution of the stochastic process and an encoder-decoder pair for transforming between the latent space and the desired ideal flow field. This stands in sharp contrast to other types of neural parameterization of closure models where each trajectory is treated as a deterministic realization of the dynamics. We show the effectiveness of our approach (niLES - neural ideal LES) on a challenging chaotic dynamical system: Kolmogorov flow at a Reynolds number of 20,000. Compared to competing methods, our method can handle non-uniform geometries using unstructured meshes seamlessly. In particular, niLES leads to trajectories with more accurate statistics and enhances stability, particularly for long-horizon rollouts.
△ Less
Submitted 1 June, 2023;
originally announced June 2023.
-
Debias Coarsely, Sample Conditionally: Statistical Downscaling through Optimal Transport and Probabilistic Diffusion Models
Authors:
Zhong Yi Wan,
Ricardo Baptista,
Yi-fan Chen,
John Anderson,
Anudhyan Boral,
Fei Sha,
Leonardo Zepeda-Núñez
Abstract:
We introduce a two-stage probabilistic framework for statistical downscaling using unpaired data. Statistical downscaling seeks a probabilistic map to transform low-resolution data from a biased coarse-grained numerical scheme to high-resolution data that is consistent with a high-fidelity scheme. Our framework tackles the problem by composing two transformations: (i) a debiasing step via an optim…
▽ More
We introduce a two-stage probabilistic framework for statistical downscaling using unpaired data. Statistical downscaling seeks a probabilistic map to transform low-resolution data from a biased coarse-grained numerical scheme to high-resolution data that is consistent with a high-fidelity scheme. Our framework tackles the problem by composing two transformations: (i) a debiasing step via an optimal transport map, and (ii) an upsampling step achieved by a probabilistic diffusion model with a posteriori conditional sampling. This approach characterizes a conditional distribution without needing paired data, and faithfully recovers relevant physical statistics from biased samples. We demonstrate the utility of the proposed approach on one- and two-dimensional fluid flow problems, which are representative of the core difficulties present in numerical simulations of weather and climate. Our method produces realistic high-resolution outputs from low-resolution inputs, by upsampling resolutions of 8x and 16x. Moreover, our procedure correctly matches the statistics of physical quantities, even when the low-frequency content of the inputs and outputs do not match, a crucial but difficult-to-satisfy assumption needed by current state-of-the-art alternatives. Code for this work is available at: https://github.com/google-research/swirl-dynamics/tree/main/swirl_dynamics/projects/probabilistic_diffusion.
△ Less
Submitted 30 October, 2023; v1 submitted 24 May, 2023;
originally announced May 2023.
-
StarCoder: may the source be with you!
Authors:
Raymond Li,
Loubna Ben Allal,
Yangtian Zi,
Niklas Muennighoff,
Denis Kocetkov,
Chenghao Mou,
Marc Marone,
Christopher Akiki,
Jia Li,
Jenny Chim,
Qian Liu,
Evgenii Zheltonozhskii,
Terry Yue Zhuo,
Thomas Wang,
Olivier Dehaene,
Mishig Davaadorj,
Joel Lamy-Poirier,
João Monteiro,
Oleh Shliazhko,
Nicolas Gontier,
Nicholas Meade,
Armel Zebaze,
Ming-Ho Yee,
Logesh Kumar Umapathi,
Jian Zhu
, et al. (42 additional authors not shown)
Abstract:
The BigCode community, an open-scientific collaboration working on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder and StarCoderBase: 15.5B parameter models with 8K context length, infilling capabilities and fast large-batch inference enabled by multi-query attention. StarCoderBase is trained on 1 trillion tokens sourced from The Stack, a large colle…
▽ More
The BigCode community, an open-scientific collaboration working on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder and StarCoderBase: 15.5B parameter models with 8K context length, infilling capabilities and fast large-batch inference enabled by multi-query attention. StarCoderBase is trained on 1 trillion tokens sourced from The Stack, a large collection of permissively licensed GitHub repositories with inspection tools and an opt-out process. We fine-tuned StarCoderBase on 35B Python tokens, resulting in the creation of StarCoder. We perform the most comprehensive evaluation of Code LLMs to date and show that StarCoderBase outperforms every open Code LLM that supports multiple programming languages and matches or outperforms the OpenAI code-cushman-001 model. Furthermore, StarCoder outperforms every model that is fine-tuned on Python, can be prompted to achieve 40\% pass@1 on HumanEval, and still retains its performance on other programming languages. We take several important steps towards a safe open-access model release, including an improved PII redaction pipeline and a novel attribution tracing tool, and make the StarCoder models publicly available under a more commercially viable version of the Open Responsible AI Model license.
△ Less
Submitted 13 December, 2023; v1 submitted 9 May, 2023;
originally announced May 2023.
-
Learning Personalized Models with Clustered System Identification
Authors:
Leonardo F. Toso,
Han Wang,
James Anderson
Abstract:
We address the problem of learning linear system models from observing multiple trajectories from different system dynamics. This framework encompasses a collaborative scenario where several systems seeking to estimate their dynamics are partitioned into clusters according to their system similarity. Thus, the systems within the same cluster can benefit from the observations made by the others. Co…
▽ More
We address the problem of learning linear system models from observing multiple trajectories from different system dynamics. This framework encompasses a collaborative scenario where several systems seeking to estimate their dynamics are partitioned into clusters according to their system similarity. Thus, the systems within the same cluster can benefit from the observations made by the others. Considering this framework, we present an algorithm where each system alternately estimates its cluster identity and performs an estimation of its dynamics. This is then aggregated to update the model of each cluster. We show that under mild assumptions, our algorithm correctly estimates the cluster identities and achieves an approximate sample complexity that scales inversely with the number of systems in the cluster, thus facilitating a more efficient and personalized system identification process.
△ Less
Submitted 10 September, 2023; v1 submitted 3 April, 2023;
originally announced April 2023.
-
Federated Temporal Difference Learning with Linear Function Approximation under Environmental Heterogeneity
Authors:
Han Wang,
Aritra Mitra,
Hamed Hassani,
George J. Pappas,
James Anderson
Abstract:
We initiate the study of federated reinforcement learning under environmental heterogeneity by considering a policy evaluation problem. Our setup involves $N$ agents interacting with environments that share the same state and action space but differ in their reward functions and state transition kernels. Assuming agents can communicate via a central server, we ask: Does exchanging information expe…
▽ More
We initiate the study of federated reinforcement learning under environmental heterogeneity by considering a policy evaluation problem. Our setup involves $N$ agents interacting with environments that share the same state and action space but differ in their reward functions and state transition kernels. Assuming agents can communicate via a central server, we ask: Does exchanging information expedite the process of evaluating a common policy? To answer this question, we provide the first comprehensive finite-time analysis of a federated temporal difference (TD) learning algorithm with linear function approximation, while accounting for Markovian sampling, heterogeneity in the agents' environments, and multiple local updates to save communication. Our analysis crucially relies on several novel ingredients: (i) deriving perturbation bounds on TD fixed points as a function of the heterogeneity in the agents' underlying Markov decision processes (MDPs); (ii) introducing a virtual MDP to closely approximate the dynamics of the federated TD algorithm; and (iii) using the virtual MDP to make explicit connections to federated optimization. Putting these pieces together, we rigorously prove that in a low-heterogeneity regime, exchanging model estimates leads to linear convergence speedups in the number of agents.
△ Less
Submitted 1 July, 2024; v1 submitted 4 February, 2023;
originally announced February 2023.
-
Using natural language processing and structured medical data to phenotype patients hospitalized due to COVID-19
Authors:
Feier Chang,
Jay Krishnan,
Jillian H Hurst,
Michael E Yarrington,
Deverick J Anderson,
Emily C O'Brien,
Benjamin A Goldstein
Abstract:
To identify patients who are hospitalized because of COVID-19 as opposed to those who were admitted for other indications, we compared the performance of different computable phenotype definitions for COVID-19 hospitalizations that use different types of data from the electronic health records (EHR), including structured EHR data elements, provider notes, or a combination of both data types. And c…
▽ More
To identify patients who are hospitalized because of COVID-19 as opposed to those who were admitted for other indications, we compared the performance of different computable phenotype definitions for COVID-19 hospitalizations that use different types of data from the electronic health records (EHR), including structured EHR data elements, provider notes, or a combination of both data types. And conduct a retrospective data analysis utilizing chart review-based validation. Participants are 586 hospitalized individuals who tested positive for SARS-CoV-2 during January 2022. We used natural language processing to incorporate data from provider notes and LASSO regression and Random Forests to fit classification algorithms that incorporated structured EHR data elements, provider notes, or a combination of structured data and provider notes. Results: Based on a chart review, 38% of 586 patients were determined to be hospitalized for reasons other than COVID-19 despite having tested positive for SARS-CoV-2. A classification algorithm that used provider notes had significantly better discrimination than one that used structured EHR data elements (AUROC: 0.894 vs 0.841, p < 0.001), and performed similarly to a model that combined provider notes with structured data elements (AUROC: 0.894 vs 0.893). Assessments of hospital outcome metrics significantly differed based on whether the population included all hospitalized patients who tested positive for SARS-CoV-2 versus those who were determined to have been hospitalized due to COVID-19. This work demonstrates the utility of natural language processing approaches to derive information related to patient hospitalizations in cases where there may be multiple conditions that could serve as the primary indication for hospitalization.
△ Less
Submitted 2 February, 2023;
originally announced February 2023.
-
SantaCoder: don't reach for the stars!
Authors:
Loubna Ben Allal,
Raymond Li,
Denis Kocetkov,
Chenghao Mou,
Christopher Akiki,
Carlos Munoz Ferrandis,
Niklas Muennighoff,
Mayank Mishra,
Alex Gu,
Manan Dey,
Logesh Kumar Umapathi,
Carolyn Jane Anderson,
Yangtian Zi,
Joel Lamy Poirier,
Hailey Schoelkopf,
Sergey Troshin,
Dmitry Abulkhanov,
Manuel Romero,
Michael Lappert,
Francesco De Toni,
Bernardo García del Río,
Qian Liu,
Shamik Bose,
Urvashi Bhattacharyya,
Terry Yue Zhuo
, et al. (16 additional authors not shown)
Abstract:
The BigCode project is an open-scientific collaboration working on the responsible development of large language models for code. This tech report describes the progress of the collaboration until December 2022, outlining the current state of the Personally Identifiable Information (PII) redaction pipeline, the experiments conducted to de-risk the model architecture, and the experiments investigat…
▽ More
The BigCode project is an open-scientific collaboration working on the responsible development of large language models for code. This tech report describes the progress of the collaboration until December 2022, outlining the current state of the Personally Identifiable Information (PII) redaction pipeline, the experiments conducted to de-risk the model architecture, and the experiments investigating better preprocessing methods for the training data. We train 1.1B parameter models on the Java, JavaScript, and Python subsets of The Stack and evaluate them on the MultiPL-E text-to-code benchmark. We find that more aggressive filtering of near-duplicates can further boost performance and, surprisingly, that selecting files from repositories with 5+ GitHub stars deteriorates performance significantly. Our best model outperforms previous open-source multilingual code generation models (InCoder-6.7B and CodeGen-Multi-2.7B) in both left-to-right generation and infilling on the Java, JavaScript, and Python portions of MultiPL-E, despite being a substantially smaller model. All models are released under an OpenRAIL license at https://hf.co/bigcode.
△ Less
Submitted 24 February, 2023; v1 submitted 9 January, 2023;
originally announced January 2023.
-
A Topic Modeling Approach to Classifying Open Street Map Health Clinics and Schools in Sub-Saharan Africa
Authors:
Joshua W. Anderson,
Luis Iñaki Alberro Encina,
Tina George Karippacheril,
Jonathan Hersh,
Cadence Stringer
Abstract:
Data deprivation, or the lack of easily available and actionable information on the well-being of individuals, is a significant challenge for the developing world and an impediment to the design and operationalization of policies intended to alleviate poverty. In this paper we explore the suitability of data derived from OpenStreetMap to proxy for the location of two crucial public services: schoo…
▽ More
Data deprivation, or the lack of easily available and actionable information on the well-being of individuals, is a significant challenge for the developing world and an impediment to the design and operationalization of policies intended to alleviate poverty. In this paper we explore the suitability of data derived from OpenStreetMap to proxy for the location of two crucial public services: schools and health clinics. Thanks to the efforts of thousands of digital humanitarians, online mapping repositories such as OpenStreetMap contain millions of records on buildings and other structures, delineating both their location and often their use. Unfortunately much of this data is locked in complex, unstructured text rendering it seemingly unsuitable for classifying schools or clinics. We apply a scalable, unsupervised learning method to unlabeled OpenStreetMap building data to extract the location of schools and health clinics in ten countries in Africa. We find the topic modeling approach greatly improves performance versus reliance on structured keys alone. We validate our results by comparing schools and clinics identified by our OSM method versus those identified by the WHO, and describe OSM coverage gaps more broadly.
△ Less
Submitted 22 December, 2022;
originally announced December 2022.
-
Synthetic Image Data for Deep Learning
Authors:
Jason W. Anderson,
Marcin Ziolkowski,
Ken Kennedy,
Amy W. Apon
Abstract:
Realistic synthetic image data rendered from 3D models can be used to augment image sets and train image classification semantic segmentation models. In this work, we explore how high quality physically-based rendering and domain randomization can efficiently create a large synthetic dataset based on production 3D CAD models of a real vehicle. We use this dataset to quantify the effectiveness of s…
▽ More
Realistic synthetic image data rendered from 3D models can be used to augment image sets and train image classification semantic segmentation models. In this work, we explore how high quality physically-based rendering and domain randomization can efficiently create a large synthetic dataset based on production 3D CAD models of a real vehicle. We use this dataset to quantify the effectiveness of synthetic augmentation using U-net and Double-U-net models. We found that, for this domain, synthetic images were an effective technique for augmenting limited sets of real training data. We observed that models trained on purely synthetic images had a very low mean prediction IoU on real validation images. We also observed that adding even very small amounts of real images to a synthetic dataset greatly improved accuracy, and that models trained on datasets augmented with synthetic images were more accurate than those trained on real images alone. Finally, we found that in use cases that benefit from incremental training or model specialization, pretraining a base model on synthetic images provided a sizeable reduction in the training cost of transfer learning, allowing up to 90\% of the model training to be front-loaded.
△ Less
Submitted 12 December, 2022;
originally announced December 2022.
-
FedSysID: A Federated Approach to Sample-Efficient System Identification
Authors:
Han Wang,
Leonardo F. Toso,
James Anderson
Abstract:
We study the problem of learning a linear system model from the observations of $M$ clients. The catch: Each client is observing data from a different dynamical system. This work addresses the question of how multiple clients collaboratively learn dynamical models in the presence of heterogeneity. We pose this problem as a federated learning problem and characterize the tension between achievable…
▽ More
We study the problem of learning a linear system model from the observations of $M$ clients. The catch: Each client is observing data from a different dynamical system. This work addresses the question of how multiple clients collaboratively learn dynamical models in the presence of heterogeneity. We pose this problem as a federated learning problem and characterize the tension between achievable performance and system heterogeneity. Furthermore, our federated sample complexity result provides a constant factor improvement over the single agent setting. Finally, we describe a meta federated learning algorithm, FedSysID, that leverages existing federated algorithms at the client level.
△ Less
Submitted 25 November, 2022;
originally announced November 2022.
-
MultiPL-E: A Scalable and Extensible Approach to Benchmarking Neural Code Generation
Authors:
Federico Cassano,
John Gouwar,
Daniel Nguyen,
Sydney Nguyen,
Luna Phipps-Costin,
Donald Pinckney,
Ming-Ho Yee,
Yangtian Zi,
Carolyn Jane Anderson,
Molly Q Feldman,
Arjun Guha,
Michael Greenberg,
Abhinav Jangda
Abstract:
Large language models have demonstrated the ability to generate both natural language and programming language text. Such models open up the possibility of multi-language code generation: could code generation models generalize knowledge from one language to another? Although contemporary code generation models can generate semantically correct Python code, little is known about their abilities wi…
▽ More
Large language models have demonstrated the ability to generate both natural language and programming language text. Such models open up the possibility of multi-language code generation: could code generation models generalize knowledge from one language to another? Although contemporary code generation models can generate semantically correct Python code, little is known about their abilities with other languages. We propose MultiPL-E, a system for translating unit test-driven code generation benchmarks to new languages. We create the first massively multilingual code generation benchmark by using MultiPL-E to translate two popular Python code generation benchmarks to 18 additional programming languages.
We use MultiPL-E to extend the HumanEval benchmark and MBPP benchmark to 18 languages that encompass a range of programming paradigms and popularity. Using these new parallel benchmarks, we evaluate the multi-language performance of three state-of-the-art code generation models: Codex, CodeGen, and InCoder. We find that Codex matches or even exceeds its performance on Python for several other languages. The range of programming languages represented in MultiPL-E allow us to explore the impact of language frequency and language features on model performance. Finally, the MultiPL-E approach of compiling code generation benchmarks to new programming languages is both scalable and extensible, making it straightforward to evaluate new models, benchmarks, and languages.
△ Less
Submitted 19 December, 2022; v1 submitted 17 August, 2022;
originally announced August 2022.
-
Fundamental Limits of Thermal-noise Lossy Bosonic Multiple Access Channel
Authors:
Evan J. D. Anderson,
Boulat A. Bash
Abstract:
Bosonic channels describe quantum-mechanically many practical communication links such as optical, microwave, and radiofrequency. We investigate the maximum rates for the bosonic multiple access channel (MAC) in the presence of thermal noise added by the environment and when the transmitters utilize Gaussian state inputs. We develop an outer bound for the capacity region for the thermal-noise loss…
▽ More
Bosonic channels describe quantum-mechanically many practical communication links such as optical, microwave, and radiofrequency. We investigate the maximum rates for the bosonic multiple access channel (MAC) in the presence of thermal noise added by the environment and when the transmitters utilize Gaussian state inputs. We develop an outer bound for the capacity region for the thermal-noise lossy bosonic MAC. We additionally find that the use of coherent states at the transmitters is capacity-achieving in the limits of high and low mean input photon numbers. Furthermore, we verify that coherent states are capacity-achieving for the sum rate of the channel. In the non-asymptotic regime, when a global mean photon-number constraint is imposed on the transmitters, coherent states are the optimal Gaussian state. Surprisingly however, the use of single-mode squeezed states can increase the capacity over that afforded by coherent state encoding when each transmitter is photon number constrained individually.
△ Less
Submitted 17 July, 2022; v1 submitted 30 June, 2022;
originally announced July 2022.
-
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
Authors:
Yizhong Wang,
Swaroop Mishra,
Pegah Alipoormolabashi,
Yeganeh Kordi,
Amirreza Mirzaei,
Anjana Arunkumar,
Arjun Ashok,
Arut Selvan Dhanasekaran,
Atharva Naik,
David Stap,
Eshaan Pathak,
Giannis Karamanolakis,
Haizhi Gary Lai,
Ishan Purohit,
Ishani Mondal,
Jacob Anderson,
Kirby Kuznia,
Krima Doshi,
Maitreya Patel,
Kuntal Kumar Pal,
Mehrad Moradshahi,
Mihir Parmar,
Mirali Purohit,
Neeraj Varshney,
Phani Rohitha Kaza
, et al. (15 additional authors not shown)
Abstract:
How well can NLP models generalize to a variety of unseen tasks when provided with task instructions? To address this question, we first introduce Super-NaturalInstructions, a benchmark of 1,616 diverse NLP tasks and their expert-written instructions. Our collection covers 76 distinct task types, including but not limited to classification, extraction, infilling, sequence tagging, text rewriting,…
▽ More
How well can NLP models generalize to a variety of unseen tasks when provided with task instructions? To address this question, we first introduce Super-NaturalInstructions, a benchmark of 1,616 diverse NLP tasks and their expert-written instructions. Our collection covers 76 distinct task types, including but not limited to classification, extraction, infilling, sequence tagging, text rewriting, and text composition. This large and diverse collection of tasks enables rigorous benchmarking of cross-task generalization under instructions -- training models to follow instructions on a subset of tasks and evaluating them on the remaining unseen ones. Furthermore, we build Tk-Instruct, a transformer model trained to follow a variety of in-context instructions (plain language task definitions or k-shot examples). Our experiments show that Tk-Instruct outperforms existing instruction-following models such as InstructGPT by over 9% on our benchmark despite being an order of magnitude smaller. We further analyze generalization as a function of various scaling parameters, such as the number of observed tasks, the number of instances per task, and model sizes. We hope our dataset and model facilitate future progress towards more general-purpose NLP models.
△ Less
Submitted 24 October, 2022; v1 submitted 15 April, 2022;
originally announced April 2022.
-
VALUE: Understanding Dialect Disparity in NLU
Authors:
Caleb Ziems,
Jiaao Chen,
Camille Harris,
Jessica Anderson,
Diyi Yang
Abstract:
English Natural Language Understanding (NLU) systems have achieved great performances and even outperformed humans on benchmarks like GLUE and SuperGLUE. However, these benchmarks contain only textbook Standard American English (SAE). Other dialects have been largely overlooked in the NLP community. This leads to biased and inequitable NLU systems that serve only a sub-population of speakers. To u…
▽ More
English Natural Language Understanding (NLU) systems have achieved great performances and even outperformed humans on benchmarks like GLUE and SuperGLUE. However, these benchmarks contain only textbook Standard American English (SAE). Other dialects have been largely overlooked in the NLP community. This leads to biased and inequitable NLU systems that serve only a sub-population of speakers. To understand disparities in current models and to facilitate more dialect-competent NLU systems, we introduce the VernAcular Language Understanding Evaluation (VALUE) benchmark, a challenging variant of GLUE that we created with a set of lexical and morphosyntactic transformation rules. In this initial release (V.1), we construct rules for 11 features of African American Vernacular English (AAVE), and we recruit fluent AAVE speakers to validate each feature transformation via linguistic acceptability judgments in a participatory design manner. Experiments show that these new dialectal features can lead to a drop in model performance. To run the transformation code and download both synthetic and gold-standard dialectal GLUE benchmarks, see https://github.com/SALT-NLP/value
△ Less
Submitted 13 September, 2022; v1 submitted 6 April, 2022;
originally announced April 2022.
-
FedADMM: A Federated Primal-Dual Algorithm Allowing Partial Participation
Authors:
Han Wang,
Siddartha Marella,
James Anderson
Abstract:
Federated learning is a framework for distributed optimization that places emphasis on communication efficiency. In particular, it follows a client-server broadcast model and is particularly appealing because of its ability to accommodate heterogeneity in client compute and storage resources, non-i.i.d. data assumptions, and data privacy. Our contribution is to offer a new federated learning algor…
▽ More
Federated learning is a framework for distributed optimization that places emphasis on communication efficiency. In particular, it follows a client-server broadcast model and is particularly appealing because of its ability to accommodate heterogeneity in client compute and storage resources, non-i.i.d. data assumptions, and data privacy. Our contribution is to offer a new federated learning algorithm, FedADMM, for solving non-convex composite optimization problems with non-smooth regularizers. We prove converges of FedADMM for the case when not all clients are able to participate in a given communication round under a very general sampling model.
△ Less
Submitted 28 March, 2022;
originally announced March 2022.
-
Finding the optimal human strategy for Wordle using maximum correct letter probabilities and reinforcement learning
Authors:
Benton J. Anderson,
Jesse G. Meyer
Abstract:
Wordle is an online word puzzle game that gained viral popularity in January 2022. The goal is to guess a hidden five letter word. After each guess, the player gains information about whether the letters they guessed are present in the word, and whether they are in the correct position. Numerous blogs have suggested guessing strategies and starting word lists that improve the chance of winning. Op…
▽ More
Wordle is an online word puzzle game that gained viral popularity in January 2022. The goal is to guess a hidden five letter word. After each guess, the player gains information about whether the letters they guessed are present in the word, and whether they are in the correct position. Numerous blogs have suggested guessing strategies and starting word lists that improve the chance of winning. Optimized algorithms can win 100% of games within five of the six allowed trials. However, it is infeasible for human players to use these algorithms due to an inability to perfectly recall all known 5-letter words and perform complex calculations that optimize information gain. Here, we present two different methods for choosing starting words along with a framework for discovering the optimal human strategy based on reinforcement learning. Human Wordle players can use the rules we discover to optimize their chance of winning.
△ Less
Submitted 1 February, 2022;
originally announced February 2022.
-
Self-Supervised Keypoint Discovery in Behavioral Videos
Authors:
Jennifer J. Sun,
Serim Ryou,
Roni Goldshmid,
Brandon Weissbourd,
John Dabiri,
David J. Anderson,
Ann Kennedy,
Yisong Yue,
Pietro Perona
Abstract:
We propose a method for learning the posture and structure of agents from unlabelled behavioral videos. Starting from the observation that behaving agents are generally the main sources of movement in behavioral videos, our method, Behavioral Keypoint Discovery (B-KinD), uses an encoder-decoder architecture with a geometric bottleneck to reconstruct the spatiotemporal difference between video fram…
▽ More
We propose a method for learning the posture and structure of agents from unlabelled behavioral videos. Starting from the observation that behaving agents are generally the main sources of movement in behavioral videos, our method, Behavioral Keypoint Discovery (B-KinD), uses an encoder-decoder architecture with a geometric bottleneck to reconstruct the spatiotemporal difference between video frames. By focusing only on regions of movement, our approach works directly on input videos without requiring manual annotations. Experiments on a variety of agent types (mouse, fly, human, jellyfish, and trees) demonstrate the generality of our approach and reveal that our discovered keypoints represent semantically meaningful body parts, which achieve state-of-the-art performance on keypoint regression among self-supervised methods. Additionally, B-KinD achieve comparable performance to supervised keypoints on downstream tasks, such as behavior classification, suggesting that our method can dramatically reduce model training costs vis-a-vis supervised methods.
△ Less
Submitted 27 April, 2022; v1 submitted 9 December, 2021;
originally announced December 2021.
-
Learning Linear Models Using Distributed Iterative Hessian Sketching
Authors:
Han Wang,
James Anderson
Abstract:
This work considers the problem of learning the Markov parameters of a linear system from observed data. Recent non-asymptotic system identification results have characterized the sample complexity of this problem in the single and multi-rollout setting. In both instances, the number of samples required in order to obtain acceptable estimates can produce optimization problems with an intractably l…
▽ More
This work considers the problem of learning the Markov parameters of a linear system from observed data. Recent non-asymptotic system identification results have characterized the sample complexity of this problem in the single and multi-rollout setting. In both instances, the number of samples required in order to obtain acceptable estimates can produce optimization problems with an intractably large number of decision variables for a second-order algorithm. We show that a randomized and distributed Newton algorithm based on Hessian-sketching can produce $ε$-optimal solutions and converges geometrically. Moreover, the algorithm is trivially parallelizable. Our results hold for a variety of sketching matrices and we illustrate the theory with numerical examples.
△ Less
Submitted 7 December, 2021;
originally announced December 2021.
-
Measurement and Analysis of GPU-accelerated Applications with HPCToolkit
Authors:
Keren Zhou,
Laksono Adhianto,
Jonathon Anderson,
Aaron Cherian,
Dejan Grubisic,
Mark Krentel,
Yumeng Liu,
Xiaozhu Meng,
John Mellor-Crummey
Abstract:
To address the challenge of performance analysis on the US DOE's forthcoming exascale supercomputers, Rice University has been extending its HPCToolkit performance tools to support measurement and analysis of GPU-accelerated applications. To help developers understand the performance of accelerated applications as a whole, HPCToolkit's measurement and analysis tools attribute metrics to calling co…
▽ More
To address the challenge of performance analysis on the US DOE's forthcoming exascale supercomputers, Rice University has been extending its HPCToolkit performance tools to support measurement and analysis of GPU-accelerated applications. To help developers understand the performance of accelerated applications as a whole, HPCToolkit's measurement and analysis tools attribute metrics to calling contexts that span both CPUs and GPUs. To measure GPU-accelerated applications efficiently, HPCToolkit employs a novel wait-free data structure to coordinate monitoring and attribution of GPU performance. To help developers understand the performance of complex GPU code generated from high-level programming models, HPCToolkit constructs sophisticated approximations of call path profiles for GPU computations. To support fine-grained analysis and tuning, HPCToolkit uses PC sampling and instrumentation to measure and attribute GPU performance metrics to source lines, loops, and inlined code. To supplement fine-grained measurements, HPCToolkit can measure GPU kernel executions using hardware performance counters. To provide a view of how an execution evolves over time, HPCToolkit can collect, analyze, and visualize call path traces within and across nodes. Finally, on NVIDIA GPUs, HPCToolkit can derive and attribute a collection of useful performance metrics based on measurements using GPU PC samples. We illustrate HPCToolkit's new capabilities for analyzing GPU-accelerated applications with several codes developed as part of the Exascale Computing Project.
△ Less
Submitted 14 September, 2021;
originally announced September 2021.
-
Solver-based Gradual Type Migration
Authors:
Luna Phipps-Costin,
Carolyn Jane Anderson,
Michael Greenberg,
Arjun Guha
Abstract:
Gradually typed languages allow programmers to mix statically and dynamically typed code, enabling them to incrementally reap the benefits of static typing as they add type annotations to their code. However, this type migration process is typically a manual effort with limited tool support. This paper examines the problem of \emph{automated type migration}: given a dynamic program, infer addition…
▽ More
Gradually typed languages allow programmers to mix statically and dynamically typed code, enabling them to incrementally reap the benefits of static typing as they add type annotations to their code. However, this type migration process is typically a manual effort with limited tool support. This paper examines the problem of \emph{automated type migration}: given a dynamic program, infer additional or improved type annotations.
Existing type migration algorithms prioritize different goals, such as maximizing type precision, maintaining compatibility with unmigrated code, and preserving the semantics of the original program. We argue that the type migration problem involves fundamental compromises: optimizing for a single goal often comes at the expense of others. Ideally, a type migration tool would flexibly accommodate a range of user priorities.
We present TypeWhich, a new approach to automated type migration for the gradually-typed lambda calculus with some extensions. Unlike prior work, which relies on custom solvers, TypeWhich produces constraints for an off-the-shelf MaxSMT solver. This allows us to easily express objectives, such as minimizing the number of necessary syntactic coercions, and constraining the type of the migration to be compatible with unmigrated code.
We present the first comprehensive evaluation of GTLC type migration algorithms, and compare TypeWhich to four other tools from the literature. Our evaluation uses prior benchmarks, and a new set of ``challenge problems.'' Moreover, we design a new evaluation methodology that highlights the subtleties of gradual type migration. In addition, we apply TypeWhich to a suite of benchmarks for Grift, a programming language based on the GTLC. TypeWhich is able to reconstruct all human-written annotations on all but one program.
△ Less
Submitted 10 September, 2021;
originally announced September 2021.
-
Large-Scale System Identification Using a Randomized SVD
Authors:
Han Wang,
James Anderson
Abstract:
Learning a dynamical system from input/output data is a fundamental task in the control design pipeline. In the partially observed setting there are two components to identification: parameter estimation to learn the Markov parameters, and system realization to obtain a state space model. In both sub-problems it is implicitly assumed that standard numerical algorithms such as the singular value de…
▽ More
Learning a dynamical system from input/output data is a fundamental task in the control design pipeline. In the partially observed setting there are two components to identification: parameter estimation to learn the Markov parameters, and system realization to obtain a state space model. In both sub-problems it is implicitly assumed that standard numerical algorithms such as the singular value decomposition (SVD) can be easily and reliably computed. When trying to fit a high-dimensional model to data, for example in the cyber-physical system setting, even computing an SVD is intractable. In this work we show that an approximate matrix factorization obtained using randomized methods can replace the standard SVD in the realization algorithm while maintaining the non-asymptotic (in data-set size) performance and robustness guarantees of classical methods. Numerical examples illustrate that for large system models, this is the only method capable of producing a model.
△ Less
Submitted 6 September, 2021;
originally announced September 2021.
-
Preparing for Performance Analysis at Exascale
Authors:
Jonathon Anderson,
Yumeng Liu,
John Mellor-Crummey
Abstract:
Performance tools for emerging heterogeneous exascale platforms must address two principal challenges when analyzing execution measurements. First, measurement of large-scale executions may record mountains of performance data. Second, performance measurements for parallel programs are sparse in two ways: the set of metrics present for any context and the set of contexts present in different threa…
▽ More
Performance tools for emerging heterogeneous exascale platforms must address two principal challenges when analyzing execution measurements. First, measurement of large-scale executions may record mountains of performance data. Second, performance measurements for parallel programs are sparse in two ways: the set of metrics present for any context and the set of contexts present in different threads. For GPU-accelerated applications, an important source of sparsity is that none of the myriad of GPU metrics apply to any of the many CPU contexts. To address these challenges, we developed a novel streaming aggregation approach to postmortem analysis that employs both shared and distributed memory parallelism to aggregate sparse performance measurements from every rank, thread, and GPU stream of an application, and attributes heterogeneous call path profiles and traces to source code. Using the same amount of resources, our approach analyzes large-scale performance measurements of GPU-accelerated applications over an order of magnitude faster than HPCToolkit and its sparse analysis results are as much as three orders of magnitude smaller than HPCToolkit's dense representation of metrics.
△ Less
Submitted 10 March, 2022; v1 submitted 9 August, 2021;
originally announced August 2021.
-
Coloring graphs with forbidden bipartite subgraphs
Authors:
James Anderson,
Anton Bernshteyn,
Abhishek Dhawan
Abstract:
A conjecture of Alon, Krivelevich, and Sudakov states that, for any graph $F$, there is a constant $c_F > 0$ such that if $G$ is an $F$-free graph of maximum degree $Δ$, then $χ(G) \leq c_F Δ/ \logΔ$. Alon, Krivelevich, and Sudakov verified this conjecture for a class of graphs $F$ that includes all bipartite graphs. Moreover, it follows from recent work by Davies, Kang, Pirot, and Sereni that if…
▽ More
A conjecture of Alon, Krivelevich, and Sudakov states that, for any graph $F$, there is a constant $c_F > 0$ such that if $G$ is an $F$-free graph of maximum degree $Δ$, then $χ(G) \leq c_F Δ/ \logΔ$. Alon, Krivelevich, and Sudakov verified this conjecture for a class of graphs $F$ that includes all bipartite graphs. Moreover, it follows from recent work by Davies, Kang, Pirot, and Sereni that if $G$ is $K_{t,t}$-free, then $χ(G) \leq (t + o(1)) Δ/ \logΔ$ as $Δ\to \infty$. We improve this bound to $(1+o(1)) Δ/\log Δ$, making the constant factor independent of $t$. We further extend our result to the DP-coloring setting (also known as correspondence coloring), introduced by Dvořák and Postle.
△ Less
Submitted 21 January, 2022; v1 submitted 12 July, 2021;
originally announced July 2021.
-
PSY-TaLiRo: A Python Toolbox for Search-Based Test Generation for Cyber-Physical Systems
Authors:
Quinn Thibeault,
Jacob Anderson,
Aniruddh Chandratre,
Giulia Pedrielli,
Georgios Fainekos
Abstract:
In this paper, we present the Python package PSY-TaLiRo which is a toolbox for temporal logic robustness guided falsification of Cyber-Physical Systems (CPS). PSY-TaLiRo is a completely modular toolbox supporting multiple temporal logic offline monitors as well as optimization engines for test case generation. Among the benefits of PSY-TaLiRo is that it supports search-based test generation for ma…
▽ More
In this paper, we present the Python package PSY-TaLiRo which is a toolbox for temporal logic robustness guided falsification of Cyber-Physical Systems (CPS). PSY-TaLiRo is a completely modular toolbox supporting multiple temporal logic offline monitors as well as optimization engines for test case generation. Among the benefits of PSY-TaLiRo is that it supports search-based test generation for many different types of systems under test. All PSY-TaLiRo modules can be fully modified by the users to support new optimization and robustness computation engines as well as any System under Test (SUT).
△ Less
Submitted 3 June, 2021;
originally announced June 2021.
-
Assessing the Acceptability of a Humanoid Robot for Alzheimer's Disease and Related Dementia Care Using an Online Survey
Authors:
Fengpei Yuan,
Joel G. Anderson,
Tami Wyatt,
Ruth Palan Lopez,
Monica Crane,
Austin Montgomery,
Xiaopeng Zhao
Abstract:
In this work, an online survey was used to understand the acceptability of humanoid robots and users' needs in using these robots to assist with care among people with Alzheimer's disease and related dementias (ADRD), their family caregivers, health care professionals, and the general public. From November 12, 2020 to March 13, 2021, a total of 631 complete responses were collected, including 80 r…
▽ More
In this work, an online survey was used to understand the acceptability of humanoid robots and users' needs in using these robots to assist with care among people with Alzheimer's disease and related dementias (ADRD), their family caregivers, health care professionals, and the general public. From November 12, 2020 to March 13, 2021, a total of 631 complete responses were collected, including 80 responses from people with mild cognitive impairment or ADRD, 245 responses from caregivers and health care professionals, and 306 responses from the general public. Overall, people with ADRD, caregivers, and the general public showed positive attitudes towards using the robot to assist with care for people with ADRD. The top three functions of robots required by the group of people with ADRD were reminders to take medicine, emergency call service, and helping contact medical services. Additional comments, suggestions, and concerns provided by caregivers and the general public are also discussed.
△ Less
Submitted 26 April, 2021;
originally announced April 2021.