Skip to main content

Showing 1–23 of 23 results for author: Moura, L

  1. arXiv:2402.06619  [pdf, other

    cs.CL cs.AI

    Aya Dataset: An Open-Access Collection for Multilingual Instruction Tuning

    Authors: Shivalika Singh, Freddie Vargus, Daniel Dsouza, Börje F. Karlsson, Abinaya Mahendiran, Wei-Yin Ko, Herumb Shandilya, Jay Patel, Deividas Mataciunas, Laura OMahony, Mike Zhang, Ramith Hettiarachchi, Joseph Wilson, Marina Machado, Luisa Souza Moura, Dominik Krzemiński, Hakimeh Fadaei, Irem Ergün, Ifeoma Okoh, Aisha Alaagib, Oshan Mudannayake, Zaid Alyafeai, Vu Minh Chien, Sebastian Ruder, Surya Guthikonda , et al. (8 additional authors not shown)

    Abstract: Datasets are foundational to many breakthroughs in modern artificial intelligence. Many recent achievements in the space of natural language processing (NLP) can be attributed to the finetuning of pre-trained models on a diverse set of tasks that enables a large language model (LLM) to respond to instructions. Instruction fine-tuning (IFT) requires specifically constructed and annotated datasets.… ▽ More

    Submitted 9 February, 2024; originally announced February 2024.

  2. A Formalized Extension of the Substitution Lemma in Coq

    Authors: Maria J. D. Lima, Flávio L. C. de Moura

    Abstract: The substitution lemma is a renowned theorem within the realm of lambda-calculus theory and concerns the interactional behaviour of the metasubstitution operation. In this work, we augment the lambda-calculus's grammar with an uninterpreted explicit substitution operator, which allows the use of our framework for different calculi with explicit substitutions. Our primary contribution lies in verif… ▽ More

    Submitted 24 September, 2023; originally announced September 2023.

    Comments: In Proceedings FROM 2023, arXiv:2309.12959

    Journal ref: EPTCS 389, 2023, pp. 80-95

  3. arXiv:2305.00109  [pdf, other

    cs.CV cs.AI

    Zero-shot performance of the Segment Anything Model (SAM) in 2D medical imaging: A comprehensive evaluation and practical guidelines

    Authors: Christian Mattjie, Luis Vinicius de Moura, Rafaela Cappelari Ravazio, Lucas Silveira Kupssinskü, Otávio Parraga, Marcelo Mussi Delucis, Rodrigo Coelho Barros

    Abstract: Segmentation in medical imaging is a critical component for the diagnosis, monitoring, and treatment of various diseases and medical conditions. Presently, the medical segmentation landscape is dominated by numerous specialized deep learning models, each fine-tuned for specific segmentation tasks and image modalities. The recently-introduced Segment Anything Model (SAM) employs the ViT neural arch… ▽ More

    Submitted 5 May, 2023; v1 submitted 28 April, 2023; originally announced May 2023.

    Comments: 18 pages, 3 Tables, 10 Figures with additional supplementary material with 1 Table

  4. arXiv:2211.05617  [pdf, other

    cs.LG cs.AI cs.CL cs.CV cs.CY

    Debiasing Methods for Fairer Neural Models in Vision and Language Research: A Survey

    Authors: Otávio Parraga, Martin D. More, Christian M. Oliveira, Nathan S. Gavenski, Lucas S. Kupssinskü, Adilson Medronha, Luis V. Moura, Gabriel S. Simões, Rodrigo C. Barros

    Abstract: Despite being responsible for state-of-the-art results in several computer vision and natural language processing tasks, neural networks have faced harsh criticism due to some of their current shortcomings. One of them is that neural networks are correlation machines prone to model biases within the data instead of focusing on actual useful causal relationships. This problem is particularly seriou… ▽ More

    Submitted 10 November, 2022; originally announced November 2022.

    Comments: Submitted to ACM Computing Surveys - Special Issue on Trustworthy AI

  5. arXiv:2208.01412  [pdf, ps, other

    cs.DM cs.IT math.CO

    Bounds on Covering Codes in RT spaces using Ordered Covering Arrays

    Authors: André Guerino Castoldi, Emerson Luiz do Monte Carmelo, Lucia Moura, Daniel Panario, Brett Stevens

    Abstract: In this work, constructions of ordered covering arrays are discussed and applied to obtain new upper bounds on covering codes in Rosenbloom-Tsfasman spaces (RT spaces), improving or extending some previous results.

    Submitted 30 July, 2022; originally announced August 2022.

    Comments: 12 pages

    Journal ref: CAI 2019. Lecture Notes in Computer Science, vol 11545. Springer

  6. Locating modifications in signed data for partial data integrity

    Authors: Thaís Bardini Idalino, Lucia Moura, Ricardo Felipe Custódio, Daniel Panario

    Abstract: We consider the problem of detecting and locating modifications in signed data to ensure partial data integrity. We assume that the data is divided into $n$ blocks (not necessarily of the same size) and that a threshold $d$ is given for the maximum amount of modified blocks that the scheme can support. We propose efficient algorithms for signature and verification steps which provide a reasonably… ▽ More

    Submitted 31 July, 2022; originally announced August 2022.

    Comments: 14 pages

    Journal ref: Information Processing Letters 115 (2015) 731-737

  7. arXiv:2208.01410  [pdf, other

    math.CO cs.IT

    Ordered Covering Arrays and Upper Bounds on Covering Codes in NRT spaces

    Authors: André Guerino Castoldi, Emerson L. Monte Carmelo, Lucia Moura, Daniel Panario, Brett Stevens

    Abstract: This work shows several direct and recursive constructions of ordered covering arrays using projection, fusion, column augmentation, derivation, concatenation and cartesian product. Upper bounds on covering codes in NRT spaces are also obtained by improving a general upper bound. We explore the connection between ordered covering arrays and covering codes in NRT spaces, which generalize similar re… ▽ More

    Submitted 31 July, 2022; originally announced August 2022.

    Comments: 27 pages

  8. Modification tolerant signature schemes: location and correction

    Authors: Thais Bardini Idalino, Lucia Moura, Carlisle Adams

    Abstract: This paper considers malleable digital signatures, for situations where data is modified after it is signed. They can be used in applications where either the data can be modified (collaborative work), or the data must be modified (redactable and content extraction signatures) or we need to know which parts of the data have been modified (data forensics). A \new{classical} digital signature is val… ▽ More

    Submitted 31 July, 2022; originally announced August 2022.

    Comments: 22 pages

    Journal ref: In: Progress in Cryptology - INDOCRYPT 2019. Lecture Notes in Computer Science, vol 11898. Springer, Cham (2019)

  9. Nested Cover-Free Families for Unbounded Fault-Tolerant Aggregate Signatures

    Authors: Thais Bardini Idalino, Lucia Moura

    Abstract: Aggregate signatures are used to create one short proof of authenticity and integrity from a set of digital signatures. However, one invalid signature in the set invalidates the entire aggregate, giving no information on which signatures are valid. Hartung et al. (2016) propose a fault-tolerant aggregate signature scheme based on combinatorial group testing. Given a bound $d$ on the number of inva… ▽ More

    Submitted 31 July, 2022; originally announced August 2022.

    Comments: 30 pages

    Journal ref: Theoretical Computer Science 854 (2021) 116-130

  10. Ordered Orthogonal Array Construction Using LFSR Sequences

    Authors: André Guerino Castoldi, Lucia Moura, Daniel Panario, Brett Stevens

    Abstract: We present a new construction of ordered orthogonal arrays (OOA) of strength $t$ with $(q + 1)t$ columns over a finite field $\mathbb{F}_{q}$ using linear feedback shift register sequences (LFSRs). OOAs are naturally related to $(t, m, s)$-nets, linear codes, and MDS codes. Our construction selects suitable columns from the array formed by all subintervals of length $\frac{q^{t}-1}{q-1}$ of an LFS… ▽ More

    Submitted 30 July, 2022; originally announced August 2022.

    Comments: 12 pages

    Journal ref: IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 63, NO. 2, FEBRUARY 2017

  11. arXiv:2202.09264  [pdf, other

    cs.DM math.CO stat.ME

    Structure-aware combinatorial group testing: a new method for pandemic screening

    Authors: Thais Bardini Idalino, Lucia Moura

    Abstract: Combinatorial group testing (CGT) is used to identify defective items from a set of items by grouping them together and performing a small number of tests on the groups. Recently, group testing has been used to design efficient COVID-19 testing, so that resources are saved while still identifying all infected individuals. Due to test waiting times, a focus is given to non-adaptive CGT, where group… ▽ More

    Submitted 16 February, 2022; originally announced February 2022.

  12. arXiv:2202.00478  [pdf

    cs.CL

    NeuraHealth: An Automated Screening Pipeline to Detect Undiagnosed Cognitive Impairment in Electronic Health Records with Deep Learning and Natural Language Processing

    Authors: Tanish Tyagi, Colin G. Magdamo, Ayush Noori, Zhaozhi Li, Xiao Liu, Mayuresh Deodhar, Zhuoqiao Hong, Wendong Ge, Elissa M. Ye, Yi-han Sheu, Haitham Alabsi, Laura Brenner, Gregory K. Robbins, Sahar Zafar, Nicole Benson, Lidia Moura, John Hsu, Alberto Serrano-Pozo, Dimitry Prokopenko, Rudolph E. Tanzi, Bradley T. Hyman, Deborah Blacker, Shibani S. Mukerji, M. Brandon Westover, Sudeshna Das

    Abstract: Dementia related cognitive impairment (CI) is a neurodegenerative disorder, affecting over 55 million people worldwide and growing rapidly at the rate of one new case every 3 seconds. 75% cases go undiagnosed globally with up to 90% in low-and-middle-income countries, leading to an estimated annual worldwide cost of USD 1.3 trillion, forecasted to reach 2.8 trillion by 2030. With no cure, a recurr… ▽ More

    Submitted 20 June, 2022; v1 submitted 12 January, 2022; originally announced February 2022.

  13. arXiv:2111.09115  [pdf, other

    cs.CL cs.LG

    Using Deep Learning to Identify Patients with Cognitive Impairment in Electronic Health Records

    Authors: Tanish Tyagi, Colin G. Magdamo, Ayush Noori, Zhaozhi Li, Xiao Liu, Mayuresh Deodhar, Zhuoqiao Hong, Wendong Ge, Elissa M. Ye, Yi-han Sheu, Haitham Alabsi, Laura Brenner, Gregory K. Robbins, Sahar Zafar, Nicole Benson, Lidia Moura, John Hsu, Alberto Serrano-Pozo, Dimitry Prokopenko, Rudolph E. Tanzi, Bradley T. Hyman, Deborah Blacker, Shibani S. Mukerji, M. Brandon Westover, Sudeshna Das

    Abstract: Dementia is a neurodegenerative disorder that causes cognitive decline and affects more than 50 million people worldwide. Dementia is under-diagnosed by healthcare professionals - only one in four people who suffer from dementia are diagnosed. Even when a diagnosis is made, it may not be entered as a structured International Classification of Diseases (ICD) diagnosis code in a patient's charts. In… ▽ More

    Submitted 12 November, 2021; originally announced November 2021.

    Comments: Machine Learning for Health (ML4H) - Extended Abstract

  14. arXiv:2012.11401  [pdf, other

    cs.AI cs.PL

    Universal Policies for Software-Defined MDPs

    Authors: Daniel Selsam, Jesse Michael Han, Leonardo de Moura, Patrice Godefroid

    Abstract: We introduce a new programming paradigm called oracle-guided decision programming in which a program specifies a Markov Decision Process (MDP) and the language provides a universal policy. We prototype a new programming language, Dodona, that manifests this paradigm using a primitive 'choose' representing nondeterministic choice. The Dodona interpreter returns either a value or a choicepoint that… ▽ More

    Submitted 21 December, 2020; originally announced December 2020.

  15. arXiv:2011.06489  [pdf, other

    cs.CL

    Natural Language Processing to Detect Cognitive Concerns in Electronic Health Records Using Deep Learning

    Authors: Zhuoqiao Hong, Colin G. Magdamo, Yi-han Sheu, Prathamesh Mohite, Ayush Noori, Elissa M. Ye, Wendong Ge, Haoqi Sun, Laura Brenner, Gregory Robbins, Shibani Mukerji, Sahar Zafar, Nicole Benson, Lidia Moura, John Hsu, Bradley T. Hyman, Michael B. Westover, Deborah Blacker, Sudeshna Das

    Abstract: Dementia is under-recognized in the community, under-diagnosed by healthcare professionals, and under-coded in claims data. Information on cognitive dysfunction, however, is often found in unstructured clinician notes within medical records but manual review by experts is time consuming and often prone to errors. Automated mining of these notes presents a potential opportunity to label patients wi… ▽ More

    Submitted 12 November, 2020; originally announced November 2020.

    Comments: Machine Learning for Health (ML4H) at NeurIPS 2020 - Extended Abstract

    MSC Class: I.2.7

  16. arXiv:2003.01685  [pdf, other

    cs.PL

    Sealing Pointer-Based Optimizations Behind Pure Functions

    Authors: Daniel Selsam, Simon Hudon, Leonardo de Moura

    Abstract: Functional programming languages are particularly well-suited for building automated reasoning systems, since (among other reasons) a logical term is well modeled by an inductive type, traversing a term can be implemented generically as a higher-order combinator, and backtracking is dramatically simplified by persistent datastructures. However, existing pure functional programming languages all su… ▽ More

    Submitted 30 May, 2020; v1 submitted 3 March, 2020; originally announced March 2020.

  17. Beyond Notations: Hygienic Macro Expansion for Theorem Proving Languages

    Authors: Sebastian Ullrich, Leonardo de Moura

    Abstract: In interactive theorem provers (ITPs), extensible syntax is not only crucial to lower the cognitive burden of manipulating complex mathematical objects, but plays a critical role in developing reusable abstractions in libraries. Most ITPs support such extensions in the form of restrictive "syntax sugar" substitutions and other ad hoc mechanisms, which are too rudimentary to support many desirable… ▽ More

    Submitted 12 April, 2022; v1 submitted 28 January, 2020; originally announced January 2020.

    Journal ref: Logical Methods in Computer Science, Volume 18, Issue 2 (April 13, 2022) lmcs:7421

  18. arXiv:2001.04301  [pdf, other

    cs.PL cs.LO

    Tabled Typeclass Resolution

    Authors: Daniel Selsam, Sebastian Ullrich, Leonardo de Moura

    Abstract: Typeclasses provide an elegant and effective way of managing ad-hoc polymorphism in both programming languages and interactive proof assistants. However, the increasingly sophisticated uses of typeclasses within proof assistants, especially within Lean's burgeoning mathematics library, mathlib, have elevated once-theoretical limitations of existing typeclass resolution procedures into major impedi… ▽ More

    Submitted 21 January, 2020; v1 submitted 13 January, 2020; originally announced January 2020.

  19. arXiv:1908.05647  [pdf, other

    cs.PL

    Counting Immutable Beans: Reference Counting Optimized for Purely Functional Programming

    Authors: Sebastian Ullrich, Leonardo de Moura

    Abstract: Most functional languages rely on some garbage collection for automatic memory management. They usually eschew reference counting in favor of a tracing garbage collector, which has less bookkeeping overhead at runtime. On the other hand, having an exact reference count of each value can enable optimizations, such as destructive updates. We explore these optimization opportunities in the context of… ▽ More

    Submitted 5 March, 2020; v1 submitted 15 August, 2019; originally announced August 2019.

    Journal ref: IFL 2019

  20. arXiv:1802.03685  [pdf, other

    cs.AI cs.LG cs.LO

    Learning a SAT Solver from Single-Bit Supervision

    Authors: Daniel Selsam, Matthew Lamm, Benedikt Bünz, Percy Liang, Leonardo de Moura, David L. Dill

    Abstract: We present NeuroSAT, a message passing neural network that learns to solve SAT problems after only being trained as a classifier to predict satisfiability. Although it is not competitive with state-of-the-art SAT solvers, NeuroSAT can solve problems that are substantially larger and more difficult than it ever saw during training by simply running for more iterations. Moreover, NeuroSAT generalize… ▽ More

    Submitted 11 March, 2019; v1 submitted 10 February, 2018; originally announced February 2018.

  21. arXiv:1701.04391  [pdf, ps, other

    cs.LO

    Congruence Closure in Intensional Type Theory

    Authors: Daniel Selsam, Leonardo de Moura

    Abstract: Congruence closure procedures are used extensively in automated reasoning and are a core component of most satisfiability modulo theories solvers. However, no known congruence closure algorithms can support any of the expressive logics based on intensional type theory (ITT), which form the basis of many interactive theorem provers. The main source of expressiveness in these logics is dependent typ… ▽ More

    Submitted 9 May, 2017; v1 submitted 16 January, 2017; originally announced January 2017.

    Comments: Appeared at International Joint Conference on Automated Reasoning (IJCAR) 2016

  22. arXiv:1505.04324  [pdf, ps, other

    cs.LO

    Elaboration in Dependent Type Theory

    Authors: Leonardo de Moura, Jeremy Avigad, Soonho Kong, Cody Roux

    Abstract: To be usable in practice, interactive theorem provers need to provide convenient and efficient means of writing expressions, definitions, and proofs. This involves inferring information that is often left implicit in an ordinary mathematical text, and resolving ambiguities in mathematical expressions. We refer to the process of passing from a quasi-formal and partially-specified expression to a co… ▽ More

    Submitted 17 December, 2015; v1 submitted 16 May, 2015; originally announced May 2015.

  23. A Formalization of the Theorem of Existence of First-Order Most General Unifiers

    Authors: Andréia B Avelar, André L Galdino, Flávio LC de Moura, Mauricio Ayala-Rincón

    Abstract: This work presents a formalization of the theorem of existence of most general unifiers in first-order signatures in the higher-order proof assistant PVS. The distinguishing feature of this formalization is that it remains close to the textbook proofs that are based on proving the correctness of the well-known Robinson's first-order unification algorithm. The formalization was applied inside… ▽ More

    Submitted 28 March, 2012; originally announced March 2012.

    Comments: In Proceedings LSFA 2011, arXiv:1203.5423

    Journal ref: EPTCS 81, 2012, pp. 63-78