Skip to main content

Showing 1–50 of 7,510 results for author: Li, Y

  1. arXiv:2407.20175  [pdf, other

    cs.CV

    Towards Localized Fine-Grained Control for Facial Expression Generation

    Authors: Tuomas Varanka, Huai-Qian Khor, Yante Li, Mengting Wei, Hanwei Kung, Nicu Sebe, Guoying Zhao

    Abstract: Generative models have surged in popularity recently due to their ability to produce high-quality images and video. However, steering these models to produce images with specific attributes and precise control remains challenging. Humans, particularly their faces, are central to content generation due to their ability to convey rich expressions and intent. Current generative models mostly generate… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  2. arXiv:2407.19863  [pdf, other

    cs.DC

    Before and After Blockchain: Development and Principle of Distributed Fault Tolerance Consensus

    Authors: Huanyu Wu, Chentao Yue, Yixuan Fan, Yonghui Li, Lei Zhang

    Abstract: The concept of distributed consensus gained widespread attention following the publication of ``Byzantine Generals Problem'' by Leslie Lamport in the 1980s. This research topic has been active and extensively studied over the last four decades, particularly since the advent of blockchain technology in 2009. Blockchain technology employs Proof-of-X (PoX) or Byzantine-fault-tolerant (BFT) systems, w… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  3. arXiv:2407.19843  [pdf, other

    cs.HC

    The Second Joint Workshop on Cross Reality

    Authors: Nanjia Wang, Yue Li, Francesco Chiossi, Fabian Pointecker, Lixiang Zhao, Daniel Zielasko

    Abstract: The 2nd Joint Workshop on Cross Reality (JWCR'24), organized as part of ISMAR 2024, seeks to explore the burgeoning field of Cross Reality (CR), which encompasses the seamless integration and transition between various points on the reality-virtuality continuum (RVC) such as Virtual Reality (VR), Augmented Virtuality (AV), and Augmented Reality (AR). This hybrid workshop aims to build upon the fou… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: 5 pages

    Journal ref: 2024 IEEE International Symposium on Mixed and Augmented Reality

  4. arXiv:2407.19784  [pdf, other

    cs.LG cs.AI

    Survey and Taxonomy: The Role of Data-Centric AI in Transformer-Based Time Series Forecasting

    Authors: Jingjing Xu, Caesar Wu, Yuan-Fang Li, Gregoire Danoy, Pascal Bouvry

    Abstract: Alongside the continuous process of improving AI performance through the development of more sophisticated models, researchers have also focused their attention to the emerging concept of data-centric AI, which emphasizes the important role of data in a systematic machine learning training process. Nonetheless, the development of models has also continued apace. One result of this progress is the… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  5. Causal Interventional Prediction System for Robust and Explainable Effect Forecasting

    Authors: Zhixuan Chu, Hui Ding, Guang Zeng, Shiyu Wang, Yiming Li

    Abstract: Although the widespread use of AI systems in today's world is growing, many current AI systems are found vulnerable due to hidden bias and missing information, especially in the most commonly used forecasting system. In this work, we explore the robustness and explainability of AI-based forecasting systems. We provide an in-depth analysis of the underlying causality involved in the effect predicti… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: Proceedings of the 33rd ACM International Conference on Information and Knowledge Management (CIKM '24), October 21--25, 2024, Boise, ID, USA

  6. arXiv:2407.19666  [pdf, other

    cs.CV

    Take A Step Back: Rethinking the Two Stages in Visual Reasoning

    Authors: Mingyu Zhang, Jiting Cai, Mingyu Liu, Yue Xu, Cewu Lu, Yong-Lu Li

    Abstract: Visual reasoning, as a prominent research area, plays a crucial role in AI by facilitating concept formation and interaction with the world. However, current works are usually carried out separately on small datasets thus lacking generalization ability. Through rigorous evaluation of diverse benchmarks, we demonstrate the shortcomings of existing ad-hoc methods in achieving cross-domain reasoning… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: ECCV 2024, Project page: https://mybearyzhang.github.io/projects/TwoStageReason/

  7. arXiv:2407.19547  [pdf, other

    cs.CV

    Temporal Feature Matters: A Framework for Diffusion Model Quantization

    Authors: Yushi Huang, Ruihao Gong, Xianglong Liu, Jing Liu, Yuhang Li, Jiwen Lu, Dacheng Tao

    Abstract: The Diffusion models, widely used for image generation, face significant challenges related to their broad applicability due to prolonged inference times and high memory demands. Efficient Post-Training Quantization (PTQ) is crucial to address these issues in traditional models. Unlike those models, diffusion models critically rely on the time-step $t$ for effective multi-round denoising. Typicall… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2311.16503

  8. arXiv:2407.19497  [pdf, other

    cs.CV

    Skeleton-based Group Activity Recognition via Spatial-Temporal Panoramic Graph

    Authors: Zhengcen Li, Xinle Chang, Yueran Li, Jingyong Su

    Abstract: Group Activity Recognition aims to understand collective activities from videos. Existing solutions primarily rely on the RGB modality, which encounters challenges such as background variations, occlusions, motion blurs, and significant computational overhead. Meanwhile, current keypoint-based methods offer a lightweight and informative representation of human motions but necessitate accurate indi… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

  9. arXiv:2407.19453  [pdf, other

    cs.CV

    FIND: Fine-tuning Initial Noise Distribution with Policy Optimization for Diffusion Models

    Authors: Changgu Chen, Libing Yang, Xiaoyan Yang, Lianggangxu Chen, Gaoqi He, CHangbo Wang, Yang Li

    Abstract: In recent years, large-scale pre-trained diffusion models have demonstrated their outstanding capabilities in image and video generation tasks. However, existing models tend to produce visual objects commonly found in the training dataset, which diverges from user input prompts. The underlying reason behind the inaccurate generated results lies in the model's difficulty in sampling from specific i… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

  10. arXiv:2407.19342  [pdf, other

    cs.LG cs.CL

    Parameter-Efficient Fine-Tuning via Circular Convolution

    Authors: Aochuan Chen, Ziqi Gao, Zijing Liu, Yu Li, Jia Li

    Abstract: Low-Rank Adaptation (LoRA) has gained popularity for fine-tuning large foundation models, leveraging low-rank matrices $\mathbf{A}$ and $\mathbf{B}$ to represent weight changes (\textit{i.e.,} $Δ\mathbf{W} = \mathbf{B} \mathbf{A}$). This method reduces trainable parameters and mitigates heavy memory consumption associated with full delta matrices by sequentially multiplying $\mathbf{A}$ and… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

    Comments: Work in progress

  11. arXiv:2407.19280  [pdf, other

    cs.AI cs.RO

    Large Language Models for Human-like Autonomous Driving: A Survey

    Authors: Yun Li, Kai Katsumata, Ehsan Javanmardi, Manabu Tsukada

    Abstract: Large Language Models (LLMs), AI models trained on massive text corpora with remarkable language understanding and generation capabilities, are transforming the field of Autonomous Driving (AD). As AD systems evolve from rule-based and optimization-based methods to learning-based techniques like deep reinforcement learning, they are now poised to embrace a third and more advanced category: knowled… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

    Comments: 8 pages, 2 figures, accepted at IEEE Intelligent Transportation Systems Conference (ITSC) 2024

  12. arXiv:2407.19259  [pdf, other

    cs.CV cs.AI

    Fine-Grained Scene Graph Generation via Sample-Level Bias Prediction

    Authors: Yansheng Li, Tingzhu Wang, Kang Wu, Linlin Wang, Xin Guo, Wenbin Wang

    Abstract: Scene Graph Generation (SGG) aims to explore the relationships between objects in images and obtain scene summary graphs, thereby better serving downstream tasks. However, the long-tailed problem has adversely affected the scene graph's quality. The predictions are dominated by coarse-grained relationships, lacking more informative fine-grained ones. The union region of one object pair (i.e., one… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

    Comments: 24 pages, 10 figures, ECCV2024

  13. arXiv:2407.19256  [pdf

    cs.AI cs.CL cs.LG

    Stochastic Parrots or ICU Experts? Large Language Models in Critical Care Medicine: A Scoping Review

    Authors: Tongyue Shi, Jun Ma, Zihan Yu, Haowei Xu, Minqi Xiong, Meirong Xiao, Yilin Li, Huiying Zhao, Guilan Kong

    Abstract: With the rapid development of artificial intelligence (AI), large language models (LLMs) have shown strong capabilities in natural language understanding, reasoning, and generation, attracting amounts of research interest in applying LLMs to health and medicine. Critical care medicine (CCM) provides diagnosis and treatment for critically ill patients who often require intensive monitoring and inte… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

    Comments: 28 pages, 5 figures

  14. arXiv:2407.19180  [pdf, other

    cs.CV

    Data Processing Techniques for Modern Multimodal Models

    Authors: Yinheng Li, Han Ding, Hang Chen

    Abstract: Data processing plays an significant role in current multimodal model training. In this paper. we provide an comprehensive review of common data processing techniques used in modern multimodal model training with a focus on diffusion models and multimodal large language models (MLLMs). We summarized all techniques into four categories: data quality, data quantity, data distribution and data safety… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

  15. arXiv:2407.19174  [pdf, other

    cs.CV

    Reducing Spurious Correlation for Federated Domain Generalization

    Authors: Shuran Ma, Weiying Xie, Daixun Li, Haowei Li, Yunsong Li

    Abstract: The rapid development of multimedia has provided a large amount of data with different distributions for visual tasks, forming different domains. Federated Learning (FL) can efficiently use this diverse data distributed on different client media in a decentralized manner through model sharing. However, in open-world scenarios, there is a challenge: global models may struggle to predict well on ent… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

    Comments: 10 pages, 4 figures

  16. arXiv:2407.18848  [pdf, other

    cs.AI cs.LO

    Repairing Networks of $\mathcal{EL_\perp}$ Ontologies using Weakening and Completing -- Extended version

    Authors: Ying Li, Patrick Lambrix

    Abstract: The quality of ontologies and their alignments is crucial for developing high-quality semantics-based applications. Traditional debugging techniques repair ontology networks by removing unwanted axioms and mappings, but may thereby remove consequences that are correct in the domain of the ontology network. In this paper we propose a framework for repairing ontology networks that deals with this is… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: This is a slightly revised and extended version of a paper published at ISWC 2024. arXiv admin note: text overlap with arXiv:2208.00486

  17. arXiv:2407.18625  [pdf, other

    cs.ET cs.AI cs.NE

    Topology Optimization of Random Memristors for Input-Aware Dynamic SNN

    Authors: Bo Wang, Shaocong Wang, Ning Lin, Yi Li, Yifei Yu, Yue Zhang, Jichang Yang, Xiaoshan Wu, Yangu He, Songqi Wang, Rui Chen, Guoqi Li, Xiaojuan Qi, Zhongrui Wang, Dashan Shang

    Abstract: There is unprecedented development in machine learning, exemplified by recent large language models and world simulators, which are artificial neural networks running on digital computers. However, they still cannot parallel human brains in terms of energy efficiency and the streamlined adaptability to inputs of different difficulties, due to differences in signal representation, optimization, run… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: 15 pages, 5 figures

  18. Learning Spectral-Decomposed Tokens for Domain Generalized Semantic Segmentation

    Authors: Jingjun Yi, Qi Bi, Hao Zheng, Haolan Zhan, Wei Ji, Yawen Huang, Yuexiang Li, Yefeng Zheng

    Abstract: The rapid development of Vision Foundation Model (VFM) brings inherent out-domain generalization for a variety of down-stream tasks. Among them, domain generalized semantic segmentation (DGSS) holds unique challenges as the cross-domain images share common pixel-wise content information but vary greatly in terms of the style. In this paper, we present a novel Spectral-dEcomposed Token (SET) learni… ▽ More

    Submitted 28 July, 2024; v1 submitted 26 July, 2024; originally announced July 2024.

    Comments: accecpted by ACM MM2024

  19. arXiv:2407.18369  [pdf, other

    cs.CY cs.CL

    AI Safety in Generative AI Large Language Models: A Survey

    Authors: Jaymari Chua, Yun Li, Shiyi Yang, Chen Wang, Lina Yao

    Abstract: Large Language Model (LLMs) such as ChatGPT that exhibit generative AI capabilities are facing accelerated adoption and innovation. The increased presence of Generative AI (GAI) inevitably raises concerns about the risks and safety associated with these models. This article provides an up-to-date survey of recent trends in AI safety research of GAI-LLMs from a computer scientist's perspective: spe… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  20. arXiv:2407.18110  [pdf, other

    cs.AR cs.AI

    MapTune: Advancing ASIC Technology Mapping via Reinforcement Learning Guided Library Tuning

    Authors: Mingju Liu, Daniel Robinson, Yingjie Li, Cunxi Yu

    Abstract: Technology mapping involves mapping logical circuits to a library of cells. Traditionally, the full technology library is used, leading to a large search space and potential overhead. Motivated by randomly sampled technology mapping case studies, we propose MapTune framework that addresses this challenge by utilizing reinforcement learning to make design-specific choices during cell selection. By… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: IEEE/ACM International Conference on Computer-Aided Design (ICCAD '24), October 27--31, 2024

  21. arXiv:2407.17904  [pdf, other

    cs.CV

    Exploring the Effect of Dataset Diversity in Self-Supervised Learning for Surgical Computer Vision

    Authors: Tim J. M. Jaspers, Ronald L. P. D. de Jong, Yasmina Al Khalil, Tijn Zeelenberg, Carolus H. J. Kusters, Yiping Li, Romy C. van Jaarsveld, Franciscus H. A. Bakker, Jelle P. Ruurda, Willem M. Brinkman, Peter H. N. De With, Fons van der Sommen

    Abstract: Over the past decade, computer vision applications in minimally invasive surgery have rapidly increased. Despite this growth, the impact of surgical computer vision remains limited compared to other medical fields like pathology and radiology, primarily due to the scarcity of representative annotated data. Whereas transfer learning from large annotated datasets such as ImageNet has been convention… ▽ More

    Submitted 26 July, 2024; v1 submitted 25 July, 2024; originally announced July 2024.

    Comments: accepted - Data Engineering in Medical Imaging (DEMI) Workshop @ MICCAI2024

  22. arXiv:2407.17827  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Unified Lexical Representation for Interpretable Visual-Language Alignment

    Authors: Yifan Li, Yikai Wang, Yanwei Fu, Dongyu Ru, Zheng Zhang, Tong He

    Abstract: Visual-Language Alignment (VLA) has gained a lot of attention since CLIP's groundbreaking work. Although CLIP performs well, the typical direct latent feature alignment lacks clarity in its representation and similarity scores. On the other hand, lexical representation, a vector whose element represents the similarity between the sample and a word from the vocabulary, is a natural sparse represent… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  23. arXiv:2407.17789  [pdf, other

    cs.MA cs.AI

    Very Large-Scale Multi-Agent Simulation in AgentScope

    Authors: Xuchen Pan, Dawei Gao, Yuexiang Xie, Zhewei Wei, Yaliang Li, Bolin Ding, Ji-Rong Wen, Jingren Zhou

    Abstract: Recent advances in large language models (LLMs) have opened new avenues for applying multi-agent systems in very large-scale simulations. However, there remain several challenges when conducting multi-agent simulations with existing platforms, such as limited scalability and low efficiency, unsatisfied agent diversity, and effort-intensive management processes. To address these challenges, we deve… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: We have released code on https://github.com/modelscope/agentscope

  24. DAC: 2D-3D Retrieval with Noisy Labels via Divide-and-Conquer Alignment and Correction

    Authors: Chaofan Gan, Yuanpeng Tu, Yuxi Li, Weiyao Lin

    Abstract: With the recent burst of 2D and 3D data, cross-modal retrieval has attracted increasing attention recently. However, manual labeling by non-experts will inevitably introduce corrupted annotations given ambiguous 2D/3D content. Though previous works have addressed this issue by designing a naive division strategy with hand-crafted thresholds, their performance generally exhibits great sensitivity t… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: accepted by ACM MM 2024

  25. arXiv:2407.17572  [pdf, other

    cs.CV cs.AI

    CityX: Controllable Procedural Content Generation for Unbounded 3D Cities

    Authors: Shougao Zhang, Mengqi Zhou, Yuxi Wang, Chuanchen Luo, Rongyu Wang, Yiwei Li, Xucheng Yin, Zhaoxiang Zhang, Junran Peng

    Abstract: Generating a realistic, large-scale 3D virtual city remains a complex challenge due to the involvement of numerous 3D assets, various city styles, and strict layout constraints. Existing approaches provide promising attempts at procedural content generation to create large-scale scenes using Blender agents. However, they face crucial issues such as difficulties in scaling up generation capability… ▽ More

    Submitted 29 July, 2024; v1 submitted 24 July, 2024; originally announced July 2024.

  26. arXiv:2407.17438  [pdf, other

    cs.CV cs.AI cs.LG

    HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation

    Authors: Zhenzhi Wang, Yixuan Li, Yanhong Zeng, Youqing Fang, Yuwei Guo, Wenran Liu, Jing Tan, Kai Chen, Tianfan Xue, Bo Dai, Dahua Lin

    Abstract: Human image animation involves generating videos from a character photo, allowing user control and unlocking potential for video and movie production. While recent approaches yield impressive results using high-quality training data, the inaccessibility of these datasets hampers fair and transparent benchmarking. Moreover, these approaches prioritize 2D human motion and overlook the significance o… ▽ More

    Submitted 28 July, 2024; v1 submitted 24 July, 2024; originally announced July 2024.

    Comments: camera controllable human image animation, a dataset and a baseline

  27. arXiv:2407.17418  [pdf, other

    cs.CV

    3D Gaussian Splatting: Survey, Technologies, Challenges, and Opportunities

    Authors: Yanqi Bao, Tianyu Ding, Jing Huo, Yaoli Liu, Yuxin Li, Wenbin Li, Yang Gao, Jiebo Luo

    Abstract: 3D Gaussian Splatting (3DGS) has emerged as a prominent technique with the potential to become a mainstream method for 3D representations. It can effectively transform multi-view images into explicit 3D Gaussian representations through efficient training, and achieve real-time rendering of novel views. This survey aims to analyze existing 3DGS-related works from multiple intersecting perspectives,… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  28. arXiv:2407.17379  [pdf, other

    cs.CV cs.CL

    MMRA: A Benchmark for Multi-granularity Multi-image Relational Association

    Authors: Siwei Wu, Kang Zhu, Yu Bai, Yiming Liang, Yizhi Li, Haoning Wu, Jiaheng Liu, Ruibo Liu, Xingwei Qu, Xuxin Cheng, Ge Zhang, Wenhao Huang, Chenghua Lin

    Abstract: Given the remarkable success that large visual language models (LVLMs) have achieved in image perception tasks, the endeavor to make LVMLs perceive the world like humans is drawing increasing attention. Current multi-modal benchmarks mainly focus on the objective fact or certain topic related potential knowledge within a image, but overlook the associative relations between multiple images. Theref… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: VLMS, Multi-Image Association

  29. arXiv:2407.17312  [pdf, other

    cs.CV

    Physical Adversarial Attack on Monocular Depth Estimation via Shape-Varying Patches

    Authors: Chenxing Zhao, Yang Li, Shihao Wu, Wenyi Tan, Shuangju Zhou, Quan Pan

    Abstract: Adversarial attacks against monocular depth estimation (MDE) systems pose significant challenges, particularly in safety-critical applications such as autonomous driving. Existing patch-based adversarial attacks for MDE are confined to the vicinity of the patch, making it difficult to affect the entire target. To address this limitation, we propose a physics-based adversarial attack on monocular d… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  30. arXiv:2407.17274  [pdf, other

    cs.MM cs.AI cs.CV

    Revolutionizing Text-to-Image Retrieval as Autoregressive Token-to-Voken Generation

    Authors: Yongqi Li, Hongru Cai, Wenjie Wang, Leigang Qu, Yinwei Wei, Wenjie Li, Liqiang Nie, Tat-Seng Chua

    Abstract: Text-to-image retrieval is a fundamental task in multimedia processing, aiming to retrieve semantically relevant cross-modal content. Traditional studies have typically approached this task as a discriminative problem, matching the text and image via the cross-attention mechanism (one-tower framework) or in a common embedding space (two-tower framework). Recently, generative cross-modal retrieval… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: Work in progress

  31. arXiv:2407.16990  [pdf, other

    cs.NI

    Region-based Content Enhancement for Efficient Video Analytics at the Edge

    Authors: Weijun Wang, Liang Mi, Shaowei Cen, Haipeng Dai, Yuanchun Li, Xiaoming Fu, Yunxin Liu

    Abstract: Video analytics is widespread in various applications serving our society. Recent advances of content enhancement in video analytics offer significant benefits for the bandwidth saving and accuracy improvement. However, existing content-enhanced video analytics systems are excessively computationally expensive and provide extremely low throughput. In this paper, we present region-based content enh… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  32. arXiv:2407.16729  [pdf, other

    cs.LG cs.AI

    PateGail: A Privacy-Preserving Mobility Trajectory Generator with Imitation Learning

    Authors: Huandong Wang, Changzheng Gao, Yuchen Wu, Depeng Jin, Lina Yao, Yong Li

    Abstract: Generating human mobility trajectories is of great importance to solve the lack of large-scale trajectory data in numerous applications, which is caused by privacy concerns. However, existing mobility trajectory generation methods still require real-world human trajectories centrally collected as the training data, where there exists an inescapable risk of privacy leakage. To overcome this limitat… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  33. arXiv:2407.16719  [pdf, other

    cs.OH

    A Brief Discussion on the Philosophical Principles and Development Directions of Data Circulation

    Authors: Zhi Li, Lei Zhang, Junyi Xin, Jianfei He, Yan Li, Zhenjun Ma, Qi Sun

    Abstract: The data circulation is a complex scenario involving a large number of participants and different types of requirements, which not only has to comply with the laws and regulations, but also faces multiple challenges in technical and business areas. In order to systematically and comprehensively address these issues, it is essential to have a comprehensive and profound understanding of 'data circul… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  34. arXiv:2407.16715  [pdf

    q-bio.QM cs.AI cs.LG

    Research on Adverse Drug Reaction Prediction Model Combining Knowledge Graph Embedding and Deep Learning

    Authors: Yufeng Li, Wenchao Zhao, Bo Dang, Xu Yan, Weimin Wang, Min Gao, Mingxuan Xiao

    Abstract: In clinical treatment, identifying potential adverse reactions of drugs can help assist doctors in making medication decisions. In response to the problems in previous studies that features are high-dimensional and sparse, independent prediction models need to be constructed for each adverse reaction of drugs, and the prediction accuracy is low, this paper develops an adverse drug reaction predict… ▽ More

    Submitted 27 July, 2024; v1 submitted 22 July, 2024; originally announced July 2024.

    Comments: 12 pages, 4 figures, 9 tables

  35. arXiv:2407.16684  [pdf, other

    eess.IV cs.CV q-bio.NC

    AutoRG-Brain: Grounded Report Generation for Brain MRI

    Authors: Jiayu Lei, Xiaoman Zhang, Chaoyi Wu, Lisong Dai, Ya Zhang, Yanyong Zhang, Yanfeng Wang, Weidi Xie, Yuehua Li

    Abstract: Radiologists are tasked with interpreting a large number of images in a daily base, with the responsibility of generating corresponding reports. This demanding workload elevates the risk of human error, potentially leading to treatment delays, increased healthcare costs, revenue loss, and operational inefficiencies. To address these challenges, we initiate a series of work on grounded Automatic Re… ▽ More

    Submitted 26 July, 2024; v1 submitted 23 July, 2024; originally announced July 2024.

  36. arXiv:2407.16634  [pdf, other

    eess.IV cs.AI cs.CV cs.HC

    Knowledge-driven AI-generated data for accurate and interpretable breast ultrasound diagnoses

    Authors: Haojun Yu, Youcheng Li, Nan Zhang, Zihan Niu, Xuantong Gong, Yanwen Luo, Quanlin Wu, Wangyan Qin, Mengyuan Zhou, Jie Han, Jia Tao, Ziwei Zhao, Di Dai, Di He, Dong Wang, Binghui Tang, Ling Huo, Qingli Zhu, Yong Wang, Liwei Wang

    Abstract: Data-driven deep learning models have shown great capabilities to assist radiologists in breast ultrasound (US) diagnoses. However, their effectiveness is limited by the long-tail distribution of training data, which leads to inaccuracies in rare cases. In this study, we address a long-standing challenge of improving the diagnostic model performance on rare cases using long-tailed data. Specifical… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  37. arXiv:2407.16277  [pdf, other

    cs.CV cs.HC

    When, Where, and What? A Novel Benchmark for Accident Anticipation and Localization with Large Language Models

    Authors: Haicheng Liao, Yongkang Li, Chengyue Wang, Yanchen Guan, KaHou Tam, Chunlin Tian, Li Li, Chengzhong Xu, Zhenning Li

    Abstract: As autonomous driving systems increasingly become part of daily transportation, the ability to accurately anticipate and mitigate potential traffic accidents is paramount. Traditional accident anticipation models primarily utilizing dashcam videos are adept at predicting when an accident may occur but fall short in localizing the incident and identifying involved entities. Addressing this gap, thi… ▽ More

    Submitted 26 July, 2024; v1 submitted 23 July, 2024; originally announced July 2024.

  38. arXiv:2407.16252  [pdf, other

    cs.CL cs.AI cs.CV

    LawLuo: A Chinese Law Firm Co-run by LLM Agents

    Authors: Jingyun Sun, Chengxiao Dai, Zhongze Luo, Yangbo Chang, Yang Li

    Abstract: Large Language Models (LLMs) demonstrate substantial potential in delivering legal consultation services to users without a legal background, attributed to their superior text comprehension and generation capabilities. Nonetheless, existing Chinese legal LLMs limit interaction to a single model-user dialogue, unlike the collaborative consultations typical of law firms, where multiple staff members… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: 11 pages, 13 figures, 2 tables

    ACM Class: I.2.1

  39. arXiv:2407.16248  [pdf, other

    cs.CV cs.MM

    Spatiotemporal Graph Guided Multi-modal Network for Livestreaming Product Retrieval

    Authors: Xiaowan Hu, Yiyi Chen, Yan Li, Minquan Wang, Haoqian Wang, Quan Chen, Han Li, Peng Jiang

    Abstract: With the rapid expansion of e-commerce, more consumers have become accustomed to making purchases via livestreaming. Accurately identifying the products being sold by salespeople, i.e., livestreaming product retrieval (LPR), poses a fundamental and daunting challenge. The LPR task encompasses three primary dilemmas in real-world scenarios: 1) the recognition of intended products from distractor pr… ▽ More

    Submitted 24 July, 2024; v1 submitted 23 July, 2024; originally announced July 2024.

    Comments: 9 pages, 12 figures

  40. arXiv:2407.16129  [pdf, other

    cs.CV cs.AI

    FoRA: Low-Rank Adaptation Model beyond Multimodal Siamese Network

    Authors: Weiying Xie, Yusi Zhang, Tianlin Hui, Jiaqing Zhang, Jie Lei, Yunsong Li

    Abstract: Multimodal object detection offers a promising prospect to facilitate robust detection in various visual conditions. However, existing two-stream backbone networks are challenged by complex fusion and substantial parameter increments. This is primarily due to large data distribution biases of multimodal homogeneous information. In this paper, we propose a novel multimodal object detector, named Lo… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  41. arXiv:2407.15875  [pdf, other

    cs.LG cs.CV

    Shapley Pruning for Neural Network Compression

    Authors: Kamil Adamczewski, Yawei Li, Luc van Gool

    Abstract: Neural network pruning is a rich field with a variety of approaches. In this work, we propose to connect the existing pruning concepts such as leave-one-out pruning and oracle pruning and develop them into a more general Shapley value-based framework that targets the compression of convolutional neural networks. To allow for practical applications in utilizing the Shapley value, this work presents… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  42. arXiv:2407.15823  [pdf, other

    cs.SI

    A Large-scale Benchmark Dataset for Commuting Origin-destination Matrix Generation

    Authors: Can Rong, Jingtao Ding, Yan Liu, Yong Li

    Abstract: The commuting origin-destination~(OD) matrix is a critical input for urban planning and transportation, providing crucial information about the population residing in one region and working in another within an interested area. Despite its importance, obtaining and updating the matrix is challenging due to high costs and privacy concerns. This has spurred research into generating commuting OD matr… ▽ More

    Submitted 23 July, 2024; v1 submitted 22 July, 2024; originally announced July 2024.

    Comments: 16 pages, 9 figures

  43. arXiv:2407.15762  [pdf, other

    cs.LG cs.AI cs.CL

    Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning

    Authors: Kaiwen Wang, Rahul Kidambi, Ryan Sullivan, Alekh Agarwal, Christoph Dann, Andrea Michi, Marco Gelmi, Yunxuan Li, Raghav Gupta, Avinava Dubey, Alexandre Ramé, Johan Ferret, Geoffrey Cideron, Le Hou, Hongkun Yu, Amr Ahmed, Aranyak Mehta, Léonard Hussenot, Olivier Bachem, Edouard Leurent

    Abstract: Reward-based finetuning is crucial for aligning language policies with intended behaviors (e.g., creativity and safety). A key challenge here is to develop steerable language models that trade-off multiple (conflicting) objectives in a flexible and efficient manner. This paper presents Conditioned Language Policy (CLP), a general framework for finetuning language models on multiple objectives. Bui… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: 40 pages

  44. arXiv:2407.15642  [pdf, other

    cs.CV

    Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models

    Authors: Xin Ma, Yaohui Wang, Gengyun Jia, Xinyuan Chen, Yuan-Fang Li, Cunjian Chen, Yu Qiao

    Abstract: Diffusion models have achieved great progress in image animation due to powerful generative capabilities. However, maintaining spatio-temporal consistency with detailed information from the input static image over time (e.g., style, background, and object of the input static image) and ensuring smoothness in animated video narratives guided by textual prompts still remains challenging. In this pap… ▽ More

    Submitted 22 July, 2024; v1 submitted 22 July, 2024; originally announced July 2024.

    Comments: Project webpage: https://maxin-cn.github.io/cinemo_project/

  45. arXiv:2407.15325  [pdf, other

    cs.AI

    Odyssey: Empowering Agents with Open-World Skills

    Authors: Shunyu Liu, Yaoru Li, Kongcheng Zhang, Zhenyu Cui, Wenkai Fang, Yuxuan Zheng, Tongya Zheng, Mingli Song

    Abstract: Recent studies have delved into constructing generalist agents for open-world embodied environments like Minecraft. Despite the encouraging results, existing efforts mainly focus on solving basic programmatic tasks, e.g., material collection and tool-crafting following the Minecraft tech-tree, treating the ObtainDiamond task as the ultimate goal. This limitation stems from the narrowly defined set… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

  46. arXiv:2407.15301  [pdf, other

    stat.ML cs.LG math.ST q-bio.QM

    U-learning for Prediction Inference via Combinatory Multi-Subsampling: With Applications to LASSO and Neural Networks

    Authors: Zhe Fei, Yi Li

    Abstract: Epigenetic aging clocks play a pivotal role in estimating an individual's biological age through the examination of DNA methylation patterns at numerous CpG (Cytosine-phosphate-Guanine) sites within their genome. However, making valid inferences on predicted epigenetic ages, or more broadly, on predictions derived from high-dimensional inputs, presents challenges. We introduce a novel U-learning a… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

  47. arXiv:2407.15233  [pdf, other

    cs.CV

    CGB-DM: Content and Graphic Balance Layout Generation with Transformer-based Diffusion Model

    Authors: Yu Li, Yifan Chen, Gongye Liu, Jie Wu, Yujiu Yang

    Abstract: Layout generation is the foundation task of intelligent design, which requires the integration of visual aesthetics and harmonious expression of content delivery. However, existing methods still face challenges in generating precise and visually appealing layouts, including blocking, overlap, or spatial misalignment between layouts, which are closely related to the spatial structure of graphic lay… ▽ More

    Submitted 22 July, 2024; v1 submitted 21 July, 2024; originally announced July 2024.

  48. arXiv:2407.15031  [pdf

    cs.NI

    Schedulability Analysis in Time-Sensitive Networking: A Systematic Literature Review

    Authors: Zitong Wang, Feng Luo, Yunpeng Li, Haotian Gan, Lei Zhu

    Abstract: Time-Sensitive Networking (TSN) is a set of standards that provide low-latency, high-reliability guarantees for the transmission of traffic in networks, and it is becoming an accepted solution for complex time-critical systems such as those in industrial automation and the automotive. In time-critical systems, it is essential to verify the timing predictability of the system, and the application o… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

  49. arXiv:2407.14933  [pdf, other

    cs.CL cs.AI cs.LG

    Consent in Crisis: The Rapid Decline of the AI Data Commons

    Authors: Shayne Longpre, Robert Mahari, Ariel Lee, Campbell Lund, Hamidah Oderinwale, William Brannon, Nayan Saxena, Naana Obeng-Marnu, Tobin South, Cole Hunter, Kevin Klyman, Christopher Klamm, Hailey Schoelkopf, Nikhil Singh, Manuel Cherep, Ahmad Anis, An Dinh, Caroline Chitongo, Da Yin, Damien Sileo, Deividas Mataciunas, Diganta Misra, Emad Alghamdi, Enrico Shippole, Jianguo Zhang , et al. (24 additional authors not shown)

    Abstract: General-purpose artificial intelligence (AI) systems are built on massive swathes of public web data, assembled into corpora such as C4, RefinedWeb, and Dolma. To our knowledge, we conduct the first, large-scale, longitudinal audit of the consent protocols for the web domains underlying AI training corpora. Our audit of 14,000 web domains provides an expansive view of crawlable web data and how co… ▽ More

    Submitted 24 July, 2024; v1 submitted 20 July, 2024; originally announced July 2024.

    Comments: 41 pages (13 main), 5 figures, 9 tables

  50. arXiv:2407.14923  [pdf, other

    cs.CV

    RayFormer: Improving Query-Based Multi-Camera 3D Object Detection via Ray-Centric Strategies

    Authors: Xiaomeng Chu, Jiajun Deng, Guoliang You, Yifan Duan, Yao Li, Yanyong Zhang

    Abstract: The recent advances in query-based multi-camera 3D object detection are featured by initializing object queries in the 3D space, and then sampling features from perspective-view images to perform multi-round query refinement. In such a framework, query points near the same camera ray are likely to sample similar features from very close pixels, resulting in ambiguous query features and degraded de… ▽ More

    Submitted 27 July, 2024; v1 submitted 20 July, 2024; originally announced July 2024.

    Comments: Accepted by ACM Multimedia 2024