Skip to main content

Showing 1–50 of 2,243 results for author: Wu, J

  1. arXiv:2407.20119  [pdf, ps, other

    cs.LG cs.AI

    Adaptive Self-supervised Robust Clustering for Unstructured Data with Unknown Cluster Number

    Authors: Chen-Lu Ding, Jiancan Wu, Wei Lin, Shiyang Shen, Xiang Wang, Yancheng Yuan

    Abstract: We introduce a novel self-supervised deep clustering approach tailored for unstructured data without requiring prior knowledge of the number of clusters, termed Adaptive Self-supervised Robust Clustering (ASRC). In particular, ASRC adaptively learns the graph structure and edge weights to capture both local and global structural information. The obtained graph enables us to learn clustering-friend… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  2. arXiv:2407.19976  [pdf, other

    cs.HC cs.MM

    MambaGesture: Enhancing Co-Speech Gesture Generation with Mamba and Disentangled Multi-Modality Fusion

    Authors: Chencan Fu, Yabiao Wang, Jiangning Zhang, Zhengkai Jiang, Xiaofeng Mao, Jiafu Wu, Weijian Cao, Chengjie Wang, Yanhao Ge, Yong Liu

    Abstract: Co-speech gesture generation is crucial for producing synchronized and realistic human gestures that accompany speech, enhancing the animation of lifelike avatars in virtual environments. While diffusion models have shown impressive capabilities, current approaches often overlook a wide range of modalities and their interactions, resulting in less dynamic and contextually varied gestures. To addre… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: Accepted to ACM MM 2024

  3. arXiv:2407.19763  [pdf, other

    eess.IV cs.CV

    TeleOR: Real-time Telemedicine System for Full-Scene Operating Room

    Authors: Yixuan Wu, Kaiyuan Hu, Qian Shao, Jintai Chen, Danny Z. Chen, Jian Wu

    Abstract: The advent of telemedicine represents a transformative development in leveraging technology to extend the reach of specialized medical expertise to remote surgeries, a field where the immediacy of expert guidance is paramount. However, the intricate dynamics of Operating Room (OR) scene pose unique challenges for telemedicine, particularly in achieving high-fidelity, real-time scene reconstruction… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  4. arXiv:2407.19507  [pdf, other

    cs.CV cs.AI

    WeCromCL: Weakly Supervised Cross-Modality Contrastive Learning for Transcription-only Supervised Text Spotting

    Authors: Jingjing Wu, Zhengyao Fang, Pengyuan Lyu, Chengquan Zhang, Fanglin Chen, Guangming Lu, Wenjie Pei

    Abstract: Transcription-only Supervised Text Spotting aims to learn text spotters relying only on transcriptions but no text boundaries for supervision, thus eliminating expensive boundary annotation. The crux of this task lies in locating each transcription in scene text images without location annotations. In this work, we formulate this challenging problem as a Weakly Supervised Cross-modality Contrastiv… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  5. arXiv:2407.19435  [pdf, other

    cs.CV cs.AI cs.CL cs.HC cs.RO

    ASI-Seg: Audio-Driven Surgical Instrument Segmentation with Surgeon Intention Understanding

    Authors: Zhen Chen, Zongming Zhang, Wenwu Guo, Xingjian Luo, Long Bai, Jinlin Wu, Hongliang Ren, Hongbin Liu

    Abstract: Surgical instrument segmentation is crucial in surgical scene understanding, thereby facilitating surgical safety. Existing algorithms directly detected all instruments of pre-defined categories in the input image, lacking the capability to segment specific instruments according to the surgeon's intention. During different stages of surgery, surgeons exhibit varying preferences and focus toward di… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: This work is accepted by IROS 2024 (Oral)

  6. arXiv:2407.19316  [pdf

    eess.IV cs.AI cs.CV

    AResNet-ViT: A Hybrid CNN-Transformer Network for Benign and Malignant Breast Nodule Classification in Ultrasound Images

    Authors: Xin Zhao, Qianqian Zhu, Jialing Wu

    Abstract: To address the challenges of similarity between lesions and surrounding tissues, overlapping appearances of partially benign and malignant nodules, and difficulty in classification, a deep learning network that integrates CNN and Transformer is proposed for the classification of benign and malignant breast lesions in ultrasound images. This network adopts a dual-branch architecture for local-globa… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

    Comments: 12 pages, 3 figures

  7. arXiv:2407.19296  [pdf, other

    cs.AI

    Multi-Modal CLIP-Informed Protein Editing

    Authors: Mingze Yin, Hanjing Zhou, Yiheng Zhu, Miao Lin, Yixuan Wu, Jialu Wu, Hongxia Xu, Chang-Yu Hsieh, Tingjun Hou, Jintai Chen, Jian Wu

    Abstract: Proteins govern most biological functions essential for life, but achieving controllable protein discovery and optimization remains challenging. Recently, machine learning-assisted protein editing (MLPE) has shown promise in accelerating optimization cycles and reducing experimental workloads. However, current methods struggle with the vast combinatorial space of potential protein edits and cannot… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

    Comments: 13 pages, 7 figures, 5 tables

  8. arXiv:2407.18716  [pdf, other

    cs.CL

    ChatSchema: A pipeline of extracting structured information with Large Multimodal Models based on schema

    Authors: Fei Wang, Yuewen Zheng, Qin Li, Jingyi Wu, Pengfei Li, Luxia Zhang

    Abstract: Objective: This study introduces ChatSchema, an effective method for extracting and structuring information from unstructured data in medical paper reports using a combination of Large Multimodal Models (LMMs) and Optical Character Recognition (OCR) based on the schema. By integrating predefined schema, we intend to enable LMMs to directly extract and standardize information according to the schem… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

  9. arXiv:2407.18613  [pdf

    cs.CV eess.IV

    Dilated Strip Attention Network for Image Restoration

    Authors: Fangwei Hao, Jiesheng Wu, Ji Du, Yinjie Wang, Jing Xu

    Abstract: Image restoration is a long-standing task that seeks to recover the latent sharp image from its deteriorated counterpart. Due to the robust capacity of self-attention to capture long-range dependencies, transformer-based methods or some attention-based convolutional neural networks have demonstrated promising results on many image restoration tasks in recent years. However, existing attention modu… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

  10. arXiv:2407.17911  [pdf, other

    cs.MM cs.AI cs.CV

    ReCorD: Reasoning and Correcting Diffusion for HOI Generation

    Authors: Jian-Yu Jiang-Lin, Kang-Yang Huang, Ling Lo, Yi-Ning Huang, Terence Lin, Jhih-Ciang Wu, Hong-Han Shuai, Wen-Huang Cheng

    Abstract: Diffusion models revolutionize image generation by leveraging natural language to guide the creation of multimedia content. Despite significant advancements in such generative models, challenges persist in depicting detailed human-object interactions, especially regarding pose and object placement accuracy. We introduce a training-free method named Reasoning and Correcting Diffusion (ReCorD) to ad… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: Accepted by ACM MM 2024. Project website: https://alberthkyhky.github.io/ReCorD/

  11. arXiv:2407.17816  [pdf, other

    cs.LG cs.AI

    NC-NCD: Novel Class Discovery for Node Classification

    Authors: Yue Hou, Xueyuan Chen, He Zhu, Romei Liu, Bowen Shi, Jiaheng Liu, Junran Wu, Ke Xu

    Abstract: Novel Class Discovery (NCD) involves identifying new categories within unlabeled data by utilizing knowledge acquired from previously established categories. However, existing NCD methods often struggle to maintain a balance between the performance of old and new categories. Discovering unlabeled new categories in a class-incremental way is more practical but also more challenging, as it is freque… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: Accepted by CIKM'24

  12. arXiv:2407.17115  [pdf, other

    cs.IR

    Reinforced Prompt Personalization for Recommendation with Large Language Models

    Authors: Wenyu Mao, Jiancan Wu, Weijian Chen, Chongming Gao, Xiang Wang, Xiangnan He

    Abstract: Designing effective prompts can empower LLMs to understand user preferences and provide recommendations by leveraging LLMs' intent comprehension and knowledge utilization capabilities. However, existing research predominantly concentrates on task-wise prompting, developing fixed prompt templates composed of four patterns (i.e., role-playing, history records, reasoning guidance, and output format)… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  13. arXiv:2407.16696  [pdf, other

    cs.CV

    PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects

    Authors: Junyi Li, Junfeng Wu, Weizhi Zhao, Song Bai, Xiang Bai

    Abstract: We present PartGLEE, a part-level foundation model for locating and identifying both objects and parts in images. Through a unified framework, PartGLEE accomplishes detection, segmentation, and grounding of instances at any granularity in the open world scenario. Specifically, we propose a Q-Former to construct the hierarchical relationship between objects and parts, parsing every object into corr… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024, homepage: https://provencestar.github.io/PartGLEE-Vision/

  14. arXiv:2407.16639  [pdf, other

    cs.SD eess.AS

    Distortion Recovery: A Two-Stage Method for Guitar Effect Removal

    Authors: Ying-Shuo Lee, Yueh-Po Peng, Jui-Te Wu, Ming Cheng, Li Su, Yi-Hsuan Yang

    Abstract: Removing audio effects from electric guitar recordings makes it easier for post-production and sound editing. An audio distortion recovery model not only improves the clarity of the guitar sounds but also opens up new opportunities for creative adjustments in mixing and mastering. While progress have been made in creating such models, previous efforts have largely focused on synthetic distortions… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: DAFx 2024

  15. arXiv:2407.16554  [pdf, other

    cs.MM cs.CV cs.SD eess.AS

    Coarse-to-Fine Proposal Refinement Framework for Audio Temporal Forgery Detection and Localization

    Authors: Junyan Wu, Wei Lu, Xiangyang Luo, Rui Yang, Qian Wang, Xiaochun Cao

    Abstract: Recently, a novel form of audio partial forgery has posed challenges to its forensics, requiring advanced countermeasures to detect subtle forgery manipulations within long-duration audio. However, existing countermeasures still serve a classification purpose and fail to perform meaningful analysis of the start and end timestamps of partial forgery segments. To address this challenge, we introduce… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: 9pages, 3figures. This paper has been accepted for ACM MM 2024

    MSC Class: 68T07; 68T10 ACM Class: I.2; I.5

  16. arXiv:2407.16151  [pdf, other

    cs.RO

    Optimal camera-robot pose estimation in linear time from points and lines

    Authors: Guangyang Zeng, Biqiang Mu, Qingcheng Zeng, Yuchen Song, Chulin Dai, Guodong Shi, Junfeng Wu

    Abstract: Camera pose estimation is a fundamental problem in robotics. This paper focuses on two issues of interest: First, point and line features have complementary advantages, and it is of great value to design a uniform algorithm that can fuse them effectively; Second, with the development of modern front-end techniques, a large number of features can exist in a single image, which presents a potential… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  17. arXiv:2407.15508  [pdf, other

    cs.CL cs.AI

    Compensate Quantization Errors+: Quantized Models Are Inquisitive Learners

    Authors: Yifei Gao, Jie Ou, Lei Wang, Fanhua Shang, Jaji Wu, Jun Cheng

    Abstract: Large Language Models (LLMs) showcase remarkable performance and robust deductive capabilities, yet their expansive size complicates deployment and raises environmental concerns due to substantial resource consumption. The recent development of a quantization technique known as Learnable Singular-value Increment (LSI) has addressed some of these quantization challenges. Leveraging insights from LS… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: Effecient Quantization Methods for LLMs

    MSC Class: I.2.7

  18. arXiv:2407.15233  [pdf, other

    cs.CV

    CGB-DM: Content and Graphic Balance Layout Generation with Transformer-based Diffusion Model

    Authors: Yu Li, Yifan Chen, Gongye Liu, Jie Wu, Yujiu Yang

    Abstract: Layout generation is the foundation task of intelligent design, which requires the integration of visual aesthetics and harmonious expression of content delivery. However, existing methods still face challenges in generating precise and visually appealing layouts, including blocking, overlap, or spatial misalignment between layouts, which are closely related to the spatial structure of graphic lay… ▽ More

    Submitted 22 July, 2024; v1 submitted 21 July, 2024; originally announced July 2024.

  19. arXiv:2407.14872  [pdf, other

    cs.CV cs.RO

    Adapt2Reward: Adapting Video-Language Models to Generalizable Robotic Rewards via Failure Prompts

    Authors: Yanting Yang, Minghao Chen, Qibo Qiu, Jiahao Wu, Wenxiao Wang, Binbin Lin, Ziyu Guan, Xiaofei He

    Abstract: For a general-purpose robot to operate in reality, executing a broad range of instructions across various environments is imperative. Central to the reinforcement learning and planning for such robotic agents is a generalizable reward function. Recent advances in vision-language models, such as CLIP, have shown remarkable performance in the domain of deep learning, paving the way for open-domain v… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

    Comments: ECCV 2024 camera-ready

  20. arXiv:2407.14829  [pdf, other

    cs.CL

    Overview of AI-Debater 2023: The Challenges of Argument Generation Tasks

    Authors: Jiayu Lin, Guanrong Chen, Bojun Jin, Chenyang Li, Shutong Jia, Wancong Lin, Yang Sun, Yuhang He, Caihua Yang, Jianzhu Bao, Jipeng Wu, Wen Su, Jinglu Chen, Xinyi Li, Tianyu Chen, Mingjie Han, Shuaiwen Du, Zijian Wang, Jiyin Li, Fuzhong Suo, Hao Wang, Nuanchen Lin, Xuanjing Huang, Changjian Jiang, RuiFeng Xu , et al. (4 additional authors not shown)

    Abstract: In this paper we present the results of the AI-Debater 2023 Challenge held by the Chinese Conference on Affect Computing (CCAC 2023), and introduce the related datasets. We organize two tracks to handle the argumentative generation tasks in different scenarios, namely, Counter-Argument Generation (Track 1) and Claim-based Argument Generation (Track 2). Each track is equipped with its distinct data… ▽ More

    Submitted 24 July, 2024; v1 submitted 20 July, 2024; originally announced July 2024.

  21. arXiv:2407.14177  [pdf, other

    cs.CV

    EVLM: An Efficient Vision-Language Model for Visual Understanding

    Authors: Kaibing Chen, Dong Shen, Hanwen Zhong, Huasong Zhong, Kui Xia, Di Xu, Wei Yuan, Yifei Hu, Bin Wen, Tianke Zhang, Changyi Liu, Dewen Fan, Huihui Xiao, Jiahong Wu, Fan Yang, Size Li, Di Zhang

    Abstract: In the field of multi-modal language models, the majority of methods are built on an architecture similar to LLaVA. These models use a single-layer ViT feature as a visual prompt, directly feeding it into the language models alongside textual tokens. However, when dealing with long sequences of visual signals or inputs such as videos, the self-attention mechanism of language models can lead to sig… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  22. arXiv:2407.12588  [pdf, other

    cs.CV cs.AI

    Benchmarking Robust Self-Supervised Learning Across Diverse Downstream Tasks

    Authors: Antoni Kowalczuk, Jan Dubiński, Atiyeh Ashari Ghomi, Yi Sui, George Stein, Jiapeng Wu, Jesse C. Cresswell, Franziska Boenisch, Adam Dziedzic

    Abstract: Large-scale vision models have become integral in many applications due to their unprecedented performance and versatility across downstream tasks. However, the robustness of these foundation models has primarily been explored for a single task, namely image classification. The vulnerability of other common vision tasks, such as semantic segmentation and depth estimation, remains largely unknown.… ▽ More

    Submitted 18 July, 2024; v1 submitted 17 July, 2024; originally announced July 2024.

    Comments: Accepted at the ICML 2024 Workshop on Foundation Models in the Wild

  23. arXiv:2407.12341  [pdf, other

    cs.MM

    LLM-based query paraphrasing for video search

    Authors: Jiaxin Wu, Chong-Wah Ngo, Wing-Kwong Chan, Sheng-Hua Zhong

    Abstract: Text-to-video retrieval answers user queries through search by concepts and embeddings. Limited by the size of the concept bank and the amount of training data, answering queries in the wild is not always effective due to the out-of-vocabulary problem. Furthermore, neither concept-based nor embedding-based search can perform reasoning to consolidate the search results for complex queries mixed wit… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  24. arXiv:2407.11712  [pdf, other

    cs.IR

    Harnessing Large Language Models for Multimodal Product Bundling

    Authors: Xiaohao Liu, Jie Wu, Zhulin Tao, Yunshan Ma, Yinwei Wei, Tat-seng Chua

    Abstract: Product bundling provides clients with a strategic combination of individual items. And it has gained significant attention in recent years as a fundamental prerequisite for online services. Recent methods utilize multimodal information through sophisticated extractors for bundling, but remain limited by inferior semantic understanding, the restricted scope of knowledge, and an inability to handle… ▽ More

    Submitted 17 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: under review

  25. arXiv:2407.11638  [pdf, other

    cs.CL cs.IR

    A Comprehensive Evaluation of Large Language Models on Temporal Event Forecasting

    Authors: He Chang, Chenchen Ye, Zhulin Tao, Jie Wu, Zhengmao Yang, Yunshan Ma, Xianglin Huang, Tat-Seng Chua

    Abstract: Recently, Large Language Models (LLMs) have demonstrated great potential in various data mining tasks, such as knowledge question answering, mathematical reasoning, and commonsense reasoning. However, the reasoning capability of LLMs on temporal event forecasting has been under-explored. To systematically investigate their abilities in temporal event forecasting, we conduct a comprehensive evaluat… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  26. arXiv:2407.11419  [pdf, other

    cs.CV

    TeethDreamer: 3D Teeth Reconstruction from Five Intra-oral Photographs

    Authors: Chenfan Xu, Zhentao Liu, Yuan Liu, Yulong Dou, Jiamin Wu, Jiepeng Wang, Minjiao Wang, Dinggang Shen, Zhiming Cui

    Abstract: Orthodontic treatment usually requires regular face-to-face examinations to monitor dental conditions of the patients. When in-person diagnosis is not feasible, an alternative is to utilize five intra-oral photographs for remote dental monitoring. However, it lacks of 3D information, and how to reconstruct 3D dental models from such sparse view photographs is a challenging problem. In this study,… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: MICCAI2024

  27. arXiv:2407.11343  [pdf, other

    cs.CV

    Ev-GS: Event-based Gaussian splatting for Efficient and Accurate Radiance Field Rendering

    Authors: Jingqian Wu, Shuo Zhu, Chutian Wang, Edmund Y. Lam

    Abstract: Computational neuromorphic imaging (CNI) with event cameras offers advantages such as minimal motion blur and enhanced dynamic range, compared to conventional frame-based methods. Existing event-based radiance field rendering methods are built on neural radiance field, which is computationally heavy and slow in reconstruction speed. Motivated by the two aspects, we introduce Ev-GS, the first CNI-i… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  28. arXiv:2407.11075  [pdf, ps, other

    cs.LG cs.AI

    A Comprehensive Survey on Kolmogorov Arnold Networks (KAN)

    Authors: Yuntian Hou, Di zhang, Jinheng Wu, Xiaohang Feng

    Abstract: Through this comprehensive survey of Kolmogorov-Arnold Networks(KAN), we have gained a thorough understanding of its theoretical foundation, architectural design, application scenarios, and current research progress. KAN, with its unique architecture and flexible activation functions, excels in handling complex data patterns and nonlinear relationships, demonstrating wide-ranging application poten… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  29. arXiv:2407.11025  [pdf, other

    cs.LG cs.AI cs.CR

    Backdoor Graph Condensation

    Authors: Jiahao Wu, Ning Lu, Zeiyu Dai, Wenqi Fan, Shengcai Liu, Qing Li, Ke Tang

    Abstract: Recently, graph condensation has emerged as a prevalent technique to improve the training efficiency for graph neural networks (GNNs). It condenses a large graph into a small one such that a GNN trained on this small synthetic graph can achieve comparable performance to a GNN trained on a large graph. However, while existing graph condensation studies mainly focus on the best trade-off between gra… ▽ More

    Submitted 16 July, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

  30. arXiv:2407.10646  [pdf, other

    cs.SD eess.AS

    Towards zero-shot amplifier modeling: One-to-many amplifier modeling via tone embedding control

    Authors: Yu-Hua Chen, Yen-Tung Yeh, Yuan-Chiao Cheng, Jui-Te Wu, Yu-Hsiang Ho, Jyh-Shing Roger Jang, Yi-Hsuan Yang

    Abstract: Replicating analog device circuits through neural audio effect modeling has garnered increasing interest in recent years. Existing work has predominantly focused on a one-to-one emulation strategy, modeling specific devices individually. In this paper, we tackle the less-explored scenario of one-to-many emulation, utilizing conditioning mechanisms to emulate multiple guitar amplifiers through a si… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: ISMIR 2024

  31. arXiv:2407.10228  [pdf, other

    cs.CV

    Efficient Facial Landmark Detection for Embedded Systems

    Authors: Ji-Jia Wu

    Abstract: This paper introduces the Efficient Facial Landmark Detection (EFLD) model, specifically designed for edge devices confronted with the challenges related to power consumption and time latency. EFLD features a lightweight backbone and a flexible detection head, each significantly enhancing operational efficiency on resource-constrained devices. To improve the model's robustness, we propose a cross-… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: technical report. 6th/165 in IEEE ICME 2024 PAIR competition

  32. arXiv:2407.09790  [pdf, other

    cs.LG

    Team up GBDTs and DNNs: Advancing Efficient and Effective Tabular Prediction with Tree-hybrid MLPs

    Authors: Jiahuan Yan, Jintai Chen, Qianxing Wang, Danny Z. Chen, Jian Wu

    Abstract: Tabular datasets play a crucial role in various applications. Thus, developing efficient, effective, and widely compatible prediction algorithms for tabular data is important. Currently, two prominent model types, Gradient Boosted Decision Trees (GBDTs) and Deep Neural Networks (DNNs), have demonstrated performance advantages on distinct tabular prediction tasks. However, selecting an effective mo… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: Accepted at KDD 2024 Research Track, codes will be available at https://github.com/jyansir/tmlp

  33. arXiv:2407.09059  [pdf, other

    cs.CV

    Domain-adaptive Video Deblurring via Test-time Blurring

    Authors: Jin-Ting He, Fu-Jen Tsai, Jia-Hao Wu, Yan-Tsung Peng, Chung-Chi Tsai, Chia-Wen Lin, Yen-Yu Lin

    Abstract: Dynamic scene video deblurring aims to remove undesirable blurry artifacts captured during the exposure process. Although previous video deblurring methods have achieved impressive results, they suffer from significant performance drops due to the domain gap between training and testing videos, especially for those captured in real-world scenarios. To address this issue, we propose a domain adapta… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  34. arXiv:2407.08664  [pdf, other

    cs.CE eess.SY

    MBD-NODE: Physics-informed data-driven modeling and simulation of constrained multibody systems

    Authors: Jingquan Wang, Shu Wang, Huzaifa Mustafa Unjhawala, Jinlong Wu, Dan Negrut

    Abstract: We describe a framework that can integrate prior physical information, e.g., the presence of kinematic constraints, to support data-driven simulation in multi-body dynamics. Unlike other approaches, e.g., Fully-connected Neural Network (FCNN) or Recurrent Neural Network (RNN)-based methods that are used to model the system states directly, the proposed approach embraces a Neural Ordinary Different… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  35. arXiv:2407.08662  [pdf, other

    cs.CL cs.AI

    Uncertainty Estimation of Large Language Models in Medical Question Answering

    Authors: Jiaxin Wu, Yizhou Yu, Hong-Yu Zhou

    Abstract: Large Language Models (LLMs) show promise for natural language generation in healthcare, but risk hallucinating factually incorrect information. Deploying LLMs for medical question answering necessitates reliable uncertainty estimation (UE) methods to detect hallucinations. In this work, we benchmark popular UE methods with different model sizes on medical question-answering datasets. Our results… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  36. arXiv:2407.08639  [pdf, other

    cs.AI cs.LG

    $β$-DPO: Direct Preference Optimization with Dynamic $β$

    Authors: Junkang Wu, Yuexiang Xie, Zhengyi Yang, Jiancan Wu, Jinyang Gao, Bolin Ding, Xiang Wang, Xiangnan He

    Abstract: Direct Preference Optimization (DPO) has emerged as a compelling approach for training Large Language Models (LLMs) to adhere to human preferences. However, the performance of DPO is sensitive to the fine-tuning of its trade-off parameter $β$, as well as to the quality of the preference data. We analyze the impact of $β$ and data quality on DPO, uncovering that optimal $β$ values vary with the inf… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  37. arXiv:2407.07999  [pdf, ps, other

    cs.CV

    Fusion of Short-term and Long-term Attention for Video Mirror Detection

    Authors: Mingchen Xu, Jing Wu, Yukun Lai, Ze Ji

    Abstract: Techniques for detecting mirrors from static images have witnessed rapid growth in recent years. However, these methods detect mirrors from single input images. Detecting mirrors from video requires further consideration of temporal consistency between frames. We observe that humans can recognize mirror candidates, from just one or two frames, based on their appearance (e.g. shape, color). However… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  38. arXiv:2407.07880  [pdf, other

    cs.LG cs.AI cs.CL

    Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization

    Authors: Junkang Wu, Yuexiang Xie, Zhengyi Yang, Jiancan Wu, Jiawei Chen, Jinyang Gao, Bolin Ding, Xiang Wang, Xiangnan He

    Abstract: This study addresses the challenge of noise in training datasets for Direct Preference Optimization (DPO), a method for aligning Large Language Models (LLMs) with human preferences. We categorize noise into pointwise noise, which includes low-quality data points, and pairwise noise, which encompasses erroneous data pair associations that affect preference rankings. Utilizing Distributionally Robus… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  39. arXiv:2407.07577  [pdf, other

    cs.CV cs.AI

    IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model

    Authors: Yatai Ji, Shilong Zhang, Jie Wu, Peize Sun, Weifeng Chen, Xuefeng Xiao, Sidi Yang, Yujiu Yang, Ping Luo

    Abstract: The rapid advancement of Large Vision-Language models (LVLMs) has demonstrated a spectrum of emergent capabilities. Nevertheless, current models only focus on the visual content of a single scenario, while their ability to associate instances across different scenes has not yet been explored, which is essential for understanding complex visual content, such as movies with multiple characters and i… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  40. arXiv:2407.07328  [pdf, other

    cs.LG

    CATP: Context-Aware Trajectory Prediction with Competition Symbiosis

    Authors: Jiang Wu, Dongyu Liu, Yuchen Lin, Yingcai Wu

    Abstract: Contextual information is vital for accurate trajectory prediction. For instance, the intricate flying behavior of migratory birds hinges on their analysis of environmental cues such as wind direction and air pressure. However, the diverse and dynamic nature of contextual information renders it an arduous task for AI models to comprehend its impact on trajectories and consequently predict them acc… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  41. arXiv:2407.07268  [pdf, other

    cs.CV

    Dataset Quantization with Active Learning based Adaptive Sampling

    Authors: Zhenghao Zhao, Yuzhang Shang, Junyi Wu, Yan Yan

    Abstract: Deep learning has made remarkable progress recently, largely due to the availability of large, well-labeled datasets. However, the training on such datasets elevates costs and computational demands. To address this, various techniques like coreset selection, dataset distillation, and dataset quantization have been explored in the literature. Unlike traditional techniques that depend on uniform sam… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  42. arXiv:2407.06714  [pdf, other

    cs.CV

    Improving the Transferability of Adversarial Examples by Feature Augmentation

    Authors: Donghua Wang, Wen Yao, Tingsong Jiang, Xiaohu Zheng, Junqi Wu, Xiaoqian Chen

    Abstract: Despite the success of input transformation-based attacks on boosting adversarial transferability, the performance is unsatisfying due to the ignorance of the discrepancy across models. In this paper, we propose a simple but effective feature augmentation attack (FAUG) method, which improves adversarial transferability without introducing extra computation costs. Specifically, we inject the random… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: 19 pages, 4 figures, 4 tables

  43. arXiv:2407.06136  [pdf, other

    cs.CV

    Mamba-FSCIL: Dynamic Adaptation with Selective State Space Model for Few-Shot Class-Incremental Learning

    Authors: Xiaojie Li, Yibo Yang, Jianlong Wu, Bernard Ghanem, Liqiang Nie, Min Zhang

    Abstract: Few-shot class-incremental learning (FSCIL) confronts the challenge of integrating new classes into a model with minimal training samples while preserving the knowledge of previously learned classes. Traditional methods widely adopt static adaptation relying on a fixed parameter space to learn from data that arrive sequentially, prone to overfitting to the current session. Existing dynamic strateg… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: Code: https://github.com/xiaojieli0903/Mamba-FSCIL

  44. arXiv:2407.05319  [pdf, other

    cs.CL

    Rethinking Targeted Adversarial Attacks For Neural Machine Translation

    Authors: Junjie Wu, Lemao Liu, Wei Bi, Dit-Yan Yeung

    Abstract: Targeted adversarial attacks are widely used to evaluate the robustness of neural machine translation systems. Unfortunately, this paper first identifies a critical issue in the existing settings of NMT targeted adversarial attacks, where their attacking results are largely overestimated. To this end, this paper presents a new setting for NMT targeted adversarial attacks that could lead to reliabl… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: 5 pages, 2 figures, accepted by ICASSP 2024

  45. arXiv:2407.05108  [pdf, other

    cs.LG stat.ML

    The Role of Depth, Width, and Tree Size in Expressiveness of Deep Forest

    Authors: Shen-Huan Lyu, Jin-Hui Wu, Qin-Cheng Zheng, Baoliu Ye

    Abstract: Random forests are classical ensemble algorithms that construct multiple randomized decision trees and aggregate their predictions using naive averaging. \citet{zhou2019deep} further propose a deep forest algorithm with multi-layer forests, which outperforms random forests in various tasks. The performance of deep forests is related to three hyperparameters in practice: depth, width, and tree size… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

    Journal ref: In: Proceedings of the 27th European Conference on Artificial Intelligence, 2024

  46. arXiv:2407.04888  [pdf, other

    eess.IV cs.CV

    Unraveling Radiomics Complexity: Strategies for Optimal Simplicity in Predictive Modeling

    Authors: Mahdi Ait Lhaj Loutfi, Teodora Boblea Podasca, Alex Zwanenburg, Taman Upadhaya, Jorge Barrios, David R. Raleigh, William C. Chen, Dante P. I. Capaldi, Hong Zheng, Olivier Gevaert, Jing Wu, Alvin C. Silva, Paul J. Zhang, Harrison X. Bai, Jan Seuntjens, Steffen Löck, Patrick O. Richard, Olivier Morin, Caroline Reinhold, Martin Lepage, Martin Vallières

    Abstract: Background: The high dimensionality of radiomic feature sets, the variability in radiomic feature types and potentially high computational requirements all underscore the need for an effective method to identify the smallest set of predictive features for a given clinical problem. Purpose: Develop a methodology and tools to identify and explain the smallest set of predictive radiomic features. Mat… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  47. arXiv:2407.04055  [pdf, other

    q-bio.QM cs.AI cs.LG

    Benchmark on Drug Target Interaction Modeling from a Structure Perspective

    Authors: Xinnan Zhang, Jialin Wu, Junyi Xie, Tianlong Chen, Kaixiong Zhou

    Abstract: The prediction modeling of drug-target interactions is crucial to drug discovery and design, which has seen rapid advancements owing to deep learning technologies. Recently developed methods, such as those based on graph neural networks (GNNs) and Transformers, demonstrate exceptional performance across various datasets by effectively extracting structural information. However, the benchmarking of… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: Submitted to NIPS 2024 Dataset and Benchmark

  48. arXiv:2407.03590  [pdf, other

    cs.RO

    A Fast Dynamic Point Detection Method for LiDAR-Inertial Odometry in Driving Scenarios

    Authors: Zikang Yuan, Xiaoxiang Wang, Jingying Wu, Junda Cheng, Xin Yang

    Abstract: Existing 3D point-based dynamic point detection and removal methods have a significant time overhead, making them difficult to adapt to LiDAR-inertial odometry systems. This paper proposes a label consistency based dynamic point detection and removal method for handling moving vehicles and pedestrians in autonomous driving scenarios, and embeds the proposed dynamic point detection and removal meth… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: 8 pages, submitted to RA-L

  49. arXiv:2407.03243  [pdf, other

    cs.CV

    Visual Grounding with Attention-Driven Constraint Balancing

    Authors: Weitai Kang, Luowei Zhou, Junyi Wu, Changchang Sun, Yan Yan

    Abstract: Unlike Object Detection, Visual Grounding task necessitates the detection of an object described by complex free-form language. To simultaneously model such complex semantic and visual representations, recent state-of-the-art studies adopt transformer-based models to fuse features from both modalities, further introducing various modules that modulate visual features to align with the language exp… ▽ More

    Submitted 6 July, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

  50. arXiv:2407.03130  [pdf, other

    cs.CV

    Towards Efficient Pixel Labeling for Industrial Anomaly Detection and Localization

    Authors: Hanxi Li, Jingqi Wu, Lin Yuanbo Wu, Hao Chen, Deyin Liu, Chunhua Shen

    Abstract: In the realm of practical Anomaly Detection (AD) tasks, manual labeling of anomalous pixels proves to be a costly endeavor. Consequently, many AD methods are crafted as one-class classifiers, tailored for training sets completely devoid of anomalies, ensuring a more cost-effective approach. While some pioneering work has demonstrated heightened AD accuracy by incorporating real anomaly samples in… ▽ More

    Submitted 4 July, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

    Comments: 18 pages, 5 figures