Skip to main content

Showing 1–50 of 9,743 results for author: Wang, Y

  1. arXiv:2407.20224  [pdf, other

    cs.CL

    Can Editing LLMs Inject Harm?

    Authors: Canyu Chen, Baixiang Huang, Zekun Li, Zhaorun Chen, Shiyang Lai, Xiongxiao Xu, Jia-Chen Gu, Jindong Gu, Huaxiu Yao, Chaowei Xiao, Xifeng Yan, William Yang Wang, Philip Torr, Dawn Song, Kai Shu

    Abstract: Knowledge editing techniques have been increasingly adopted to efficiently correct the false or outdated knowledge in Large Language Models (LLMs), due to the high cost of retraining from scratch. Meanwhile, one critical but under-explored question is: can knowledge editing be used to inject harm into LLMs? In this paper, we propose to reformulate knowledge editing as a new type of safety threat f… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: The first two authors contributed equally. 9 pages for main paper, 36 pages including appendix. The code, results, dataset for this paper and more resources are on the project website: https://llm-editing.github.io

  2. arXiv:2407.20111  [pdf, other

    cs.SD eess.AS eess.SP

    Enhancing Anti-spoofing Countermeasures Robustness through Joint Optimization and Transfer Learning

    Authors: Yikang Wang, Xingming Wang, Hiromitsu Nishizaki, Ming Li

    Abstract: Current research in synthesized speech detection primarily focuses on the generalization of detection systems to unknown spoofing methods of noise-free speech. However, the performance of anti-spoofing countermeasures (CM) system is often don't work as well in more challenging scenarios, such as those involving noise and reverberation. To address the problem of enhancing the robustness of CM syste… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: 29 pages, 4 figures, Journal Papers

  3. arXiv:2407.20080  [pdf, other

    cs.CV cs.LG

    UniTTA: Unified Benchmark and Versatile Framework Towards Realistic Test-Time Adaptation

    Authors: Chaoqun Du, Yulin Wang, Jiayi Guo, Yizeng Han, Jie Zhou, Gao Huang

    Abstract: Test-Time Adaptation (TTA) aims to adapt pre-trained models to the target domain during testing. In reality, this adaptability can be influenced by multiple factors. Researchers have identified various challenging scenarios and developed diverse methods to address these challenges, such as dealing with continual domain shifts, mixed domains, and temporally correlated or imbalanced class distributi… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  4. arXiv:2407.20042  [pdf, other

    cs.SE

    When to Stop? Towards Efficient Code Generation in LLMs with Excess Token Prevention

    Authors: Lianghong Guo, Yanlin Wang, Ensheng Shi, Wanjun Zhong, Hongyu Zhang, Jiachi Chen, Ruikai Zhang, Yuchi Ma, Zibin Zheng

    Abstract: Code generation aims to automatically generate code snippets that meet given natural language requirements and plays an important role in software development. Although Code LLMs have shown excellent performance in this domain, their long generation time poses a signification limitation in practice use. In this paper, we first conduct an in-depth preliminary study with different Code LLMs on code… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: To appear at ISSTA 2024

  5. arXiv:2407.19976  [pdf, other

    cs.HC cs.MM

    MambaGesture: Enhancing Co-Speech Gesture Generation with Mamba and Disentangled Multi-Modality Fusion

    Authors: Chencan Fu, Yabiao Wang, Jiangning Zhang, Zhengkai Jiang, Xiaofeng Mao, Jiafu Wu, Weijian Cao, Chengjie Wang, Yanhao Ge, Yong Liu

    Abstract: Co-speech gesture generation is crucial for producing synchronized and realistic human gestures that accompany speech, enhancing the animation of lifelike avatars in virtual environments. While diffusion models have shown impressive capabilities, current approaches often overlook a wide range of modalities and their interactions, resulting in less dynamic and contextually varied gestures. To addre… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: Accepted to ACM MM 2024

  6. arXiv:2407.19643  [pdf

    cs.AI

    Prometheus Chatbot: Knowledge Graph Collaborative Large Language Model for Computer Components Recommendation

    Authors: Yunsheng Wang, Songhao Chen, Kevin Jin

    Abstract: Knowledge graphs (KGs) are essential in applications such as network alignment, question-answering, and recommender systems (RSs) since they offer structured relational data that facilitate the inference of indirect relationships. However, the development of KG-based RSs capable of processing user inputs in natural language faces significant challenges. Firstly, natural language processing units m… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

  7. arXiv:2407.19493  [pdf, ps, other

    cs.CV cs.AI cs.MM

    Official-NV: A News Video Dataset for Multimodal Fake News Detection

    Authors: Yihao Wang, Lizhi Chen, Zhong Qian, Peifeng Li

    Abstract: News media, especially video news media, have penetrated into every aspect of daily life, which also brings the risk of fake news. Therefore, multimodal fake news detection has recently received more attention. However, the number of fake news detection data sets for video modal is small, and these data sets are composed of unofficial videos uploaded by users, so there is too much useless data. To… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

  8. arXiv:2407.19491  [pdf, other

    cs.CV

    Multi-modal Crowd Counting via Modal Emulation

    Authors: Chenhao Wang, Xiaopeng Hong, Zhiheng Ma, Yupeng Wei, Yabin Wang, Xiaopeng Fan

    Abstract: Multi-modal crowd counting is a crucial task that uses multi-modal cues to estimate the number of people in crowded scenes. To overcome the gap between different modalities, we propose a modal emulation-based two-pass multi-modal crowd-counting framework that enables efficient modal emulation, alignment, and fusion. The framework consists of two key components: a \emph{multi-modal inference} pass… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: This is the preprint version of the paper to appear in BMVC 2024. Please cite the final published version. Code is available at https://github.com/Mr-Monday/Multi-modal-Crowd-Counting-via-Modal-Emulation

  9. arXiv:2407.19487  [pdf, other

    cs.SE

    RLCoder: Reinforcement Learning for Repository-Level Code Completion

    Authors: Yanlin Wang, Yanli Wang, Daya Guo, Jiachi Chen, Ruikai Zhang, Yuchi Ma, Zibin Zheng

    Abstract: Repository-level code completion aims to generate code for unfinished code snippets within the context of a specified repository. Existing approaches mainly rely on retrieval-augmented generation strategies due to limitations in input sequence length. However, traditional lexical-based retrieval methods like BM25 struggle to capture code semantics, while model-based retrieval methods face challeng… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: To appear at ICSE 2025

    Journal ref: 47th International Conference on Software Engineering (ICSE 2025)

  10. arXiv:2407.19456  [pdf, other

    cs.MM

    An Inverse Partial Optimal Transport Framework for Music-guided Movie Trailer Generation

    Authors: Yutong Wang, Sidan Zhu, Hongteng Xu, Dixin Luo

    Abstract: Trailer generation is a challenging video clipping task that aims to select highlighting shots from long videos like movies and re-organize them in an attractive way. In this study, we propose an inverse partial optimal transport (IPOT) framework to achieve music-guided movie trailer generation. In particular, we formulate the trailer generation task as selecting and sorting key movie shots based… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: acmmm2024

  11. arXiv:2407.19451  [pdf, other

    cs.CV cs.GR

    \textsc{Perm}: A Parametric Representation for Multi-Style 3D Hair Modeling

    Authors: Chengan He, Xin Sun, Zhixin Shu, Fujun Luan, Sören Pirk, Jorge Alejandro Amador Herrera, Dominik L. Michels, Tuanfeng Y. Wang, Meng Zhang, Holly Rushmeier, Yi Zhou

    Abstract: We present \textsc{Perm}, a learned parametric model of human 3D hair designed to facilitate various hair-related applications. Unlike previous work that jointly models the global hair shape and local strand details, we propose to disentangle them using a PCA-based strand representation in the frequency domain, thereby allowing more precise editing and output control. Specifically, we leverage our… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: Rejected by SIGGRAPH and SIGGRAPH Asia. Project page: https://cs.yale.edu/homes/che/projects/perm/

  12. arXiv:2407.19389  [pdf, other

    cs.DC cs.LG math.OC

    FIARSE: Model-Heterogeneous Federated Learning via Importance-Aware Submodel Extraction

    Authors: Feijie Wu, Xingchen Wang, Yaqing Wang, Tianci Liu, Lu Su, Jing Gao

    Abstract: In federated learning (FL), accommodating clients' varied computational capacities poses a challenge, often limiting the participation of those with constrained resources in global model training. To address this issue, the concept of model heterogeneity through submodel extraction has emerged, offering a tailored solution that aligns the model's complexity with each client's computational capacit… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

  13. arXiv:2407.19054  [pdf, other

    stat.ML cs.LG q-bio.PE stat.AP

    Flusion: Integrating multiple data sources for accurate influenza predictions

    Authors: Evan L. Ray, Yijin Wang, Russell D. Wolfinger, Nicholas G. Reich

    Abstract: Over the last ten years, the US Centers for Disease Control and Prevention (CDC) has organized an annual influenza forecasting challenge with the motivation that accurate probabilistic forecasts could improve situational awareness and yield more effective public health actions. Starting with the 2021/22 influenza season, the forecasting targets for this challenge have been based on hospital admiss… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

  14. arXiv:2407.18999  [pdf, other

    cs.CV cs.LG

    Graph-based Unsupervised Disentangled Representation Learning via Multimodal Large Language Models

    Authors: Baao Xie, Qiuyu Chen, Yunnan Wang, Zequn Zhang, Xin Jin, Wenjun Zeng

    Abstract: Disentangled representation learning (DRL) aims to identify and decompose underlying factors behind observations, thus facilitating data perception and generation. However, current DRL approaches often rely on the unrealistic assumption that semantic factors are statistically independent. In reality, these factors may exhibit correlations, which off-the-shelf solutions have yet to properly address… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: 9 pages, 7 figures

  15. arXiv:2407.18914  [pdf, other

    cs.CV

    Floating No More: Object-Ground Reconstruction from a Single Image

    Authors: Yunze Man, Yichen Sheng, Jianming Zhang, Liang-Yan Gui, Yu-Xiong Wang

    Abstract: Recent advancements in 3D object reconstruction from single images have primarily focused on improving the accuracy of object shapes. Yet, these techniques often fail to accurately capture the inter-relation between the object, ground, and camera. As a result, the reconstructed objects often appear floating or tilted when placed on flat surfaces. This limitation significantly affects 3D-aware imag… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: Project Page: https://yunzeman.github.io/ORG/

  16. arXiv:2407.18854  [pdf, other

    cs.CV cs.AI

    Unifying Visual and Semantic Feature Spaces with Diffusion Models for Enhanced Cross-Modal Alignment

    Authors: Yuze Zheng, Zixuan Li, Xiangxian Li, Jinxing Liu, Yuqing Wang, Xiangxu Meng, Lei Meng

    Abstract: Image classification models often demonstrate unstable performance in real-world applications due to variations in image information, driven by differing visual perspectives of subject objects and lighting discrepancies. To mitigate these challenges, existing studies commonly incorporate additional modal information matching the visual data to regularize the model's learning process, enabling the… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

  17. arXiv:2407.18613  [pdf

    cs.CV eess.IV

    Dilated Strip Attention Network for Image Restoration

    Authors: Fangwei Hao, Jiesheng Wu, Ji Du, Yinjie Wang, Jing Xu

    Abstract: Image restoration is a long-standing task that seeks to recover the latent sharp image from its deteriorated counterpart. Due to the robust capacity of self-attention to capture long-range dependencies, transformer-based methods or some attention-based convolutional neural networks have demonstrated promising results on many image restoration tasks in recent years. However, existing attention modu… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

  18. arXiv:2407.18611  [pdf, other

    cs.CV

    IOVS4NeRF:Incremental Optimal View Selection for Large-Scale NeRFs

    Authors: Jingpeng Xie, Shiyu Tan, Yuanlei Wang, Yizhen Lao

    Abstract: Urban-level three-dimensional reconstruction for modern applications demands high rendering fidelity while minimizing computational costs. The advent of Neural Radiance Fields (NeRF) has enhanced 3D reconstruction, yet it exhibits artifacts under multiple viewpoints. In this paper, we propose a new NeRF framework method to address these issues. Our method uses image content and pose data to iterat… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

  19. arXiv:2407.18534  [pdf, other

    cs.CV

    Boosting Cross-Domain Point Classification via Distilling Relational Priors from 2D Transformers

    Authors: Longkun Zou, Wanru Zhu, Ke Chen, Lihua Guo, Kailing Guo, Kui Jia, Yaowei Wang

    Abstract: Semantic pattern of an object point cloud is determined by its topological configuration of local geometries. Learning discriminative representations can be challenging due to large shape variations of point sets in local regions and incomplete surface in a global perspective, which can be made even more severe in the context of unsupervised domain adaptation (UDA). In specific, traditional 3D net… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

  20. arXiv:2407.18525  [pdf, other

    cs.CL cs.AI cs.LG

    Is larger always better? Evaluating and prompting large language models for non-generative medical tasks

    Authors: Yinghao Zhu, Junyi Gao, Zixiang Wang, Weibin Liao, Xiaochen Zheng, Lifang Liang, Yasha Wang, Chengwei Pan, Ewen M. Harrison, Liantao Ma

    Abstract: The use of Large Language Models (LLMs) in medicine is growing, but their ability to handle both structured Electronic Health Record (EHR) data and unstructured clinical notes is not well-studied. This study benchmarks various models, including GPT-based LLMs, BERT-based models, and traditional clinical predictive models, for non-generative medical tasks utilizing renowned datasets. We assessed 14… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: text overlap with arXiv:2402.01713

  21. arXiv:2407.18492  [pdf

    cs.CV

    Neural Modulation Alteration to Positive and Negative Emotions in Depressed Patients: Insights from fMRI Using Positive/Negative Emotion Atlas

    Authors: Yu Feng, Weiming Zeng, Yifan Xie, Hongyu Chen, Lei Wang, Yingying Wang, Hongjie Yan, Kaile Zhang, Ran Tao, Wai Ting Siok, Nizhuan Wang

    Abstract: Background: Although it has been noticed that depressed patients show differences in processing emotions, the precise neural modulation mechanisms of positive and negative emotions remain elusive. FMRI is a cutting-edge medical imaging technology renowned for its high spatial resolution and dynamic temporal information, making it particularly suitable for the neural dynamics of depression research… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  22. arXiv:2407.18487  [pdf, other

    cs.CV

    SMPISD-MTPNet: Scene Semantic Prior-Assisted Infrared Ship Detection Using Multi-Task Perception Networks

    Authors: Chen Hu, Xiaogang Dong, Yian Huang Lele Wang, Liang Xu, Tian Pu, Zhenming Peng

    Abstract: Infrared ship detection (IRSD) has received increasing attention in recent years due to the robustness of infrared images to adverse weather. However, a large number of false alarms may occur in complex scenes. To address these challenges, we propose the Scene Semantic Prior-Assisted Multi-Task Perception Network (SMPISD-MTPNet), which includes three stages: scene semantic extraction, deep feature… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  23. arXiv:2407.18479  [pdf, other

    cs.CL

    Multi-turn Response Selection with Commonsense-enhanced Language Models

    Authors: Yuandong Wang, Xuhui Ren, Tong Chen, Yuxiao Dong, Nguyen Quoc Viet Hung, Jie Tang

    Abstract: As a branch of advanced artificial intelligence, dialogue systems are prospering. Multi-turn response selection is a general research problem in dialogue systems. With the assistance of background information and pre-trained language models, the performance of state-of-the-art methods on this problem gains impressive improvement. However, existing studies neglect the importance of external commons… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  24. arXiv:2407.18449  [pdf, other

    eess.IV cs.CV cs.LG

    Towards A Generalizable Pathology Foundation Model via Unified Knowledge Distillation

    Authors: Jiabo Ma, Zhengrui Guo, Fengtao Zhou, Yihui Wang, Yingxue Xu, Yu Cai, Zhengjie Zhu, Cheng Jin, Yi Lin Xinrui Jiang, Anjia Han, Li Liang, Ronald Cheong Kin Chan, Jiguang Wang, Kwang-Ting Cheng, Hao Chen

    Abstract: Foundation models pretrained on large-scale datasets are revolutionizing the field of computational pathology (CPath). The generalization ability of foundation models is crucial for the success in various downstream clinical tasks. However, current foundation models have only been evaluated on a limited type and number of tasks, leaving their generalization ability and overall performance unclear.… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Report number: I.2.10

  25. arXiv:2407.18390  [pdf, other

    eess.IV cs.CV

    Adapting Mouse Pathological Model to Human Glomerular Lesion Segmentation

    Authors: Lining Yu, Mengmeng Yin, Ruining Deng, Quan Liu, Tianyuan Yao, Can Cui, Yu Wang, Yaohong Wang, Shilin Zhao, Haichun Yang, Yuankai Huo

    Abstract: Moving from animal models to human applications in preclinical research encompasses a broad spectrum of disciplines in medical science. A fundamental element in the development of new drugs, treatments, diagnostic methods, and in deepening our understanding of disease processes is the accurate measurement of kidney tissues. Past studies have demonstrated the viability of translating glomeruli segm… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  26. arXiv:2407.18365  [pdf, other

    cs.LG cs.AI cs.DC math.OC

    FADAS: Towards Federated Adaptive Asynchronous Optimization

    Authors: Yujia Wang, Shiqiang Wang, Songtao Lu, Jinghui Chen

    Abstract: Federated learning (FL) has emerged as a widely adopted training paradigm for privacy-preserving machine learning. While the SGD-based FL algorithms have demonstrated considerable success in the past, there is a growing trend towards adopting adaptive federated optimization methods, particularly for training large-scale models. However, the conventional synchronous aggregation design poses a signi… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: Accepted by ICML 2024

  27. arXiv:2407.18243  [pdf, other

    cs.CV

    BIV-Priv-Seg: Locating Private Content in Images Taken by People With Visual Impairments

    Authors: Yu-Yun Tseng, Tanusree Sharma, Lotus Zhang, Abigale Stangl, Leah Findlater, Yang Wang, Danna Gurari Yu-Yun Tseng, Tanusree Sharma, Lotus Zhang, Abigale Stangl, Leah Findlater, Yang Wang, Danna Gurari

    Abstract: Individuals who are blind or have low vision (BLV) are at a heightened risk of sharing private information if they share photographs they have taken. To facilitate developing technologies that can help preserve privacy, we introduce BIV-Priv-Seg, the first localization dataset originating from people with visual impairments that shows private content. It contains 1,028 images with segmentation ann… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  28. arXiv:2407.18209  [pdf, other

    cs.ET cs.AR

    SuperFlow: A Fully-Customized RTL-to-GDS Design Automation Flow for Adiabatic Quantum-Flux-Parametron Superconducting Circuits

    Authors: Yanyue Xie, Peiyan Dong, Geng Yuan, Zhengang Li, Masoud Zabihi, Chao Wu, Sung-En Chang, Xufeng Zhang, Xue Lin, Caiwen Ding, Nobuyuki Yoshikawa, Olivia Chen, Yanzhi Wang

    Abstract: Superconducting circuits, like Adiabatic Quantum-Flux-Parametron (AQFP), offer exceptional energy efficiency but face challenges in physical design due to sophisticated spacing and timing constraints. Current design tools often neglect the importance of constraint adherence throughout the entire design flow. In this paper, we propose SuperFlow, a fully-customized RTL-to-GDS design flow tailored fo… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: Accepted by DATE 2024

  29. arXiv:2407.18181  [pdf, other

    cs.LG cs.AI

    Gene Regulatory Network Inference from Pre-trained Single-Cell Transcriptomics Transformer with Joint Graph Learning

    Authors: Sindhura Kommu, Yizhi Wang, Yue Wang, Xuan Wang

    Abstract: Inferring gene regulatory networks (GRNs) from single-cell RNA sequencing (scRNA-seq) data is a complex challenge that requires capturing the intricate relationships between genes and their regulatory interactions. In this study, we tackle this challenge by leveraging the single-cell BERT-based pre-trained transformer model (scBERT), trained on extensive unlabeled scRNA-seq data, to augment struct… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: Accepted into the ICML 2024 AI for Science workshop

  30. arXiv:2407.18175  [pdf, other

    cs.LG cs.AI cs.CV

    Quasar-ViT: Hardware-Oriented Quantization-Aware Architecture Search for Vision Transformers

    Authors: Zhengang Li, Alec Lu, Yanyue Xie, Zhenglun Kong, Mengshu Sun, Hao Tang, Zhong Jia Xue, Peiyan Dong, Caiwen Ding, Yanzhi Wang, Xue Lin, Zhenman Fang

    Abstract: Vision transformers (ViTs) have demonstrated their superior accuracy for computer vision tasks compared to convolutional neural networks (CNNs). However, ViT models are often computation-intensive for efficient deployment on resource-limited edge devices. This work proposes Quasar-ViT, a hardware-oriented quantization-aware architecture search framework for ViTs, to design efficient ViT models for… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: Accepted by ICS 2024

  31. arXiv:2407.18039  [pdf, other

    cs.LG cs.AI

    Peak-Controlled Logits Poisoning Attack in Federated Distillation

    Authors: Yuhan Tang, Aoxu Zhang, Zhiyuan Wu, Bo Gao, Tian Wen, Yuwei Wang, Sheng Sun

    Abstract: Federated Distillation (FD) offers an innovative approach to distributed machine learning, leveraging knowledge distillation for efficient and flexible cross-device knowledge transfer without necessitating the upload of extensive model parameters to a central server. While FD has gained popularity, its vulnerability to poisoning attacks remains underexplored. To address this gap, we previously int… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: text overlap with arXiv:2401.03685

  32. arXiv:2407.17996  [pdf, other

    cs.CV

    Joint RGB-Spectral Decomposition Model Guided Image Enhancement in Mobile Photography

    Authors: Kailai Zhou, Lijing Cai, Yibo Wang, Mengya Zhang, Bihan Wen, Qiu Shen, Xun Cao

    Abstract: The integration of miniaturized spectrometers into mobile devices offers new avenues for image quality enhancement and facilitates novel downstream tasks. However, the broader application of spectral sensors in mobile photography is hindered by the inherent complexity of spectral images and the constraints of spectral imaging capabilities. To overcome these challenges, we propose a joint RGB-Spect… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  33. arXiv:2407.17879  [pdf, other

    cs.AR cs.AI

    HG-PIPE: Vision Transformer Acceleration with Hybrid-Grained Pipeline

    Authors: Qingyu Guo, Jiayong Wan, Songqiang Xu, Meng Li, Yuan Wang

    Abstract: Vision Transformer (ViT) acceleration with field programmable gate array (FPGA) is promising but challenging. Existing FPGA-based ViT accelerators mainly rely on temporal architectures, which process different operators by reusing the same hardware blocks and suffer from extensive memory access overhead. Pipelined architectures, either coarse-grained or fine-grained, unroll the ViT computation spa… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: Accepted by ICCAD 2024

    MSC Class: 68T07

  34. arXiv:2407.17827  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Unified Lexical Representation for Interpretable Visual-Language Alignment

    Authors: Yifan Li, Yikai Wang, Yanwei Fu, Dongyu Ru, Zheng Zhang, Tong He

    Abstract: Visual-Language Alignment (VLA) has gained a lot of attention since CLIP's groundbreaking work. Although CLIP performs well, the typical direct latent feature alignment lacks clarity in its representation and similarity scores. On the other hand, lexical representation, a vector whose element represents the similarity between the sample and a word from the vocabulary, is a natural sparse represent… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  35. arXiv:2407.17703  [pdf, other

    cs.LG physics.soc-ph

    Context-aware knowledge graph framework for traffic speed forecasting using graph neural network

    Authors: Yatao Zhang, Yi Wang, Song Gao, Martin Raubal

    Abstract: Human mobility is intricately influenced by urban contexts spatially and temporally, constituting essential domain knowledge in understanding traffic systems. While existing traffic forecasting models primarily rely on raw traffic data and advanced deep learning techniques, incorporating contextual information remains underexplored due to the lack of effective integration frameworks and the comple… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: 13 pages, 4 figures

  36. arXiv:2407.17572  [pdf, other

    cs.CV cs.AI

    CityX: Controllable Procedural Content Generation for Unbounded 3D Cities

    Authors: Shougao Zhang, Mengqi Zhou, Yuxi Wang, Chuanchen Luo, Rongyu Wang, Yiwei Li, Xucheng Yin, Zhaoxiang Zhang, Junran Peng

    Abstract: Generating a realistic, large-scale 3D virtual city remains a complex challenge due to the involvement of numerous 3D assets, various city styles, and strict layout constraints. Existing approaches provide promising attempts at procedural content generation to create large-scale scenes using Blender agents. However, they face crucial issues such as difficulties in scaling up generation capability… ▽ More

    Submitted 29 July, 2024; v1 submitted 24 July, 2024; originally announced July 2024.

  37. arXiv:2407.17453  [pdf, other

    cs.CV

    $VILA^2$: VILA Augmented VILA

    Authors: Yunhao Fang, Ligeng Zhu, Yao Lu, Yan Wang, Pavlo Molchanov, Jang Hyun Cho, Marco Pavone, Song Han, Hongxu Yin

    Abstract: Visual language models (VLMs) have rapidly progressed, driven by the success of large language models (LLMs). While model architectures and training infrastructures advance rapidly, data curation remains under-explored. When data quantity and quality become a bottleneck, existing work either directly crawls more raw data from the Internet that does not have a guarantee of data quality or distills… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  38. arXiv:2407.17267  [pdf, other

    cs.CV

    M4: Multi-Proxy Multi-Gate Mixture of Experts Network for Multiple Instance Learning in Histopathology Image Analysis

    Authors: Junyu Li, Ye Zhang, Wen Shu, Xiaobing Feng, Yingchun Wang, Pengju Yan, Xiaolin Li, Chulin Sha, Min He

    Abstract: Multiple instance learning (MIL) has been successfully applied for whole slide images (WSIs) analysis in computational pathology, enabling a wide range of prediction tasks from tumor subtyping to inferring genetic mutations and multi-omics biomarkers. However, existing MIL methods predominantly focus on single-task learning, resulting in not only overall low efficiency but also the overlook of int… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: 25pages,5figures

  39. arXiv:2407.17234  [pdf, other

    cs.IR

    Intent-guided Heterogeneous Graph Contrastive Learning for Recommendation

    Authors: Lei Sang, Yu Wang, Yi Zhang, Yiwen Zhang, Xindong Wu

    Abstract: Contrastive Learning (CL)-based recommender systems have gained prominence in the context of Heterogeneous Graph (HG) due to their capacity to enhance the consistency of representations across different views. However, existing frameworks often neglect the fact that user-item interactions within HG are governed by diverse latent intents (e.g., brand preferences or demographic characteristics of it… ▽ More

    Submitted 28 July, 2024; v1 submitted 24 July, 2024; originally announced July 2024.

    Comments: 14pages, 11figures

  40. arXiv:2407.17213  [pdf, other

    cs.LG

    Spectrum-Informed Multistage Neural Networks: Multiscale Function Approximators of Machine Precision

    Authors: Jakin Ng, Yongji Wang, Ching-Yao Lai

    Abstract: Deep learning frameworks have become powerful tools for approaching scientific problems such as turbulent flow, which has wide-ranging applications. In practice, however, existing scientific machine learning approaches have difficulty fitting complex, multi-scale dynamical systems to very high precision, as required in scientific contexts. We propose using the novel multistage neural network appro… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: 8 pages, 3 figures, ICML 2024 workshop (AI for Science: Scaling in AI for Scientific Discovery)

  41. arXiv:2407.17126  [pdf

    cs.CL cs.AI

    SDoH-GPT: Using Large Language Models to Extract Social Determinants of Health (SDoH)

    Authors: Bernardo Consoli, Xizhi Wu, Song Wang, Xinyu Zhao, Yanshan Wang, Justin Rousseau, Tom Hartvigsen, Li Shen, Huanmei Wu, Yifan Peng, Qi Long, Tianlong Chen, Ying Ding

    Abstract: Extracting social determinants of health (SDoH) from unstructured medical notes depends heavily on labor-intensive annotations, which are typically task-specific, hampering reusability and limiting sharing. In this study we introduced SDoH-GPT, a simple and effective few-shot Large Language Model (LLM) method leveraging contrastive examples and concise instructions to extract SDoH without relying… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  42. arXiv:2407.16993  [pdf, other

    cs.CV

    LoFormer: Local Frequency Transformer for Image Deblurring

    Authors: Xintian Mao, Jiansheng Wang, Xingran Xie, Qingli Li, Yan Wang

    Abstract: Due to the computational complexity of self-attention (SA), prevalent techniques for image deblurring often resort to either adopting localized SA or employing coarse-grained global SA methods, both of which exhibit drawbacks such as compromising global modeling or lacking fine-grained correlation. In order to address this issue by effectively modeling long-range dependencies without sacrificing f… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  43. arXiv:2407.16958  [pdf, other

    cs.LG cs.AI

    Cheems: Wonderful Matrices More Efficient and More Effective Architecture

    Authors: Jingze Shi, Lu He, Yuhan Wang, Tianyu He, Bingheng Wu, Mingkun Hou

    Abstract: Recent studies have shown that, relative position encoding performs well in selective state space model scanning algorithms, and the architecture that balances SSM and Attention enhances the efficiency and effectiveness of the algorithm, while the sparse activation of the mixture of experts reduces the training cost. I studied the effectiveness of using different position encodings in structured s… ▽ More

    Submitted 24 July, 2024; v1 submitted 23 July, 2024; originally announced July 2024.

  44. arXiv:2407.16822  [pdf, other

    cs.CV cs.AI

    AI-Enhanced 7-Point Checklist for Melanoma Detection Using Clinical Knowledge Graphs and Data-Driven Quantification

    Authors: Yuheng Wang, Tianze Yu, Jiayue Cai, Sunil Kalia, Harvey Lui, Z. Jane Wang, Tim K. Lee

    Abstract: The 7-point checklist (7PCL) is widely used in dermoscopy to identify malignant melanoma lesions needing urgent medical attention. It assigns point values to seven attributes: major attributes are worth two points each, and minor ones are worth one point each. A total score of three or higher prompts further evaluation, often including a biopsy. However, a significant limitation of current methods… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  45. arXiv:2407.16732  [pdf, other

    cs.SE cs.AI

    PyBench: Evaluating LLM Agent on various real-world coding tasks

    Authors: Yaolun Zhang, Yinxu Pan, Yudong Wang, Jie Cai, Zhi Zheng, Guoyang Zeng, Zhiyuan Liu

    Abstract: The LLM Agent, equipped with a code interpreter, is capable of automatically solving real-world coding tasks, such as data analysis and image editing. However, existing benchmarks primarily focus on either simplistic tasks, such as completing a few lines of code, or on extremely complex and specific tasks at the repository level, neither of which are representative of various daily coding tasks.… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: 9 pages

  46. arXiv:2407.16711  [pdf, other

    cs.SE cs.CL

    Benchmarks as Microscopes: A Call for Model Metrology

    Authors: Michael Saxon, Ari Holtzman, Peter West, William Yang Wang, Naomi Saphra

    Abstract: Modern language models (LMs) pose a new challenge in capability assessment. Static benchmarks inevitably saturate without providing confidence in the deployment tolerances of LM-based systems, but developers nonetheless claim that their models have generalized traits such as reasoning or open-domain language understanding based on these flawed metrics. The science and practice of LMs requires a ne… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: Conference paper at COLM 2024

  47. arXiv:2407.16684  [pdf, other

    eess.IV cs.CV q-bio.NC

    AutoRG-Brain: Grounded Report Generation for Brain MRI

    Authors: Jiayu Lei, Xiaoman Zhang, Chaoyi Wu, Lisong Dai, Ya Zhang, Yanyong Zhang, Yanfeng Wang, Weidi Xie, Yuehua Li

    Abstract: Radiologists are tasked with interpreting a large number of images in a daily base, with the responsibility of generating corresponding reports. This demanding workload elevates the risk of human error, potentially leading to treatment delays, increased healthcare costs, revenue loss, and operational inefficiencies. To address these challenges, we initiate a series of work on grounded Automatic Re… ▽ More

    Submitted 26 July, 2024; v1 submitted 23 July, 2024; originally announced July 2024.

  48. arXiv:2407.16634  [pdf, other

    eess.IV cs.AI cs.CV cs.HC

    Knowledge-driven AI-generated data for accurate and interpretable breast ultrasound diagnoses

    Authors: Haojun Yu, Youcheng Li, Nan Zhang, Zihan Niu, Xuantong Gong, Yanwen Luo, Quanlin Wu, Wangyan Qin, Mengyuan Zhou, Jie Han, Jia Tao, Ziwei Zhao, Di Dai, Di He, Dong Wang, Binghui Tang, Ling Huo, Qingli Zhu, Yong Wang, Liwei Wang

    Abstract: Data-driven deep learning models have shown great capabilities to assist radiologists in breast ultrasound (US) diagnoses. However, their effectiveness is limited by the long-tail distribution of training data, which leads to inaccuracies in rare cases. In this study, we address a long-standing challenge of improving the diagnostic model performance on rare cases using long-tailed data. Specifical… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  49. arXiv:2407.16406  [pdf, other

    cs.CV cs.LG

    Hi-EF: Benchmarking Emotion Forecasting in Human-interaction

    Authors: Haoran Wang, Xinji Mai, Zeng Tao, Yan Wang, Jiawen Yu, Ziheng Zhou, Xuan Tong, Shaoqi Yan, Qing Zhao, Shuyong Gao, Wenqiang Zhang

    Abstract: Affective Forecasting, a research direction in psychology that predicts individuals future emotions, is often constrained by numerous external factors like social influence and temporal distance. To address this, we transform Affective Forecasting into a Deep Learning problem by designing an Emotion Forecasting paradigm based on two-party interactions. We propose a novel Emotion Forecasting (EF) t… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  50. arXiv:2407.16165  [pdf, other

    eess.IV cs.CV cs.LG

    Advanced AI Framework for Enhanced Detection and Assessment of Abdominal Trauma: Integrating 3D Segmentation with 2D CNN and RNN Models

    Authors: Liheng Jiang, Xuechun yang, Chang Yu, Zhizhong Wu, Yuting Wang

    Abstract: Trauma is a significant cause of mortality and disability, particularly among individuals under forty. Traditional diagnostic methods for traumatic injuries, such as X-rays, CT scans, and MRI, are often time-consuming and dependent on medical expertise, which can delay critical interventions. This study explores the application of artificial intelligence (AI) and machine learning (ML) to improve t… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: 6 Pages