Skip to main content

Showing 1–50 of 8,279 results for author: Zhang, Y

  1. arXiv:2407.19692  [pdf, other

    cs.IR

    High-Order Fusion Graph Contrastive Learning for Recommendation

    Authors: Yu Zhang, Lei Sang, Yi Zhang, Yiwen Zhang

    Abstract: Self-supervised learning (SSL) has recently attracted significant attention in the field of recommender systems. Contrastive learning (CL) stands out as a major SSL paradigm due to its robust ability to generate self-supervised signals. Mainstream graph contrastive learning (GCL)-based methods typically implement CL by creating contrastive views through various data augmentation techniques. Despit… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  2. GradCraft: Elevating Multi-task Recommendations through Holistic Gradient Crafting

    Authors: Yimeng Bai, Yang Zhang, Fuli Feng, Jing Lu, Xiaoxue Zang, Chenyi Lei, Yang Song

    Abstract: Recommender systems require the simultaneous optimization of multiple objectives to accurately model user interests, necessitating the application of multi-task learning methods. However, existing multi-task learning methods in recommendations overlook the specific characteristics of recommendation scenarios, falling short in achieving proper gradient balance. To address this challenge, we set the… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: Accepted by KDD'24

    ACM Class: H.3.3; H.3.5

  3. arXiv:2407.19669  [pdf, other

    cs.CL cs.IR

    mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval

    Authors: Xin Zhang, Yanzhao Zhang, Dingkun Long, Wen Xie, Ziqi Dai, Jialong Tang, Huan Lin, Baosong Yang, Pengjun Xie, Fei Huang, Meishan Zhang, Wenjie Li, Min Zhang

    Abstract: We present systematic efforts in building long-context multilingual text representation model (TRM) and reranker from scratch for text retrieval. We first introduce a text encoder (base size) enhanced with RoPE and unpadding, pre-trained in a native 8192-token context (longer than 512 of previous multilingual encoders). Then we construct a hybrid TRM and a cross-encoder reranker by contrastive lea… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: 20 pages, 5 figures

  4. arXiv:2407.19621  [pdf, other

    cs.GR

    Structure-Aware Simplification for Hypergraph Visualization

    Authors: Peter Oliver, Eugene Zhang, Yue Zhang

    Abstract: Hypergraphs provide a natural way to represent polyadic relationships in network data. For large hypergraphs, it is often difficult to visually detect structures within the data. Recently, a scalable polygon-based visualization approach was developed allowing hypergraphs with thousands of hyperedges to be simplified and examined at different levels of detail. However, this approach is not guarante… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: 15 pages, 14 figures, to be published in VIS 2024

  5. arXiv:2407.19512  [pdf, other

    cs.CV

    Large-scale cervical precancerous screening via AI-assisted cytology whole slide image analysis

    Authors: Honglin Li, Yusuan Sun, Chenglu Zhu, Yunlong Zhang, Shichuan Zhang, Zhongyi Shui, Pingyi Chen, Jingxiong Li, Sunyi Zheng, Can Cui, Lin Yang

    Abstract: Cervical Cancer continues to be the leading gynecological malignancy, posing a persistent threat to women's health on a global scale. Early screening via cytology Whole Slide Image (WSI) diagnosis is critical to prevent this Cancer progression and improve survival rate, but pathologist's single test suffers inevitable false negative due to the immense number of cells that need to be reviewed withi… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

  6. arXiv:2407.19467  [pdf, other

    cs.IR cs.LG

    Enhancing Taobao Display Advertising with Multimodal Representations: Challenges, Approaches and Insights

    Authors: Xiang-Rong Sheng, Feifan Yang, Litong Gong, Biao Wang, Zhangming Chan, Yujing Zhang, Yueyao Cheng, Yong-Nan Zhu, Tiezheng Ge, Han Zhu, Yuning Jiang, Jian Xu, Bo Zheng

    Abstract: Despite the recognized potential of multimodal data to improve model accuracy, many large-scale industrial recommendation systems, including Taobao display advertising system, predominantly depend on sparse ID features in their models. In this work, we explore approaches to leverage multimodal data to enhance the recommendation accuracy. We start from identifying the key challenges in adopting mul… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: Accepted at CIKM 2024

  7. arXiv:2407.19415  [pdf, other

    cs.MM cs.AI

    Start from Video-Music Retrieval: An Inter-Intra Modal Loss for Cross Modal Retrieval

    Authors: Zeyu Chen, Pengfei Zhang, Kai Ye, Wei Dong, Xin Feng, Yana Zhang

    Abstract: The burgeoning short video industry has accelerated the advancement of video-music retrieval technology, assisting content creators in selecting appropriate music for their videos. In self-supervised training for video-to-music retrieval, the video and music samples in the dataset are separated from the same video work, so they are all one-to-one matches. This does not match the real situation. In… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: 10 pages, 7 figures

    ACM Class: I.2; I.4

  8. arXiv:2407.18961  [pdf, other

    cs.AI

    MMAU: A Holistic Benchmark of Agent Capabilities Across Diverse Domains

    Authors: Guoli Yin, Haoping Bai, Shuang Ma, Feng Nan, Yanchao Sun, Zhaoyang Xu, Shen Ma, Jiarui Lu, Xiang Kong, Aonan Zhang, Dian Ang Yap, Yizhe zhang, Karsten Ahnert, Vik Kamath, Mathias Berglund, Dominic Walsh, Tobias Gindele, Juergen Wiest, Zhengfeng Lai, Xiaoming Wang, Jiulong Shan, Meng Cao, Ruoming Pang, Zirui Wang

    Abstract: Recent advances in large language models (LLMs) have increased the demand for comprehensive benchmarks to evaluate their capabilities as human-like agents. Existing benchmarks, while useful, often focus on specific application scenarios, emphasizing task completion but failing to dissect the underlying skills that drive these outcomes. This lack of granularity makes it difficult to deeply discern… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  9. arXiv:2407.18957  [pdf, other

    q-fin.TR cs.AI cs.MA

    When AI Meets Finance (StockAgent): Large Language Model-based Stock Trading in Simulated Real-world Environments

    Authors: Chong Zhang, Xinyi Liu, Mingyu Jin, Zhongmou Zhang, Lingyao Li, Zhengting Wang, Wenyue Hua, Dong Shu, Suiyuan Zhu, Xiaobo Jin, Sujian Li, Mengnan Du, Yongfeng Zhang

    Abstract: Can AI Agents simulate real-world trading environments to investigate the impact of external factors on stock trading activities (e.g., macroeconomics, policy changes, company fundamentals, and global events)? These factors, which frequently influence trading behaviors, are critical elements in the quest for maximizing investors' profits. Our work attempts to solve this problem through large langu… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 33 pages, 10 figures

  10. arXiv:2407.18903  [pdf, other

    cs.CE

    Using high-fidelity discrete element simulation to calibrate an expeditious terramechanics model in a multibody dynamics framework

    Authors: Yuemin Zhang, Junpeng Dai, Wei Hu, Dan Negrut

    Abstract: The wheel-soil interaction has great impact on the dynamics of off-road vehicles in terramechanics applications. The Soil Contact Model (SCM), which anchors an empirical method to characterize the frictional contact between a wheel and soil, has been widely used in off-road vehicle dynamics simulations because it quickly produces adequate results for many terramechanics applications. The SCM appro… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: version has Appendix

    MSC Class: 70-10

  11. arXiv:2407.18843  [pdf

    cs.RO physics.bio-ph physics.flu-dyn

    Morphing median fin enhances untethered bionic robotic tuna's linear acceleration and turning maneuverability

    Authors: Hongbin Huang, Zhonglu Lin, Wei Zheng, Jinhu Zhang, Zhibin Liu, Wei Zhou, Yu Zhang

    Abstract: Median fins of fish-like swimmers play a crucial role in linear acceleration and maneuvering processes. However, few research focused on untethered robotic fish experiments. Imitating the behaviour of real tuna, we developed a free-swimming bionic tuna with a foldable dorsal fin. The erection of dorsal fin, at proper conditions, can reduce head heave by 50%, enhance linear acceleration by 15.7%, i… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: 7 pages, 5 figures

  12. arXiv:2407.18667  [pdf, other

    cs.CV

    A Labeled Ophthalmic Ultrasound Dataset with Medical Report Generation Based on Cross-modal Deep Learning

    Authors: Jing Wang, Junyan Fan, Meng Zhou, Yanzhu Zhang, Mingyu Shi

    Abstract: Ultrasound imaging reveals eye morphology and aids in diagnosing and treating eye diseases. However, interpreting diagnostic reports requires specialized physicians. We present a labeled ophthalmic dataset for the precise analysis and the automated exploration of medical images along with their associated reports. It collects three modal data, including the ultrasound images, blood flow informatio… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

  13. arXiv:2407.18632  [pdf, other

    cs.LG

    Robust VAEs via Generating Process of Noise Augmented Data

    Authors: Hiroo Irobe, Wataru Aoki, Kimihiro Yamazaki, Yuhui Zhang, Takumi Nakagawa, Hiroki Waida, Yuichiro Wada, Takafumi Kanamori

    Abstract: Advancing defensive mechanisms against adversarial attacks in generative models is a critical research topic in machine learning. Our study focuses on a specific type of generative models - Variational Auto-Encoders (VAEs). Contrary to common beliefs and existing literature which suggest that noise injection towards training data can make models more robust, our preliminary experiments revealed th… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

  14. arXiv:2407.18625  [pdf, other

    cs.ET cs.AI cs.NE

    Topology Optimization of Random Memristors for Input-Aware Dynamic SNN

    Authors: Bo Wang, Shaocong Wang, Ning Lin, Yi Li, Yifei Yu, Yue Zhang, Jichang Yang, Xiaoshan Wu, Yangu He, Songqi Wang, Rui Chen, Guoqi Li, Xiaojuan Qi, Zhongrui Wang, Dashan Shang

    Abstract: There is unprecedented development in machine learning, exemplified by recent large language models and world simulators, which are artificial neural networks running on digital computers. However, they still cannot parallel human brains in terms of energy efficiency and the streamlined adaptability to inputs of different difficulties, due to differences in signal representation, optimization, run… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: 15 pages, 5 figures

  15. arXiv:2407.18523  [pdf, other

    cs.LG

    DTFormer: A Transformer-Based Method for Discrete-Time Dynamic Graph Representation Learning

    Authors: Xi Chen, Yun Xiong, Siwei Zhang, Jiawei Zhang, Yao Zhang, Shiyang Zhou, Xixi Wu, Mingyang Zhang, Tengfei Liu, Weiqiang Wang

    Abstract: Discrete-Time Dynamic Graphs (DTDGs), which are prevalent in real-world implementations and notable for their ease of data acquisition, have garnered considerable attention from both academic researchers and industry practitioners. The representation learning of DTDGs has been extensively applied to model the dynamics of temporally changing entities and their evolving connections. Currently, DTDG… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: 11 pages, 3 figures

  16. arXiv:2407.18466  [pdf, other

    cs.CV

    A Progressive Single-Modality to Multi-Modality Classification Framework for Alzheimer's Disease Sub-type Diagnosis

    Authors: Yuxiao Liu, Mianxin Liu, Yuanwang Zhang, Kaicong Sun, Dinggang Shen

    Abstract: The current clinical diagnosis framework of Alzheimer's disease (AD) involves multiple modalities acquired from multiple diagnosis stages, each with distinct usage and cost. Previous AD diagnosis research has predominantly focused on how to directly fuse multiple modalities for an end-to-end one-stage diagnosis, which practically requires a high cost in data acquisition. Moreover, a significant pa… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  17. arXiv:2407.18170  [pdf, other

    cs.LG

    RIDA: A Robust Attack Framework on Incomplete Graphs

    Authors: Jianke Yu, Hanchen Wang, Chen Chen, Xiaoyang Wang, Wenjie Zhang, Ying Zhang

    Abstract: Graph Neural Networks (GNNs) are vital in data science but are increasingly susceptible to adversarial attacks. To help researchers develop more robust GNN models, it's essential to focus on designing strong attack models as foundational benchmarks and guiding references. Among adversarial attacks, gray-box poisoning attacks are noteworthy due to their effectiveness and fewer constraints. These at… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  18. arXiv:2407.17721  [pdf, other

    cs.LG physics.comp-ph

    A Two-Stage Imaging Framework Combining CNN and Physics-Informed Neural Networks for Full-Inverse Tomography: A Case Study in Electrical Impedance Tomography (EIT)

    Authors: Xuanxuan Yang, Yangming Zhang, Haofeng Chen, Gang Ma, Xiaojie Wang

    Abstract: Physics-Informed Neural Networks (PINNs) are a machine learning technique for solving partial differential equations (PDEs) by incorporating PDEs as loss terms in neural networks and minimizing the loss function during training. Tomographic imaging, a method to reconstruct internal properties from external measurement data, is highly complex and ill-posed, making it an inverse problem. Recently, P… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  19. arXiv:2407.17709  [pdf, other

    cs.RO

    PGD-VIO: An Accurate Plane-Aided Visual-Inertial Odometry with Graph-Based Drift Suppression

    Authors: Yidi Zhang, Fulin Tang, Zewen Xu, Yihong Wu, Pengju Ma

    Abstract: Generally, high-level features provide more geometrical information compared to point features, which can be exploited to further constrain motions. Planes are commonplace in man-made environments, offering an active means to reduce drift, due to their extensive spatial and temporal observability. To make full use of planar information, we propose a novel visual-inertial odometry (VIO) using an RG… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  20. arXiv:2407.17703  [pdf, other

    cs.LG physics.soc-ph

    Context-aware knowledge graph framework for traffic speed forecasting using graph neural network

    Authors: Yatao Zhang, Yi Wang, Song Gao, Martin Raubal

    Abstract: Human mobility is intricately influenced by urban contexts spatially and temporally, constituting essential domain knowledge in understanding traffic systems. While existing traffic forecasting models primarily rely on raw traffic data and advanced deep learning techniques, incorporating contextual information remains underexplored due to the lack of effective integration frameworks and the comple… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: 13 pages, 4 figures

  21. arXiv:2407.17678  [pdf, other

    cs.CL

    Efficient LLM Training and Serving with Heterogeneous Context Sharding among Attention Heads

    Authors: Xihui Lin, Yunan Zhang, Suyu Ge, Barun Patra, Vishrav Chaudhary, Xia Song

    Abstract: Existing LLM training and inference frameworks struggle in boosting efficiency with sparsity while maintaining the integrity of context and model architecture. Inspired by the sharding concept in database and the fact that attention parallelizes over heads on accelerators, we propose Sparsely-Sharded (S2) Attention, an attention algorithm that allocates heterogeneous context partitions for differe… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: 10 pages

  22. arXiv:2407.17267  [pdf, other

    cs.CV

    M4: Multi-Proxy Multi-Gate Mixture of Experts Network for Multiple Instance Learning in Histopathology Image Analysis

    Authors: Junyu Li, Ye Zhang, Wen Shu, Xiaobing Feng, Yingchun Wang, Pengju Yan, Xiaolin Li, Chulin Sha, Min He

    Abstract: Multiple instance learning (MIL) has been successfully applied for whole slide images (WSIs) analysis in computational pathology, enabling a wide range of prediction tasks from tumor subtyping to inferring genetic mutations and multi-omics biomarkers. However, existing MIL methods predominantly focus on single-task learning, resulting in not only overall low efficiency but also the overlook of int… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: 25pages,5figures

  23. arXiv:2407.17234  [pdf, other

    cs.IR

    Intent-guided Heterogeneous Graph Contrastive Learning for Recommendation

    Authors: Lei Sang, Yu Wang, Yi Zhang, Yiwen Zhang, Xindong Wu

    Abstract: Contrastive Learning (CL)-based recommender systems have gained prominence in the context of Heterogeneous Graph (HG) due to their capacity to enhance the consistency of representations across different views. However, existing frameworks often neglect the fact that user-item interactions within HG are governed by diverse latent intents (e.g., brand preferences or demographic characteristics of it… ▽ More

    Submitted 28 July, 2024; v1 submitted 24 July, 2024; originally announced July 2024.

    Comments: 14pages, 11figures

  24. arXiv:2407.17104  [pdf, ps, other

    cs.CE

    A simple hybrid linear and non-linear interpolation finite element for adaptive cracking elements method

    Authors: Xueya Wang, Yiming Zhang, Minjie Wen, Herbert Mang

    Abstract: Cracking Elements Method (CEM) is a numerical tool to simulate quasi-brittle fractures, which does not need remeshing, nodal enrichment, or complicated crack tracking strategy. The cracking elements used in the CEM can be considered as a special type of finite element implemented in the standard finite element frameworks. One disadvantage of CEM is that it uses nonlinear interpolation of the displ… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: It is very useful for FEM researchers

  25. arXiv:2407.17030  [pdf, other

    cs.NI

    Applications of Multi-Agent Deep Reinforcement Learning Communication in Network Management: A Survey

    Authors: Yue Pi, Wang Zhang, Yong Zhang, Hairong Huang, Baoquan Rao, Yulong Ding, Shuanghua Yang

    Abstract: With the advancement of artificial intelligence technology, the automation of network management, also known as Autonomous Driving Networks (ADN), is gaining widespread attention. The network management has shifted from traditional homogeneity and centralization to heterogeneity and decentralization. Multi-agent deep reinforcement learning (MADRL) allows agents to make decisions based on local obs… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  26. arXiv:2407.16997  [pdf, other

    cs.CL

    Revisiting Who's Harry Potter: Towards Targeted Unlearning from a Causal Intervention Perspective

    Authors: Yujian Liu, Yang Zhang, Tommi Jaakkola, Shiyu Chang

    Abstract: This paper investigates Who's Harry Potter (WHP), a pioneering yet insufficiently understood method for LLM unlearning. We explore it in two steps. First, we introduce a new task of LLM targeted unlearning, where given an unlearning target (e.g., a person) and some unlearning documents, we aim to unlearn only the information about the target, rather than everything in the unlearning documents. We… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  27. arXiv:2407.16982  [pdf, other

    cs.CV cs.AI

    Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model

    Authors: Lirui Zhao, Tianshuo Yang, Wenqi Shao, Yuxin Zhang, Yu Qiao, Ping Luo, Kaipeng Zhang, Rongrong Ji

    Abstract: This paper addresses an important problem of object addition for images with only text guidance. It is challenging because the new object must be integrated seamlessly into the image with consistent visual context, such as lighting, texture, and spatial location. While existing text-guided image inpainting methods can add objects, they either fail to preserve the background consistency or involve… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  28. arXiv:2407.16957  [pdf, other

    cs.CV

    Raindrop Clarity: A Dual-Focused Dataset for Day and Night Raindrop Removal

    Authors: Yeying Jin, Xin Li, Jiadong Wang, Yan Zhang, Malu Zhang

    Abstract: Existing raindrop removal datasets have two shortcomings. First, they consist of images captured by cameras with a focus on the background, leading to the presence of blurry raindrops. To our knowledge, none of these datasets include images where the focus is specifically on raindrops, which results in a blurry background. Second, these datasets predominantly consist of daytime images, thereby lac… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV2024, dataset and benchmark at: \url{https://github.com/jinyeying/RaindropClarity}

  29. arXiv:2407.16942  [pdf

    cs.GR

    EUFormer: Learning Driven 3D Spine Deformity Assessment with Orthogonal Optical Images

    Authors: Nan Meng, Jason P. Y. Cheung, Tao Huang, Moxin Zhao, Yue Zhang, Chenxi Yu, Chang Shi, Teng Zhang

    Abstract: In clinical settings, the screening, diagnosis, and monitoring of adolescent idiopathic scoliosis (AIS) typically involve physical or radiographic examinations. However, physical examinations are subjective, while radiographic examinations expose patients to harmful radiation. Consequently, we propose a pipeline that can accurately determine scoliosis severity. This pipeline utilizes posteroanteri… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  30. arXiv:2407.16875  [pdf, other

    cs.CV

    PathwayBench: Assessing Routability of Pedestrian Pathway Networks Inferred from Multi-City Imagery

    Authors: Yuxiang Zhang, Bill Howe, Sachin Mehta, Nicholas-J Bolten, Anat Caspi

    Abstract: Applications to support pedestrian mobility in urban areas require a complete, and routable graph representation of the built environment. Globally available information, including aerial imagery provides a scalable source for constructing these path networks, but the associated learning problem is challenging: Relative to road network pathways, pedestrian network pathways are narrower, more frequ… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: text overlap with arXiv:2303.02323

  31. arXiv:2407.16741  [pdf, other

    cs.SE cs.AI cs.CL

    OpenDevin: An Open Platform for AI Software Developers as Generalist Agents

    Authors: Xingyao Wang, Boxuan Li, Yufan Song, Frank F. Xu, Xiangru Tang, Mingchen Zhuge, Jiayi Pan, Yueqi Song, Bowen Li, Jaskirat Singh, Hoang H. Tran, Fuqiang Li, Ren Ma, Mingzhang Zheng, Bill Qian, Yanjun Shao, Niklas Muennighoff, Yizhe Zhang, Binyuan Hui, Junyang Lin, Robert Brennan, Hao Peng, Heng Ji, Graham Neubig

    Abstract: Software is one of the most powerful tools that we humans have at our disposal; it allows a skilled programmer to interact with the world in complex and profound ways. At the same time, thanks to improvements in large language models (LLMs), there has also been a rapid development in AI agents that interact with and affect change in their surrounding environments. In this paper, we introduce OpenD… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: Code: https://github.com/OpenDevin/OpenDevin

  32. arXiv:2407.16732  [pdf, other

    cs.SE cs.AI

    PyBench: Evaluating LLM Agent on various real-world coding tasks

    Authors: Yaolun Zhang, Yinxu Pan, Yudong Wang, Jie Cai, Zhi Zheng, Guoyang Zeng, Zhiyuan Liu

    Abstract: The LLM Agent, equipped with a code interpreter, is capable of automatically solving real-world coding tasks, such as data analysis and image editing. However, existing benchmarks primarily focus on either simplistic tasks, such as completing a few lines of code, or on extremely complex and specific tasks at the repository level, neither of which are representative of various daily coding tasks.… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: 9 pages

  33. arXiv:2407.16684  [pdf, other

    eess.IV cs.CV q-bio.NC

    AutoRG-Brain: Grounded Report Generation for Brain MRI

    Authors: Jiayu Lei, Xiaoman Zhang, Chaoyi Wu, Lisong Dai, Ya Zhang, Yanyong Zhang, Yanfeng Wang, Weidi Xie, Yuehua Li

    Abstract: Radiologists are tasked with interpreting a large number of images in a daily base, with the responsibility of generating corresponding reports. This demanding workload elevates the risk of human error, potentially leading to treatment delays, increased healthcare costs, revenue loss, and operational inefficiencies. To address these challenges, we initiate a series of work on grounded Automatic Re… ▽ More

    Submitted 26 July, 2024; v1 submitted 23 July, 2024; originally announced July 2024.

  34. arXiv:2407.16575  [pdf, other

    cs.CV

    Timeliness-Fidelity Tradeoff in 3D Scene Representations

    Authors: Xiangmin Xu, Zhen Meng, Yichi Zhang, Changyang She, Philip G. Zhao

    Abstract: Real-time three-dimensional (3D) scene representations serve as one of the building blocks that bolster various innovative applications, e.g., digital manufacturing, Virtual/Augmented/Extended/Mixed Reality (VR/AR/XR/MR), and the metaverse. Despite substantial efforts that have been made to real-time communications and computing, real-time 3D scene representations remain a challenging task. This p… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: This paper has been accepted for publication by the IEEE International Conference on Computer Communications (INFOCOM) Workshops 2024

  35. arXiv:2407.16327  [pdf, other

    cs.CR cs.CV

    Understanding Impacts of Electromagnetic Signal Injection Attacks on Object Detection

    Authors: Youqian Zhang, Chunxi Yang, Eugene Y. Fu, Qinhong Jiang, Chen Yan, Sze-Yiu Chau, Grace Ngai, Hong-Va Leong, Xiapu Luo, Wenyuan Xu

    Abstract: Object detection can localize and identify objects in images, and it is extensively employed in critical multimedia applications such as security surveillance and autonomous driving. Despite the success of existing object detection models, they are often evaluated in ideal scenarios where captured images guarantee the accurate and complete representation of the detecting scenes. However, images ca… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: 2024 IEEE International Conference on Multimedia and Expo (ICME), July 15 - July 19, 2024, Niagra Falls, Ontario, Canada

  36. arXiv:2407.16234  [pdf, other

    cs.CV cs.CL

    A Multi-view Mask Contrastive Learning Graph Convolutional Neural Network for Age Estimation

    Authors: Yiping Zhang, Yuntao Shou, Tao Meng, Wei Ai, Keqin Li

    Abstract: The age estimation task aims to use facial features to predict the age of people and is widely used in public security, marketing, identification, and other fields. However, the features are mainly concentrated in facial keypoints, and existing CNN and Transformer-based methods have inflexibility and redundancy for modeling complex irregular structures. Therefore, this paper proposes a Multi-view… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: 20 pages, 9 figures

  37. arXiv:2407.16154  [pdf, other

    cs.CL

    DDK: Distilling Domain Knowledge for Efficient Large Language Models

    Authors: Jiaheng Liu, Chenchen Zhang, Jinyang Guo, Yuanxing Zhang, Haoran Que, Ken Deng, Zhiqi Bai, Jie Liu, Ge Zhang, Jiakai Wang, Yanan Wu, Congnan Liu, Wenbo Su, Jiamang Wang, Lin Qu, Bo Zheng

    Abstract: Despite the advanced intelligence abilities of large language models (LLMs) in various applications, they still face significant computational and storage demands. Knowledge Distillation (KD) has emerged as an effective strategy to improve the performance of a smaller LLM (i.e., the student model) by transferring knowledge from a high-performing LLM (i.e., the teacher model). Prevailing techniques… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  38. arXiv:2407.16142  [pdf, other

    cs.LG cs.AI cs.CV cs.RO

    Diffusion Models as Optimizers for Efficient Planning in Offline RL

    Authors: Renming Huang, Yunqiang Pei, Guoqing Wang, Yangming Zhang, Yang Yang, Peng Wang, Hengtao Shen

    Abstract: Diffusion models have shown strong competitiveness in offline reinforcement learning tasks by formulating decision-making as sequential generation. However, the practicality of these methods is limited due to the lengthy inference processes they require. In this paper, we address this problem by decomposing the sampling process of diffusion models into two decoupled subprocesses: 1) generating a f… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: The paper was accepted by ECCV2024

  39. arXiv:2407.16129  [pdf, other

    cs.CV cs.AI

    FoRA: Low-Rank Adaptation Model beyond Multimodal Siamese Network

    Authors: Weiying Xie, Yusi Zhang, Tianlin Hui, Jiaqing Zhang, Jie Lei, Yunsong Li

    Abstract: Multimodal object detection offers a promising prospect to facilitate robust detection in various visual conditions. However, existing two-stream backbone networks are challenged by complex fusion and substantial parameter increments. This is primarily due to large data distribution biases of multimodal homogeneous information. In this paper, we propose a novel multimodal object detector, named Lo… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  40. arXiv:2407.15888  [pdf, other

    q-bio.GN cs.LG

    A Benchmark Dataset for Multimodal Prediction of Enzymatic Function Coupling DNA Sequences and Natural Language

    Authors: Yuchen Zhang, Ratish Kumar Chandrakant Jha, Soumya Bharadwaj, Vatsal Sanjaykumar Thakkar, Adrienne Hoarfrost, Jin Sun

    Abstract: Predicting gene function from its DNA sequence is a fundamental challenge in biology. Many deep learning models have been proposed to embed DNA sequences and predict their enzymatic function, leveraging information in public databases linking DNA sequences to an enzymatic function label. However, much of the scientific community's knowledge of biological function is not represented in these catego… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

  41. arXiv:2407.15880  [pdf, other

    cs.LG cs.AI q-bio.QM

    Diff4VS: HIV-inhibiting Molecules Generation with Classifier Guidance Diffusion for Virtual Screening

    Authors: Jiaqing Lyu, Changjie Chen, Bing Liang, Yijia Zhang

    Abstract: The AIDS epidemic has killed 40 million people and caused serious global problems. The identification of new HIV-inhibiting molecules is of great importance for combating the AIDS epidemic. Here, the Classifier Guidance Diffusion model and ligand-based virtual screening strategy are combined to discover potential HIV-inhibiting molecules for the first time. We call it Diff4VS. An extra classifier… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

  42. arXiv:2407.15787  [pdf, other

    cs.CV

    Unsupervised Mastoidectomy for Cochlear CT Mesh Reconstruction Using Highly Noisy Data

    Authors: Yike Zhang, Dingjie Su, Eduardo Davalos, Jack H. Noble

    Abstract: Cochlear Implant (CI) procedures involve inserting an array of electrodes into the cochlea located inside the inner ear. Mastoidectomy is a surgical procedure that uses a high-speed drill to remove part of the mastoid region of the temporal bone, providing safe access to the cochlea through the middle and inner ear. We aim to develop an intraoperative navigation system that registers plans created… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  43. arXiv:2407.15247  [pdf, other

    cs.LG stat.ML

    TimeInf: Time Series Data Contribution via Influence Functions

    Authors: Yizi Zhang, Jingyan Shen, Xiaoxue Xiong, Yongchan Kwon

    Abstract: Evaluating the contribution of individual data points to a model's prediction is critical for interpreting model predictions and improving model performance. Existing data contribution methods have been applied to various data types, including tabular data, images, and texts; however, their primary focus has been on i.i.d. settings. Despite the pressing need for principled approaches tailored to t… ▽ More

    Submitted 23 July, 2024; v1 submitted 21 July, 2024; originally announced July 2024.

  44. arXiv:2407.15221  [pdf, other

    cs.NI cs.DC

    Secure Web Objects: Building Blocks for Metaverse Interoperability and Decentralization

    Authors: Tianyuan Yu, Xinyu Ma, Varun Patil, Yekta Kocaogullar, Yulong Zhang, Jeff Burke, Dirk Kutscher, Lixia Zhang

    Abstract: This position paper explores how to support the Web's evolution through an underlying data-centric approach that better matches the data-orientedness of modern and emerging applications. We revisit the original vision of the Web as a hypermedia system that supports document composability and application interoperability via name-based data access. We propose the use of secure web objects (SWO), a… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

    Comments: 9 pages

    ACM Class: H.3.5

  45. arXiv:2407.15141  [pdf, other

    cs.AI cs.LG physics.chem-ph

    Text-Augmented Multimodal LLMs for Chemical Reaction Condition Recommendation

    Authors: Yu Zhang, Ruijie Yu, Kaipeng Zeng, Ding Li, Feng Zhu, Xiaokang Yang, Yaohui Jin, Yanyan Xu

    Abstract: High-throughput reaction condition (RC) screening is fundamental to chemical synthesis. However, current RC screening suffers from laborious and costly trial-and-error workflows. Traditional computer-aided synthesis planning (CASP) tools fail to find suitable RCs due to data sparsity and inadequate reaction representations. Nowadays, large language models (LLMs) are capable of tackling chemistry-r… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

  46. arXiv:2407.15098  [pdf, other

    cs.CR cs.LG

    SeqMIA: Sequential-Metric Based Membership Inference Attack

    Authors: Hao Li, Zheng Li, Siyuan Wu, Chengrui Hu, Yutong Ye, Min Zhang, Dengguo Feng, Yang Zhang

    Abstract: Most existing membership inference attacks (MIAs) utilize metrics (e.g., loss) calculated on the model's final state, while recent advanced attacks leverage metrics computed at various stages, including both intermediate and final stages, throughout the model training. Nevertheless, these attacks often process multiple intermediate states of the metric independently, ignoring their time-dependent… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

    Comments: Accepted by ACM CCS 2024

  47. arXiv:2407.15026  [pdf, other

    cs.AR cs.AI

    Benchmarking End-To-End Performance of AI-Based Chip Placement Algorithms

    Authors: Zhihai Wang, Zijie Geng, Zhaojie Tu, Jie Wang, Yuxi Qian, Zhexuan Xu, Ziyan Liu, Siyuan Xu, Zhentao Tang, Shixiong Kai, Mingxuan Yuan, Jianye Hao, Bin Li, Yongdong Zhang, Feng Wu

    Abstract: The increasing complexity of modern very-large-scale integration (VLSI) design highlights the significance of Electronic Design Automation (EDA) technologies. Chip placement is a critical step in the EDA workflow, which positions chip modules on the canvas with the goal of optimizing performance, power, and area (PPA) metrics of final chip designs. Recent advances have demonstrated the great poten… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: A comprehensive benchmark for AI-based chip placement algorithms using end-to-end performance metrics

  48. arXiv:2407.14923  [pdf, other

    cs.CV

    RayFormer: Improving Query-Based Multi-Camera 3D Object Detection via Ray-Centric Strategies

    Authors: Xiaomeng Chu, Jiajun Deng, Guoliang You, Yifan Duan, Yao Li, Yanyong Zhang

    Abstract: The recent advances in query-based multi-camera 3D object detection are featured by initializing object queries in the 3D space, and then sampling features from perspective-view images to perform multi-round query refinement. In such a framework, query points near the same camera ray are likely to sample similar features from very close pixels, resulting in ambiguous query features and degraded de… ▽ More

    Submitted 27 July, 2024; v1 submitted 20 July, 2024; originally announced July 2024.

    Comments: Accepted by ACM Multimedia 2024

  49. arXiv:2407.14845  [pdf, other

    cs.LG cs.CL

    Understanding the Relationship between Prompts and Response Uncertainty in Large Language Models

    Authors: Ze Yu Zhang, Arun Verma, Finale Doshi-Velez, Bryan Kian Hsiang Low

    Abstract: Large language models (LLMs) are widely used in decision-making, but their reliability, especially in critical tasks like healthcare, is not well-established. Therefore, understanding how LLMs reason and make decisions is crucial for their safe deployment. This paper investigates how the uncertainty of responses generated by LLMs relates to the information provided in the input prompt. Leveraging… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

    Comments: 27 pages, 11 figures

  50. arXiv:2407.14668  [pdf, other

    q-bio.NC cs.LG cs.NE

    Towards a "universal translator" for neural dynamics at single-cell, single-spike resolution

    Authors: Yizi Zhang, Yanchen Wang, Donato Jimenez-Beneto, Zixuan Wang, Mehdi Azabou, Blake Richards, Olivier Winter, International Brain Laboratory, Eva Dyer, Liam Paninski, Cole Hurwitz

    Abstract: Neuroscience research has made immense progress over the last decade, but our understanding of the brain remains fragmented and piecemeal: the dream of probing an arbitrary brain region and automatically reading out the information encoded in its neural activity remains out of reach. In this work, we build towards a first foundation model for neural spiking data that can solve a diverse set of tas… ▽ More

    Submitted 23 July, 2024; v1 submitted 19 July, 2024; originally announced July 2024.