Skip to main content

Showing 1–50 of 6,084 results for author: Wang, X

  1. arXiv:2407.20171  [pdf, other

    cs.CV

    Diffusion Feedback Helps CLIP See Better

    Authors: Wenxuan Wang, Quan Sun, Fan Zhang, Yepeng Tang, Jing Liu, Xinlong Wang

    Abstract: Contrastive Language-Image Pre-training (CLIP), which excels at abstracting open-world representations across domains and modalities, has become a foundation for a variety of vision and multimodal tasks. However, recent studies reveal that CLIP has severe visual shortcomings, such as which can hardly distinguish orientation, quantity, color, structure, etc. These visual shortcomings also limit the… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  2. arXiv:2407.20119  [pdf, ps, other

    cs.LG cs.AI

    Adaptive Self-supervised Robust Clustering for Unstructured Data with Unknown Cluster Number

    Authors: Chen-Lu Ding, Jiancan Wu, Wei Lin, Shiyang Shen, Xiang Wang, Yancheng Yuan

    Abstract: We introduce a novel self-supervised deep clustering approach tailored for unstructured data without requiring prior knowledge of the number of clusters, termed Adaptive Self-supervised Robust Clustering (ASRC). In particular, ASRC adaptively learns the graph structure and edge weights to capture both local and global structural information. The obtained graph enables us to learn clustering-friend… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  3. arXiv:2407.20111  [pdf, other

    cs.SD eess.AS eess.SP

    Enhancing Anti-spoofing Countermeasures Robustness through Joint Optimization and Transfer Learning

    Authors: Yikang Wang, Xingming Wang, Hiromitsu Nishizaki, Ming Li

    Abstract: Current research in synthesized speech detection primarily focuses on the generalization of detection systems to unknown spoofing methods of noise-free speech. However, the performance of anti-spoofing countermeasures (CM) system is often don't work as well in more challenging scenarios, such as those involving noise and reverberation. To address the problem of enhancing the robustness of CM syste… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: 29 pages, 4 figures, Journal Papers

  4. arXiv:2407.19875  [pdf, other

    cs.CV

    Exploring Robust Face-Voice Matching in Multilingual Environments

    Authors: Jiehui Tang, Xiaofei Wang, Zhen Xiao, Jiayi Liu, Xueliang Liu, Richang Hong

    Abstract: This paper presents Team Xaiofei's innovative approach to exploring Face-Voice Association in Multilingual Environments (FAME) at ACM Multimedia 2024. We focus on the impact of different languages in face-voice matching by building upon Fusion and Orthogonal Projection (FOP), introducing four key components: a dual-branch structure, dynamic sample pair weighting, robust data augmentation, and scor… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  5. arXiv:2407.19841  [pdf, other

    eess.SP cs.AR

    RRAM-Based Bio-Inspired Circuits for Mobile Epileptic Correlation Extraction and Seizure Prediction

    Authors: Hao Wang, Lingfeng Zhang, Erjia Xiao, Xin Wang, Zhongrui Wang, Renjing Xu

    Abstract: Non-invasive mobile electroencephalography (EEG) acquisition systems have been utilized for long-term monitoring of seizures, yet they suffer from limited battery life. Resistive random access memory (RRAM) is widely used in computing-in-memory(CIM) systems, which offers an ideal platform for reducing the computational energy consumption of seizure prediction algorithms, potentially solving the en… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: 7 pages, 5 figures

  6. arXiv:2407.19420  [pdf, other

    cs.LG

    UniGAP: A Universal and Adaptive Graph Upsampling Approach to Mitigate Over-Smoothing in Node Classification Tasks

    Authors: Xiaotang Wang, Yun Zhu, Haizhou Shi, Yongchao Liu, Chuntao Hong

    Abstract: In the graph domain, deep graph networks based on Message Passing Neural Networks (MPNNs) or Graph Transformers often cause over-smoothing of node features, limiting their expressive capacity. Many upsampling techniques involving node and edge manipulation have been proposed to mitigate this issue. However, these methods often require extensive manual labor, resulting in suboptimal performance and… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

  7. arXiv:2407.19389  [pdf, other

    cs.DC cs.LG math.OC

    FIARSE: Model-Heterogeneous Federated Learning via Importance-Aware Submodel Extraction

    Authors: Feijie Wu, Xingchen Wang, Yaqing Wang, Tianci Liu, Lu Su, Jing Gao

    Abstract: In federated learning (FL), accommodating clients' varied computational capacities poses a challenge, often limiting the participation of those with constrained resources in global model training. To address this issue, the concept of model heterogeneity through submodel extraction has emerged, offering a tailored solution that aligns the model's complexity with each client's computational capacit… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

  8. arXiv:2407.19364  [pdf, other

    cs.HC cs.CR

    Defogger: A Visual Analysis Approach for Data Exploration of Sensitive Data Protected by Differential Privacy

    Authors: Xumeng Wang, Shuangcheng Jiao, Chris Bryan

    Abstract: Differential privacy ensures the security of individual privacy but poses challenges to data exploration processes because the limited privacy budget incapacitates the flexibility of exploration and the noisy feedback of data requests leads to confusing uncertainty. In this study, we take the lead in describing corresponding exploration scenarios, including underlying requirements and available ex… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

    Comments: 11 pages, 8 figures

  9. arXiv:2407.19178  [pdf, other

    cs.CV eess.SP

    Power-LLaVA: Large Language and Vision Assistant for Power Transmission Line Inspection

    Authors: Jiahao Wang, Mingxuan Li, Haichen Luo, Jinguo Zhu, Aijun Yang, Mingzhe Rong, Xiaohua Wang

    Abstract: The inspection of power transmission line has achieved notable achievements in the past few years, primarily due to the integration of deep learning technology. However, current inspection approaches continue to encounter difficulties in generalization and intelligence, which restricts their further applicability. In this paper, we introduce Power-LLaVA, the first large language and vision assista… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

  10. arXiv:2407.18961  [pdf, other

    cs.AI

    MMAU: A Holistic Benchmark of Agent Capabilities Across Diverse Domains

    Authors: Guoli Yin, Haoping Bai, Shuang Ma, Feng Nan, Yanchao Sun, Zhaoyang Xu, Shen Ma, Jiarui Lu, Xiang Kong, Aonan Zhang, Dian Ang Yap, Yizhe zhang, Karsten Ahnert, Vik Kamath, Mathias Berglund, Dominic Walsh, Tobias Gindele, Juergen Wiest, Zhengfeng Lai, Xiaoming Wang, Jiulong Shan, Meng Cao, Ruoming Pang, Zirui Wang

    Abstract: Recent advances in large language models (LLMs) have increased the demand for comprehensive benchmarks to evaluate their capabilities as human-like agents. Existing benchmarks, while useful, often focus on specific application scenarios, emphasizing task completion but failing to dissect the underlying skills that drive these outcomes. This lack of granularity makes it difficult to deeply discern… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  11. arXiv:2407.18902  [pdf, other

    cs.RO cs.AI cs.LG

    Lessons from Learning to Spin "Pens"

    Authors: Jun Wang, Ying Yuan, Haichuan Che, Haozhi Qi, Yi Ma, Jitendra Malik, Xiaolong Wang

    Abstract: In-hand manipulation of pen-like objects is an important skill in our daily lives, as many tools such as hammers and screwdrivers are similarly shaped. However, current learning-based methods struggle with this task due to a lack of high-quality demonstrations and the significant gap between simulation and the real world. In this work, we push the boundaries of learning-based in-hand manipulation… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: Website: https://penspin.github.io/

  12. arXiv:2407.18715  [pdf, other

    cs.CV

    BCTR: Bidirectional Conditioning Transformer for Scene Graph Generation

    Authors: Peng Hao, Xiaobing Wang, Yingying Jiang, Hanchao Jia, Xiaoshuai Hao

    Abstract: Scene Graph Generation (SGG) remains a challenging task due to its compositional property. Previous approaches improve prediction efficiency by learning in an end-to-end manner. However, these methods exhibit limited performance as they assume unidirectional conditioning between entities and predicates, leading to insufficient information interaction. To address this limitation, we propose a novel… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: 9 pages, 3 figures

  13. arXiv:2407.18427  [pdf, other

    cs.HC

    Quantifying Emotional Responses to Immutable Data Characteristics and Designer Choices in Data Visualizations

    Authors: Carter Blair, Xiyao Wang, Charles Perin

    Abstract: Emotion is an important factor to consider when designing visualizations as it can impact the amount of trust viewers place in a visualization, how well they can retrieve information and understand the underlying data, and how much they engage with or connect to a visualization. We conducted five crowdsourced experiments to quantify the effects of color, chart type, data trend, data variability an… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: Accepted to IEEE VIS 2024. 11 pages, 25 figures

    ACM Class: H.1.2

  14. arXiv:2407.18271  [pdf, other

    cs.AR cs.AI

    Large Language Model for Verilog Generation with Golden Code Feedback

    Authors: Ning Wang, Bingkun Yao, Jie Zhou, Xi Wang, Zhe Jiang, Nan Guan

    Abstract: Recent advancements in large language models (LLMs) have catalyzed significant interest in the automatic generation of Register-Transfer Level (RTL) code, particularly Verilog, from natural language instructions. While commercial LLMs like ChatGPT have dominated this domain, open-source alternatives have lagged considerably in performance, limiting the flexibility and data privacy of this emerging… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

  15. arXiv:2407.18232  [pdf, other

    cs.CV

    LION: Linear Group RNN for 3D Object Detection in Point Clouds

    Authors: Zhe Liu, Jinghua Hou, Xinyu Wang, Xiaoqing Ye, Jingdong Wang, Hengshuang Zhao, Xiang Bai

    Abstract: The benefit of transformers in large-scale 3D point cloud perception tasks, such as 3D object detection, is limited by their quadratic computation cost when modeling long-range relationships. In contrast, linear RNNs have low computational complexity and are suitable for long-range modeling. Toward this goal, we propose a simple and effective window-based framework built on LInear grOup RNN (i.e.,… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: Project page: https://happinesslz.github.io/projects/LION/

  16. arXiv:2407.18181  [pdf, other

    cs.LG cs.AI

    Gene Regulatory Network Inference from Pre-trained Single-Cell Transcriptomics Transformer with Joint Graph Learning

    Authors: Sindhura Kommu, Yizhi Wang, Yue Wang, Xuan Wang

    Abstract: Inferring gene regulatory networks (GRNs) from single-cell RNA sequencing (scRNA-seq) data is a complex challenge that requires capturing the intricate relationships between genes and their regulatory interactions. In this study, we tackle this challenge by leveraging the single-cell BERT-based pre-trained transformer model (scBERT), trained on extensive unlabeled scRNA-seq data, to augment struct… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: Accepted into the ICML 2024 AI for Science workshop

  17. arXiv:2407.18170  [pdf, other

    cs.LG

    RIDA: A Robust Attack Framework on Incomplete Graphs

    Authors: Jianke Yu, Hanchen Wang, Chen Chen, Xiaoyang Wang, Wenjie Zhang, Ying Zhang

    Abstract: Graph Neural Networks (GNNs) are vital in data science but are increasingly susceptible to adversarial attacks. To help researchers develop more robust GNN models, it's essential to focus on designing strong attack models as foundational benchmarks and guiding references. Among adversarial attacks, gray-box poisoning attacks are noteworthy due to their effectiveness and fewer constraints. These at… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  18. arXiv:2407.18137  [pdf, other

    cs.CV

    XS-VID: An Extremely Small Video Object Detection Dataset

    Authors: Jiahao Guo, Ziyang Xu, Lianjun Wu, Fei Gao, Wenyu Liu, Xinggang Wang

    Abstract: Small Video Object Detection (SVOD) is a crucial subfield in modern computer vision, essential for early object discovery and detection. However, existing SVOD datasets are scarce and suffer from issues such as insufficiently small objects, limited object categories, and lack of scene diversity, leading to unitary application scenarios for corresponding methods. To address this gap, we develop the… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  19. arXiv:2407.18064  [pdf, other

    cs.HC

    ComPeer: A Generative Conversational Agent for Proactive Peer Support

    Authors: Tianjian Liu, Hongzheng Zhao, Yuheng Liu, Xingbo Wang, Zhenhui Peng

    Abstract: Conversational Agents (CAs) acting as peer supporters have been widely studied and demonstrated beneficial for people's mental health. However, previous peer support CAs either are user-initiated or follow predefined rules to initiate the conversations, which may discourage users to engage and build relationships with the CAs for long-term benefits. In this paper, we develop ComPeer, a generative… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: 22 pages (7 figures, 7 tables)

  20. arXiv:2407.18054  [pdf, other

    eess.IV cs.CV

    LKCell: Efficient Cell Nuclei Instance Segmentation with Large Convolution Kernels

    Authors: Ziwei Cui, Jingfeng Yao, Lunbin Zeng, Juan Yang, Wenyu Liu, Xinggang Wang

    Abstract: The segmentation of cell nuclei in tissue images stained with the blood dye hematoxylin and eosin (H$\&$E) is essential for various clinical applications and analyses. Due to the complex characteristics of cellular morphology, a large receptive field is considered crucial for generating high-quality segmentation. However, previous methods face challenges in achieving a balance between the receptiv… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  21. arXiv:2407.17834  [pdf, other

    cs.CV

    Towards the Spectral bias Alleviation by Normalizations in Coordinate Networks

    Authors: Zhicheng Cai, Hao Zhu, Qiu Shen, Xinran Wang, Xun Cao

    Abstract: Representing signals using coordinate networks dominates the area of inverse problems recently, and is widely applied in various scientific computing tasks. Still, there exists an issue of spectral bias in coordinate networks, limiting the capacity to learn high-frequency components. This problem is caused by the pathological distribution of the neural tangent kernel's (NTK's) eigenvalues of coord… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  22. arXiv:2407.17734  [pdf, other

    cs.AI cs.CL cs.CV

    Cost-effective Instruction Learning for Pathology Vision and Language Analysis

    Authors: Kaitao Chen, Mianxin Liu, Fang Yan, Lei Ma, Xiaoming Shi, Lilong Wang, Xiaosong Wang, Lifeng Zhu, Zhe Wang, Mu Zhou, Shaoting Zhang

    Abstract: The advent of vision-language models fosters the interactive conversations between AI-enabled models and humans. Yet applying these models into clinics must deal with daunting challenges around large-scale training data, financial, and computational resources. Here we propose a cost-effective instruction learning framework for conversational pathology named as CLOVER. CLOVER only trains a lightwei… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  23. arXiv:2407.17726  [pdf, other

    cs.LG cs.CV

    Multi-modal Data Binding for Survival Analysis Modeling with Incomplete Data and Annotations

    Authors: Linhao Qu, Dan Huang, Shaoting Zhang, Xiaosong Wang

    Abstract: Survival analysis stands as a pivotal process in cancer treatment research, crucial for predicting patient survival rates accurately. Recent advancements in data collection techniques have paved the way for enhancing survival predictions by integrating information from multiple modalities. However, real-world scenarios often present challenges with incomplete data, particularly when dealing with c… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: Accepted by MICCAI 2024

  24. arXiv:2407.17721  [pdf, other

    cs.LG physics.comp-ph

    A Two-Stage Imaging Framework Combining CNN and Physics-Informed Neural Networks for Full-Inverse Tomography: A Case Study in Electrical Impedance Tomography (EIT)

    Authors: Xuanxuan Yang, Yangming Zhang, Haofeng Chen, Gang Ma, Xiaojie Wang

    Abstract: Physics-Informed Neural Networks (PINNs) are a machine learning technique for solving partial differential equations (PDEs) by incorporating PDEs as loss terms in neural networks and minimizing the loss function during training. Tomographic imaging, a method to reconstruct internal properties from external measurement data, is highly complex and ill-posed, making it an inverse problem. Recently, P… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  25. arXiv:2407.17630  [pdf, other

    cs.CV

    Revising the Problem of Partial Labels from the Perspective of CNNs' Robustness

    Authors: Xin Zhang, Yuqi Song, Wyatt McCurdy, Xiaofeng Wang, Fei Zuo

    Abstract: Convolutional neural networks (CNNs) have gained increasing popularity and versatility in recent decades, finding applications in diverse domains. These remarkable achievements are greatly attributed to the support of extensive datasets with precise labels. However, annotating image datasets is intricate and complex, particularly in the case of multi-label datasets. Hence, the concept of partial-l… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  26. arXiv:2407.17229  [pdf, other

    cs.CV

    LPGen: Enhancing High-Fidelity Landscape Painting Generation through Diffusion Model

    Authors: Wanggong Yang, Xiaona Wang, Yingrui Qiu, Yifei Zhao

    Abstract: Generating landscape paintings expands the possibilities of artistic creativity and imagination. Traditional landscape painting methods involve using ink or colored ink on rice paper, which requires substantial time and effort. These methods are susceptible to errors and inconsistencies and lack precise control over lines and colors. This paper presents LPGen, a high-fidelity, controllable model f… ▽ More

    Submitted 25 July, 2024; v1 submitted 24 July, 2024; originally announced July 2024.

  27. arXiv:2407.17190  [pdf, other

    cs.CE

    Fusing LLMs and KGs for Formal Causal Reasoning behind Financial Risk Contagion

    Authors: Guanyuan Yu, Xv Wang, Qing Li, Yu Zhao

    Abstract: Financial risks trend to spread from one entity to another, ultimately leading to systemic risks. The key to preventing such risks lies in understanding the causal chains behind risk contagion. Despite this, prevailing approaches primarily emphasize identifying risks, overlooking the underlying causal analysis of risk. To address such an issue, we propose a Risk Contagion Causal Reasoning model ca… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  28. arXiv:2407.17115  [pdf, other

    cs.IR

    Reinforced Prompt Personalization for Recommendation with Large Language Models

    Authors: Wenyu Mao, Jiancan Wu, Weijian Chen, Chongming Gao, Xiang Wang, Xiangnan He

    Abstract: Designing effective prompts can empower LLMs to understand user preferences and provide recommendations by leveraging LLMs' intent comprehension and knowledge utilization capabilities. However, existing research predominantly concentrates on task-wise prompting, developing fixed prompt templates composed of four patterns (i.e., role-playing, history records, reasoning guidance, and output format)… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  29. arXiv:2407.17104  [pdf, ps, other

    cs.CE

    A simple hybrid linear and non-linear interpolation finite element for adaptive cracking elements method

    Authors: Xueya Wang, Yiming Zhang, Minjie Wen, Herbert Mang

    Abstract: Cracking Elements Method (CEM) is a numerical tool to simulate quasi-brittle fractures, which does not need remeshing, nodal enrichment, or complicated crack tracking strategy. The cracking elements used in the CEM can be considered as a special type of finite element implemented in the standard finite element frameworks. One disadvantage of CEM is that it uses nonlinear interpolation of the displ… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: It is very useful for FEM researchers

  30. arXiv:2407.16741  [pdf, other

    cs.SE cs.AI cs.CL

    OpenDevin: An Open Platform for AI Software Developers as Generalist Agents

    Authors: Xingyao Wang, Boxuan Li, Yufan Song, Frank F. Xu, Xiangru Tang, Mingchen Zhuge, Jiayi Pan, Yueqi Song, Bowen Li, Jaskirat Singh, Hoang H. Tran, Fuqiang Li, Ren Ma, Mingzhang Zheng, Bill Qian, Yanjun Shao, Niklas Muennighoff, Yizhe Zhang, Binyuan Hui, Junyang Lin, Robert Brennan, Hao Peng, Heng Ji, Graham Neubig

    Abstract: Software is one of the most powerful tools that we humans have at our disposal; it allows a skilled programmer to interact with the world in complex and profound ways. At the same time, thanks to improvements in large language models (LLMs), there has also been a rapid development in AI agents that interact with and affect change in their surrounding environments. In this paper, we introduce OpenD… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: Code: https://github.com/OpenDevin/OpenDevin

  31. arXiv:2407.16680  [pdf, other

    cs.RO cs.LG

    A Simulation Benchmark for Autonomous Racing with Large-Scale Human Data

    Authors: Adrian Remonda, Nicklas Hansen, Ayoub Raji, Nicola Musiu, Marko Bertogna, Eduardo Veas, Xiaolong Wang

    Abstract: Despite the availability of international prize-money competitions, scaled vehicles, and simulation environments, research on autonomous racing and the control of sports cars operating close to the limit of handling has been limited by the high costs of vehicle acquisition and management, as well as the limited physics accuracy of open-source simulators. In this paper, we propose a racing simulati… ▽ More

    Submitted 24 July, 2024; v1 submitted 23 July, 2024; originally announced July 2024.

    Comments: Project page and code can be found at: \url{https://assetto-corsa-gym.github.io/}

  32. arXiv:2407.16674  [pdf, other

    cs.LG cs.AI

    KAN or MLP: A Fairer Comparison

    Authors: Runpeng Yu, Weihao Yu, Xinchao Wang

    Abstract: This paper does not introduce a novel method. Instead, it offers a fairer and more comprehensive comparison of KAN and MLP models across various tasks, including machine learning, computer vision, audio processing, natural language processing, and symbolic formula representation. Specifically, we control the number of parameters and FLOPs to compare the performance of KAN and MLP. Our main observa… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: Technical Report

  33. arXiv:2407.16295  [pdf, other

    cs.CR

    Manifoldchain: Maximizing Blockchain Throughput via Bandwidth-Clustered Sharding

    Authors: Chunjiang Che, Songze Li, Xuechao Wang

    Abstract: Bandwidth limitation is the major bottleneck that hinders scaling throughput of proof-of-work blockchains. To guarantee security, the mining rate of the blockchain is determined by the miners with the lowest bandwidth, resulting in an inefficient bandwidth utilization among fast miners. We propose Manifoldchain, an innovative blockchain sharding protocol that alleviates the impact of slow miners t… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  34. arXiv:2407.16255  [pdf

    cs.LG cond-mat.mes-hall cs.AI

    Self-Reasoning Assistant Learning for non-Abelian Gauge Fields Design

    Authors: Jinyang Sun, Xi Chen, Xiumei Wang, Dandan Zhu, Xingping Zhou

    Abstract: Non-Abelian braiding has attracted substantial attention because of its pivotal role in describing the exchange behaviour of anyons, in which the input and outcome of non-Abelian braiding are connected by a unitary matrix. Implementing braiding in a classical system can assist the experimental investigation of non-Abelian physics. However, the design of non-Abelian gauge fields faces numerous chal… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  35. arXiv:2407.16205  [pdf, other

    cs.CR cs.AI cs.CL cs.LG

    Figure it Out: Analyzing-based Jailbreak Attack on Large Language Models

    Authors: Shi Lin, Rongchang Li, Xun Wang, Changting Lin, Wenpeng Xing, Meng Han

    Abstract: The rapid development of Large Language Models (LLMs) has brought remarkable generative capabilities across diverse tasks. However, despite the impressive achievements, these models still have numerous security vulnerabilities, particularly when faced with jailbreak attacks. Therefore, by investigating jailbreak attacks, we can uncover hidden weaknesses in LLMs and guide us in developing more robu… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  36. arXiv:2407.16131  [pdf, other

    cond-mat.mtrl-sci cs.LG physics.comp-ph

    Crystals with Transformers on Graphs, for Prediction of Unconventional Crystal Material Properties and the Benchmark

    Authors: Hongyi Wang, Ji Sun, Jinzhe Liang, Li Zhai, Zitian Tang, Zijian Li, Wei Zhai, Xusheng Wang, Weihao Gao, Sheng Gong, Bolong Huang, Hua Zhang

    Abstract: The ionic bonding across the lattice and ordered microscopic structures endow crystals with unique symmetry and determine their macroscopic properties. Unconventional crystals, in particular, exhibit non-traditional lattice structures or possess exotic physical properties, making them intriguing subjects for investigation. Therefore, to accurately predict the physical and chemical properties of cr… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  37. arXiv:2407.15912  [pdf, other

    cs.CR

    The Shadow of Fraud: The Emerging Danger of AI-powered Social Engineering and its Possible Cure

    Authors: Jingru Yu, Yi Yu, Xuhong Wang, Yilun Lin, Manzhi Yang, Yu Qiao, Fei-Yue Wang

    Abstract: Social engineering (SE) attacks remain a significant threat to both individuals and organizations. The advancement of Artificial Intelligence (AI), including diffusion models and large language models (LLMs), has potentially intensified these threats by enabling more personalized and convincing attacks. This survey paper categorizes SE attack mechanisms, analyzes their evolution, and explores meth… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  38. arXiv:2407.15855  [pdf

    cs.CR cs.LG

    Data Poisoning Attacks in Intelligent Transportation Systems: A Survey

    Authors: Feilong Wang, Xin Wang, Xuegang Ban

    Abstract: Emerging technologies drive the ongoing transformation of Intelligent Transportation Systems (ITS). This transformation has given rise to cybersecurity concerns, among which data poisoning attack emerges as a new threat as ITS increasingly relies on data. In data poisoning attacks, attackers inject malicious perturbations into datasets, potentially leading to inaccurate results in offline learning… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: This is an accepted version that is published on journal Transportation Research Part C: Emerging Technologies

  39. arXiv:2407.15537  [pdf, other

    cs.LG cs.RO

    Exterior Penalty Policy Optimization with Penalty Metric Network under Constraints

    Authors: Shiqing Gao, Jiaxin Ding, Luoyi Fu, Xinbing Wang, Chenghu Zhou

    Abstract: In Constrained Reinforcement Learning (CRL), agents explore the environment to learn the optimal policy while satisfying constraints. The penalty function method has recently been studied as an effective approach for handling constraints, which imposes constraints penalties on the objective to transform the constrained problem into an unconstrained one. However, it is challenging to choose appropr… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: To be published in the 33rd International Joint Conference on Artificial Intelligence (IJCAI 2024)

  40. arXiv:2407.15451  [pdf, other

    cs.CV

    Domain-Adaptive 2D Human Pose Estimation via Dual Teachers in Extremely Low-Light Conditions

    Authors: Yihao Ai, Yifei Qi, Bo Wang, Yu Cheng, Xinchao Wang, Robby T. Tan

    Abstract: Existing 2D human pose estimation research predominantly concentrates on well-lit scenarios, with limited exploration of poor lighting conditions, which are a prevalent aspect of daily life. Recent studies on low-light pose estimation require the use of paired well-lit and low-light images with ground truths for training, which are impractical due to the inherent challenges associated with annotat… ▽ More

    Submitted 23 July, 2024; v1 submitted 22 July, 2024; originally announced July 2024.

    Comments: 18 pages, 3 figure. Accepted by ECCV24

  41. arXiv:2407.15441  [pdf, other

    cs.CL

    Developing a Reliable, General-Purpose Hallucination Detection and Mitigation Service: Insights and Lessons Learned

    Authors: Song Wang, Xun Wang, Jie Mei, Yujia Xie, Sean Muarray, Zhang Li, Lingfeng Wu, Si-Qing Chen, Wayne Xiong

    Abstract: Hallucination, a phenomenon where large language models (LLMs) produce output that is factually incorrect or unrelated to the input, is a major challenge for LLM applications that require accuracy and dependability. In this paper, we introduce a reliable and high-speed production system aimed at detecting and rectifying the hallucination issue within LLMs. Our system encompasses named entity recog… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  42. arXiv:2407.15362  [pdf, other

    cs.CV cs.AI

    A Multimodal Knowledge-enhanced Whole-slide Pathology Foundation Model

    Authors: Yingxue Xu, Yihui Wang, Fengtao Zhou, Jiabo Ma, Shu Yang, Huangjing Lin, Xin Wang, Jiguang Wang, Li Liang, Anjia Han, Ronald Cheong Kin Chan, Hao Chen

    Abstract: Remarkable strides in computational pathology have been made in the task-agnostic foundation model that advances the performance of a wide array of downstream clinical tasks. Despite the promising performance, there are still several challenges. First, prior works have resorted to either vision-only or vision-captions data, disregarding invaluable pathology reports and gene expression profiles whi… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: 44 pages, 9 figures

  43. arXiv:2407.15352  [pdf, other

    cs.CL

    MAVEN-Fact: A Large-scale Event Factuality Detection Dataset

    Authors: Chunyang Li, Hao Peng, Xiaozhi Wang, Yunjia Qi, Lei Hou, Bin Xu, Juanzi Li

    Abstract: Event Factuality Detection (EFD) task determines the factuality of textual events, i.e., classifying whether an event is a fact, possibility, or impossibility, which is essential for faithfully understanding and utilizing event knowledge. However, due to the lack of high-quality large-scale data, event factuality detection is under-explored in event understanding research, which limits the develop… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

    Comments: Under review

  44. arXiv:2407.15341  [pdf, other

    cs.CL

    ZZU-NLP at SIGHAN-2024 dimABSA Task: Aspect-Based Sentiment Analysis with Coarse-to-Fine In-context Learning

    Authors: Senbin Zhu, Hanjie Zhao, Xingren Wang, Shanhong Liu, Yuxiang Jia, Hongying Zan

    Abstract: The DimABSA task requires fine-grained sentiment intensity prediction for restaurant reviews, including scores for Valence and Arousal dimensions for each Aspect Term. In this study, we propose a Coarse-to-Fine In-context Learning(CFICL) method based on the Baichuan2-7B model for the DimABSA task in the SIGHAN 2024 workshop. Our method improves prediction accuracy through a two-stage optimization… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

  45. arXiv:2407.15062  [pdf, other

    cs.CR

    AGORA: Open More and Trust Less in Binary Verification Service

    Authors: Hongbo Chen, Quan Zhou, Sen Yang, Xing Han, Fan Zhang, Danfeng Zhang, Xiaofeng Wang

    Abstract: Binary verification plays a pivotal role in software security, yet building a verification service that is both open and trustworthy poses a formidable challenge. In this paper, we introduce a novel binary verification service, AGORA, scrupulously designed to overcome the challenge. At the heart of this approach lies a strategic insight: certain tasks can be delegated to untrusted entities, while… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

  46. arXiv:2407.14985  [pdf, other

    cs.CL cs.AI cs.LG

    Generalization v.s. Memorization: Tracing Language Models' Capabilities Back to Pretraining Data

    Authors: Antonis Antoniades, Xinyi Wang, Yanai Elazar, Alfonso Amayuelas, Alon Albalak, Kexun Zhang, William Yang Wang

    Abstract: Despite the proven utility of large language models (LLMs) in real-world applications, there remains a lack of understanding regarding how they leverage their large-scale pretraining text corpora to achieve such capabilities. In this work, we investigate the interplay between generalization and memorization in pretrained LLMs at scale, through a comprehensive $n$-gram analysis of their training da… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

    Comments: ICML FM-Wild workshop version

  47. arXiv:2407.14138  [pdf, other

    cs.CV

    Visual Text Generation in the Wild

    Authors: Yuanzhi Zhu, Jiawei Liu, Feiyu Gao, Wenyu Liu, Xinggang Wang, Peng Wang, Fei Huang, Cong Yao, Zhibo Yang

    Abstract: Recently, with the rapid advancements of generative models, the field of visual text generation has witnessed significant progress. However, it is still challenging to render high-quality text images in real-world scenarios, as three critical criteria should be satisfied: (1) Fidelity: the generated text images should be photo-realistic and the contents are expected to be the same as specified in… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  48. arXiv:2407.14117  [pdf, other

    cs.CV

    Rethinking Visual Content Refinement in Low-Shot CLIP Adaptation

    Authors: Jinda Lu, Shuo Wang, Yanbin Hao, Haifeng Liu, Xiang Wang, Meng Wang

    Abstract: Recent adaptations can boost the low-shot capability of Contrastive Vision-Language Pre-training (CLIP) by effectively facilitating knowledge transfer. However, these adaptation methods are usually operated on the global view of an input image, and thus biased perception of partial local details of the image. To solve this problem, we propose a Visual Content Refinement (VCR) before the adaptation… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  49. arXiv:2407.13301  [pdf, other

    cs.CL cs.AI cs.LG

    CoD, Towards an Interpretable Medical Agent using Chain of Diagnosis

    Authors: Junying Chen, Chi Gui, Anningzhe Gao, Ke Ji, Xidong Wang, Xiang Wan, Benyou Wang

    Abstract: The field of medical diagnosis has undergone a significant transformation with the advent of large language models (LLMs), yet the challenges of interpretability within these models remain largely unaddressed. This study introduces Chain-of-Diagnosis (CoD) to enhance the interpretability of LLM-based medical diagnostics. CoD transforms the diagnostic process into a diagnostic chain that mirrors a… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  50. arXiv:2407.13252  [pdf, other

    cs.CV

    Unveiling Structural Memorization: Structural Membership Inference Attack for Text-to-Image Diffusion Models

    Authors: Qiao Li, Xiaomeng Fu, Xi Wang, Jin Liu, Xingyu Gao, Jiao Dai, Jizhong Han

    Abstract: With the rapid advancements of large-scale text-to-image diffusion models, various practical applications have emerged, bringing significant convenience to society. However, model developers may misuse the unauthorized data to train diffusion models. These data are at risk of being memorized by the models, thus potentially violating citizens' privacy rights. Therefore, in order to judge whether a… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.