Skip to main content

Showing 1–50 of 3,147 results for author: Zhang, C

  1. arXiv:2407.20121  [pdf, other

    cs.IR cs.AI

    EXIT: An EXplicit Interest Transfer Framework for Cross-Domain Recommendation

    Authors: Lei Huang, Weitao Li, Chenrui Zhang, Jinpeng Wang, Xianchun Yi, Sheng Chen

    Abstract: Cross-domain recommendation has attracted substantial interest in industrial apps such as Meituan, which serves multiple business domains via knowledge transfer and meets the diverse interests of users. However, existing methods typically follow an implicit modeling paradigm that blends the knowledge from both the source and target domains, and design intricate network structures to share learned… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: Accepted at CIKM 2024

  2. arXiv:2407.19984  [pdf, other

    cs.CL

    Confidence Estimation for Automatic Detection of Depression and Alzheimer's Disease Based on Clinical Interviews

    Authors: Wen Wu, Chao Zhang, Philip C. Woodland

    Abstract: Speech-based automatic detection of Alzheimer's disease (AD) and depression has attracted increased attention. Confidence estimation is crucial for a trust-worthy automatic diagnostic system which informs the clinician about the confidence of model predictions and helps reduce the risk of misdiagnosis. This paper investigates confidence estimation for automatic detection of AD and depression based… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: Accepted by Interspeech 2024

  3. arXiv:2407.19507  [pdf, other

    cs.CV cs.AI

    WeCromCL: Weakly Supervised Cross-Modality Contrastive Learning for Transcription-only Supervised Text Spotting

    Authors: Jingjing Wu, Zhengyao Fang, Pengyuan Lyu, Chengquan Zhang, Fanglin Chen, Guangming Lu, Wenjie Pei

    Abstract: Transcription-only Supervised Text Spotting aims to learn text spotters relying only on transcriptions but no text boundaries for supervision, thus eliminating expensive boundary annotation. The crux of this task lies in locating each transcription in scene text images without location annotations. In this work, we formulate this challenging problem as a Weakly Supervised Cross-modality Contrastiv… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  4. arXiv:2407.18957  [pdf, other

    q-fin.TR cs.AI cs.MA

    When AI Meets Finance (StockAgent): Large Language Model-based Stock Trading in Simulated Real-world Environments

    Authors: Chong Zhang, Xinyi Liu, Mingyu Jin, Zhongmou Zhang, Lingyao Li, Zhengting Wang, Wenyue Hua, Dong Shu, Suiyuan Zhu, Xiaobo Jin, Sujian Li, Mengnan Du, Yongfeng Zhang

    Abstract: Can AI Agents simulate real-world trading environments to investigate the impact of external factors on stock trading activities (e.g., macroeconomics, policy changes, company fundamentals, and global events)? These factors, which frequently influence trading behaviors, are critical elements in the quest for maximizing investors' profits. Our work attempts to solve this problem through large langu… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 33 pages, 10 figures

  5. arXiv:2407.17889  [pdf

    cs.NE

    An Error Discovery and Correction for the Family of V-Shaped BPSO Algorithms

    Authors: Qing Zhao, Chengkui Zhang, Hao Li, Ting Ke

    Abstract: BPSO algorithm is a swarm intelligence optimization algorithm, which has the characteristics of good optimization effect, high efficiency and easy to implement. In recent years, it has been used to optimize a variety of machine learning and deep learning models, such as CNN, LSTM, SVM, etc. But it is easy to fall into local optimum for the lack of exploitation ability. It is found that in the arti… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: 25 pages, 11 figures

  6. arXiv:2407.17792  [pdf, other

    cs.CV

    Harnessing Temporal Causality for Advanced Temporal Action Detection

    Authors: Shuming Liu, Lin Sui, Chen-Lin Zhang, Fangzhou Mu, Chen Zhao, Bernard Ghanem

    Abstract: As a fundamental task in long-form video understanding, temporal action detection (TAD) aims to capture inherent temporal relations in untrimmed videos and identify candidate actions with precise boundaries. Over the years, various networks, including convolutions, graphs, and transformers, have been explored for effective temporal modeling for TAD. However, these modules typically treat past and… ▽ More

    Submitted 25 July, 2024; v1 submitted 25 July, 2024; originally announced July 2024.

    Comments: 1st in Moment Queries track at the Ego4D Challenge 2024; 1st in Action Recognition, Action Detection, and Audio-Based Interaction Detection tracks at the EPIC-Kitchens Challenge 2024

  7. arXiv:2407.17674  [pdf, other

    cs.LG q-bio.BM

    Synthetic High-resolution Cryo-EM Density Maps with Generative Adversarial Networks

    Authors: Chenwei Zhang, Anne Condon, Khanh Dao Duc

    Abstract: Generating synthetic cryogenic electron microscopy (cryo-EM) 3D density maps from molecular structures has potential important applications in structural biology. Yet existing simulation-based methods cannot mimic all the complex features present in experimental maps, such as secondary structure elements. As an alternative, we propose struc2mapGAN, a novel data-driven method that employs a generat… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  8. TWIN V2: Scaling Ultra-Long User Behavior Sequence Modeling for Enhanced CTR Prediction at Kuaishou

    Authors: Zihua Si, Lin Guan, ZhongXiang Sun, Xiaoxue Zang, Jing Lu, Yiqun Hui, Xingchao Cao, Zeyu Yang, Yichen Zheng, Dewei Leng, Kai Zheng, Chenbin Zhang, Yanan Niu, Yang Song, Kun Gai

    Abstract: The significance of modeling long-term user interests for CTR prediction tasks in large-scale recommendation systems is progressively gaining attention among researchers and practitioners. Existing work, such as SIM and TWIN, typically employs a two-stage approach to model long-term user behavior sequences for efficiency concerns. The first stage rapidly retrieves a subset of sequences related to… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: Accepted by CIKM 2024

  9. arXiv:2407.16154  [pdf, other

    cs.CL

    DDK: Distilling Domain Knowledge for Efficient Large Language Models

    Authors: Jiaheng Liu, Chenchen Zhang, Jinyang Guo, Yuanxing Zhang, Haoran Que, Ken Deng, Zhiqi Bai, Jie Liu, Ge Zhang, Jiakai Wang, Yanan Wu, Congnan Liu, Wenbo Su, Jiamang Wang, Lin Qu, Bo Zheng

    Abstract: Despite the advanced intelligence abilities of large language models (LLMs) in various applications, they still face significant computational and storage demands. Knowledge Distillation (KD) has emerged as an effective strategy to improve the performance of a smaller LLM (i.e., the student model) by transferring knowledge from a high-performing LLM (i.e., the teacher model). Prevailing techniques… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  10. arXiv:2407.15861  [pdf, other

    cs.CR cs.AI cs.CV

    Adversarial Attacks and Defenses on Text-to-Image Diffusion Models: A Survey

    Authors: Chenyu Zhang, Mingwang Hu, Wenhui Li, Lanjun Wang

    Abstract: Recently, the text-to-image diffusion model has gained considerable attention from the community due to its exceptional image generation capability. A representative model, Stable Diffusion, amassed more than 10 million users within just two months of its release. This surge in popularity has facilitated studies on the robustness and safety of the model, leading to the proposal of various adversar… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  11. arXiv:2407.15661  [pdf, other

    cs.CV

    DriveDiTFit: Fine-tuning Diffusion Transformers for Autonomous Driving

    Authors: Jiahang Tu, Wei Ji, Hanbin Zhao, Chao Zhang, Roger Zimmermann, Hui Qian

    Abstract: In autonomous driving, deep models have shown remarkable performance across various visual perception tasks with the demand of high-quality and huge-diversity training datasets. Such datasets are expected to cover various driving scenarios with adverse weather, lighting conditions and diverse moving objects. However, manually collecting these data presents huge challenges and expensive cost. With… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  12. arXiv:2407.15431  [pdf, other

    cs.SI cs.AI cs.LG

    Pre-Training and Prompting for Few-Shot Node Classification on Text-Attributed Graphs

    Authors: Huanjing Zhao, Beining Yang, Yukuo Cen, Junyu Ren, Chenhui Zhang, Yuxiao Dong, Evgeny Kharlamov, Shu Zhao, Jie Tang

    Abstract: The text-attributed graph (TAG) is one kind of important real-world graph-structured data with each node associated with raw texts. For TAGs, traditional few-shot node classification methods directly conduct training on the pre-processed node features and do not consider the raw texts. The performance is highly dependent on the choice of the feature pre-processing method. In this paper, we propose… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: Accepted to KDD'24

  13. arXiv:2407.15360  [pdf, other

    cs.CL

    Dissecting Multiplication in Transformers: Insights into LLMs

    Authors: Luyu Qiu, Jianing Li, Chi Su, Chen Jason Zhang, Lei Chen

    Abstract: Transformer-based large language models have achieved remarkable performance across various natural language processing tasks. However, they often struggle with seemingly easy tasks like arithmetic despite their vast capabilities. This stark disparity raise human's concerns about their safe and ethical use, hinder their widespread adoption.In this paper, we focus on a typical arithmetic task, inte… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: 8 pages, 5 figures

  14. arXiv:2407.15083  [pdf, other

    cs.LG

    Rocket Landing Control with Random Annealing Jump Start Reinforcement Learning

    Authors: Yuxuan Jiang, Yujie Yang, Zhiqian Lan, Guojian Zhan, Shengbo Eben Li, Qi Sun, Jian Ma, Tianwen Yu, Changwu Zhang

    Abstract: Rocket recycling is a crucial pursuit in aerospace technology, aimed at reducing costs and environmental impact in space exploration. The primary focus centers on rocket landing control, involving the guidance of a nonlinear underactuated rocket with limited fuel in real-time. This challenging task prompts the application of reinforcement learning (RL), yet goal-oriented nature of the problem pose… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

    Comments: IROS 2024 Oral

  15. arXiv:2407.14769  [pdf, other

    cs.HC

    A Two-Phase Visualization System for Continuous Human-AI Collaboration in Sequelae Analysis and Modeling

    Authors: Yang Ouyang, Chenyang Zhang, He Wang, Tianle Ma, Chang Jiang, Yuheng Yan, Zuoqin Yan, Xiaojuan Ma, Chuhan Shi, Quan Li

    Abstract: In healthcare, AI techniques are widely used for tasks like risk assessment and anomaly detection. Despite AI's potential as a valuable assistant, its role in complex medical data analysis often oversimplifies human-AI collaboration dynamics. To address this, we collaborated with a local hospital, engaging six physicians and one data scientist in a formative study. From this collaboration, we prop… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

    Comments: To appear at the IEEE VIS Conference 2024

  16. arXiv:2407.14733  [pdf, other

    cs.LG cs.AI cs.CL

    Hard Prompts Made Interpretable: Sparse Entropy Regularization for Prompt Tuning with RL

    Authors: Yunseon Choi, Sangmin Bae, Seonghyun Ban, Minchan Jeong, Chuheng Zhang, Lei Song, Li Zhao, Jiang Bian, Kee-Eung Kim

    Abstract: With the advent of foundation models, prompt tuning has positioned itself as an important technique for directing model behaviors and eliciting desired responses. Prompt tuning regards selecting appropriate keywords included into the input, thereby adapting to the downstream task without adjusting or fine-tuning the model parameters. There is a wide range of work in prompt tuning, from approaches… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  17. arXiv:2407.14095  [pdf, other

    cs.GT cs.AI q-bio.NC

    People use fast, goal-directed simulation to reason about novel games

    Authors: Cedegao E. Zhang, Katherine M. Collins, Lionel Wong, Adrian Weller, Joshua B. Tenenbaum

    Abstract: We can evaluate features of problems and their potential solutions well before we can effectively solve them. When considering a game we have never played, for instance, we might infer whether it is likely to be challenging, fair, or fun simply from hearing the game rules, prior to deciding whether to invest time in learning the game or trying to play it well. Many studies of game play have focuse… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: Accepted at CogSci 2024 as a talk

  18. arXiv:2407.13646  [pdf, other

    cs.CV

    Beyond Dropout: Robust Convolutional Neural Networks Based on Local Feature Masking

    Authors: Yunpeng Gong, Chuangliang Zhang, Yongjie Hou, Lifei Chen, Min Jiang

    Abstract: In the contemporary of deep learning, where models often grapple with the challenge of simultaneously achieving robustness against adversarial attacks and strong generalization capabilities, this study introduces an innovative Local Feature Masking (LFM) strategy aimed at fortifying the performance of Convolutional Neural Networks (CNNs) on both fronts. During the training phase, we strategically… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: It has been accepted by IJCNN 2024

  19. arXiv:2407.13640  [pdf, other

    cs.CV

    Beyond Augmentation: Empowering Model Robustness under Extreme Capture Environments

    Authors: Yunpeng Gong, Yongjie Hou, Chuangliang Zhang, Min Jiang

    Abstract: Person Re-identification (re-ID) in computer vision aims to recognize and track individuals across different cameras. While previous research has mainly focused on challenges like pose variations and lighting changes, the impact of extreme capture conditions is often not adequately addressed. These extreme conditions, including varied lighting, camera styles, angles, and image distortions, can sig… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: It has been accepted by IJCNN 2024

  20. arXiv:2407.12777  [pdf, other

    cs.CV cs.GR

    Generalizable Human Gaussians for Sparse View Synthesis

    Authors: Youngjoong Kwon, Baole Fang, Yixing Lu, Haoye Dong, Cheng Zhang, Francisco Vicente Carrasco, Albert Mosella-Montoro, Jianjin Xu, Shingo Takagi, Daeil Kim, Aayush Prakash, Fernando De la Torre

    Abstract: Recent progress in neural rendering has brought forth pioneering methods, such as NeRF and Gaussian Splatting, which revolutionize view rendering across various domains like AR/VR, gaming, and content creation. While these methods excel at interpolating {\em within the training data}, the challenge of generalizing to new scenes and objects from very sparse views persists. Specifically, modeling 3D… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  21. arXiv:2407.12489  [pdf, other

    cs.CV

    Dual-level Adaptive Self-Labeling for Novel Class Discovery in Point Cloud Segmentation

    Authors: Ruijie Xu, Chuyu Zhang, Hui Ren, Xuming He

    Abstract: We tackle the novel class discovery in point cloud segmentation, which discovers novel classes based on the semantic knowledge of seen classes. Existing work proposes an online point-wise clustering method with a simplified equal class-size constraint on the novel classes to avoid degenerate solutions. However, the inherent imbalanced distribution of novel classes in point clouds typically violate… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  22. arXiv:2407.12038  [pdf, ps, other

    eess.AS cs.AI

    ICAGC 2024: Inspirational and Convincing Audio Generation Challenge 2024

    Authors: Ruibo Fu, Rui Liu, Chunyu Qiang, Yingming Gao, Yi Lu, Tao Wang, Ya Li, Zhengqi Wen, Chen Zhang, Hui Bu, Yukun Liu, Shuchen Shi, Xin Qi, Guanjun Li

    Abstract: The Inspirational and Convincing Audio Generation Challenge 2024 (ICAGC 2024) is part of the ISCSLP 2024 Competitions and Challenges track. While current text-to-speech (TTS) technology can generate high-quality audio, its ability to convey complex emotions and controlled detail content remains limited. This constraint leads to a discrepancy between the generated audio and human subjective percept… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: ISCSLP 2024 Challenge description

  23. arXiv:2407.11743  [pdf, other

    cs.CV cs.LG

    OAM-TCD: A globally diverse dataset of high-resolution tree cover maps

    Authors: Josh Veitch-Michaelis, Andrew Cottam, Daniella Schweizer, Eben N. Broadbent, David Dao, Ce Zhang, Angelica Almeyda Zambrano, Simeon Max

    Abstract: Accurately quantifying tree cover is an important metric for ecosystem monitoring and for assessing progress in restored sites. Recent works have shown that deep learning-based segmentation algorithms are capable of accurately mapping trees at country and continental scales using high-resolution aerial and satellite imagery. Mapping at high (ideally sub-meter) resolution is necessary to identify i… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 10 pages plus appendix/supplementary material, 3 figures in main text. 33 pages total, including references/supplementary. Code and documentation will be available at https://github.com/Restor-Foundation/tcd, the dataset will be made available at https://huggingface.co/restor

  24. arXiv:2407.11651  [pdf, other

    cs.IT eess.SP

    Fluid Antenna Grouping Index Modulation Design for MIMO Systems

    Authors: Xinghao Guo, Yin Xu, Dazhi He, Cixiao Zhang, Wenjun Zhang, Yi-yan Wu

    Abstract: Index modulation (IM) significantly enhances the spectral efficiency of fluid antennas (FAs) enabled multiple-input multiple-output (MIMO) systems, which is named FA-IM. However, due to the dense distribution of ports on fluid antennas, the wireless channel exhibits a high spatial correlation, resulting in severe performance degradation in the existing FA-IM scheme. This paper proposes a novel flu… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: A longer and more detailed version will be submitted to an IEEE journal

  25. arXiv:2407.11477  [pdf, other

    cs.LG cs.AI

    XTraffic: A Dataset Where Traffic Meets Incidents with Explainability and More

    Authors: Xiaochuan Gou, Ziyue Li, Tian Lan, Junpeng Lin, Zhishuai Li, Bingyu Zhao, Chen Zhang, Di Wang, Xiangliang Zhang

    Abstract: Long-separated research has been conducted on two highly correlated tracks: traffic and incidents. Traffic track witnesses complicating deep learning models, e.g., to push the prediction a few percent more accurate, and the incident track only studies the incidents alone, e.g., to infer the incident risk. We, for the first time, spatiotemporally aligned the two tracks in a large-scale region (16,9… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  26. arXiv:2407.11449  [pdf, other

    cs.CV cs.AI

    Controllable Contextualized Image Captioning: Directing the Visual Narrative through User-Defined Highlights

    Authors: Shunqi Mao, Chaoyi Zhang, Hang Su, Hwanjun Song, Igor Shalyminov, Weidong Cai

    Abstract: Contextualized Image Captioning (CIC) evolves traditional image captioning into a more complex domain, necessitating the ability for multimodal reasoning. It aims to generate image captions given specific contextual information. This paper further introduces a novel domain of Controllable Contextualized Image Captioning (Ctrl-CIC). Unlike CIC, which solely relies on broad context, Ctrl-CIC accentu… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  27. arXiv:2407.10806  [pdf, other

    cs.CV

    Enhancing Robustness to Noise Corruption for Point Cloud Model via Spatial Sorting and Set-Mixing Aggregation Module

    Authors: Dingxin Zhang, Jianhui Yu, Tengfei Xue, Chaoyi Zhang, Dongnan Liu, Weidong Cai

    Abstract: Current models for point cloud recognition demonstrate promising performance on synthetic datasets. However, real-world point cloud data inevitably contains noise, impacting model robustness. While recent efforts focus on enhancing robustness through various strategies, there still remains a gap in comprehensive analyzes from the standpoint of network architecture design. Unlike traditional method… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 22 pages, 9 figures

  28. arXiv:2407.10499  [pdf, other

    cs.CL

    CIBench: Evaluating Your LLMs with a Code Interpreter Plugin

    Authors: Songyang Zhang, Chuyu Zhang, Yingfan Hu, Haowen Shen, Kuikun Liu, Zerun Ma, Fengzhe Zhou, Wenwei Zhang, Xuming He, Dahua Lin, Kai Chen

    Abstract: While LLM-Based agents, which use external tools to solve complex problems, have made significant progress, benchmarking their ability is challenging, thereby hindering a clear understanding of their limitations. In this paper, we propose an interactive evaluation framework, named CIBench, to comprehensively assess LLMs' ability to utilize code interpreters for data science tasks. Our evaluation f… ▽ More

    Submitted 25 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: Under review. The first three authors contribute equally, and Songyang Zhang is the project leader

  29. arXiv:2407.10386  [pdf, ps, other

    cs.IT

    Two-Phase Channel Estimation for RIS-Aided Cell-Free Massive MIMO with Electromagnetic Interference

    Authors: Jun Qian, Chi Zhang, Khaled B. Letaief, Ross Murch

    Abstract: This work considers a reconfigurable intelligent surface (RIS)-aided cell-free massive multiple-input multiple-output (MIMO) system with RIS spatial correlation and electromagnetic interference (EMI). We propose a two-phase channel estimation scheme with fractional power control-aided pilot assignment to improve the estimation accuracy and system performance of RIS-aided cell-free massive MIMO sys… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: 6 pages, 3 figures. This paper has been submitted to 2024 IEEE MeditCom

  30. arXiv:2407.09429  [pdf, other

    cs.CL

    Open (Clinical) LLMs are Sensitive to Instruction Phrasings

    Authors: Alberto Mario Ceballos Arroyo, Monica Munnangi, Jiuding Sun, Karen Y. C. Zhang, Denis Jered McInerney, Byron C. Wallace, Silvio Amir

    Abstract: Instruction-tuned Large Language Models (LLMs) can perform a wide range of tasks given natural language instructions to do so, but they are sensitive to how such instructions are phrased. This issue is especially concerning in healthcare, as clinicians are unlikely to be experienced prompt engineers and the potential consequences of inaccurate outputs are heightened in this domain. This raises a… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: To appear at BioNLP, ACL 2024

  31. arXiv:2407.09292  [pdf, other

    cs.CR

    Counterfactual Explainable Incremental Prompt Attack Analysis on Large Language Models

    Authors: Dong Shu, Mingyu Jin, Tianle Chen, Chong Zhang, Yongfeng Zhang

    Abstract: This study sheds light on the imperative need to bolster safety and privacy measures in large language models (LLMs), such as GPT-4 and LLaMA-2, by identifying and mitigating their vulnerabilities through explainable analysis of prompt attacks. We propose Counterfactual Explainable Incremental Prompt Attack (CEIPA), a novel technique where we guide prompts in a specific manner to quantitatively me… ▽ More

    Submitted 17 July, 2024; v1 submitted 12 July, 2024; originally announced July 2024.

    Comments: 23 pages, 6 figures

  32. arXiv:2407.08970  [pdf, other

    cs.CR cs.AI cs.LG

    Soft Prompts Go Hard: Steering Visual Language Models with Hidden Meta-Instructions

    Authors: Tingwei Zhang, Collin Zhang, John X. Morris, Eugene Bagdasaryan, Vitaly Shmatikov

    Abstract: We introduce a new type of indirect injection vulnerabilities in language models that operate on images: hidden "meta-instructions" that influence how the model interprets the image and steer the model's outputs to express an adversary-chosen style, sentiment, or point of view. We explain how to create meta-instructions by generating images that act as soft prompts. Unlike jailbreaking attacks a… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  33. arXiv:2407.08914  [pdf, other

    cs.NI eess.SP

    Multi-objective Aerial Collaborative Secure Communication Optimization via Generative Diffusion Model-enabled Deep Reinforcement Learning

    Authors: Chuang Zhang, Geng Sun, Jiahui Li, Qingqing Wu, Jiacheng Wang, Dusit Niyato, Yuanwei Liu

    Abstract: Due to flexibility and low-cost, unmanned aerial vehicles (UAVs) are increasingly crucial for enhancing coverage and functionality of wireless networks. However, incorporating UAVs into next-generation wireless communication systems poses significant challenges, particularly in sustaining high-rate and long-range secure communications against eavesdropping attacks. In this work, we consider a UAV… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: This paper has been submitted to IEEE Transactions on Mobile Computing

  34. arXiv:2407.08883  [pdf

    cs.CV

    TractGraphFormer: Anatomically Informed Hybrid Graph CNN-Transformer Network for Classification from Diffusion MRI Tractography

    Authors: Yuqian Chen, Fan Zhang, Meng Wang, Leo R. Zekelman, Suheyla Cetin-Karayumak, Tengfei Xue, Chaoyi Zhang, Yang Song, Nikos Makris, Yogesh Rathi, Weidong Cai, Lauren J. O'Donnell

    Abstract: The relationship between brain connections and non-imaging phenotypes is increasingly studied using deep neural networks. However, the local and global properties of the brain's white matter networks are often overlooked in convolutional network design. We introduce TractGraphFormer, a hybrid Graph CNN-Transformer deep learning framework tailored for diffusion MRI tractography. This model leverage… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 23 pages, 4 figures

  35. arXiv:2407.08699  [pdf, other

    cs.LG

    Mitigating Catastrophic Forgetting in Language Transfer via Model Merging

    Authors: Anton Alexandrov, Veselin Raychev, Mark Niklas Müller, Ce Zhang, Martin Vechev, Kristina Toutanova

    Abstract: As open-weight large language models (LLMs) achieve ever more impressive performances across a wide range of tasks in English, practitioners aim to adapt these models to different languages. However, such language adaptation is often accompanied by catastrophic forgetting of the base model's capabilities, severely limiting the usefulness of the resulting model. We address this issue by proposing B… ▽ More

    Submitted 16 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

  36. arXiv:2407.08555  [pdf, other

    eess.IV cs.CV

    SLoRD: Structural Low-Rank Descriptors for Shape Consistency in Vertebrae Segmentation

    Authors: Xin You, Yixin Lou, Minghui Zhang, Chuyan Zhang, Jie Yang, Yun Gu

    Abstract: Automatic and precise segmentation of vertebrae from CT images is crucial for various clinical applications. However, due to a lack of explicit and strict constraints, existing methods especially for single-stage methods, still suffer from the challenge of intra-vertebrae segmentation inconsistency, which refers to multiple label predictions inside a singular vertebra. For multi-stage methods, ver… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Under review

  37. arXiv:2407.08462  [pdf, other

    cs.LG cs.NI

    Distributed Deep Reinforcement Learning Based Gradient Quantization for Federated Learning Enabled Vehicle Edge Computing

    Authors: Cui Zhang, Wenjun Zhang, Qiong Wu, Pingyi Fan, Qiang Fan, Jiangzhou Wang, Khaled B. Letaief

    Abstract: Federated Learning (FL) can protect the privacy of the vehicles in vehicle edge computing (VEC) to a certain extent through sharing the gradients of vehicles' local models instead of local data. The gradients of vehicles' local models are usually large for the vehicular artificial intelligence (AI) applications, thus transmitting such large gradients would cause large per-round latency. Gradient q… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: This paper has been submitted to IEEE Journal. The source code has been released at: https://github.com/qiongwu86/Distributed-Deep-Reinforcement-Learning-Based-Gradient Quantization-for-Federated-Learning-Enabled-Vehicle-Edge-Computing

  38. arXiv:2407.08440  [pdf, other

    cs.CL cs.AI

    Beyond Instruction Following: Evaluating Rule Following of Large Language Models

    Authors: Wangtao Sun, Chenxiang Zhang, Xueyou Zhang, Ziyang Huang, Haotian Xu, Pei Chen, Shizhu He, Jun Zhao, Kang Liu

    Abstract: Although Large Language Models (LLMs) have demonstrated strong instruction-following ability to be helpful, they are further supposed to be controlled and guided by rules in real-world scenarios to be safe, and accurate in responses. This demands the possession of rule-following capability of LLMs. However, few works have made a clear evaluation of the rule-following capability of LLMs. Previous s… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  39. arXiv:2407.08156  [pdf, other

    cs.CV

    AddressCLIP: Empowering Vision-Language Models for City-wide Image Address Localization

    Authors: Shixiong Xu, Chenghao Zhang, Lubin Fan, Gaofeng Meng, Shiming Xiang, Jieping Ye

    Abstract: In this study, we introduce a new problem raised by social media and photojournalism, named Image Address Localization (IAL), which aims to predict the readable textual address where an image was taken. Existing two-stage approaches involve predicting geographical coordinates and converting them into human-readable addresses, which can lead to ambiguity and be resource-intensive. In contrast, we p… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Accepted at ECCV 2024

  40. arXiv:2407.08109  [pdf, other

    cs.CV cs.AI cs.LG

    Urban Waterlogging Detection: A Challenging Benchmark and Large-Small Model Co-Adapter

    Authors: Suqi Song, Chenxu Zhang, Peng Zhang, Pengkun Li, Fenglong Song, Lei Zhang

    Abstract: Urban waterlogging poses a major risk to public safety and infrastructure. Conventional methods using water-level sensors need high-maintenance to hardly achieve full coverage. Recent advances employ surveillance camera imagery and deep learning for detection, yet these struggle amidst scarce data and adverse environmental conditions. In this paper, we establish a challenging Urban Waterlogging Be… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  41. arXiv:2407.07771  [pdf, other

    cs.CL cs.CV cs.MM

    Multi-task Prompt Words Learning for Social Media Content Generation

    Authors: Haochen Xue, Chong Zhang, Chengzhi Liu, Fangyu Wu, Xiaobo Jin

    Abstract: The rapid development of the Internet has profoundly changed human life. Humans are increasingly expressing themselves and interacting with others on social media platforms. However, although artificial intelligence technology has been widely used in many aspects of life, its application in social media content creation is still blank. To solve this problem, we propose a new prompt word generation… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: 8 pages, 5 figures

    Journal ref: International Joint Conference on Neural Networks 2024

  42. arXiv:2407.07506  [pdf, other

    eess.SP cs.AI

    Generative AI for RF Sensing in IoT systems

    Authors: Li Wang, Chao Zhang, Qiyang Zhao, Hang Zou, Samson Lasaulce, Giuseppe Valenzise, Zhuo He, Merouane Debbah

    Abstract: The development of wireless sensing technologies, using signals such as Wi-Fi, infrared, and RF to gather environmental data, has significantly advanced within Internet of Things (IoT) systems. Among these, Radio Frequency (RF) sensing stands out for its cost-effective and non-intrusive monitoring of human activities and environmental changes. However, traditional RF sensing methods face significa… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  43. arXiv:2407.07395  [pdf, other

    cs.CV cs.MM eess.IV

    Standard compliant video coding using low complexity, switchable neural wrappers

    Authors: Yueyu Hu, Chenhao Zhang, Onur G. Guleryuz, Debargha Mukherjee, Yao Wang

    Abstract: The proliferation of high resolution videos posts great storage and bandwidth pressure on cloud video services, driving the development of next-generation video codecs. Despite great progress made in neural video coding, existing approaches are still far from economical deployment considering the complexity and rate-distortion performance tradeoff. To clear the roadblocks for neural video coding,… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Accepted by IEEE ICIP 2024

  44. arXiv:2407.06984  [pdf, other

    cs.CV

    Category-level Object Detection, Pose Estimation and Reconstruction from Stereo Images

    Authors: Chuanrui Zhang, Yonggen Ling, Minglei Lu, Minghan Qin, Haoqian Wang

    Abstract: We study the 3D object understanding task for manipulating everyday objects with different material properties (diffuse, specular, transparent and mixed). Existing monocular and RGB-D methods suffer from scale ambiguity due to missing or imprecise depth measurements. We present CODERS, a one-stage approach for Category-level Object Detection, pose Estimation and Reconstruction from Stereo images.… ▽ More

    Submitted 17 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

  45. arXiv:2407.06894  [pdf, other

    cs.IT cs.PF

    RIS-Assisted Received Adaptive Spatial Modulation for Wireless Communication

    Authors: Chaorong Zhang, Hui Xu, Benjamin K. Ng, Chan-Tong Lam

    Abstract: A novel wireless transmission scheme, as named the reconfigurable intelligent surface (RIS)-assisted received adaptive spatial modulation (RASM) scheme, is proposed in this paper. In this scheme, the adaptive spatial modulation (ASM)-based antennas selection works at the receiver by employing the characteristics of the RIS in each time slot, where the signal-to-noise ratio at specific selected ant… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  46. arXiv:2407.06597  [pdf, other

    cs.AI

    TVR-Ranking: A Dataset for Ranked Video Moment Retrieval with Imprecise Queries

    Authors: Renjie Liang, Li Li, Chongzhi Zhang, Jing Wang, Xizhou Zhu, Aixin Sun

    Abstract: In this paper, we propose the task of \textit{Ranked Video Moment Retrieval} (RVMR) to locate a ranked list of matching moments from a collection of videos, through queries in natural language. Although a few related tasks have been proposed and studied by CV, NLP, and IR communities, RVMR is the task that best reflects the practical setting of moment search. To facilitate research in RVMR, we dev… ▽ More

    Submitted 23 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

  47. arXiv:2407.06507  [pdf

    cs.LG cs.AI

    Economic span selection of bridge based on deep reinforcement learning

    Authors: Leye Zhang, Xiangxiang Tian, Chengli Zhang, Hongjun Zhang

    Abstract: Deep Q-network algorithm is used to select economic span of bridge. Selection of bridge span has a significant impact on the total cost of bridge, and a reasonable selection of span can reduce engineering cost. Economic span of bridge is theoretically analyzed, and the theoretical solution formula of economic span is deduced. Construction process of bridge simulation environment is described in de… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 7 pages, 6 figures

  48. arXiv:2407.06460  [pdf, other

    cs.CL cs.AI

    MUSE: Machine Unlearning Six-Way Evaluation for Language Models

    Authors: Weijia Shi, Jaechan Lee, Yangsibo Huang, Sadhika Malladi, Jieyu Zhao, Ari Holtzman, Daogao Liu, Luke Zettlemoyer, Noah A. Smith, Chiyuan Zhang

    Abstract: Language models (LMs) are trained on vast amounts of text data, which may include private and copyrighted content. Data owners may request the removal of their data from a trained model due to privacy or copyright concerns. However, exactly unlearning only these datapoints (i.e., retraining with the data removed) is intractable in modern-day models. This has led to the development of many approxim… ▽ More

    Submitted 14 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

  49. arXiv:2407.06127  [pdf, other

    cs.CV

    Better Sampling, towards Better End-to-end Small Object Detection

    Authors: Zile Huang, Chong Zhang, Mingyu Jin, Fangyu Wu, Chengzhi Liu, Xiaobo Jin

    Abstract: While deep learning-based general object detection has made significant strides in recent years, the effectiveness and efficiency of small object detection remain unsatisfactory. This is primarily attributed not only to the limited characteristics of such small targets but also to the high density and mutual overlap among these targets. The existing transformer-based small object detectors do not… ▽ More

    Submitted 17 May, 2024; originally announced July 2024.

    Comments: 14 pages, 5 figures

  50. arXiv:2407.06083  [pdf, other

    cs.LG cs.IR

    A Survey of Controllable Learning: Methods and Applications in Information Retrieval

    Authors: Chenglei Shen, Xiao Zhang, Teng Shi, Changshuo Zhang, Guofu Xie, Jun Xu

    Abstract: Controllable learning (CL) emerges as a critical component in trustworthy machine learning, ensuring that learners meet predefined targets and can adaptively adjust without retraining according to the changes in those targets. We provide a formal definition of CL, and discuss its applications in information retrieval (IR) where information needs are often complex and dynamic. The survey categorize… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.