Skip to main content

Showing 1–50 of 2,952 results for author: Chen, H

  1. arXiv:2407.20068  [pdf, other

    cs.CR cs.AI

    Unleash the Power of Ellipsis: Accuracy-enhanced Sparse Vector Technique with Exponential Noise

    Authors: Yuhan Liu, Sheng Wang, Yixuan Liu, Feifei Li, Hong Chen

    Abstract: The Sparse Vector Technique (SVT) is one of the most fundamental tools in differential privacy (DP). It works as a backbone for adaptive data analysis by answering a sequence of queries on a given dataset, and gleaning useful information in a privacy-preserving manner. Unlike the typical private query releases that directly publicize the noisy query results, SVT is less informative -- it keeps the… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  2. arXiv:2407.19845  [pdf, other

    cs.LG cs.CR

    BackdoorBench: A Comprehensive Benchmark and Analysis of Backdoor Learning

    Authors: Baoyuan Wu, Hongrui Chen, Mingda Zhang, Zihao Zhu, Shaokui Wei, Danni Yuan, Mingli Zhu, Ruotong Wang, Li Liu, Chao Shen

    Abstract: As an emerging approach to explore the vulnerability of deep neural networks (DNNs), backdoor learning has attracted increasing interest in recent years, and many seminal backdoor attack and defense algorithms are being developed successively or concurrently, in the status of a rapid arms race. However, mainly due to the diverse settings, and the difficulties of implementation and reproducibility… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: Substantial extensions based on our previous conference version "Backdoorbench: A comprehensive benchmark of backdoor learning" published at NeurIPS D&B Track 2022. 20 backdoor attack algorithms, 32 backdoor defense algorithms, 11000+ pairs of attack-against-defense evaluations, 10 analyses, 18 analysis tools

  3. arXiv:2407.19180  [pdf, other

    cs.CV

    Data Processing Techniques for Modern Multimodal Models

    Authors: Yinheng Li, Han Ding, Hang Chen

    Abstract: Data processing plays an significant role in current multimodal model training. In this paper. we provide an comprehensive review of common data processing techniques used in modern multimodal model training with a focus on diffusion models and multimodal large language models (MLLMs). We summarized all techniques into four categories: data quality, data quantity, data distribution and data safety… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

  4. arXiv:2407.18899  [pdf, other

    cs.CV cs.AI cs.LG

    Learn from the Learnt: Source-Free Active Domain Adaptation via Contrastive Sampling and Visual Persistence

    Authors: Mengyao Lyu, Tianxiang Hao, Xinhao Xu, Hui Chen, Zijia Lin, Jungong Han, Guiguang Ding

    Abstract: Domain Adaptation (DA) facilitates knowledge transfer from a source domain to a related target domain. This paper investigates a practical DA paradigm, namely Source data-Free Active Domain Adaptation (SFADA), where source data becomes inaccessible during adaptation, and a minimum amount of annotation budget is available in the target domain. Without referencing the source data, new challenges eme… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  5. arXiv:2407.18690  [pdf, other

    cs.AI

    Collaborative Evolving Strategy for Automatic Data-Centric Development

    Authors: Xu Yang, Haotian Chen, Wenjun Feng, Haoxue Wang, Zeqi Ye, Xinjie Shen, Xiao Yang, Shizhao Sun, Weiqing Liu, Jiang Bian

    Abstract: Artificial Intelligence (AI) significantly influences many fields, largely thanks to the vast amounts of high-quality data for machine learning models. The emphasis is now on a data-centric AI strategy, prioritizing data development over model design progress. Automating this process is crucial. In this paper, we serve as the first work to introduce the automatic data-centric development (AD^2) ta… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: 23 pages, 7 figures

  6. Content-driven Magnitude-Derivative Spectrum Complementary Learning for Hyperspectral Image Classification

    Authors: Huiyan Bai, Tingfa Xu, Huan Chen, Peifu Liu, Jianan Li

    Abstract: Extracting discriminative information from complex spectral details in hyperspectral image (HSI) for HSI classification is pivotal. While current prevailing methods rely on spectral magnitude features, they could cause confusion in certain classes, resulting in misclassification and decreased accuracy. We find that the derivative spectrum proves more adept at capturing concealed information, there… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: accepted by TGRS

  7. arXiv:2407.18492  [pdf

    cs.CV

    Neural Modulation Alteration to Positive and Negative Emotions in Depressed Patients: Insights from fMRI Using Positive/Negative Emotion Atlas

    Authors: Yu Feng, Weiming Zeng, Yifan Xie, Hongyu Chen, Lei Wang, Yingying Wang, Hongjie Yan, Kaile Zhang, Ran Tao, Wai Ting Siok, Nizhuan Wang

    Abstract: Background: Although it has been noticed that depressed patients show differences in processing emotions, the precise neural modulation mechanisms of positive and negative emotions remain elusive. FMRI is a cutting-edge medical imaging technology renowned for its high spatial resolution and dynamic temporal information, making it particularly suitable for the neural dynamics of depression research… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  8. arXiv:2407.18449  [pdf, other

    eess.IV cs.CV cs.LG

    Towards A Generalizable Pathology Foundation Model via Unified Knowledge Distillation

    Authors: Jiabo Ma, Zhengrui Guo, Fengtao Zhou, Yihui Wang, Yingxue Xu, Yu Cai, Zhengjie Zhu, Cheng Jin, Yi Lin Xinrui Jiang, Anjia Han, Li Liang, Ronald Cheong Kin Chan, Jiguang Wang, Kwang-Ting Cheng, Hao Chen

    Abstract: Foundation models pretrained on large-scale datasets are revolutionizing the field of computational pathology (CPath). The generalization ability of foundation models is crucial for the success in various downstream clinical tasks. However, current foundation models have only been evaluated on a limited type and number of tasks, leaving their generalization ability and overall performance unclear.… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Report number: I.2.10

  9. arXiv:2407.18157  [pdf, other

    cs.CR cs.DB

    Enhanced Privacy Bound for Shuffle Model with Personalized Privacy

    Authors: Yixuan Liu, Yuhan Liu, Li Xiong, Yujie Gu, Hong Chen

    Abstract: The shuffle model of Differential Privacy (DP) is an enhanced privacy protocol which introduces an intermediate trusted server between local users and a central data curator. It significantly amplifies the central DP guarantee by anonymizing and shuffling the local randomized data. Yet, deriving a tight privacy bound is challenging due to its complicated randomization protocol. While most existing… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  10. arXiv:2407.18035  [pdf, other

    cs.CV cs.AI cs.CL

    RestoreAgent: Autonomous Image Restoration Agent via Multimodal Large Language Models

    Authors: Haoyu Chen, Wenbo Li, Jinjin Gu, Jingjing Ren, Sixiang Chen, Tian Ye, Renjing Pei, Kaiwen Zhou, Fenglong Song, Lei Zhu

    Abstract: Natural images captured by mobile devices often suffer from multiple types of degradation, such as noise, blur, and low light. Traditional image restoration methods require manual selection of specific tasks, algorithms, and execution sequences, which is time-consuming and may yield suboptimal results. All-in-one models, though capable of handling multiple tasks, typically support only a limited r… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  11. arXiv:2407.17721  [pdf, other

    cs.LG physics.comp-ph

    A Two-Stage Imaging Framework Combining CNN and Physics-Informed Neural Networks for Full-Inverse Tomography: A Case Study in Electrical Impedance Tomography (EIT)

    Authors: Xuanxuan Yang, Yangming Zhang, Haofeng Chen, Gang Ma, Xiaojie Wang

    Abstract: Physics-Informed Neural Networks (PINNs) are a machine learning technique for solving partial differential equations (PDEs) by incorporating PDEs as loss terms in neural networks and minimizing the loss function during training. Tomographic imaging, a method to reconstruct internal properties from external measurement data, is highly complex and ill-posed, making it an inverse problem. Recently, P… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  12. arXiv:2407.17642  [pdf, other

    cs.LG cs.AI

    SMA-Hyper: Spatiotemporal Multi-View Fusion Hypergraph Learning for Traffic Accident Prediction

    Authors: Xiaowei Gao, James Haworth, Ilya Ilyankou, Xianghui Zhang, Tao Cheng, Stephen Law, Huanfa Chen

    Abstract: Predicting traffic accidents is the key to sustainable city management, which requires effective address of the dynamic and complex spatiotemporal characteristics of cities. Current data-driven models often struggle with data sparsity and typically overlook the integration of diverse urban data sources and the high-order dependencies within them. Additionally, they frequently rely on predefined to… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  13. arXiv:2407.16655  [pdf, other

    cs.CV

    MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence

    Authors: Canyu Zhao, Mingyu Liu, Wen Wang, Jianlong Yuan, Hao Chen, Bo Zhang, Chunhua Shen

    Abstract: Recent advancements in video generation have primarily leveraged diffusion models for short-duration content. However, these approaches often fall short in modeling complex narratives and maintaining character consistency over extended periods, which is essential for long-form video production like movies. We propose MovieDreamer, a novel hierarchical framework that integrates the strengths of aut… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: 23 pages, 18 figures

  14. arXiv:2407.16128  [pdf, other

    cs.CV cs.AI

    Advancing Brain Imaging Analysis Step-by-step via Progressive Self-paced Learning

    Authors: Yanwu Yang, Hairui Chen, Jiesi Hu, Xutao Guo, Ting Ma

    Abstract: Recent advancements in deep learning have shifted the development of brain imaging analysis. However, several challenges remain, such as heterogeneity, individual variations, and the contradiction between the high dimensionality and small size of brain imaging datasets. These issues complicate the learning process, preventing models from capturing intrinsic, meaningful patterns and potentially lea… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: miccai-2024

  15. arXiv:2407.15841  [pdf, other

    cs.CV

    SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models

    Authors: Mingze Xu, Mingfei Gao, Zhe Gan, Hong-You Chen, Zhengfeng Lai, Haiming Gang, Kai Kang, Afshin Dehghan

    Abstract: We propose SlowFast-LLaVA (or SF-LLaVA for short), a training-free video large language model (LLM) that can jointly capture the detailed spatial semantics and long-range temporal context without exceeding the token budget of commonly used LLMs. This is realized by using a two-stream SlowFast design of inputs for Video LLMs to aggregate features from sampled video frames in an effective way. Speci… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: Technical report

  16. arXiv:2407.15362  [pdf, other

    cs.CV cs.AI

    A Multimodal Knowledge-enhanced Whole-slide Pathology Foundation Model

    Authors: Yingxue Xu, Yihui Wang, Fengtao Zhou, Jiabo Ma, Shu Yang, Huangjing Lin, Xin Wang, Jiguang Wang, Li Liang, Anjia Han, Ronald Cheong Kin Chan, Hao Chen

    Abstract: Remarkable strides in computational pathology have been made in the task-agnostic foundation model that advances the performance of a wide array of downstream clinical tasks. Despite the promising performance, there are still several challenges. First, prior works have resorted to either vision-only or vision-captions data, disregarding invaluable pathology reports and gene expression profiles whi… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: 44 pages, 9 figures

  17. arXiv:2407.15329  [pdf, other

    eess.IV cs.CV

    Efficient Multi-disparity Transformer for Light Field Image Super-resolution

    Authors: Zeke Zexi Hu, Haodong Chen, Yuk Ying Chung, Xiaoming Chen

    Abstract: This paper presents the Multi-scale Disparity Transformer (MDT), a novel Transformer tailored for light field image super-resolution (LFSR) that addresses the issues of computational redundancy and disparity entanglement caused by the indiscriminate processing of sub-aperture images inherent in conventional methods. MDT features a multi-branch structure, with each branch utilising independent disp… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

  18. arXiv:2407.15317  [pdf, other

    cs.CV

    Open-CD: A Comprehensive Toolbox for Change Detection

    Authors: Kaiyu Li, Jiawei Jiang, Andrea Codegoni, Chengxi Han, Yupeng Deng, Keyan Chen, Zhuo Zheng, Hao Chen, Zhengxia Zou, Zhenwei Shi, Sheng Fang, Deyu Meng, Zhi Wang, Xiangyong Cao

    Abstract: We present Open-CD, a change detection toolbox that contains a rich set of change detection methods as well as related components and modules. The toolbox started from a series of open source general vision task tools, including OpenMMLab Toolkits, PyTorch Image Models, etc. It gradually evolves into a unified platform that covers many popular change detection methods and contemporary modules. It… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

    Comments: 9 pages

  19. arXiv:2407.15062  [pdf, other

    cs.CR

    AGORA: Open More and Trust Less in Binary Verification Service

    Authors: Hongbo Chen, Quan Zhou, Sen Yang, Xing Han, Fan Zhang, Danfeng Zhang, Xiaofeng Wang

    Abstract: Binary verification plays a pivotal role in software security, yet building a verification service that is both open and trustworthy poses a formidable challenge. In this paper, we introduce a novel binary verification service, AGORA, scrupulously designed to overcome the challenge. At the heart of this approach lies a strategic insight: certain tasks can be delegated to untrusted entities, while… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

  20. arXiv:2407.15017  [pdf, other

    cs.CL cs.AI cs.CV cs.HC cs.LG

    Knowledge Mechanisms in Large Language Models: A Survey and Perspective

    Authors: Mengru Wang, Yunzhi Yao, Ziwen Xu, Shuofei Qiao, Shumin Deng, Peng Wang, Xiang Chen, Jia-Chen Gu, Yong Jiang, Pengjun Xie, Fei Huang, Huajun Chen, Ningyu Zhang

    Abstract: Understanding knowledge mechanisms in Large Language Models (LLMs) is crucial for advancing towards trustworthy AGI. This paper reviews knowledge mechanism analysis from a novel taxonomy including knowledge utilization and evolution. Knowledge utilization delves into the mechanism of memorization, comprehension and application, and creation. Knowledge evolution focuses on the dynamic progression o… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: Ongoing work (v1); 34 pages, 5 figures

  21. arXiv:2407.14850  [pdf, other

    cs.CV

    A Tale of Single-channel Electroencephalogram: Devices, Datasets, Signal Processing, Applications, and Future Directions

    Authors: Yueyang Li, Weiming Zeng, Wenhao Dong, Di Han, Lei Chen, Hongyu Chen, Hongjie Yan, Wai Ting Siok, Nizhuan Wang

    Abstract: Single-channel electroencephalogram (EEG) is a cost-effective, comfortable, and non-invasive method for monitoring brain activity, widely adopted by researchers, consumers, and clinicians. The increasing number and proportion of articles on single-channel EEG underscore its growing potential. This paper provides a comprehensive review of single-channel EEG, focusing on development trends, devices,… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

  22. arXiv:2407.14844  [pdf, other

    cs.CY cs.HC cs.SI q-fin.TR

    Political Leanings in Web3 Betting: Decoding the Interplay of Political and Profitable Motives

    Authors: Hongzhou Chen, Xiaolin Duan, Abdulmotaleb El Saddik, Wei Cai

    Abstract: Harnessing the transparent blockchain user behavior data, we construct the Political Betting Leaning Score (PBLS) to measure political leanings based on betting within Web3 prediction markets. Focusing on Polymarket and starting from the 2024 U.S. Presidential Election, we synthesize behaviors over 15,000 addresses across 4,500 events and 8,500 markets, capturing the intensity and direction of the… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

  23. arXiv:2407.14544  [pdf, other

    cs.DC

    Fast Iterative Graph Computing with Updated Neighbor States

    Authors: Yijie Zhou, Shufeng Gong, Feng Yao, Hanzhang Chen, Song Yu, Pengxi Liu, Yanfeng Zhang, Ge Yu, Jeffrey Xu Yu

    Abstract: Enhancing the efficiency of iterative computation on graphs has garnered considerable attention in both industry and academia. Nonetheless, the majority of efforts focus on expediting iterative computation by minimizing the running time per iteration step, ignoring the optimization of the number of iteration rounds, which is a crucial aspect of iterative computation. We experimentally verified the… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 14 pages, 13 figures, 2 tables; accepted for publication in ICDE 2024

  24. arXiv:2407.14532  [pdf, other

    cs.DC cs.LG

    A Scenario-Oriented Benchmark for Assessing AIOps Algorithms in Microservice Management

    Authors: Yongqian Sun, Jiaju Wang, Zhengdan Li, Xiaohui Nie, Minghua Ma, Shenglin Zhang, Yuhe Ji, Lu Zhang, Wen Long, Hengmao Chen, Yongnan Luo, Dan Pei

    Abstract: AIOps algorithms play a crucial role in the maintenance of microservice systems. Many previous benchmarks' performance leaderboard provides valuable guidance for selecting appropriate algorithms. However, existing AIOps benchmarks mainly utilize offline datasets to evaluate algorithms. They cannot consistently evaluate the performance of algorithms using real-time datasets, and the operation scena… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: Codes are available at https://github.com/MicroServo/microservo, datasets are available at https://github.com/MicroServo/hot-plugging

  25. arXiv:2407.14302  [pdf, other

    cs.CV

    Dyn-Adapter: Towards Disentangled Representation for Efficient Visual Recognition

    Authors: Yurong Zhang, Honghao Chen, Xinyu Zhang, Xiangxiang Chu, Li Song

    Abstract: Parameter-efficient transfer learning (PETL) is a promising task, aiming to adapt the large-scale pre-trained model to downstream tasks with a relatively modest cost. However, current PETL methods struggle in compressing computational complexity and bear a heavy inference burden due to the complete forward process. This paper presents an efficient visual recognition paradigm, called Dynamic Adapte… ▽ More

    Submitted 23 July, 2024; v1 submitted 19 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  26. arXiv:2407.13775  [pdf, other

    cs.HC cs.AI

    Lessons in Cooperation: A Qualitative Analysis of Driver Sentiments towards Real-Time Advisory Systems from a Driving Simulator User Study

    Authors: Aamir Hasan, Neeloy Chakraborty, Haonan Chen, Cathy Wu, Katherine Driggs-Campbell

    Abstract: Real-time Advisory (RTA) systems, such as navigational and eco-driving assistants, are becoming increasingly ubiquitous in vehicles due to their benefits for users and society. Until autonomous vehicles mature, such advisory systems will continue to expand their ability to cooperate with drivers, enabling safer and more eco-friendly driving practices while improving user experience. However, the i… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  27. arXiv:2407.13436  [pdf, other

    cs.IT

    An Algorithm for Computing the Capacity of Symmetrized KL Information for Discrete Channels

    Authors: Haobo Chen, Gholamali Aminian, Yuheng Bu

    Abstract: Symmetrized Kullback-Leibler (KL) information (\(I_{\mathrm{SKL}}\)), which symmetrizes the traditional mutual information by integrating Lautum information, has been shown as a critical quantity in communication~\cite{aminian2015capacity} and learning theory~\cite{aminian2023information}. This paper considers the problem of computing the capacity in terms of \(I_{\mathrm{SKL}}\) for a fixed discr… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  28. arXiv:2407.13417  [pdf, other

    cs.CV

    GDDS: A Single Domain Generalized Defect Detection Frame of Open World Scenario using Gather and Distribute Domain-shift Suppression Network

    Authors: Haiyong Chen, Yaxiu Zhang, Yan Zhang, Xin Zhang, Xingwei Yan

    Abstract: Efficient and intelligent surface defect detection of photovoltaic modules is crucial for improving the quality of photovoltaic modules and ensuring the reliable operation of large-scale infrastructure. However, the scenario characteristics of data distribution deviation make the construction of defect detection models for open world scenarios such as photovoltaic manufacturing and power plant ins… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 13 images

    ACM Class: I.4.9; I.5.1

  29. arXiv:2407.13219  [pdf, other

    cs.CV

    Multi-sentence Video Grounding for Long Video Generation

    Authors: Wei Feng, Xin Wang, Hong Chen, Zeyang Zhang, Wenwu Zhu

    Abstract: Video generation has witnessed great success recently, but their application in generating long videos still remains challenging due to the difficulty in maintaining the temporal consistency of generated videos and the high memory cost during generation. To tackle the problems, in this paper, we propose a brave and new idea of Multi-sentence Video Grounding for Long Video Generation, connecting th… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  30. arXiv:2407.13179  [pdf, other

    eess.IV cs.CV

    Learned HDR Image Compression for Perceptually Optimal Storage and Display

    Authors: Peibei Cao, Haoyu Chen, Jingzhe Ma, Yu-Chieh Yuan, Zhiyong Xie, Xin Xie, Haiqing Bai, Kede Ma

    Abstract: High dynamic range (HDR) capture and display have seen significant growth in popularity driven by the advancements in technology and increasing consumer demand for superior image quality. As a result, HDR image compression is crucial to fully realize the benefits of HDR imaging without suffering from large file sizes and inefficient data handling. Conventionally, this is achieved by introducing a… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  31. arXiv:2407.13132  [pdf, other

    eess.IV cs.CV

    LSD3K: A Benchmark for Smoke Removal from Laparoscopic Surgery Images

    Authors: Wenhui Chang, Hongming Chen

    Abstract: Smoke generated by surgical instruments during laparoscopic surgery can obscure the visual field, impairing surgeons' ability to perform operations accurately and safely. Thus, smoke removal task for laparoscopic images is highly desirable. Despite laparoscopic image desmoking has attracted the attention of researchers in recent years and several algorithms have emerged, the lack of publicly avail… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  32. arXiv:2407.12810  [pdf

    cs.NI

    A Study on the Situation of Connected Car Patent Portfolios

    Authors: Abel C. H. Chen, Chia-Shen Chang

    Abstract: In recent years, the countries of the world have drafted the specifications of connected cars; for instance, the Security Credential Management System (SCMS) has been proposed by United States Department of Transportation (USDOT), and the Cooperative Intelligent Transportation System (C-ITS) Credential Management System (CCMS) has been proposed by European Union (EU). Therefore, several companies… ▽ More

    Submitted 26 June, 2024; originally announced July 2024.

    Comments: in Chinese language

  33. arXiv:2407.12764  [pdf, other

    cs.LG

    Jigsaw Game: Federated Clustering

    Authors: Jinxuan Xu, Hong-You Chen, Wei-Lun Chao, Yuqian Zhang

    Abstract: Federated learning has recently garnered significant attention, especially within the domain of supervised learning. However, despite the abundance of unlabeled data on end-users, unsupervised learning problems such as clustering in the federated setting remain underexplored. In this paper, we investigate the federated clustering problem, with a focus on federated k-means. We outline the challenge… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Accepted to TMLR

  34. arXiv:2407.12592  [pdf, other

    cs.CV

    VegeDiff: Latent Diffusion Model for Geospatial Vegetation Forecasting

    Authors: Sijie Zhao, Hao Chen, Xueliang Zhang, Pengfeng Xiao, Lei Bai, Wanli Ouyang

    Abstract: In the context of global climate change and frequent extreme weather events, forecasting future geospatial vegetation states under these conditions is of significant importance. The vegetation change process is influenced by the complex interplay between dynamic meteorological variables and static environmental variables, leading to high levels of uncertainty. Existing deterministic methods are in… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: 15 pages, 8 figures

  35. arXiv:2407.12393  [pdf, other

    cs.CL cs.AI cs.CY

    PersLLM: A Personified Training Approach for Large Language Models

    Authors: Zheni Zeng, Jiayi Chen, Huimin Chen, Yukun Yan, Yuxuan Chen, Zhenghao Liu, Zhiyuan Liu, Maosong Sun

    Abstract: Large language models exhibit aspects of human-level intelligence that catalyze their application as human-like agents in domains such as social simulations, human-machine interactions, and collaborative multi-agent systems. However, the absence of distinct personalities, such as displaying ingratiating behaviors, inconsistent opinions, and uniform response patterns, diminish LLMs utility in pract… ▽ More

    Submitted 25 July, 2024; v1 submitted 17 July, 2024; originally announced July 2024.

    Comments: 10 pages for main text, 5 figures

  36. arXiv:2407.12226  [pdf, other

    cs.LG

    Individualized Federated Learning for Traffic Prediction with Error Driven Aggregation

    Authors: Hang Chen, Collin Meese, Mark Nejad, Chien-Chung Shen

    Abstract: Low-latency traffic prediction is vital for smart city traffic management. Federated Learning has emerged as a promising technique for Traffic Prediction (FLTP), offering several advantages such as privacy preservation, reduced communication overhead, improved prediction accuracy, and enhanced adaptability to changing traffic conditions. However, majority of the current FLTP frameworks lack a real… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 16 pages, 4 figures

  37. arXiv:2407.11585  [pdf, other

    cs.CV cs.AI

    QVD: Post-training Quantization for Video Diffusion Models

    Authors: Shilong Tian, Hong Chen, Chengtao Lv, Yu Liu, Jinyang Guo, Xianglong Liu, Shengxi Li, Hao Yang, Tao Xie

    Abstract: Recently, video diffusion models (VDMs) have garnered significant attention due to their notable advancements in generating coherent and realistic video content. However, processing multiple frame features concurrently, coupled with the considerable model size, results in high latency and extensive memory consumption, hindering their broader application. Post-training quantization (PTQ) is an effe… ▽ More

    Submitted 17 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: accepted by ACMMM2024

  38. arXiv:2407.10704  [pdf, other

    cs.CV

    Quantized Prompt for Efficient Generalization of Vision-Language Models

    Authors: Tianxiang Hao, Xiaohan Ding, Juexiao Feng, Yuhong Yang, Hui Chen, Guiguang Ding

    Abstract: In the past few years, large-scale pre-trained vision-language models like CLIP have achieved tremendous success in various fields. Naturally, how to transfer the rich knowledge in such huge pre-trained models to downstream tasks and datasets becomes a hot topic. During downstream adaptation, the most challenging problems are overfitting and catastrophic forgetting, which can cause the model to ov… ▽ More

    Submitted 19 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  39. arXiv:2407.10485  [pdf, other

    cs.CV

    Effective Motion Modeling for UAV-platform Multiple Object Tracking with Re-Margin Loss

    Authors: Mufeng Yao, Jinlong Peng, Qingdong He, Bo Peng, Hao Chen, Mingmin Chi, Chao Liu, Jon Atli Benediktsson

    Abstract: Multiple object tracking (MOT) from unmanned aerial vehicle (UAV) platforms requires efficient motion modeling. This is because UAV-MOT faces tracking difficulties caused by large and irregular motion, and insufficient training due to the motion long-tailed distribution of current UAV-MOT datasets. Previous UAV-MOT methods either extract motion and detection features redundantly or supervise motio… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: text overlap with arXiv:2308.07207

  40. arXiv:2407.10419  [pdf, other

    cs.CV cs.LG

    Omni-Dimensional Frequency Learner for General Time Series Analysis

    Authors: Xianing Chen, Hanting Chen, Hailin Hu

    Abstract: Frequency domain representation of time series feature offers a concise representation for handling real-world time series data with inherent complexity and dynamic nature. However, current frequency-based methods with complex operations still fall short of state-of-the-art time domain methods for general time series analysis. In this work, we present Omni-Dimensional Frequency Learner (ODFL) mode… ▽ More

    Submitted 18 July, 2024; v1 submitted 14 July, 2024; originally announced July 2024.

  41. arXiv:2407.10285  [pdf, other

    cs.CV

    Noise Calibration: Plug-and-play Content-Preserving Video Enhancement using Pre-trained Video Diffusion Models

    Authors: Qinyu Yang, Haoxin Chen, Yong Zhang, Menghan Xia, Xiaodong Cun, Zhixun Su, Ying Shan

    Abstract: In order to improve the quality of synthesized videos, currently, one predominant method involves retraining an expert diffusion model and then implementing a noising-denoising process for refinement. Despite the significant training costs, maintaining consistency of content between the original and enhanced videos remains a major challenge. To tackle this challenge, we propose a novel formulation… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: ECCV 2024, Project Page: https://yangqy1110.github.io/NC-SDEdit/, Code Repo: https://github.com/yangqy1110/NC-SDEdit/

    ACM Class: I.2; I.4.3

  42. arXiv:2407.10068  [pdf, other

    cs.CL

    Multi-Granularity Semantic Revision for Large Language Model Distillation

    Authors: Xiaoyu Liu, Yun Zhang, Wei Li, Simiao Li, Xudong Huang, Hanting Chen, Yehui Tang, Jie Hu, Zhiwei Xiong, Yunhe Wang

    Abstract: Knowledge distillation plays a key role in compressing the Large Language Models (LLMs), which boosts a small-size student model under large teacher models' guidance. However, existing LLM distillation methods overly rely on student-generated outputs, which may introduce generation errors and misguide the distillation process. Moreover, the distillation loss functions introduced in previous art st… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  43. arXiv:2407.09911  [pdf, other

    cs.HC cs.CV cs.LG eess.SP

    SensEmo: Enabling Affective Learning through Real-time Emotion Recognition with Smartwatches

    Authors: Kushan Choksi, Hongkai Chen, Karan Joshi, Sukrutha Jade, Shahriar Nirjon, Shan Lin

    Abstract: Recent research has demonstrated the capability of physiological signals to infer both user emotional and attention responses. This presents an opportunity for leveraging widely available physiological sensors in smartwatches, to detect real-time emotional cues in users, such as stress and excitement. In this paper, we introduce SensEmo, a smartwatch-based system designed for affective learning. S… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: 7 pages, 7 figures, 2 tables. IEEE MASS 2024

    ACM Class: C.3.3; J.3.2; J.4.2

  44. arXiv:2407.09816  [pdf, other

    cs.CL

    MaskMoE: Boosting Token-Level Learning via Routing Mask in Mixture-of-Experts

    Authors: Zhenpeng Su, Zijia Lin, Xue Bai, Xing Wu, Yizhe Xiong, Haoran Lian, Guangyuan Ma, Hui Chen, Guiguang Ding, Wei Zhou, Songlin Hu

    Abstract: Scaling the size of a model enhances its capabilities but significantly increases computation complexity. Mixture-of-Experts models (MoE) address the issue by allowing model size to scale up without substantially increasing training or inference costs. Despite their promising results, MoE models encounter several challenges. Primarily, for dynamic routing methods, the dispersion of training tokens… ▽ More

    Submitted 28 July, 2024; v1 submitted 13 July, 2024; originally announced July 2024.

    Comments: Work in progress

  45. arXiv:2407.09698  [pdf, other

    cs.LG

    RIO-CPD: A Riemannian Geometric Method for Correlation-aware Online Change Point Detection

    Authors: Chengyuan Deng, Zhengzhang Chen, Xujiang Zhao, Haoyu Wang, Junxiang Wang, Haifeng Chen, Jie Gao

    Abstract: The objective of change point detection is to identify abrupt changes at potentially multiple points within a data sequence. This task is particularly challenging in the online setting where various types of changes can occur, including shifts in both the marginal and joint distributions of the data. This paper tackles these challenges by sequentially tracking correlation matrices on the Riemannia… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  46. Performance Comparison of Various Modes of Advanced Encryption Standard

    Authors: Abel C. H. Chen

    Abstract: With the maturation of quantum computing technology, many cryptographic methods are gradually facing threats from quantum computing. Although the Grover algorithm can accelerate search speeds, current research indicates that the Advanced Encryption Standard (AES) method can still enhance security by increasing the length of the secret key. However, the AES method involves multiple modes in impleme… ▽ More

    Submitted 21 May, 2024; originally announced July 2024.

    Comments: in Chinese language

  47. arXiv:2407.09268  [pdf, other

    eess.IV cs.CV

    Region Attention Transformer for Medical Image Restoration

    Authors: Zhiwen Yang, Haowei Chen, Ziniu Qian, Yang Zhou, Hui Zhang, Dan Zhao, Bingzheng Wei, Yan Xu

    Abstract: Transformer-based methods have demonstrated impressive results in medical image restoration, attributed to the multi-head self-attention (MSA) mechanism in the spatial dimension. However, the majority of existing Transformers conduct attention within fixed and coarsely partitioned regions (\text{e.g.} the entire image or fixed patches), resulting in interference from irrelevant regions and fragmen… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: This paper has been accepted by MICCAI 2024

  48. arXiv:2407.09249  [pdf

    cs.MA cs.AI

    GNN with Model-based RL for Multi-agent Systems

    Authors: Hanxiao Chen

    Abstract: Multi-agent systems (MAS) constitute a significant role in exploring machine intelligence and advanced applications. In order to deeply investigate complicated interactions within MAS scenarios, we originally propose "GNN for MBRL" model, which utilizes a state-spaced Graph Neural Networks with Model-based Reinforcement Learning to address specific MAS missions (e.g., Billiard-Avoidance, Autonomou… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  49. arXiv:2407.09024  [pdf, other

    cs.LG

    Aligning Diffusion Behaviors with Q-functions for Efficient Continuous Control

    Authors: Huayu Chen, Kaiwen Zheng, Hang Su, Jun Zhu

    Abstract: Drawing upon recent advances in language model alignment, we formulate offline Reinforcement Learning as a two-stage optimization problem: First pretraining expressive generative policies on reward-free behavior datasets, then fine-tuning these policies to align with task-specific annotations like Q-values. This strategy allows us to leverage abundant and diverse behavior data to enhance generaliz… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  50. arXiv:2407.09019  [pdf, other

    cs.SI cs.AI

    Heterogeneous Subgraph Network with Prompt Learning for Interpretable Depression Detection on Social Media

    Authors: Chen Chen, Mingwei Li, Fenghuan Li, Haopeng Chen, Yuankun Lin

    Abstract: Massive social media data can reflect people's authentic thoughts, emotions, communication, etc., and therefore can be analyzed for early detection of mental health problems such as depression. Existing works about early depression detection on social media lacked interpretability and neglected the heterogeneity of social media data. Furthermore, they overlooked the global interaction among users.… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.