Skip to main content

Showing 1–50 of 656 results for author: Yang, Q

  1. arXiv:2407.16341  [pdf, other

    cs.CV

    Motion Capture from Inertial and Vision Sensors

    Authors: Xiaodong Chen, Wu Liu, Qian Bao, Xinchen Liu, Quanwei Yang, Ruoli Dai, Tao Mei

    Abstract: Human motion capture is the foundation for many computer vision and graphics tasks. While industrial motion capture systems with complex camera arrays or expensive wearable sensors have been widely adopted in movie and game production, consumer-affordable and easy-to-use solutions for personal applications are still far from mature. To utilize a mixture of a monocular camera and very few inertial… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: 17 pages,9 figures

  2. arXiv:2407.15435  [pdf, other

    cs.CV

    Enhancement of 3D Gaussian Splatting using Raw Mesh for Photorealistic Recreation of Architectures

    Authors: Ruizhe Wang, Chunliang Hua, Tomakayev Shingys, Mengyuan Niu, Qingxin Yang, Lizhong Gao, Yi Zheng, Junyan Yang, Qiao Wang

    Abstract: The photorealistic reconstruction and rendering of architectural scenes have extensive applications in industries such as film, games, and transportation. It also plays an important role in urban planning, architectural design, and the city's promotion, especially in protecting historical and cultural relics. The 3D Gaussian Splatting, due to better performance over NeRF, has become a mainstream t… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  3. arXiv:2407.14197  [pdf, other

    cs.CV

    A Benchmark for Gaussian Splatting Compression and Quality Assessment Study

    Authors: Qi Yang, Kaifa Yang, Yuke Xing, Yiling Xu, Zhu Li

    Abstract: To fill the gap of traditional GS compression method, in this paper, we first propose a simple and effective GS data compression anchor called Graph-based GS Compression (GGSC). GGSC is inspired by graph signal processing theory and uses two branches to compress the primitive center and attributes. We split the whole GS sample via KDTree and clip the high-frequency components after the graph Fouri… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  4. arXiv:2407.14006  [pdf, other

    eess.AS cs.SD

    MSceneSpeech: A Multi-Scene Speech Dataset For Expressive Speech Synthesis

    Authors: Qian Yang, Jialong Zuo, Zhe Su, Ziyue Jiang, Mingze Li, Zhou Zhao, Feiyang Chen, Zhefeng Wang, Baoxing Huai

    Abstract: We introduce an open source high-quality Mandarin TTS dataset MSceneSpeech (Multiple Scene Speech Dataset), which is intended to provide resources for expressive speech synthesis. MSceneSpeech comprises numerous audio recordings and texts performed and recorded according to daily life scenarios. Each scenario includes multiple speakers and a diverse range of prosodic styles, making it suitable for… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Accepted by INTERSPEECH 2024

  5. arXiv:2407.13117  [pdf, other

    cs.CY cs.MM

    SOMONITOR: Explainable Marketing Data Processing and Analysis with Large Language Models

    Authors: Qi Yang, Sergey Nikolenko, Marlo Ongpin, Ilia Gossoudarev, Yu-Yi Chu-Farseeva, Aleksandr Farseev

    Abstract: Online marketing faces formidable challenges in managing and interpreting immense volumes of data necessary for competitor analysis, content research, and strategic branding. It is impossible to review hundreds to thousands of transient online content items by hand, and partial analysis often leads to suboptimal outcomes and poorly performing campaigns. We introduce an explainable AI framework SoM… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  6. arXiv:2407.12517  [pdf, other

    cs.LG

    Evaluating the transferability potential of deep learning models for climate downscaling

    Authors: Ayush Prasad, Paula Harder, Qidong Yang, Prasanna Sattegeri, Daniela Szwarcman, Campbell Watson, David Rolnick

    Abstract: Climate downscaling, the process of generating high-resolution climate data from low-resolution simulations, is essential for understanding and adapting to climate change at regional and local scales. Deep learning approaches have proven useful in tackling this problem. However, existing studies usually focus on training models for one specific task, location and variable, which are therefore limi… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  7. arXiv:2407.12014  [pdf, other

    cs.HC cs.CY

    Surprising Performances of Students with Autism in Classroom with NAO Robot

    Authors: Qin Yang, Huan Lu, Dandan Liang, Shengrong Gong, Huanghao Feng

    Abstract: Autism is a developmental disorder that manifests in early childhood and persists throughout life, profoundly affecting social behavior and hindering the acquisition of learning and social skills in those diagnosed. As technological advancements progress, an increasing array of technologies is being utilized to support the education of students with Autism Spectrum Disorder (ASD), aiming to improv… ▽ More

    Submitted 26 June, 2024; originally announced July 2024.

  8. arXiv:2407.11536  [pdf, other

    cs.CL cs.AI

    Fine-Tuning Medical Language Models for Enhanced Long-Contextual Understanding and Domain Expertise

    Authors: Qimin Yang, Rongsheng Wang, Jiexin Chen, Runqi Su, Tao Tan

    Abstract: Large Language Models (LLMs) have been widely applied in various professional fields. By fine-tuning the models using domain specific question and answer datasets, the professional domain knowledge and Q\&A abilities of these models have significantly improved, for example, medical professional LLMs that use fine-tuning of doctor-patient Q\&A data exhibit extraordinary disease diagnostic abilities… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 5 pages, 1 figure. Accepted by the Workshop on Long-Context Foundation Models (LCFM) at ICML 2024

  9. arXiv:2407.10759  [pdf, other

    eess.AS cs.CL cs.LG

    Qwen2-Audio Technical Report

    Authors: Yunfei Chu, Jin Xu, Qian Yang, Haojie Wei, Xipin Wei, Zhifang Guo, Yichong Leng, Yuanjun Lv, Jinzheng He, Junyang Lin, Chang Zhou, Jingren Zhou

    Abstract: We introduce the latest progress of Qwen-Audio, a large-scale audio-language model called Qwen2-Audio, which is capable of accepting various audio signal inputs and performing audio analysis or direct textual responses with regard to speech instructions. In contrast to complex hierarchical tags, we have simplified the pre-training process by utilizing natural language prompts for different data an… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: https://github.com/QwenLM/Qwen2-Audio. Checkpoints, codes and scripts will be opensoursed soon

  10. arXiv:2407.10416  [pdf, other

    cs.AR

    SOFA: A Compute-Memory Optimized Sparsity Accelerator via Cross-Stage Coordinated Tiling

    Authors: Huizheng Wang, Jiahao Fang, Xinru Tang, Zhiheng Yue, Jinxi Li, Yubin Qin, Sihan Guan, Qize Yang, Yang Wang, Chao Li, Yang Hu, Shouyi Yin

    Abstract: Benefiting from the self-attention mechanism, Transformer models have attained impressive contextual comprehension capabilities for lengthy texts. The requirements of high-throughput inference arise as the large language models (LLMs) become increasingly prevalent, which calls for large-scale token parallel processing (LTPP). However, existing dynamic sparse accelerators struggle to effectively ha… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

  11. arXiv:2407.10285  [pdf, other

    cs.CV

    Noise Calibration: Plug-and-play Content-Preserving Video Enhancement using Pre-trained Video Diffusion Models

    Authors: Qinyu Yang, Haoxin Chen, Yong Zhang, Menghan Xia, Xiaodong Cun, Zhixun Su, Ying Shan

    Abstract: In order to improve the quality of synthesized videos, currently, one predominant method involves retraining an expert diffusion model and then implementing a noising-denoising process for refinement. Despite the significant training costs, maintaining consistency of content between the original and enhanced videos remains a major challenge. To tackle this challenge, we propose a novel formulation… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: ECCV 2024, Project Page: https://yangqy1110.github.io/NC-SDEdit/, Code Repo: https://github.com/yangqy1110/NC-SDEdit/

    ACM Class: I.2; I.4.3

  12. arXiv:2407.09806  [pdf, other

    cs.CV

    Asynchronous Feedback Network for Perceptual Point Cloud Quality Assessment

    Authors: Yujie Zhang, Qi Yang, Ziyu Shan, Yiling Xu

    Abstract: Recent years have witnessed the success of the deep learning-based technique in research of no-reference point cloud quality assessment (NR-PCQA). For a more accurate quality prediction, many previous studies have attempted to capture global and local feature in a bottom-up manner, but ignored the interaction and promotion between them. To solve this problem, we propose a novel asynchronous feedba… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  13. arXiv:2407.08165  [pdf, other

    eess.IV cs.CV

    Explicit-NeRF-QA: A Quality Assessment Database for Explicit NeRF Model Compression

    Authors: Yuke Xing, Qi Yang, Kaifa Yang, Yilin Xu, Zhu Li

    Abstract: In recent years, Neural Radiance Fields (NeRF) have demonstrated significant advantages in representing and synthesizing 3D scenes. Explicit NeRF models facilitate the practical NeRF applications with faster rendering speed, and also attract considerable attention in NeRF compression due to its huge storage cost. To address the challenge of the NeRF compression study, in this paper, we construct a… ▽ More

    Submitted 18 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

    Comments: 5 pages, 4 figures, 2 tables, conference

  14. arXiv:2407.07840  [pdf, other

    cs.CV cs.CL

    Decompose and Compare Consistency: Measuring VLMs' Answer Reliability via Task-Decomposition Consistency Comparison

    Authors: Qian Yang, Weixiang Yan, Aishwarya Agrawal

    Abstract: Despite tremendous advancements, current state-of-the-art Vision-Language Models (VLMs) are still far from perfect. They tend to hallucinate and may generate biased responses. In such circumstances, having a way to assess the reliability of a given response generated by a VLM is quite useful. Existing methods, such as estimating uncertainty using answer likelihoods or prompt-based confidence gener… ▽ More

    Submitted 11 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

    Comments: Preprint

  15. arXiv:2407.07495  [pdf, other

    cs.CL

    Bucket Pre-training is All You Need

    Authors: Hongtao Liu, Qiyao Peng, Qing Yang, Kai Liu, Hongyan Xu

    Abstract: Large language models (LLMs) have demonstrated exceptional performance across various natural language processing tasks. However, the conventional fixed-length data composition strategy for pretraining, which involves concatenating and splitting documents, can introduce noise and limit the model's ability to capture long-range dependencies. To address this, we first introduce three metrics for eva… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  16. arXiv:2407.07487  [pdf, other

    cs.CL

    Review-LLM: Harnessing Large Language Models for Personalized Review Generation

    Authors: Qiyao Peng, Hongtao Liu, Hongyan Xu, Qing Yang, Minglai Shao, Wenjun Wang

    Abstract: Product review generation is an important task in recommender systems, which could provide explanation and persuasiveness for the recommendation. Recently, Large Language Models (LLMs, e.g., ChatGPT) have shown superior text modeling and generating ability, which could be applied in review generation. However, directly applying the LLMs for generating reviews might be troubled by the ``polite'' ph… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  17. arXiv:2407.05767  [pdf, other

    eess.IV cs.CV

    Nonrigid Reconstruction of Freehand Ultrasound without a Tracker

    Authors: Qi Li, Ziyi Shen, Qianye Yang, Dean C. Barratt, Matthew J. Clarkson, Tom Vercauteren, Yipeng Hu

    Abstract: Reconstructing 2D freehand Ultrasound (US) frames into 3D space without using a tracker has recently seen advances with deep learning. Predicting good frame-to-frame rigid transformations is often accepted as the learning objective, especially when the ground-truth labels from spatial tracking devices are inherently rigid transformations. Motivated by a) the observed nonrigid deformation due to so… ▽ More

    Submitted 14 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

    Comments: Accepted at MICCAI 2024

  18. arXiv:2407.03885  [pdf, other

    cs.CV eess.IV

    Perception-Guided Quality Metric of 3D Point Clouds Using Hybrid Strategy

    Authors: Yujie Zhang, Qi Yang, Yiling Xu, Shan Liu

    Abstract: Full-reference point cloud quality assessment (FR-PCQA) aims to infer the quality of distorted point clouds with available references. Most of the existing FR-PCQA metrics ignore the fact that the human visual system (HVS) dynamically tackles visual information according to different distortion levels (i.e., distortion detection for high-quality samples and appearance perception for low-quality sa… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  19. arXiv:2407.03876  [pdf, other

    cs.CR cs.CL

    DART: Deep Adversarial Automated Red Teaming for LLM Safety

    Authors: Bojian Jiang, Yi Jing, Tianhao Shen, Qing Yang, Deyi Xiong

    Abstract: Manual Red teaming is a commonly-used method to identify vulnerabilities in large language models (LLMs), which, is costly and unscalable. In contrast, automated red teaming uses a Red LLM to automatically generate adversarial prompts to the Target LLM, offering a scalable way for safety vulnerability detection. However, the difficulty of building a powerful automated Red LLM lies in the fact that… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  20. arXiv:2407.03440  [pdf, other

    cs.SD cs.LG eess.AS

    Advanced Framework for Animal Sound Classification With Features Optimization

    Authors: Qiang Yang, Xiuying Chen, Changsheng Ma, Carlos M. Duarte, Xiangliang Zhang

    Abstract: The automatic classification of animal sounds presents an enduring challenge in bioacoustics, owing to the diverse statistical properties of sound signals, variations in recording equipment, and prevalent low Signal-to-Noise Ratio (SNR) conditions. Deep learning models like Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) have excelled in human speech recognition but have not… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  21. arXiv:2407.00478  [pdf, other

    cs.LG cs.AI

    Knowledge-Aware Parsimony Learning: A Perspective from Relational Graphs

    Authors: Quanming Yao, Yongqi Zhang, Yaqing Wang, Nan Yin, James Kwok, Qiang Yang

    Abstract: The scaling law, a strategy that involves the brute-force scaling of the training dataset and learnable parameters, has become a prevalent approach for developing stronger learning models. In this paper, we examine its rationale in terms of learning from relational graphs. We demonstrate that directly adhering to such a scaling law does not necessarily yield stronger models due to architectural in… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  22. arXiv:2406.19703  [pdf, other

    cs.CV

    Vision Transformer with Key-select Routing Attention for Single Image Dehazing

    Authors: Lihan Tong, Weijia Li, Qingxia Yang, Liyuan Chen, Peng Chen

    Abstract: We present Ksformer, utilizing Multi-scale Key-select Routing Attention (MKRA) for intelligent selection of key areas through multi-channel, multi-scale windows with a top-k operator, and Lightweight Frequency Processing Module (LFPM) to enhance high-frequency features, outperforming other dehazing methods in tests.

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: 5 pages,4 figures,IEICE Trans. Information and Systems

    Report number: Vol.E107-D,No.11,pp.-,Nov. 2024 MSC Class: 68U10(Primary) ACM Class: I.4

  23. arXiv:2406.18862  [pdf, other

    cs.SD eess.AS

    Streaming Decoder-Only Automatic Speech Recognition with Discrete Speech Units: A Pilot Study

    Authors: Peikun Chen, Sining Sun, Changhao Shan, Qing Yang, Lei Xie

    Abstract: Unified speech-text models like SpeechGPT, VioLA, and AudioPaLM have shown impressive performance across various speech-related tasks, especially in Automatic Speech Recognition (ASR). These models typically adopt a unified method to model discrete speech and text tokens, followed by training a decoder-only transformer. However, they are all designed for non-streaming ASR tasks, where the entire s… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted for Interspeech 2024

  24. arXiv:2406.17404  [pdf, other

    cs.CL cs.LG

    Make Some Noise: Unlocking Language Model Parallel Inference Capability through Noisy Training

    Authors: Yixuan Wang, Xianzhen Luo, Fuxuan Wei, Yijun Liu, Qingfu Zhu, Xuanyu Zhang, Qing Yang, Dongliang Xu, Wanxiang Che

    Abstract: Existing speculative decoding methods typically require additional model structure and training processes to assist the model for draft token generation. This makes the migration of acceleration methods to the new model more costly and more demanding on device memory. To address this problem, we propose the Make Some Noise (MSN) training framework as a replacement for the supervised fine-tuning st… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 11 pages, 6 figures

  25. arXiv:2406.16442  [pdf, other

    cs.CV

    EmoLLM: Multimodal Emotional Understanding Meets Large Language Models

    Authors: Qu Yang, Mang Ye, Bo Du

    Abstract: Multi-modal large language models (MLLMs) have achieved remarkable performance on objective multimodal perception tasks, but their ability to interpret subjective, emotionally nuanced multimodal content remains largely unexplored. Thus, it impedes their ability to effectively understand and react to the intricate emotions expressed by humans through multimodal media. To bridge this gap, we introdu… ▽ More

    Submitted 29 June, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

    Comments: 9 pages

  26. arXiv:2406.16271  [pdf, other

    cs.CV

    Feature-prompting GBMSeg: One-Shot Reference Guided Training-Free Prompt Engineering for Glomerular Basement Membrane Segmentation

    Authors: Xueyu Liu, Guangze Shi, Rui Wang, Yexin Lai, Jianan Zhang, Lele Sun, Quan Yang, Yongfei Wu, MIng Li, Weixia Han, Wen Zheng

    Abstract: Assessment of the glomerular basement membrane (GBM) in transmission electron microscopy (TEM) is crucial for diagnosing chronic kidney disease (CKD). The lack of domain-independent automatic segmentation tools for the GBM necessitates an AI-based solution to automate the process. In this study, we introduce GBMSeg, a training-free framework designed to automatically segment the GBM in TEM images… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: Accepted for MICCAI2024

  27. arXiv:2406.13007  [pdf, other

    cs.CV

    NTIRE 2024 Challenge on Night Photography Rendering

    Authors: Egor Ershov, Artyom Panshin, Oleg Karasev, Sergey Korchagin, Shepelev Lev, Alexandr Startsev, Daniil Vladimirov, Ekaterina Zaychenkova, Nikola Banić, Dmitrii Iarchuk, Maria Efimova, Radu Timofte, Arseniy Terekhin, Shuwei Yue, Yuyang Liu, Minchen Wei, Lu Xu, Chao Zhang, Yasi Wang, Furkan Kınlı, Doğa Yılmaz, Barış Özcan, Furkan Kıraç, Shuai Liu, Jingyuan Xiao , et al. (25 additional authors not shown)

    Abstract: This paper presents a review of the NTIRE 2024 challenge on night photography rendering. The goal of the challenge was to find solutions that process raw camera images taken in nighttime conditions, and thereby produce a photo-quality output images in the standard RGB (sRGB) space. Unlike the previous year's competition, the challenge images were collected with a mobile phone and the speed of algo… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 10 pages, 10 figures

  28. arXiv:2406.12726  [pdf, other

    cs.SD cs.AI eess.AS

    ED-sKWS: Early-Decision Spiking Neural Networks for Rapid,and Energy-Efficient Keyword Spotting

    Authors: Zeyang Song, Qianhui Liu, Qu Yang, Yizhou Peng, Haizhou Li

    Abstract: Keyword Spotting (KWS) is essential in edge computing requiring rapid and energy-efficient responses. Spiking Neural Networks (SNNs) are well-suited for KWS for their efficiency and temporal capacity for speech. To further reduce the latency and energy consumption, this study introduces ED-sKWS, an SNN-based KWS model with an early-decision mechanism that can stop speech processing and output the… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH2024

  29. arXiv:2406.12403  [pdf, other

    cs.CL cs.AI

    PDSS: A Privacy-Preserving Framework for Step-by-Step Distillation of Large Language Models

    Authors: Tao Fan, Yan Kang, Weijing Chen, Hanlin Gu, Yuanfeng Song, Lixin Fan, Kai Chen, Qiang Yang

    Abstract: In the context of real-world applications, leveraging large language models (LLMs) for domain-specific tasks often faces two major challenges: domain-specific knowledge privacy and constrained resources. To address these issues, we propose PDSS, a privacy-preserving framework for step-by-step distillation of LLMs. PDSS works on a server-client architecture, wherein client transmits perturbed promp… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  30. arXiv:2406.12254  [pdf, other

    eess.IV cs.CV

    Enhancing Single-Slice Segmentation with 3D-to-2D Unpaired Scan Distillation

    Authors: Xin Yu, Qi Yang, Han Liu, Ho Hin Lee, Yucheng Tang, Lucas W. Remedios, Michael E. Kim, Rendong Zhang, Shunxing Bao, Yuankai Huo, Ann Zenobia Moore, Luigi Ferrucci, Bennett A. Landman

    Abstract: 2D single-slice abdominal computed tomography (CT) enables the assessment of body habitus and organ health with low radiation exposure. However, single-slice data necessitates the use of 2D networks for segmentation, but these networks often struggle to capture contextual information effectively. Consequently, even when trained on identical datasets, 3D networks typically achieve superior segmenta… ▽ More

    Submitted 12 July, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

  31. HiFGL: A Hierarchical Framework for Cross-silo Cross-device Federated Graph Learning

    Authors: Zhuoning Guo, Duanyi Yao, Qiang Yang, Hao Liu

    Abstract: Federated Graph Learning (FGL) has emerged as a promising way to learn high-quality representations from distributed graph data with privacy preservation. Despite considerable efforts have been made for FGL under either cross-device or cross-silo paradigm, how to effectively capture graph knowledge in a more complicated cross-silo cross-device environment remains an under-explored problem. However… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: Accepted by SIGKDD 2024

  32. arXiv:2406.10540  [pdf, other

    cs.AI cs.NE cs.RO

    Generating and Evolving Reward Functions for Highway Driving with Large Language Models

    Authors: Xu Han, Qiannan Yang, Xianda Chen, Xiaowen Chu, Meixin Zhu

    Abstract: Reinforcement Learning (RL) plays a crucial role in advancing autonomous driving technologies by maximizing reward functions to achieve the optimal policy. However, crafting these reward functions has been a complex, manual process in many practices. To reduce this complexity, we introduce a novel framework that integrates Large Language Models (LLMs) with RL to improve reward function design in a… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: 7 pages, 6 figures

  33. arXiv:2406.10469  [pdf, other

    eess.IV cs.CV cs.MM

    Object-Attribute-Relation Representation based Video Semantic Communication

    Authors: Qiyuan Du, Yiping Duan, Qianqian Yang, Xiaoming Tao, Mérouane Debbah

    Abstract: With the rapid growth of multimedia data volume, there is an increasing need for efficient video transmission in applications such as virtual reality and future video streaming services. Semantic communication is emerging as a vital technique for ensuring efficient and reliable transmission in low-bandwidth, high-noise settings. However, most current approaches focus on joint source-channel coding… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  34. arXiv:2406.04601  [pdf, other

    cs.LG

    Enhancing Size Generalization in Graph Neural Networks through Disentangled Representation Learning

    Authors: Zheng Huang, Qihui Yang, Dawei Zhou, Yujun Yan

    Abstract: Although most graph neural networks (GNNs) can operate on graphs of any size, their classification performance often declines on graphs larger than those encountered during training. Existing methods insufficiently address the removal of size information from graph representations, resulting in sub-optimal performance and reliance on backbone models. In response, we propose DISGEN, a novel and mod… ▽ More

    Submitted 11 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

  35. arXiv:2406.04323  [pdf, other

    cs.LG cs.AI cs.CV

    ATraDiff: Accelerating Online Reinforcement Learning with Imaginary Trajectories

    Authors: Qianlan Yang, Yu-Xiong Wang

    Abstract: Training autonomous agents with sparse rewards is a long-standing problem in online reinforcement learning (RL), due to low data efficiency. Prior work overcomes this challenge by extracting useful knowledge from offline data, often accomplished through the learning of action distribution from offline data and utilizing the learned distribution to facilitate online RL. However, since the offline d… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: ICML 2024 Accepted

  36. arXiv:2406.04025  [pdf

    cs.CL

    The syntax-semantics interface in a child's path: A study of 3- to 11-year-olds' elicited production of Mandarin recursive relative clauses

    Authors: Caimei Yang, Qihang Yang, Xingzhi Su, Chenxi Fu, Xiaoyi Wang, Ying Yan, Zaijiang Man

    Abstract: There have been apparently conflicting claims over the syntax-semantics relationship in child acquisition. However, few of them have assessed the child's path toward the acquisition of recursive relative clauses (RRCs). The authors of the current paper did experiments to investigate 3- to 11-year-olds' most-structured elicited production of eight Mandarin RRCs in a 4 (syntactic types)*2 (semantic… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  37. arXiv:2406.03868  [pdf, other

    cs.DC

    PALM: A Efficient Performance Simulator for Tiled Accelerators with Large-scale Model Training

    Authors: Jiahao Fang, Huizheng Wang, Qize Yang, Dehao Kong, Xu Dai, Jinyi Deng, Yang Hu, Shouyi Yin

    Abstract: Deep learning (DL) models are piquing high interest and scaling at an unprecedented rate. To this end, a handful of tiled accelerators have been proposed to support such large-scale training tasks. However, these accelerators often incorporate numerous cores or tiles even extending to wafer-scale, substantial on-chip bandwidth, and distributed memory systems. This results in an exceedingly complex… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 11 pages

  38. arXiv:2406.02916  [pdf, other

    cs.RO

    Real-time Motion Planning for autonomous vehicles in dynamic environments

    Authors: Mohammad Dehghani Tezerjani, Dominic Carrillo, Deyuan Qu, Sudip Dhakal, Amir Mirzaeinia, Qing Yang

    Abstract: Recent advancements in self-driving car technologies have enabled them to navigate autonomously through various environments. However, one of the critical challenges in autonomous vehicle operation is trajectory planning, especially in dynamic environments with moving obstacles. This research aims to tackle this challenge by proposing a robust algorithm tailored for autonomous cars operating in dy… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 8 pages

  39. arXiv:2406.02224  [pdf, other

    cs.CL cs.AI

    FedMKT: Federated Mutual Knowledge Transfer for Large and Small Language Models

    Authors: Tao Fan, Guoqiang Ma, Yan Kang, Hanlin Gu, Yuanfeng Song, Lixin Fan, Kai Chen, Qiang Yang

    Abstract: Recent research in federated large language models (LLMs) has primarily focused on enabling clients to fine-tune their locally deployed homogeneous LLMs collaboratively or on transferring knowledge from server-based LLMs to small language models (SLMs) at downstream clients. However, a significant gap remains in the simultaneous mutual enhancement of both the server's LLM and clients' SLMs. To bri… ▽ More

    Submitted 18 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

  40. arXiv:2406.01956  [pdf, other

    cs.CV

    Enhance Image-to-Image Generation with LLaVA Prompt and Negative Prompt

    Authors: Zhicheng Ding, Panfeng Li, Qikai Yang, Siyang Li

    Abstract: This paper presents a novel approach to enhance image-to-image generation by leveraging the multimodal capabilities of the Large Language and Vision Assistant (LLaVA). We propose a framework where LLaVA analyzes input images and generates textual descriptions, hereinafter LLaVA-generated prompts. These prompts, along with the original image, are fed into the image-to-image generation pipeline. Thi… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted by 2024 5th International Conference on Information Science, Parallel and Distributed Systems

  41. arXiv:2406.01422  [pdf, other

    cs.SE cs.CL

    How to Understand Whole Software Repository?

    Authors: Yingwei Ma, Qingping Yang, Rongyu Cao, Binhua Li, Fei Huang, Yongbin Li

    Abstract: Recently, Large Language Model (LLM) based agents have advanced the significant development of Automatic Software Engineering (ASE). Although verified effectiveness, the designs of the existing methods mainly focus on the local information of codes, e.g., issues, classes, and functions, leading to limitations in capturing the global context and interdependencies within the software system. From th… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  42. arXiv:2406.01085  [pdf, other

    cs.CR cs.AI

    FedAdOb: Privacy-Preserving Federated Deep Learning with Adaptive Obfuscation

    Authors: Hanlin Gu, Jiahuan Luo, Yan Kang, Yuan Yao, Gongxi Zhu, Bowen Li, Lixin Fan, Qiang Yang

    Abstract: Federated learning (FL) has emerged as a collaborative approach that allows multiple clients to jointly learn a machine learning model without sharing their private data. The concern about privacy leakage, albeit demonstrated under specific conditions, has triggered numerous follow-up research in designing powerful attacking methods and effective defending mechanisms aiming to thwart these attacki… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  43. arXiv:2405.20681  [pdf, other

    cs.CR cs.AI

    No Free Lunch Theorem for Privacy-Preserving LLM Inference

    Authors: Xiaojin Zhang, Yulin Fei, Yan Kang, Wei Chen, Lixin Fan, Hai Jin, Qiang Yang

    Abstract: Individuals and businesses have been significantly benefited by Large Language Models (LLMs) including PaLM, Gemini and ChatGPT in various ways. For example, LLMs enhance productivity, reduce costs, and enable us to focus on more valuable tasks. Furthermore, LLMs possess the capacity to sift through extensive datasets, uncover underlying patterns, and furnish critical insights that propel the fron… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  44. arXiv:2405.18802  [pdf, other

    cs.CR cs.AI

    Enhancing Security and Privacy in Federated Learning using Update Digests and Voting-Based Defense

    Authors: Wenjie Li, Kai Fan, Jingyuan Zhang, Hui Li, Wei Yang Bryan Lim, Qiang Yang

    Abstract: Federated Learning (FL) is a promising privacy-preserving machine learning paradigm that allows data owners to collaboratively train models while keeping their data localized. Despite its potential, FL faces challenges related to the trustworthiness of both clients and servers, especially in the presence of curious or malicious adversaries. In this paper, we introduce a novel framework named \unde… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 14 pages

  45. arXiv:2405.18776  [pdf, other

    cs.CR cs.CL cs.LG

    LMO-DP: Optimizing the Randomization Mechanism for Differentially Private Fine-Tuning (Large) Language Models

    Authors: Qin Yang, Meisam Mohammad, Han Wang, Ali Payani, Ashish Kundu, Kai Shu, Yan Yan, Yuan Hong

    Abstract: Differentially Private Stochastic Gradient Descent (DP-SGD) and its variants have been proposed to ensure rigorous privacy for fine-tuning large-scale pre-trained language models. However, they rely heavily on the Gaussian mechanism, which may overly perturb the gradients and degrade the accuracy, especially in stronger privacy regimes (e.g., the privacy budget $ε< 3$). To address such limitations… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 18 pages, 15 figures

  46. arXiv:2405.17660  [pdf, other

    cs.CV

    LoReTrack: Efficient and Accurate Low-Resolution Transformer Tracking

    Authors: Shaohua Dong, Yunhe Feng, Qing Yang, Yuewei Lin, Heng Fan

    Abstract: High-performance Transformer trackers have shown excellent results, yet they often bear a heavy computational load. Observing that a smaller input can immediately and conveniently reduce computations without changing the model, an easy solution is to adopt the low-resolution input for efficient Transformer tracking. Albeit faster, this hurts tracking accuracy much due to information loss in low re… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  47. arXiv:2405.17522  [pdf, other

    cs.LG cs.DC

    Efficient Model Compression for Hierarchical Federated Learning

    Authors: Xi Zhu, Songcan Yu, Junbo Wang, Qinglin Yang

    Abstract: Federated learning (FL), as an emerging collaborative learning paradigm, has garnered significant attention due to its capacity to preserve privacy within distributed learning systems. In these systems, clients collaboratively train a unified neural network model using their local datasets and share model parameters rather than raw data, enhancing privacy. Predominantly, FL systems are designed fo… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  48. arXiv:2405.17221  [pdf, other

    cs.AI cs.AR

    Efficient Orchestrated AI Workflows Execution on Scale-out Spatial Architecture

    Authors: Jinyi Deng, Xinru Tang, Zhiheng Yue, Guangyang Lu, Qize Yang, Jiahao Zhang, Jinxi Li, Chao Li, Shaojun Wei, Yang Hu, Shouyi Yin

    Abstract: Given the increasing complexity of AI applications, traditional spatial architectures frequently fall short. Our analysis identifies a pattern of interconnected, multi-faceted tasks encompassing both AI and general computational processes. In response, we have conceptualized "Orchestrated AI Workflows," an approach that integrates various tasks with logic-driven decisions into dynamic, sophisticat… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  49. arXiv:2405.15474  [pdf, other

    cs.LG cs.DC

    Unlearning during Learning: An Efficient Federated Machine Unlearning Method

    Authors: Hanlin Gu, Gongxi Zhu, Jie Zhang, Xinyuan Zhao, Yuxing Han, Lixin Fan, Qiang Yang

    Abstract: In recent years, Federated Learning (FL) has garnered significant attention as a distributed machine learning paradigm. To facilitate the implementation of the right to be forgotten, the concept of federated machine unlearning (FMU) has also emerged. However, current FMU approaches often involve additional time-consuming steps and may not offer comprehensive unlearning capabilities, which renders… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: Accepted by IJCAI 2024

  50. arXiv:2405.14488  [pdf, other

    cs.CL

    MoGU: A Framework for Enhancing Safety of Open-Sourced LLMs While Preserving Their Usability

    Authors: Yanrui Du, Sendong Zhao, Danyang Zhao, Ming Ma, Yuhan Chen, Liangyu Huo, Qing Yang, Dongliang Xu, Bing Qin

    Abstract: Large Language Models (LLMs) are increasingly deployed in various applications. As their usage grows, concerns regarding their safety are rising, especially in maintaining harmless responses when faced with malicious instructions. Many defense strategies have been developed to enhance the safety of LLMs. However, our research finds that existing defense strategies lead LLMs to predominantly adopt… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.