Skip to main content

Showing 1–50 of 463 results for author: Dong, X

  1. arXiv:2407.18487  [pdf, other

    cs.CV

    SMPISD-MTPNet: Scene Semantic Prior-Assisted Infrared Ship Detection Using Multi-Task Perception Networks

    Authors: Chen Hu, Xiaogang Dong, Yian Huang Lele Wang, Liang Xu, Tian Pu, Zhenming Peng

    Abstract: Infrared ship detection (IRSD) has received increasing attention in recent years due to the robustness of infrared images to adverse weather. However, a large number of false alarms may occur in complex scenes. To address these challenges, we propose the Scene Semantic Prior-Assisted Multi-Task Perception Network (SMPISD-MTPNet), which includes three stages: scene semantic extraction, deep feature… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  2. arXiv:2407.16975  [pdf, other

    cs.LG stat.ME

    On the Parameter Identifiability of Partially Observed Linear Causal Models

    Authors: Xinshuai Dong, Ignavier Ng, Biwei Huang, Yuewen Sun, Songyao Jin, Roberto Legaspi, Peter Spirtes, Kun Zhang

    Abstract: Linear causal models are important tools for modeling causal dependencies and yet in practice, only a subset of the variables can be observed. In this paper, we examine the parameter identifiability of these models by investigating whether the edge coefficients can be recovered given the causal structure and partially observed data. Our setting is more general than that of prior research - we allo… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  3. arXiv:2407.16286  [pdf, other

    cs.LG cs.AI

    A deeper look at depth pruning of LLMs

    Authors: Shoaib Ahmed Siddiqui, Xin Dong, Greg Heinrich, Thomas Breuel, Jan Kautz, David Krueger, Pavlo Molchanov

    Abstract: Large Language Models (LLMs) are not only resource-intensive to train but even more costly to deploy in production. Therefore, recent work has attempted to prune blocks of LLMs based on cheap proxies for estimating block importance, effectively removing 10% of blocks in well-trained LLaMa-2 and Mistral 7b models without any significant degradation of downstream metrics. In this paper, we explore d… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  4. arXiv:2407.15886  [pdf, other

    cs.CV cs.AI

    CatVTON: Concatenation Is All You Need for Virtual Try-On with Diffusion Models

    Authors: Zheng Chong, Xiao Dong, Haoxiang Li, Shiyue Zhang, Wenqing Zhang, Xujie Zhang, Hanqing Zhao, Xiaodan Liang

    Abstract: Virtual try-on methods based on diffusion models achieve realistic try-on effects but often replicate the backbone network as a ReferenceNet or use additional image encoders to process condition inputs, leading to high training and inference costs. In this work, we rethink the necessity of ReferenceNet and image encoders and innovate the interaction between garment and person by proposing CatVTON,… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

    Comments: 10 pages, 9 figures, 4 tables

    MSC Class: 68T42 (Primary) 168T45 (Secondary) ACM Class: I.4.9

  5. arXiv:2407.14804  [pdf, other

    cs.CR

    WiFaKey: Generating Cryptographic Keys from Face in the Wild

    Authors: Xingbo Dong, Hui Zhang, Yen Lung Lai, Zhe Jin, Junduan Huang, Wenxiong Kang, Andrew Beng Jin Teoh

    Abstract: Deriving a unique cryptographic key from biometric measurements is a challenging task due to the existing noise gap between the biometric measurements and error correction coding. Additionally, privacy and security concerns arise as biometric measurements are inherently linked to the user. Biocryptosystems represent a key branch of solutions aimed at addressing these issues. However, many existing… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

  6. arXiv:2407.13141  [pdf, other

    cs.LG

    Out-of-Distribution Detection through Soft Clustering with Non-Negative Kernel Regression

    Authors: Aryan Gulati, Xingjian Dong, Carlos Hurtado, Sarath Shekkizhar, Swabha Swayamdipta, Antonio Ortega

    Abstract: As language models become more general purpose, increased attention needs to be paid to detecting out-of-distribution (OOD) instances, i.e., those not belonging to any of the distributions seen during training. Existing methods for detecting OOD data are computationally complex and storage-intensive. We propose a novel soft clustering approach for OOD detection based on non-negative kernel regress… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  7. arXiv:2407.11691  [pdf, other

    cs.CV

    VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models

    Authors: Haodong Duan, Junming Yang, Yuxuan Qiao, Xinyu Fang, Lin Chen, Yuan Liu, Xiaoyi Dong, Yuhang Zang, Pan Zhang, Jiaqi Wang, Dahua Lin, Kai Chen

    Abstract: We present VLMEvalKit: an open-source toolkit for evaluating large multi-modality models based on PyTorch. The toolkit aims to provide a user-friendly and comprehensive framework for researchers and developers to evaluate existing multi-modality models and publish reproducible evaluation results. In VLMEvalKit, we implement over 70 different large multi-modality models, including both proprietary… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  8. arXiv:2407.10510  [pdf, other

    cs.CL cs.AI cs.CE

    TCM-FTP: Fine-Tuning Large Language Models for Herbal Prescription Prediction

    Authors: Xingzhi Zhou, Xin Dong, Chunhao Li, Yuning Bai, Yulong Xu, Ka Chun Cheung, Simon See, Xinpeng Song, Runshun Zhang, Xuezhong Zhou, Nevin L. Zhang

    Abstract: Traditional Chinese medicine (TCM) relies on specific combinations of herbs in prescriptions to treat symptoms and signs, a practice that spans thousands of years. Predicting TCM prescriptions presents a fascinating technical challenge with practical implications. However, this task faces limitations due to the scarcity of high-quality clinical datasets and the intricate relationship between sympt… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  9. arXiv:2407.10080  [pdf, other

    cs.IT eess.SY

    Design and Optimization on Successive RIS-assisted Multi-hop Wireless Communications

    Authors: Rujing Xiong, Jialong Lu, Jianan Zhang, Minggang Liu, Xuehui Dong, Tiebin Mi, Robert Caiming Qiu

    Abstract: As an emerging wireless communication technology, reconfigurable intelligent surface (RIS) has become a basic choice for providing signal coverage services in scenarios with dense obstacles or long tunnels through multi-hop configurations. Conventional works of literature mainly focus on alternating optimization or single-beam calculation in RIS phase configuration, which is limited in considering… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

  10. arXiv:2407.07844  [pdf, other

    cs.CV

    OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion

    Authors: Hao Wang, Pengzhen Ren, Zequn Jie, Xiao Dong, Chengjian Feng, Yinlong Qian, Lin Ma, Dongmei Jiang, Yaowei Wang, Xiangyuan Lan, Xiaodan Liang

    Abstract: Open-vocabulary detection is a challenging task due to the requirement of detecting objects based on class names, including those not encountered during training. Existing methods have shown strong zero-shot detection capabilities through pre-training and pseudo-labeling on diverse large-scale datasets. However, these approaches encounter two main challenges: (i) how to effectively eliminate data… ▽ More

    Submitted 21 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

    Comments: Technical Report

  11. arXiv:2407.07476  [pdf, other

    cs.DC

    A Transverse-Read-assisted Valid-Bit Collection to Accelerate Stochastic Computing MAC for Energy-Efficient in-RTM DNNs

    Authors: Jihe Wang, Zhiying Zhang, Xingwu Dong, Danghui Wang

    Abstract: It looks attractive to coordinate racetrack-memory(RM) and stochastic-computing (SC) jointly to build an ultra-low power neuron-architecture. However, the above combination has always been questioned in a fatal weakness that the narrow bit-view of the RM-MTJ structure, a.k.a. shift-and-access pattern, cannot physically match the great throughput of direct-stored stochastic sequences. Fortunately,… ▽ More

    Submitted 20 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

  12. arXiv:2407.05023  [pdf, other

    cs.CV

    SurgicalGaussian: Deformable 3D Gaussians for High-Fidelity Surgical Scene Reconstruction

    Authors: Weixing Xie, Junfeng Yao, Xianpeng Cao, Qiqin Lin, Zerui Tang, Xiao Dong, Xiaohu Guo

    Abstract: Dynamic reconstruction of deformable tissues in endoscopic video is a key technology for robot-assisted surgery. Recent reconstruction methods based on neural radiance fields (NeRFs) have achieved remarkable results in the reconstruction of surgical scenes. However, based on implicit representation, NeRFs struggle to capture the intricate details of objects in the scene and cannot achieve real-tim… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  13. arXiv:2407.03788  [pdf, other

    cs.CV cs.CL

    Meta-optimized Angular Margin Contrastive Framework for Video-Language Representation Learning

    Authors: Thong Nguyen, Yi Bin, Xiaobao Wu, Xinshuai Dong, Zhiyuan Hu, Khoi Le, Cong-Duy Nguyen, See-Kiong Ng, Luu Anh Tuan

    Abstract: Data quality stands at the forefront of deciding the effectiveness of video-language representation learning. However, video-text pairs in previous data typically do not align perfectly with each other, which might lead to video-language representations that do not accurately reflect cross-modal semantics. Moreover, previous data also possess an uneven distribution of concepts, thereby hampering t… ▽ More

    Submitted 19 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  14. arXiv:2407.03320  [pdf, other

    cs.CV cs.CL

    InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

    Authors: Pan Zhang, Xiaoyi Dong, Yuhang Zang, Yuhang Cao, Rui Qian, Lin Chen, Qipeng Guo, Haodong Duan, Bin Wang, Linke Ouyang, Songyang Zhang, Wenwei Zhang, Yining Li, Yang Gao, Peng Sun, Xinyue Zhang, Wei Li, Jingwen Li, Wenhai Wang, Hang Yan, Conghui He, Xingcheng Zhang, Kai Chen, Jifeng Dai, Yu Qiao , et al. (2 additional authors not shown)

    Abstract: We present InternLM-XComposer-2.5 (IXC-2.5), a versatile large-vision language model that supports long-contextual input and output. IXC-2.5 excels in various text-image comprehension and composition applications, achieving GPT-4V level capabilities with merely 7B LLM backend. Trained with 24K interleaved image-text contexts, it can seamlessly extend to 96K long contexts via RoPE extrapolation. Th… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Technical Report. https://github.com/InternLM/InternLM-XComposer

  15. arXiv:2407.02403  [pdf, other

    cs.CV cs.AI

    Face Reconstruction Transfer Attack as Out-of-Distribution Generalization

    Authors: Yoon Gyo Jung, Jaewoo Park, Xingbo Dong, Hojin Park, Andrew Beng Jin Teoh, Octavia Camps

    Abstract: Understanding the vulnerability of face recognition systems to malicious attacks is of critical importance. Previous works have focused on reconstructing face images that can penetrate a targeted verification system. Even in the white-box scenario, however, naively reconstructed images misrepresent the identity information, hence the attacks are easily neutralized once the face system is updated o… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV2024

  16. arXiv:2407.01523  [pdf, other

    cs.CV cs.CL

    MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations

    Authors: Yubo Ma, Yuhang Zang, Liangyu Chen, Meiqi Chen, Yizhu Jiao, Xinze Li, Xinyuan Lu, Ziyu Liu, Yan Ma, Xiaoyi Dong, Pan Zhang, Liangming Pan, Yu-Gang Jiang, Jiaqi Wang, Yixin Cao, Aixin Sun

    Abstract: Understanding documents with rich layouts and multi-modal components is a long-standing and practical task. Recent Large Vision-Language Models (LVLMs) have made remarkable strides in various tasks, particularly in single-page document understanding (DU). However, their abilities on long-context DU remain an open problem. This work presents MMLongBench-Doc, a long-context, multi-modal benchmark co… ▽ More

    Submitted 10 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

  17. arXiv:2406.16816  [pdf, other

    eess.SP cs.SI

    On the Impact of Sample Size in Reconstructing Noisy Graph Signals: A Theoretical Characterisation

    Authors: Baskaran Sripathmanathan, Xiaowen Dong, Michael Bronstein

    Abstract: Reconstructing a signal on a graph from noisy observations of a subset of the vertices is a fundamental problem in the field of graph signal processing. This paper investigates how sample size affects reconstruction error in the presence of noise via an in-depth theoretical analysis of the two most common reconstruction methods in the literature, least-squares reconstruction (LS) and graph-Laplaci… ▽ More

    Submitted 11 July, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

    Comments: The paper arXiv:2307.00336v1 is the earlier, shorter conference version of this paper

  18. arXiv:2406.14503  [pdf, other

    cs.CL

    Overview of the CAIL 2023 Argument Mining Track

    Authors: Jingcong Liang, Junlong Wang, Xinyu Zhai, Yungui Zhuang, Yiyang Zheng, Xin Xu, Xiandong Ran, Xiaozheng Dong, Honghui Rong, Yanlun Liu, Hao Chen, Yuhan Wei, Donghai Li, Jiajie Peng, Xuanjing Huang, Chongde Shi, Yansong Feng, Yun Song, Zhongyu Wei

    Abstract: We give a detailed overview of the CAIL 2023 Argument Mining Track, one of the Chinese AI and Law Challenge (CAIL) 2023 tracks. The main goal of the track is to identify and extract interacting argument pairs in trial dialogs. It mainly uses summarized judgment documents but can also refer to trial recordings. The track consists of two stages, and we introduce the tasks designed for each stage; we… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  19. arXiv:2406.11903  [pdf, other

    q-fin.GN cs.AI q-fin.CP

    A Survey of Large Language Models for Financial Applications: Progress, Prospects and Challenges

    Authors: Yuqi Nie, Yaxuan Kong, Xiaowen Dong, John M. Mulvey, H. Vincent Poor, Qingsong Wen, Stefan Zohren

    Abstract: Recent advances in large language models (LLMs) have unlocked novel opportunities for machine learning applications in the financial domain. These models have demonstrated remarkable capabilities in understanding context, processing vast amounts of data, and generating human-preferred contents. In this survey, we explore the application of LLMs on various financial tasks, focusing on their potenti… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  20. arXiv:2406.11890  [pdf, other

    cs.LG cs.AI cs.CL

    Unraveling the Mechanics of Learning-Based Demonstration Selection for In-Context Learning

    Authors: Hui Liu, Wenya Wang, Hao Sun, Chris Xing Tian, Chenqi Kong, Xin Dong, Haoliang Li

    Abstract: Large Language Models (LLMs) have demonstrated impressive in-context learning (ICL) capabilities from few-shot demonstration exemplars. While recent learning-based demonstration selection methods have proven beneficial to ICL by choosing more useful exemplars, their underlying mechanisms are opaque, hindering efforts to address limitations such as high training costs and poor generalization across… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  21. arXiv:2406.11833  [pdf, other

    cs.CV cs.AI cs.LG

    MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs

    Authors: Ziyu Liu, Tao Chu, Yuhang Zang, Xilin Wei, Xiaoyi Dong, Pan Zhang, Zijian Liang, Yuanjun Xiong, Yu Qiao, Dahua Lin, Jiaqi Wang

    Abstract: Generating natural and meaningful responses to communicate with multi-modal human inputs is a fundamental capability of Large Vision-Language Models(LVLMs). While current open-source LVLMs demonstrate promising performance in simplified scenarios such as single-turn single-image input, they fall short in real-world conversation scenarios such as following instructions in a long context history wit… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: This project is available at https://github.com/Liuziyu77/MMDU

  22. arXiv:2406.11739  [pdf, other

    cs.CV

    V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object Detection: Methods and Results

    Authors: Jiaqi Wang, Yuhang Zang, Pan Zhang, Tao Chu, Yuhang Cao, Zeyi Sun, Ziyu Liu, Xiaoyi Dong, Tong Wu, Dahua Lin, Zeming Chen, Zhi Wang, Lingchen Meng, Wenhao Yao, Jianwei Yang, Sihong Wu, Zhineng Chen, Zuxuan Wu, Yu-Gang Jiang, Peixi Wu, Bosong Chai, Xuan Nie, Longquan Yan, Zeyu Wang, Qifan Zhou , et al. (9 additional authors not shown)

    Abstract: Detecting objects in real-world scenes is a complex task due to various challenges, including the vast range of object categories, and potential encounters with previously unknown or unseen objects. The challenges necessitate the development of public benchmarks and challenges to advance the field of object detection. Inspired by the success of previous COCO and LVIS Challenges, we organize the V3… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  23. arXiv:2406.11131  [pdf, other

    cs.CL cs.AI cs.DB

    Are Large Language Models a Good Replacement of Taxonomies?

    Authors: Yushi Sun, Hao Xin, Kai Sun, Yifan Ethan Xu, Xiao Yang, Xin Luna Dong, Nan Tang, Lei Chen

    Abstract: Large language models (LLMs) demonstrate an impressive ability to internalize knowledge and answer natural language questions. Although previous studies validate that LLMs perform well on general knowledge while presenting poor performance on long-tail nuanced knowledge, the community is still doubtful about whether the traditional knowledge graphs should be replaced by LLMs. In this paper, we ask… ▽ More

    Submitted 20 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

    Comments: Accepted by VLDB 2024

  24. arXiv:2406.10098  [pdf, other

    cs.LG cs.AI

    ECGMamba: Towards Efficient ECG Classification with BiSSM

    Authors: Yupeng Qiang, Xunde Dong, Xiuling Liu, Yang Yang, Yihai Fang, Jianhong Dou

    Abstract: Electrocardiogram (ECG) signal analysis represents a pivotal technique in the diagnosis of cardiovascular diseases. Although transformer-based models have made significant progress in ECG classification, they exhibit inefficiencies in the inference phase. The issue is primarily attributable to the secondary computational complexity of Transformer's self-attention mechanism. particularly when proce… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 6 pages, 2 figures. arXiv admin note: text overlap with arXiv:2404.17858 by other authors

  25. arXiv:2406.08966  [pdf, other

    cs.LG cs.AI

    Separation Power of Equivariant Neural Networks

    Authors: Marco Pacini, Xiaowen Dong, Bruno Lepri, Gabriele Santin

    Abstract: The separation power of a machine learning model refers to its capacity to distinguish distinct inputs, and it is often employed as a proxy for its expressivity. In this paper, we propose a theoretical framework to investigate the separation power of equivariant neural networks with point-wise activations. Using the proposed framework, we can derive an explicit description of inputs indistinguisha… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 9 pages of main text, 2 figures

  26. arXiv:2406.07543  [pdf, other

    cs.CV

    Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning

    Authors: Chenyu Yang, Xizhou Zhu, Jinguo Zhu, Weijie Su, Junjie Wang, Xuan Dong, Wenhai Wang, Lewei Lu, Bin Li, Jie Zhou, Yu Qiao, Jifeng Dai

    Abstract: Recently, vision model pre-training has evolved from relying on manually annotated datasets to leveraging large-scale, web-crawled image-text data. Despite these advances, there is no pre-training method that effectively exploits the interleaved image-text data, which is very prevalent on the Internet. Inspired by the recent success of compression learning in natural language processing, we propos… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  27. arXiv:2406.05338  [pdf, other

    cs.CV

    MotionClone: Training-Free Motion Cloning for Controllable Video Generation

    Authors: Pengyang Ling, Jiazi Bu, Pan Zhang, Xiaoyi Dong, Yuhang Zang, Tong Wu, Huaian Chen, Jiaqi Wang, Yi Jin

    Abstract: Motion-based controllable text-to-video generation involves motions to control the video generation. Previous methods typically require the training of models to encode motion cues or the fine-tuning of video diffusion models. However, these approaches often result in suboptimal motion generation when applied outside the trained domain. In this work, we propose MotionClone, a training-free framewo… ▽ More

    Submitted 28 June, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

    Comments: 17 pages, 12 figures, https://bujiazi.github.io/motionclone.github.io/

  28. arXiv:2406.05039  [pdf, other

    cs.CV cs.CL

    Bootstrapping Referring Multi-Object Tracking

    Authors: Yani Zhang, Dongming Wu, Wencheng Han, Xingping Dong

    Abstract: Referring multi-object tracking (RMOT) aims at detecting and tracking multiple objects following human instruction represented by a natural language expression. Existing RMOT benchmarks are usually formulated through manual annotations, integrated with static regulations. This approach results in a dearth of notable diversity and a constrained scope of implementation. In this work, our key idea is… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  29. arXiv:2406.04744  [pdf, other

    cs.CL

    CRAG -- Comprehensive RAG Benchmark

    Authors: Xiao Yang, Kai Sun, Hao Xin, Yushi Sun, Nikita Bhalla, Xiangsen Chen, Sajal Choudhary, Rongze Daniel Gui, Ziran Will Jiang, Ziyu Jiang, Lingkun Kong, Brian Moran, Jiaqi Wang, Yifan Ethan Xu, An Yan, Chenyu Yang, Eting Yuan, Hanwen Zha, Nan Tang, Lei Chen, Nicolas Scheffer, Yue Liu, Nirav Shah, Rakesh Wanga, Anuj Kumar , et al. (2 additional authors not shown)

    Abstract: Retrieval-Augmented Generation (RAG) has recently emerged as a promising solution to alleviate Large Language Model (LLM)'s deficiency in lack of knowledge. Existing RAG datasets, however, do not adequately represent the diverse and dynamic nature of real-world Question Answering (QA) tasks. To bridge this gap, we introduce the Comprehensive RAG Benchmark (CRAG), a factual question answering bench… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  30. arXiv:2406.04325  [pdf, other

    cs.CV

    ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

    Authors: Lin Chen, Xilin Wei, Jinsong Li, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Zehui Chen, Haodong Duan, Bin Lin, Zhenyu Tang, Li Yuan, Yu Qiao, Dahua Lin, Feng Zhao, Jiaqi Wang

    Abstract: We present the ShareGPT4Video series, aiming to facilitate the video understanding of large video-language models (LVLMs) and the video generation of text-to-video models (T2VMs) via dense and precise captions. The series comprises: 1) ShareGPT4Video, 40K GPT4V annotated dense captions of videos with various lengths and sources, developed through carefully designed data filtering and annotating st… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Project Page: https://sharegpt4video.github.io/

  31. arXiv:2406.00671  [pdf, other

    cs.RO

    An Efficient Trajectory Generation for Bi-copter Flight in Tight Space

    Authors: Xin Dong, Yangjie Cui, Jingwu Xiang, Daochun Li, Zhan Tu

    Abstract: Unlike squared (or alike) quadrotors, elongated bi-copters leverage natural superiority in crossing tight spaces. To date, extensive works have focused on the design, modeling, and control of bi-copters. Besides, a proper motion planner utilizing bi-copters' shape characteristics is essential to efficiently and safely traverse tight spaces, yet it has rarely been studied. Current motion planning m… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: 8 pages,8 figures

  32. arXiv:2406.00093  [pdf, other

    cs.CV cs.AI cs.GR cs.LG cs.MM

    Bootstrap3D: Improving 3D Content Creation with Synthetic Data

    Authors: Zeyi Sun, Tong Wu, Pan Zhang, Yuhang Zang, Xiaoyi Dong, Yuanjun Xiong, Dahua Lin, Jiaqi Wang

    Abstract: Recent years have witnessed remarkable progress in multi-view diffusion models for 3D content creation. However, there remains a significant gap in image quality and prompt-following ability compared to 2D diffusion models. A critical bottleneck is the scarcity of high-quality 3D assets with detailed captions. To address this challenge, we propose Bootstrap3D, a novel framework that automatically… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

    Comments: Project Page: https://sunzey.github.io/Bootstrap3D/

  33. arXiv:2405.20799  [pdf, other

    stat.ML cs.LG

    Rough Transformers: Lightweight Continuous-Time Sequence Modelling with Path Signatures

    Authors: Fernando Moreno-Pino, Álvaro Arroyo, Harrison Waldon, Xiaowen Dong, Álvaro Cartea

    Abstract: Time-series data in real-world settings typically exhibit long-range dependencies and are observed at non-uniform intervals. In these settings, traditional sequence-based recurrent models struggle. To overcome this, researchers often replace recurrent architectures with Neural ODE-based models to account for irregularly sampled data and use Transformer-based architectures to account for long-range… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: Preprint. Under review. arXiv admin note: text overlap with arXiv:2403.10288

  34. arXiv:2405.18326  [pdf, other

    cs.CV

    VITON-DiT: Learning In-the-Wild Video Try-On from Human Dance Videos via Diffusion Transformers

    Authors: Jun Zheng, Fuwei Zhao, Youjiang Xu, Xin Dong, Xiaodan Liang

    Abstract: Video try-on stands as a promising area for its tremendous real-world potential. Prior works are limited to transferring product clothing images onto person videos with simple poses and backgrounds, while underperforming on casually captured videos. Recently, Sora revealed the scalability of Diffusion Transformer (DiT) in generating lifelike videos featuring real-world scenarios. Inspired by this,… ▽ More

    Submitted 7 June, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: Project Page: https://zhengjun-ai.github.io/viton-dit-page/

  35. arXiv:2405.17957  [pdf, other

    cs.CL cs.AI

    Modeling Dynamic Topics in Chain-Free Fashion by Evolution-Tracking Contrastive Learning and Unassociated Word Exclusion

    Authors: Xiaobao Wu, Xinshuai Dong, Liangming Pan, Thong Nguyen, Anh Tuan Luu

    Abstract: Dynamic topic models track the evolution of topics in sequential documents, which have derived various applications like trend analysis and opinion mining. However, existing models suffer from repetitive topic and unassociated topic issues, failing to reveal the evolution and hindering further applications. To address these issues, we break the tradition of simply chaining topics in existing work… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Accepted to ACL 2024 Findings

  36. arXiv:2405.17525  [pdf, ps, other

    cs.LG

    SmoothGNN: Smoothing-based GNN for Unsupervised Node Anomaly Detection

    Authors: Xiangyu Dong, Xingyi Zhang, Yanni Sun, Lei Chen, Mingxuan Yuan, Sibo Wang

    Abstract: The smoothing issue leads to indistinguishable node representations, which poses a significant challenge in the field of graph learning. However, this issue also presents an opportunity to reveal underlying properties behind different types of nodes, which have been overlooked in previous studies. Through empirical and theoretical analysis of real-world node anomaly detection (NAD) datasets, we ob… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  37. arXiv:2405.16009  [pdf, other

    cs.CV

    Streaming Long Video Understanding with Large Language Models

    Authors: Rui Qian, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Shuangrui Ding, Dahua Lin, Jiaqi Wang

    Abstract: This paper presents VideoStreaming, an advanced vision-language large model (VLLM) for video understanding, that capably understands arbitrary-length video with a constant number of video tokens streamingly encoded and adaptively selected. The challenge of video understanding in the vision language area mainly lies in the significant computational burden caused by the great number of tokens extrac… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  38. arXiv:2405.15540  [pdf, other

    cs.LG

    Bundle Neural Networks for message diffusion on graphs

    Authors: Jacob Bamberger, Federico Barbero, Xiaowen Dong, Michael Bronstein

    Abstract: The dominant paradigm for learning on graph-structured data is message passing. Despite being a strong inductive bias, the local message passing mechanism suffers from pathological issues such as over-smoothing, over-squashing, and limited node-level expressivity. To address these limitations we propose Bundle Neural Networks (BuNN), a new type of GNN that operates via message diffusion over flat… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  39. arXiv:2405.15119  [pdf, other

    cs.LG stat.ML

    Bayesian Optimization of Functions over Node Subsets in Graphs

    Authors: Huidong Liang, Xingchen Wan, Xiaowen Dong

    Abstract: We address the problem of optimizing over functions defined on node subsets in a graph. The optimization of such functions is often a non-trivial task given their combinatorial, black-box and expensive-to-evaluate nature. Although various algorithms have been introduced in the literature, most are either task-specific or computationally inefficient and only utilize information about the graph stru… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 26 pages with 20 figures

  40. arXiv:2405.14866  [pdf, other

    cs.CV

    Tele-Aloha: A Low-budget and High-authenticity Telepresence System Using Sparse RGB Cameras

    Authors: Hanzhang Tu, Ruizhi Shao, Xue Dong, Shunyuan Zheng, Hao Zhang, Lili Chen, Meili Wang, Wenyu Li, Siyan Ma, Shengping Zhang, Boyao Zhou, Yebin Liu

    Abstract: In this paper, we present a low-budget and high-authenticity bidirectional telepresence system, Tele-Aloha, targeting peer-to-peer communication scenarios. Compared to previous systems, Tele-Aloha utilizes only four sparse RGB cameras, one consumer-grade GPU, and one autostereoscopic screen to achieve high-resolution (2048x2048), real-time (30 fps), low-latency (less than 150ms) and robust distant… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: Paper accepted by SIGGRAPH 2024. Project page: http://118.178.32.38/c/Tele-Aloha/

  41. arXiv:2405.13094  [pdf, other

    cs.SI cs.AI cs.LG

    KPG: Key Propagation Graph Generator for Rumor Detection based on Reinforcement Learning

    Authors: Yusong Zhang, Kun Xie, Xingyi Zhang, Xiangyu Dong, Sibo Wang

    Abstract: The proliferation of rumors on social media platforms during significant events, such as the US elections and the COVID-19 pandemic, has a profound impact on social stability and public health. Existing approaches for rumor detection primarily rely on propagation graphs to enhance model effectiveness. However, the presence of noisy and irrelevant structures during the propagation process limits th… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  42. arXiv:2405.11671  [pdf, other

    cs.DS

    BYO: A Unified Framework for Benchmarking Large-Scale Graph Containers

    Authors: Brian Wheatman, Xiaojun Dong, Zheqi Shen, Laxman Dhulipala, Jakub Łącki, Prashant Pandey, Helen Xu

    Abstract: A fundamental building block in any graph algorithm is a graph container - a data structure used to represent the graph. Ideally, a graph container enables efficient access to the underlying graph, has low space usage, and supports updating the graph efficiently. In this paper, we conduct an extensive empirical evaluation of graph containers designed to support running algorithms on large graphs.… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

  43. arXiv:2405.11190  [pdf, other

    cs.CV

    ReasonPix2Pix: Instruction Reasoning Dataset for Advanced Image Editing

    Authors: Ying Jin, Pengyang Ling, Xiaoyi Dong, Pan Zhang, Jiaqi Wang, Dahua Lin

    Abstract: Instruction-based image editing focuses on equipping a generative model with the capacity to adhere to human-written instructions for editing images. Current approaches typically comprehend explicit and specific instructions. However, they often exhibit a deficiency in executing active reasoning capacities required to comprehend instructions that are implicit or insufficiently defined. To enhance… ▽ More

    Submitted 31 May, 2024; v1 submitted 18 May, 2024; originally announced May 2024.

  44. arXiv:2405.05134  [pdf, other

    cs.CY cs.AI cs.LG

    Enhancing Deep Knowledge Tracing via Diffusion Models for Personalized Adaptive Learning

    Authors: Ming Kuo, Shouvon Sarker, Lijun Qian, Yujian Fu, Xiangfang Li, Xishuang Dong

    Abstract: In contrast to pedagogies like evidence-based teaching, personalized adaptive learning (PAL) distinguishes itself by closely monitoring the progress of individual students and tailoring the learning path to their unique knowledge and requirements. A crucial technique for effective PAL implementation is knowledge tracing, which models students' evolving knowledge to predict their future performance… ▽ More

    Submitted 24 April, 2024; originally announced May 2024.

  45. arXiv:2405.03110  [pdf, other

    cs.IR

    Vector Quantization for Recommender Systems: A Review and Outlook

    Authors: Qijiong Liu, Xiaoyu Dong, Jiaren Xiao, Nuo Chen, Hengchang Hu, Jieming Zhu, Chenxu Zhu, Tetsuya Sakai, Xiao-Ming Wu

    Abstract: Vector quantization, renowned for its unparalleled feature compression capabilities, has been a prominent topic in signal processing and machine learning research for several decades and remains widely utilized today. With the emergence of large models and generative AI, vector quantization has gained popularity in recommender systems, establishing itself as a preferred solution. This paper starts… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  46. arXiv:2405.00448  [pdf, other

    cs.CV

    MMTryon: Multi-Modal Multi-Reference Control for High-Quality Fashion Generation

    Authors: Xujie Zhang, Ente Lin, Xiu Li, Yuxuan Luo, Michael Kampffmeyer, Xin Dong, Xiaodan Liang

    Abstract: This paper introduces MMTryon, a multi-modal multi-reference VIrtual Try-ON (VITON) framework, which can generate high-quality compositional try-on results by taking a text instruction and multiple garment images as inputs. Our MMTryon addresses three problems overlooked in prior literature: 1) Support of multiple try-on items. Existing methods are commonly designed for single-item try-on tasks (e… ▽ More

    Submitted 28 May, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

  47. arXiv:2404.19364  [pdf, other

    cs.CL

    Navigating Brain Language Representations: A Comparative Analysis of Neural Language Models and Psychologically Plausible Models

    Authors: Yunhao Zhang, Shaonan Wang, Xinyi Dong, Jiajun Yu, Chengqing Zong

    Abstract: Neural language models, particularly large-scale ones, have been consistently proven to be most effective in predicting brain neural activity across a range of studies. However, previous research overlooked the comparison of these models with psychologically plausible ones. Moreover, evaluations were reliant on limited, single-modality, and English cognitive datasets. To address these questions, w… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  48. arXiv:2404.19048  [pdf, other

    cs.CL cs.AI

    A Framework for Real-time Safeguarding the Text Generation of Large Language Model

    Authors: Ximing Dong, Dayi Lin, Shaowei Wang, Ahmed E. Hassan

    Abstract: Large Language Models (LLMs) have significantly advanced natural language processing (NLP) tasks but also pose ethical and societal risks due to their propensity to generate harmful content. To address this, various approaches have been developed to safeguard LLMs from producing unsafe content. However, existing methods have limitations, including the need for training specific control models and… ▽ More

    Submitted 1 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

  49. arXiv:2404.19019  [pdf, other

    cs.DS cs.DC

    Optimal Parallel Algorithms for Dendrogram Computation and Single-Linkage Clustering

    Authors: Laxman Dhulipala, Xiaojun Dong, Kishen N Gowda, Yan Gu

    Abstract: Computing a Single-Linkage Dendrogram (SLD) is a key step in the classic single-linkage hierarchical clustering algorithm. Given an input edge-weighted tree $T$, the SLD of $T$ is a binary dendrogram that summarizes the $n-1$ clusterings obtained by contracting the edges of $T$ in order of weight. Existing algorithms for computing the SLD all require $Ω(n\log n)$ work where $n = |T|$. Furthermore,… ▽ More

    Submitted 12 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

    Comments: To appear at SPAA 2024

  50. PASGAL: Parallel And Scalable Graph Algorithm Library

    Authors: Xiaojun Dong, Yan Gu, Yihan Sun, Letong Wang

    Abstract: In this paper, we introduce PASGAL (Parallel And Scalable Graph Algorithm Library), a parallel graph library that scales to a variety of graph types, many processors, and large graph sizes. One special focus of PASGAL is the efficiency on \textit{large-diameter graphs}, which is a common challenge for many existing parallel graph processing systems: many existing graph processing systems can be ev… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.