Skip to main content

Showing 1–50 of 1,304 results for author: Ma, X

  1. arXiv:2407.19976  [pdf, other

    cs.HC cs.MM

    MambaGesture: Enhancing Co-Speech Gesture Generation with Mamba and Disentangled Multi-Modality Fusion

    Authors: Chencan Fu, Yabiao Wang, Jiangning Zhang, Zhengkai Jiang, Xiaofeng Mao, Jiafu Wu, Weijian Cao, Chengjie Wang, Yanhao Ge, Yong Liu

    Abstract: Co-speech gesture generation is crucial for producing synchronized and realistic human gestures that accompany speech, enhancing the animation of lifelike avatars in virtual environments. While diffusion models have shown impressive capabilities, current approaches often overlook a wide range of modalities and their interactions, resulting in less dynamic and contextually varied gestures. To addre… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: Accepted to ACM MM 2024

  2. arXiv:2407.19533  [pdf, other

    cs.GR

    FreeShell: A Context-Free 4D Printing Technique for Fabricating Complex 3D Triangle Mesh Shells

    Authors: Chao Yuan, Nan Cao, Xuejiao Ma, Shengqi Dang

    Abstract: Freeform thin-shell surfaces are critical in various fields, but their fabrication is complex and costly. Traditional methods are wasteful and require custom molds, while 3D printing needs extensive support structures and post-processing. Thermoshrinkage actuated 4D printing is an effective method through flat structures fabricating 3D shell. However, existing research faces issues related to prec… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: This paper includes 12 pages and 19 figures

  3. arXiv:2407.18937  [pdf

    cs.IR cs.LG

    Advancements in Recommender Systems: A Comprehensive Analysis Based on Data, Algorithms, and Evaluation

    Authors: Xin Ma, Mingyue Li, Xuguang Liu

    Abstract: Using 286 research papers collected from Web of Science, ScienceDirect, SpringerLink, arXiv, and Google Scholar databases, a systematic review methodology was adopted to review and summarize the current challenges and potential future developments in data, algorithms, and evaluation aspects of RSs. It was found that RSs involve five major research topics, namely algorithmic improvement, domain app… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: 24 pages, 10 figures, 3 tables

  4. arXiv:2407.16993  [pdf, other

    cs.CV

    LoFormer: Local Frequency Transformer for Image Deblurring

    Authors: Xintian Mao, Jiansheng Wang, Xingran Xie, Qingli Li, Yan Wang

    Abstract: Due to the computational complexity of self-attention (SA), prevalent techniques for image deblurring often resort to either adopting localized SA or employing coarse-grained global SA methods, both of which exhibit drawbacks such as compromising global modeling or lacking fine-grained correlation. In order to address this issue by effectively modeling long-range dependencies without sacrificing f… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  5. arXiv:2407.16216  [pdf, other

    cs.CL

    A Comprehensive Survey of LLM Alignment Techniques: RLHF, RLAIF, PPO, DPO and More

    Authors: Zhichao Wang, Bin Bi, Shiva Kumar Pentyala, Kiran Ramnath, Sougata Chaudhuri, Shubham Mehrotra, Zixu, Zhu, Xiang-Bo Mao, Sitaram Asur, Na, Cheng

    Abstract: With advancements in self-supervised learning, the availability of trillions tokens in a pre-training corpus, instruction fine-tuning, and the development of large Transformers with billions of parameters, large language models (LLMs) are now capable of generating factual and coherent responses to human queries. However, the mixed quality of training data can lead to the generation of undesired re… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  6. arXiv:2407.15642  [pdf, other

    cs.CV

    Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models

    Authors: Xin Ma, Yaohui Wang, Gengyun Jia, Xinyuan Chen, Yuan-Fang Li, Cunjian Chen, Yu Qiao

    Abstract: Diffusion models have achieved great progress in image animation due to powerful generative capabilities. However, maintaining spatio-temporal consistency with detailed information from the input static image over time (e.g., style, background, and object of the input static image) and ensuring smoothness in animated video narratives guided by textual prompts still remains challenging. In this pap… ▽ More

    Submitted 22 July, 2024; v1 submitted 22 July, 2024; originally announced July 2024.

    Comments: Project webpage: https://maxin-cn.github.io/cinemo_project/

  7. arXiv:2407.15234  [pdf, other

    cs.NI

    Exploring the Design of Collaborative Applications via the Lens of NDN Workspace

    Authors: Tianyuan Yu, Xinyu Ma, Varun Patil, Yekta Kocaogullar, Lixia Zhang

    Abstract: Metaverse applications desire to communicate with semantically identified objects among a diverse set of cyberspace entities, such as cameras for collecting images from, sensors for sensing environment, and users collaborating with each other, all could be nearby or far away, in a timely and secure way. However, supporting the above function faces networking challenges. Today's metaverse implement… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

  8. arXiv:2407.15221  [pdf, other

    cs.NI cs.DC

    Secure Web Objects: Building Blocks for Metaverse Interoperability and Decentralization

    Authors: Tianyuan Yu, Xinyu Ma, Varun Patil, Yekta Kocaogullar, Yulong Zhang, Jeff Burke, Dirk Kutscher, Lixia Zhang

    Abstract: This position paper explores how to support the Web's evolution through an underlying data-centric approach that better matches the data-orientedness of modern and emerging applications. We revisit the original vision of the Web as a hypermedia system that supports document composability and application interoperability via name-based data access. We propose the use of secure web objects (SWO), a… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

    Comments: 9 pages

    ACM Class: H.3.5

  9. arXiv:2407.14769  [pdf, other

    cs.HC

    A Two-Phase Visualization System for Continuous Human-AI Collaboration in Sequelae Analysis and Modeling

    Authors: Yang Ouyang, Chenyang Zhang, He Wang, Tianle Ma, Chang Jiang, Yuheng Yan, Zuoqin Yan, Xiaojuan Ma, Chuhan Shi, Quan Li

    Abstract: In healthcare, AI techniques are widely used for tasks like risk assessment and anomaly detection. Despite AI's potential as a valuable assistant, its role in complex medical data analysis often oversimplifies human-AI collaboration dynamics. To address this, we collaborated with a local hospital, engaging six physicians and one data scientist in a formative study. From this collaboration, we prop… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

    Comments: To appear at the IEEE VIS Conference 2024

  10. arXiv:2407.14211  [pdf, other

    cs.LG

    Enhanced Mortality Prediction in ICU Stroke Patients via Deep Learning

    Authors: Armin Abdollahi, Xinghong Ma, Jiahao Zhang, Daijia Wu, Tongshou Wu, Zizheng Ye, Maryam Pishgar

    Abstract: Background: Stroke is second-leading cause of disability and death among adults. Approximately 17 million people suffer from a stroke annually, with about 85% being ischemic strokes. Predicting mortality of ischemic stroke patients in intensive care unit (ICU) is crucial for optimizing treatment strategies, allocating resources, and improving survival rates. Methods: We acquired data on ICU ischem… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  11. arXiv:2407.13596  [pdf, other

    cs.CV

    EarthMarker: Visual Prompt Learning for Region-level and Point-level Remote Sensing Imagery Comprehension

    Authors: Wei Zhang, Miaoxin Cai, Tong Zhang, Jun Li, Yin Zhuang, Xuerui Mao

    Abstract: Recent advances in visual prompting in the natural image area have allowed users to interact with artificial intelligence (AI) tools through various visual marks such as box, point, and free-form shapes. However, due to the significant difference between the natural and remote sensing (RS) images, existing visual prompting models face challenges in RS scenarios. Moreover, RS MLLMs mainly focus on… ▽ More

    Submitted 20 July, 2024; v1 submitted 18 July, 2024; originally announced July 2024.

  12. arXiv:2407.13268  [pdf, other

    cs.AI cs.LG

    Mixture of Experts based Multi-task Supervise Learning from Crowds

    Authors: Tao Han, Huaixuan Shi, Xinyi Ding, Xiao Ma, Huamao Gu, Yili Fang

    Abstract: Existing truth inference methods in crowdsourcing aim to map redundant labels and items to the ground truth. They treat the ground truth as hidden variables and use statistical or deep learning-based worker behavior models to infer the ground truth. However, worker behavior models that rely on ground truth hidden variables overlook workers' behavior at the item feature level, leading to imprecise… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  13. arXiv:2407.12267  [pdf, other

    cs.CV cs.GR

    Generating 3D House Wireframes with Semantics

    Authors: Xueqi Ma, Yilin Liu, Wenjun Zhou, Ruowei Wang, Hui Huang

    Abstract: We present a new approach for generating 3D house wireframes with semantic enrichment using an autoregressive model. Unlike conventional generative models that independently process vertices, edges, and faces, our approach employs a unified wire-based representation for improved coherence in learning 3D wireframe structures. By re-ordering wire sequences based on semantic meanings, we facilitate s… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: European Conference on Computer Vision (Proceedings of ECCV 2024); Project page: https://vcc.tech/research/2024/3DWire; GitHub repository: https://github.com/3d-house-wireframe/3d-house-wireframe-dataset

  14. arXiv:2407.12019  [pdf, other

    cs.CL cs.AI

    DIM: Dynamic Integration of Multimodal Entity Linking with Large Language Model

    Authors: Shezheng Song, Shasha Li, Jie Yu, Shan Zhao, Xiaopeng Li, Jun Ma, Xiaodong Liu, Zhuo Li, Xiaoguang Mao

    Abstract: Our study delves into Multimodal Entity Linking, aligning the mention in multimodal information with entities in knowledge base. Existing methods are still facing challenges like ambiguous entity representations and limited image information utilization. Thus, we propose dynamic entity extraction using ChatGPT, which dynamically extracts entities and enhances datasets. We also propose a method: Dy… ▽ More

    Submitted 27 June, 2024; originally announced July 2024.

    Comments: Published on PRCV24

  15. arXiv:2407.11626  [pdf

    cs.LG cs.NE

    Dynamic Dimension Wrapping (DDW) Algorithm: A Novel Approach for Efficient Cross-Dimensional Search in Dynamic Multidimensional Spaces

    Authors: Dongnan Jin, Yali Liu, Qiuzhi Song, Xunju Ma, Yue Liu, Dehao Wu

    Abstract: In the real world, as the complexity of optimization problems continues to increase, there is an urgent need to research more efficient optimization methods. Current optimization algorithms excel in solving problems with a fixed number of dimensions. However, their efficiency in searching dynamic multi-dimensional spaces is unsatisfactory. In response to the challenge of cross-dimensional search i… ▽ More

    Submitted 18 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

  16. arXiv:2407.10468  [pdf, other

    cs.SD cs.AI eess.AS

    LiteFocus: Accelerated Diffusion Inference for Long Audio Synthesis

    Authors: Zhenxiong Tan, Xinyin Ma, Gongfan Fang, Xinchao Wang

    Abstract: Latent diffusion models have shown promising results in audio generation, making notable advancements over traditional methods. However, their performance, while impressive with short audio clips, faces challenges when extended to longer audio sequences. These challenges are due to model's self-attention mechanism and training predominantly on 10-second clips, which complicates the extension to lo… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Interspeech 2024; Code: https://github.com/Yuanshi9815/LiteFocus

  17. arXiv:2407.10439  [pdf, other

    cs.CV

    PolyRoom: Room-aware Transformer for Floorplan Reconstruction

    Authors: Yuzhou Liu, Lingjie Zhu, Xiaodong Ma, Hanqiao Ye, Xiang Gao, Xianwei Zheng, Shuhan Shen

    Abstract: Reconstructing geometry and topology structures from raw unstructured data has always been an important research topic in indoor mapping research. In this paper, we aim to reconstruct the floorplan with a vectorized representation from point clouds. Despite significant advancements achieved in recent years, current methods still encounter several challenges, such as missing corners or edges, inacc… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  18. arXiv:2407.10366  [pdf, other

    cs.CV cs.AI cs.LG

    Accessing Vision Foundation Models at ImageNet-level Costs

    Authors: Yitian Zhang, Xu Ma, Yue Bai, Huan Wang, Yun Fu

    Abstract: Vision foundation models are renowned for their generalization ability due to massive training data. Nevertheless, they demand tremendous training resources, and the training data is often inaccessible, e.g., CLIP, DINOv2, posing great challenges to developing derivatives that could advance research in this field. In this work, we offer a very simple and general solution, named Proteus, to distill… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

  19. arXiv:2407.09247  [pdf, other

    cs.AI

    Constrained Intrinsic Motivation for Reinforcement Learning

    Authors: Xiang Zheng, Xingjun Ma, Chao Shen, Cong Wang

    Abstract: This paper investigates two fundamental problems that arise when utilizing Intrinsic Motivation (IM) for reinforcement learning in Reward-Free Pre-Training (RFPT) tasks and Exploration with Intrinsic Motivation (EIM) tasks: 1) how to design an effective intrinsic objective in RFPT tasks, and 2) how to reduce the bias introduced by the intrinsic objective in EIM tasks. Existing IM methods suffer fr… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: Accepted by IJCAI 2024

  20. arXiv:2407.09096  [pdf, other

    cs.LG cs.AI

    STD-LLM: Understanding Both Spatial and Temporal Properties of Spatial-Temporal Data with LLMs

    Authors: Yiheng Huang, Xiaowei Mao, Shengnan Guo, Yubin Chen, Youfang Lin, Huaiyu Wan

    Abstract: Spatial-temporal forecasting and imputation are important for real-world dynamic systems such as intelligent transportation, urban planning, and public health. Most existing methods are tailored for individual forecasting or imputation tasks but are not designed for both. Additionally, they are less effective for zero-shot and few-shot learning. While large language models (LLMs) have exhibited st… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  21. arXiv:2407.09013  [pdf, ps, other

    cs.AI cs.LG

    Procedural Content Generation via Generative Artificial Intelligence

    Authors: Xinyu Mao, Wanli Yu, Kazunori D Yamada, Michael R. Zielewski

    Abstract: The attempt to utilize machine learning in PCG has been made in the past. In this survey paper, we investigate how generative artificial intelligence (AI), which saw a significant increase in interest in the mid-2010s, is being used for PCG. We review applications of generative AI for the creation of various types of content, including terrains, items, and even storylines. While generative AI is e… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  22. arXiv:2407.08978  [pdf, other

    cs.CL cs.LG

    Towards Chapter-to-Chapter Context-Aware Literary Translation via Large Language Models

    Authors: Linghao Jin, Li An, Xuezhe Ma

    Abstract: Discourse phenomena in existing document-level translation datasets are sparse, which has been a fundamental obstacle in the development of context-aware machine translation models. Moreover, most existing document-level corpora and context-aware machine translation methods rely on an unrealistic assumption on sentence-level alignments. To mitigate these issues, we first curate a novel dataset of… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: Preprint

  23. arXiv:2407.08528  [pdf, other

    eess.IV cs.CV cs.MM

    Enhancing octree-based context models for point cloud geometry compression with attention-based child node number prediction

    Authors: Chang Sun, Hui Yuan, Xiaolong Mao, Xin Lu, Raouf Hamzaoui

    Abstract: In point cloud geometry compression, most octreebased context models use the cross-entropy between the onehot encoding of node occupancy and the probability distribution predicted by the context model as the loss. This approach converts the problem of predicting the number (a regression problem) and the position (a classification problem) of occupied child nodes into a 255-dimensional classificati… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 2 figures and 2 tables

    Journal ref: IEEE Signal Processing Letters, 2024

  24. arXiv:2407.07868  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Green Screen Augmentation Enables Scene Generalisation in Robotic Manipulation

    Authors: Eugene Teoh, Sumit Patidar, Xiao Ma, Stephen James

    Abstract: Generalising vision-based manipulation policies to novel environments remains a challenging area with limited exploration. Current practices involve collecting data in one location, training imitation learning or reinforcement learning policies with this data, and deploying the policy in the same location. However, this approach lacks scalability as it necessitates data collection in multiple loca… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Project website: https://greenaug.github.io/

  25. arXiv:2407.07791  [pdf, other

    cs.CL

    Flooding Spread of Manipulated Knowledge in LLM-Based Multi-Agent Communities

    Authors: Tianjie Ju, Yiting Wang, Xinbei Ma, Pengzhou Cheng, Haodong Zhao, Yulong Wang, Lifeng Liu, Jian Xie, Zhuosheng Zhang, Gongshen Liu

    Abstract: The rapid adoption of large language models (LLMs) in multi-agent systems has highlighted their impressive capabilities in various applications, such as collaborative problem-solving and autonomous negotiation. However, the security implications of these LLM-based multi-agent systems have not been thoroughly investigated, particularly concerning the spread of manipulated knowledge. In this paper,… ▽ More

    Submitted 22 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

    Comments: 18 Pages, working in progress

  26. arXiv:2407.07788  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    BiGym: A Demo-Driven Mobile Bi-Manual Manipulation Benchmark

    Authors: Nikita Chernyadev, Nicholas Backshall, Xiao Ma, Yunfan Lu, Younggyo Seo, Stephen James

    Abstract: We introduce BiGym, a new benchmark and learning environment for mobile bi-manual demo-driven robotic manipulation. BiGym features 40 diverse tasks set in home environments, ranging from simple target reaching to complex kitchen cleaning. To capture the real-world performance accurately, we provide human-collected demonstrations for each task, reflecting the diverse modalities found in real-world… ▽ More

    Submitted 11 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

    Comments: Project webpage: https://chernyadev.github.io/bigym/

  27. arXiv:2407.07325  [pdf, other

    cs.CV cs.CL cs.MM eess.IV

    HiLight: Technical Report on the Motern AI Video Language Model

    Authors: Zhiting Wang, Qiangong Zhou, Kangjie Yang, Zongyang Liu, Xin Mao

    Abstract: This technical report presents the implementation of a state-of-the-art video encoder for video-text modal alignment and a video conversation framework called HiLight, which features dual visual towers. The work is divided into two main parts: 1.alignment of video and text modalities; 2.convenient and efficient way to interact with users. Our goal is to address the task of video comprehension in t… ▽ More

    Submitted 11 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

  28. arXiv:2407.06841  [pdf, other

    cs.CV

    HTD-Mamba: Efficient Hyperspectral Target Detection with Pyramid State Space Model

    Authors: Dunbin Shen, Xuanbing Zhu, Jiacheng Tian, Jianjun Liu, Zhenrong Du, Hongyu Wang, Xiaorui Ma

    Abstract: Hyperspectral target detection (HTD) identifies objects of interest from complex backgrounds at the pixel level, playing a vital role in Earth observation. However, HTD faces challenges due to limited prior knowledge and spectral variation, leading to underfitting models and unreliable performance. To address these challenges, this paper proposes an efficient self-supervised HTD method with a pyra… ▽ More

    Submitted 17 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

    Comments: 13 pages,6 figures, 5 tables

  29. arXiv:2407.05282  [pdf, other

    cs.CV

    UltraEdit: Instruction-based Fine-Grained Image Editing at Scale

    Authors: Haozhe Zhao, Xiaojian Ma, Liang Chen, Shuzheng Si, Rujie Wu, Kaikai An, Peiyu Yu, Minjia Zhang, Qing Li, Baobao Chang

    Abstract: This paper presents UltraEdit, a large-scale (approximately 4 million editing samples), automatically generated dataset for instruction-based image editing. Our key idea is to address the drawbacks in existing image editing datasets like InstructPix2Pix and MagicBrush, and provide a systematic approach to producing massive and high-quality image editing samples. UltraEdit offers several distinct a… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: 32 pages, 14 figures

  30. arXiv:2407.05010  [pdf, other

    cs.CV

    PRANCE: Joint Token-Optimization and Structural Channel-Pruning for Adaptive ViT Inference

    Authors: Ye Li, Chen Tang, Yuan Meng, Jiajun Fan, Zenghao Chai, Xinzhu Ma, Zhi Wang, Wenwu Zhu

    Abstract: We introduce PRANCE, a Vision Transformer compression framework that jointly optimizes the activated channels and reduces tokens, based on the characteristics of inputs. Specifically, PRANCE~ leverages adaptive token optimization strategies for a certain computational budget, aiming to accelerate ViTs' inference from a unified data and architectural perspective. However, the joint framework poses… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  31. arXiv:2407.04616  [pdf, other

    cs.CV cs.AI cs.LG

    Isomorphic Pruning for Vision Models

    Authors: Gongfan Fang, Xinyin Ma, Michael Bi Mi, Xinchao Wang

    Abstract: Structured pruning reduces the computational overhead of deep neural networks by removing redundant sub-structures. However, assessing the relative importance of different sub-structures remains a significant challenge, particularly in advanced vision models featuring novel mechanisms and architectures like self-attention, depth-wise convolutions, or residual connections. These heterogeneous subst… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  32. arXiv:2407.03695  [pdf, other

    cs.CV

    M^3:Manipulation Mask Manufacturer for Arbitrary-Scale Super-Resolution Mask

    Authors: Xinyu Yang, Xiaochen Ma, Xuekang Zhu, Bo Du, Lei Su, Bingkui Tong, Zeyu Lei, Jizhe Zhou

    Abstract: In the field of image manipulation localization (IML), the small quantity and poor quality of existing datasets have always been major issues. A dataset containing various types of manipulations will greatly help improve the accuracy of IML models. Images on the internet (such as those on Baidu Tieba's PS Bar) are manipulated using various techniques, and creating a dataset from these images will… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  33. arXiv:2407.02813  [pdf, other

    cs.CV cs.AI cs.LG

    Data Overfitting for On-Device Super-Resolution with Dynamic Algorithm and Compiler Co-Design

    Authors: Gen Li, Zhihao Shu, Jie Ji, Minghai Qin, Fatemeh Afghah, Wei Niu, Xiaolong Ma

    Abstract: Deep neural networks (DNNs) are frequently employed in a variety of computer vision applications. Nowadays, an emerging trend in the current video distribution system is to take advantage of DNN's overfitting properties to perform video resolution upscaling. By splitting videos into chunks and applying a super-resolution (SR) model to overfit each chunk, this scheme of SR models plus video chunks… ▽ More

    Submitted 11 July, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

    Comments: ECCV2024

  34. arXiv:2407.02716  [pdf, other

    cs.CV cs.LG

    Light-weight Fine-tuning Method for Defending Adversarial Noise in Pre-trained Medical Vision-Language Models

    Authors: Xu Han, Linghao Jin, Xuezhe Ma, Xiaofeng Liu

    Abstract: Fine-tuning pre-trained Vision-Language Models (VLMs) has shown remarkable capabilities in medical image and textual depiction synergy. Nevertheless, many pre-training datasets are restricted by patient privacy concerns, potentially containing noise that can adversely affect downstream performance. Moreover, the growing reliance on multi-modal generation exacerbates this issue because of its susce… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  35. Dense Retrieval with Continuous Explicit Feedback for Systematic Review Screening Prioritisation

    Authors: Xinyu Mao, Shengyao Zhuang, Bevan Koopman, Guido Zuccon

    Abstract: The goal of screening prioritisation in systematic reviews is to identify relevant documents with high recall and rank them in early positions for review. This saves reviewing effort if paired with a stopping criterion, and speeds up review completion if performed alongside downstream tasks. Recent studies have shown that neural models have good potential on this task, but their time-consuming fin… ▽ More

    Submitted 17 July, 2024; v1 submitted 30 June, 2024; originally announced July 2024.

    Comments: Accepted at SIGIR 2024;typos corrected

  36. arXiv:2407.00114  [pdf, other

    cs.LG cs.AI cs.CL

    OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents

    Authors: Zihao Wang, Shaofei Cai, Zhancun Mu, Haowei Lin, Ceyao Zhang, Xuejie Liu, Qing Li, Anji Liu, Xiaojian Ma, Yitao Liang

    Abstract: We present OmniJARVIS, a novel Vision-Language-Action (VLA) model for open-world instruction-following agents in open-world Minecraft. Compared to prior works that either emit textual goals to separate controllers or produce the control command directly, OmniJARVIS seeks a different path to ensure both strong reasoning and efficient decision-making capabilities via unified tokenization of multimod… ▽ More

    Submitted 27 June, 2024; originally announced July 2024.

  37. arXiv:2406.19964  [pdf, other

    cs.CR

    Secure Outsourced Decryption for FHE-based Privacy-preserving Cloud Computing

    Authors: Xirong Ma, Chuan Li, Yuchang Hu, Yunting Tao, Yali Jiang, Yanbin Li, Fanyu Kong, Chunpeng Ge

    Abstract: The demand for processing vast volumes of data has surged dramatically due to the advancement of machine learning technology. Large-scale data processing necessitates substantial computational resources, prompting individuals and enterprises to turn to cloud services. Accompanying this trend is a growing concern regarding data leakage and misuse. Homomorphic encryption (HE) is one solution for saf… ▽ More

    Submitted 9 July, 2024; v1 submitted 28 June, 2024; originally announced June 2024.

    Comments: content and title updated

  38. arXiv:2406.19711  [pdf, other

    cs.LG

    CHASE: A Causal Heterogeneous Graph based Framework for Root Cause Analysis in Multimodal Microservice Systems

    Authors: Ziming Zhao, Tiehua Zhang, Zhishu Shen, Hai Dong, Xingjun Ma, Xianhui Liu, Yun Yang

    Abstract: In recent years, the widespread adoption of distributed microservice architectures within the industry has significantly increased the demand for enhanced system availability and robustness. Due to the complex service invocation paths and dependencies at enterprise-level microservice systems, it is challenging to locate the anomalies promptly during service invocations, thus causing intractable is… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  39. arXiv:2406.17923  [pdf, other

    cs.CL

    PAFT: A Parallel Training Paradigm for Effective LLM Fine-Tuning

    Authors: Shiva Kumar Pentyala, Zhichao Wang, Bin Bi, Kiran Ramnath, Xiang-Bo Mao, Regunathan Radhakrishnan, Sitaram Asur, Na, Cheng

    Abstract: Large language models (LLMs) have shown remarkable abilities in diverse natural language processing (NLP) tasks. The LLMs generally undergo supervised fine-tuning (SFT) followed by preference alignment to be usable in downstream applications. However, this sequential training pipeline leads to alignment tax that degrades the LLM performance. This paper introduces PAFT, a new PArallel training pa… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  40. arXiv:2406.17608  [pdf, other

    cs.CV

    Test-Time Generative Augmentation for Medical Image Segmentation

    Authors: Xiao Ma, Yuhui Tao, Yuhan Zhang, Zexuan Ji, Yizhe Zhang, Qiang Chen

    Abstract: In this paper, we propose a novel approach to enhance medical image segmentation during test time. Instead of employing hand-crafted transforms or functions on the input test image to create multiple views for test-time augmentation, we advocate for the utilization of an advanced domain-fine-tuned generative model (GM), e.g., stable diffusion (SD), for test-time augmentation. Given that the GM has… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 12pages, 2figures

  41. arXiv:2406.17343  [pdf, other

    cs.CV cs.AI

    Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers

    Authors: Lei Chen, Yuan Meng, Chen Tang, Xinzhu Ma, Jingyan Jiang, Xin Wang, Zhi Wang, Wenwu Zhu

    Abstract: Recent advancements in diffusion models, particularly the trend of architectural transformation from UNet-based Diffusion to Diffusion Transformer (DiT), have significantly improved the quality and scalability of image synthesis. Despite the incredible generative quality, the large computational requirements of these large-scale models significantly hinder the deployments in real-world scenarios.… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  42. arXiv:2406.16571  [pdf, other

    math.OC cs.AI cs.LG eess.SY

    Differentiable Distributionally Robust Optimization Layers

    Authors: Xutao Ma, Chao Ning, Wenli Du

    Abstract: In recent years, there has been a growing research interest in decision-focused learning, which embeds optimization problems as a layer in learning pipelines and demonstrates a superior performance than the prediction-focused approach. However, for distributionally robust optimization (DRO), a popular paradigm for decision-making under uncertainty, it is still unknown how to embed it as a layer, i… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: In Forty-first International Conference on Machine Learning (2024)

  43. arXiv:2406.16502  [pdf, other

    cs.CV

    LOGCAN++: Adaptive Local-global class-aware network for semantic segmentation of remote sensing imagery

    Authors: Xiaowen Ma, Rongrong Lian, Zhenkai Wu, Hongbo Guo, Mengting Ma, Sensen Wu, Zhenhong Du, Siyang Song, Wei Zhang

    Abstract: Remote sensing images usually characterized by complex backgrounds, scale and orientation variations, and large intra-class variance. General semantic segmentation methods usually fail to fully investigate the above issues, and thus their performances on remote sensing image segmentation are limited. In this paper, we propose our LOGCAN++, a semantic segmentation model customized for remote sensin… ▽ More

    Submitted 1 July, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

    Comments: Under Review

  44. Placing Timely Refreshing Services at the Network Edge

    Authors: Xishuo Li, Shan Zhang, Hongbin Luo, Xiao Ma, Junyi He

    Abstract: Accommodating services at the network edge is favorable for time-sensitive applications. However, maintaining service usability is resource-consuming in terms of pulling service images to the edge, synchronizing databases of service containers, and hot updates of service modules. Accordingly, it is critical to determine which service to place based on the received user requests and service refresh… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  45. arXiv:2406.15686  [pdf, other

    cs.CR cs.NI

    The Case for Transport-Level Encryption in Datacenter Networks

    Authors: Tianyi Gao, Xinshu Ma, Suhas Narreddy, Eugenio Luo, Steven W. D. Chien, Michio Honda

    Abstract: Cloud applications need network data encryption to isolate from other tenants and protect their data from potential eavesdroppers in the network infrastructure. This paper presents SDP, a protocol design for emerging datacenter transport protocols, such as pHost, NDP, and Homa, to integrate data encryption with the use of existing NIC offloading of cryptographic operations designed for TLS over TC… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  46. arXiv:2406.15320  [pdf, other

    cs.CV

    Rethinking Remote Sensing Change Detection With A Mask View

    Authors: Xiaowen Ma, Zhenkai Wu, Rongrong Lian, Wei Zhang, Siyang Song

    Abstract: Remote sensing change detection aims to compare two or more images recorded for the same area but taken at different time stamps to quantitatively and qualitatively assess changes in geographical entities and environmental factors. Mainstream models usually built on pixel-by-pixel change detection paradigms, which cannot tolerate the diversity of changes due to complex scenes and variation in imag… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Under review

  47. arXiv:2406.15319  [pdf, other

    cs.CL cs.AI

    LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs

    Authors: Ziyan Jiang, Xueguang Ma, Wenhu Chen

    Abstract: In traditional RAG framework, the basic retrieval units are normally short. The common retrievers like DPR normally work with 100-word Wikipedia paragraphs. Such a design forces the retriever to search over a large corpus to find the `needle' unit. In contrast, the readers only need to extract answers from the short retrieved units. Such an imbalanced `heavy' retriever and `light' reader design ca… ▽ More

    Submitted 30 June, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

    Comments: Technical Report

  48. Towards Timely Video Analytics Services at the Network Edge

    Authors: Xishuo Li, Shan Zhang, Yuejiao Huang, Xiao Ma, Zhiyuan Wang, Hongbin Luo

    Abstract: Real-time video analytics services aim to provide users with accurate recognition results timely. However, existing studies usually fall into the dilemma between reducing delay and improving accuracy. The edge computing scenario imposes strict transmission and computation resource constraints, making balancing these conflicting metrics under dynamic network conditions difficult. In this regard, we… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  49. arXiv:2406.14555  [pdf, other

    cs.CV

    A Survey of Multimodal-Guided Image Editing with Text-to-Image Diffusion Models

    Authors: Xincheng Shuai, Henghui Ding, Xingjun Ma, Rongcheng Tu, Yu-Gang Jiang, Dacheng Tao

    Abstract: Image editing aims to edit the given synthetic or real image to meet the specific requirements from users. It is widely studied in recent years as a promising and challenging field of Artificial Intelligence Generative Content (AIGC). Recent significant advancement in this field is based on the development of text-to-image (T2I) diffusion models, which generate images according to text prompts. Th… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Project Page: https://github.com/xinchengshuai/Awesome-Image-Editing

  50. arXiv:2406.14075  [pdf, other

    cs.CL

    EXCEEDS: Extracting Complex Events as Connecting the Dots to Graphs in Scientific Domain

    Authors: Yi-Fan Lu, Xian-Ling Mao, Bo Wang, Xiao Liu, Heyan Huang

    Abstract: It is crucial to utilize events to understand a specific domain. There are lots of research on event extraction in many domains such as news, finance and biology domain. However, scientific domain still lacks event extraction research, including comprehensive datasets and corresponding methods. Compared to other domains, scientific domain presents two characteristics: denser nuggets and more compl… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: This paper is working in process