Skip to main content

Showing 1–50 of 4,878 results for author: Wang, H

  1. arXiv:2407.20078  [pdf, other

    cs.CV

    Background Semantics Matter: Cross-Task Feature Exchange Network for Clustered Infrared Small Target Detection With Sky-Annotated Dataset

    Authors: Yimian Dai, Mengxuan Xiao, Yiming Zhu, Huan Wang, Kehua Guo, Jian Yang

    Abstract: Infrared small target detection poses unique challenges due to the scarcity of intrinsic target features and the abundance of similar background distractors. We argue that background semantics play a pivotal role in distinguishing visually similar objects for this task. To address this, we introduce a new task -- clustered infrared small target detection, and present DenseSIRST, a novel benchmark… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  2. arXiv:2407.19841  [pdf, other

    eess.SP cs.AR

    RRAM-Based Bio-Inspired Circuits for Mobile Epileptic Correlation Extraction and Seizure Prediction

    Authors: Hao Wang, Lingfeng Zhang, Erjia Xiao, Xin Wang, Zhongrui Wang, Renjing Xu

    Abstract: Non-invasive mobile electroencephalography (EEG) acquisition systems have been utilized for long-term monitoring of seizures, yet they suffer from limited battery life. Resistive random access memory (RRAM) is widely used in computing-in-memory(CIM) systems, which offers an ideal platform for reducing the computational energy consumption of seizure prediction algorithms, potentially solving the en… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: 7 pages, 5 figures

  3. arXiv:2407.19829  [pdf, other

    cs.IR cs.AI

    Generative Retrieval with Preference Optimization for E-commerce Search

    Authors: Mingming Li, Huimu Wang, Zuxu Chen, Guangtao Nie, Yiming Qiu, Binbin Wang, Guoyu Tang, Lin Liu, Jingwei Zhuo

    Abstract: Generative retrieval introduces a groundbreaking paradigm to document retrieval by directly generating the identifier of a pertinent document in response to a specific query. This paradigm has demonstrated considerable benefits and potential, particularly in representation and generalization capabilities, within the context of large language models. However, it faces significant challenges in E-co… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  4. arXiv:2407.19787  [pdf, other

    cs.CV

    SciPostLayout: A Dataset for Layout Analysis and Layout Generation of Scientific Posters

    Authors: Shohei Tanaka, Hao Wang, Yoshitaka Ushiku

    Abstract: Scientific posters are used to present the contributions of scientific papers effectively in a graphical format. However, creating a well-designed poster that efficiently summarizes the core of a paper is both labor-intensive and time-consuming. A system that can automatically generate well-designed posters from scientific papers would reduce the workload of authors and help readers understand the… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: Accepted by BMVC2024

  5. arXiv:2407.19721  [pdf, other

    cs.NI cs.AI cs.DC

    Rina: Enhancing Ring-AllReduce with In-network Aggregation in Distributed Model Training

    Authors: Zixuan Chen, Xuandong Liu, Minglin Li, Yinfan Hu, Hao Mei, Huifeng Xing, Hao Wang, Wanxin Shi, Sen Liu, Yang Xu

    Abstract: Parameter Server (PS) and Ring-AllReduce (RAR) are two widely utilized synchronization architectures in multi-worker Deep Learning (DL), also referred to as Distributed Deep Learning (DDL). However, PS encounters challenges with the ``incast'' issue, while RAR struggles with problems caused by the long dependency chain. The emerging In-network Aggregation (INA) has been proposed to integrate with… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: To appear in ICNP 2024. Preview version only

  6. arXiv:2407.19514  [pdf, other

    cs.CV cs.MM

    Detached and Interactive Multimodal Learning

    Authors: Yunfeng Fan, Wenchao Xu, Haozhao Wang, Junhong Liu, Song Guo

    Abstract: Recently, Multimodal Learning (MML) has gained significant interest as it compensates for single-modality limitations through comprehensive complementary information within multimodal data. However, traditional MML methods generally use the joint learning framework with a uniform learning objective that can lead to the modality competition issue, where feedback predominantly comes from certain mod… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: Accepted by ACM MM 24

  7. arXiv:2407.19365  [pdf, other

    cs.CR

    Seamless Website Fingerprinting in Multiple Environments

    Authors: Chuxu Song, Zining Fan, Hao Wang, Richard Martin

    Abstract: Website fingerprinting (WF) attacks identify the websites visited over anonymized connections by analyzing patterns in network traffic flows, such as packet sizes, directions, or interval times using a machine learning classifier. Previous studies showed WF attacks achieve high classification accuracy. However, several issues call into question whether existing WF approaches are realizable in prac… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

    Comments: 16 pages

  8. arXiv:2407.18690  [pdf, other

    cs.AI

    Collaborative Evolving Strategy for Automatic Data-Centric Development

    Authors: Xu Yang, Haotian Chen, Wenjun Feng, Haoxue Wang, Zeqi Ye, Xinjie Shen, Xiao Yang, Shizhao Sun, Weiqing Liu, Jiang Bian

    Abstract: Artificial Intelligence (AI) significantly influences many fields, largely thanks to the vast amounts of high-quality data for machine learning models. The emphasis is now on a data-centric AI strategy, prioritizing data development over model design progress. Automating this process is crucial. In this paper, we serve as the first work to introduce the automatic data-centric development (AD^2) ta… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: 23 pages, 7 figures

  9. arXiv:2407.18512  [pdf, other

    cs.SE

    SPOLRE: Semantic Preserving Object Layout Reconstruction for Image Captioning System Testing

    Authors: Yi Liu, Guanyu Wang, Xinyi Zheng, Gelei Deng, Kailong Wang, Yang Liu, Haoyu Wang

    Abstract: Image captioning (IC) systems, such as Microsoft Azure Cognitive Service, translate image content into descriptive language but can generate inaccuracies leading to misinterpretations. Advanced testing techniques like MetaIC and ROME aim to address these issues but face significant challenges. These methods require intensive manual labor for detailed annotations and often produce unrealistic image… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

  10. arXiv:2407.18498  [pdf, other

    cs.CL cs.AI cs.LO

    A Reliable Common-Sense Reasoning Socialbot Built Using LLMs and Goal-Directed ASP

    Authors: Yankai Zeng, Abhiramon Rajashekharan, Kinjal Basu, Huaduo Wang, Joaquín Arias, Gopal Gupta

    Abstract: The development of large language models (LLMs), such as GPT, has enabled the construction of several socialbots, like ChatGPT, that are receiving a lot of attention for their ability to simulate a human conversation. However, the conversation is not guided by a goal and is hard to control. In addition, because LLMs rely more on pattern recognition than deductive reasoning, they can give confusing… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

  11. arXiv:2407.18488  [pdf, other

    cs.LG cs.IT stat.ML

    Conversational Dueling Bandits in Generalized Linear Models

    Authors: Shuhua Yang, Hui Yuan, Xiaoying Zhang, Mengdi Wang, Hong Zhang, Huazheng Wang

    Abstract: Conversational recommendation systems elicit user preferences by interacting with users to obtain their feedback on recommended commodities. Such systems utilize a multi-armed bandit framework to learn user preferences in an online manner and have received great success in recent years. However, existing conversational bandit methods have several limitations. First, they only enable users to provi… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  12. arXiv:2407.18487  [pdf, other

    cs.CV

    SMPISD-MTPNet: Scene Semantic Prior-Assisted Infrared Ship Detection Using Multi-Task Perception Networks

    Authors: Chen Hu, Xiaogang Dong, Yian Huang Lele Wang, Liang Xu, Tian Pu, Zhenming Peng

    Abstract: Infrared ship detection (IRSD) has received increasing attention in recent years due to the robustness of infrared images to adverse weather. However, a large number of false alarms may occur in complex scenes. To address these challenges, we propose the Scene Semantic Prior-Assisted Multi-Task Perception Network (SMPISD-MTPNet), which includes three stages: scene semantic extraction, deep feature… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  13. arXiv:2407.18170  [pdf, other

    cs.LG

    RIDA: A Robust Attack Framework on Incomplete Graphs

    Authors: Jianke Yu, Hanchen Wang, Chen Chen, Xiaoyang Wang, Wenjie Zhang, Ying Zhang

    Abstract: Graph Neural Networks (GNNs) are vital in data science but are increasingly susceptible to adversarial attacks. To help researchers develop more robust GNN models, it's essential to focus on designing strong attack models as foundational benchmarks and guiding references. Among adversarial attacks, gray-box poisoning attacks are noteworthy due to their effectiveness and fewer constraints. These at… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  14. arXiv:2407.17216  [pdf, other

    math.OC cs.LG

    An Adaptive Second-order Method for a Class of Nonconvex Nonsmooth Composite Optimization

    Authors: Hao Wang, Xiangyu Yang, Yichen Zhu

    Abstract: This paper explores a specific type of nonconvex sparsity-promoting regularization problems, namely those involving $\ell_p$-norm regularization, in conjunction with a twice continuously differentiable loss function. We propose a novel second-order algorithm designed to effectively address this class of challenging nonconvex and nonsmooth problems, showcasing several innovative features: (i) The u… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    MSC Class: 90C26; 49M15; 90C53

  15. arXiv:2407.16826  [pdf, other

    cs.CV

    SINDER: Repairing the Singular Defects of DINOv2

    Authors: Haoqi Wang, Tong Zhang, Mathieu Salzmann

    Abstract: Vision Transformer models trained on large-scale datasets, although effective, often exhibit artifacts in the patch token they extract. While such defects can be alleviated by re-training the entire model with additional classification tokens, the underlying reasons for the presence of these tokens remain unclear. In this paper, we conduct a thorough investigation of this phenomenon, combining the… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  16. arXiv:2407.16729  [pdf, other

    cs.LG cs.AI

    PateGail: A Privacy-Preserving Mobility Trajectory Generator with Imitation Learning

    Authors: Huandong Wang, Changzheng Gao, Yuchen Wu, Depeng Jin, Lina Yao, Yong Li

    Abstract: Generating human mobility trajectories is of great importance to solve the lack of large-scale trajectory data in numerous applications, which is caused by privacy concerns. However, existing mobility trajectory generation methods still require real-world human trajectories centrally collected as the training data, where there exists an inescapable risk of privacy leakage. To overcome this limitat… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  17. arXiv:2407.16406  [pdf, other

    cs.CV cs.LG

    Hi-EF: Benchmarking Emotion Forecasting in Human-interaction

    Authors: Haoran Wang, Xinji Mai, Zeng Tao, Yan Wang, Jiawen Yu, Ziheng Zhou, Xuan Tong, Shaoqi Yan, Qing Zhao, Shuyong Gao, Wenqiang Zhang

    Abstract: Affective Forecasting, a research direction in psychology that predicts individuals future emotions, is often constrained by numerous external factors like social influence and temporal distance. To address this, we transform Affective Forecasting into a Deep Learning problem by designing an Emotion Forecasting paradigm based on two-party interactions. We propose a novel Emotion Forecasting (EF) t… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  18. arXiv:2407.16248  [pdf, other

    cs.CV cs.MM

    Spatiotemporal Graph Guided Multi-modal Network for Livestreaming Product Retrieval

    Authors: Xiaowan Hu, Yiyi Chen, Yan Li, Minquan Wang, Haoqian Wang, Quan Chen, Han Li, Peng Jiang

    Abstract: With the rapid expansion of e-commerce, more consumers have become accustomed to making purchases via livestreaming. Accurately identifying the products being sold by salespeople, i.e., livestreaming product retrieval (LPR), poses a fundamental and daunting challenge. The LPR task encompasses three primary dilemmas in real-world scenarios: 1) the recognition of intended products from distractor pr… ▽ More

    Submitted 24 July, 2024; v1 submitted 23 July, 2024; originally announced July 2024.

    Comments: 9 pages, 12 figures

  19. arXiv:2407.16244  [pdf, other

    cs.CV cs.AI cs.MM

    HSVLT: Hierarchical Scale-Aware Vision-Language Transformer for Multi-Label Image Classification

    Authors: Shuyi Ouyang, Hongyi Wang, Ziwei Niu, Zhenjia Bai, Shiao Xie, Yingying Xu, Ruofeng Tong, Yen-Wei Chen, Lanfen Lin

    Abstract: The task of multi-label image classification involves recognizing multiple objects within a single image. Considering both valuable semantic information contained in the labels and essential visual features presented in the image, tight visual-linguistic interactions play a vital role in improving classification performance. Moreover, given the potential variance in object size and appearance with… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: 10 pages, 6 figures

    Journal ref: Proceedings of the 31st ACM International Conference on Multimedia. 2023: 4768-4777

  20. arXiv:2407.16131  [pdf, other

    cond-mat.mtrl-sci cs.LG physics.comp-ph

    Crystals with Transformers on Graphs, for Prediction of Unconventional Crystal Material Properties and the Benchmark

    Authors: Hongyi Wang, Ji Sun, Jinzhe Liang, Li Zhai, Zitian Tang, Zijian Li, Wei Zhai, Xusheng Wang, Weihao Gao, Sheng Gong, Bolong Huang, Hua Zhang

    Abstract: The ionic bonding across the lattice and ordered microscopic structures endow crystals with unique symmetry and determine their macroscopic properties. Unconventional crystals, in particular, exhibit non-traditional lattice structures or possess exotic physical properties, making them intriguing subjects for investigation. Therefore, to accurately predict the physical and chemical properties of cr… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  21. On Flange-based 3D Hand-Eye Calibration for Soft Robotic Tactile Welding

    Authors: Xudong Han, Ning Guo, Yu Jie, He Wang, Fang Wan, Chaoyang Song

    Abstract: This paper investigates the direct application of standardized designs on the robot for conducting robot hand-eye calibration by employing 3D scanners with collaborative robots. The well-established geometric features of the robot flange are exploited by directly capturing its point cloud data. In particular, an iterative method is proposed to facilitate point cloud processing toward a refined cal… ▽ More

    Submitted 27 July, 2024; v1 submitted 22 July, 2024; originally announced July 2024.

    Comments: 25 pages, 14 figures, 2 tables, Accepted by Measurement

  22. arXiv:2407.15590  [pdf, other

    cs.CV

    All rivers run into the sea: Unified Modality Brain-like Emotional Central Mechanism

    Authors: Xinji Mai, Junxiong Lin, Haoran Wang, Zeng Tao, Yan Wang, Shaoqi Yan, Xuan Tong, Jiawen Yu, Boyang Wang, Ziheng Zhou, Qing Zhao, Shuyong Gao, Wenqiang Zhang

    Abstract: In the field of affective computing, fully leveraging information from a variety of sensory modalities is essential for the comprehensive understanding and processing of human emotions. Inspired by the process through which the human brain handles emotions and the theory of cross-modal plasticity, we propose UMBEnet, a brain-like unified modal affective processing network. The primary design of UM… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  23. arXiv:2407.15524  [pdf, other

    cs.CR

    Towards Efficient Transferable Preemptive Adversarial Defense

    Authors: Hanrui Wang, Ching-Chun Chang, Chun-Shien Lu, Isao Echizen

    Abstract: Deep learning technology has brought convenience and advanced developments but has become untrustworthy because of its sensitivity to inconspicuous perturbations (i.e., adversarial attacks). Attackers utilize this sensitivity to slightly manipulate transmitted messages. To defend against such attacks, we have devised a strategy for "attacking" the message before it is attacked. This strategy, dubb… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: Under Review

  24. arXiv:2407.15476  [pdf, other

    cs.LG cs.IR

    MODRL-TA:A Multi-Objective Deep Reinforcement Learning Framework for Traffic Allocation in E-Commerce Search

    Authors: Peng Cheng, Huimu Wang, Jinyuan Zhao, Yihao Wang, Enqiang Xu, Yu Zhao, Zhuojian Xiao, Songlin Wang, Guoyu Tang, Lin Liu, Sulong Xu

    Abstract: Traffic allocation is a process of redistributing natural traffic to products by adjusting their positions in the post-search phase, aimed at effectively fostering merchant growth, precisely meeting customer demands, and ensuring the maximization of interests across various parties within e-commerce platforms. Existing methods based on learning to rank neglect the long-term value of traffic alloca… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  25. arXiv:2407.15389  [pdf, other

    cs.LG cs.CR cs.DC

    Poisoning with A Pill: Circumventing Detection in Federated Learning

    Authors: Hanxi Guo, Hao Wang, Tao Song, Tianhang Zheng, Yang Hua, Haibing Guan, Xiangyu Zhang

    Abstract: Without direct access to the client's data, federated learning (FL) is well-known for its unique strength in data privacy protection among existing distributed machine learning techniques. However, its distributive and iterative nature makes FL inherently vulnerable to various poisoning attacks. To counteract these threats, extensive defenses have been proposed to filter out malicious clients, usi… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  26. arXiv:2407.15355  [pdf, other

    cs.CV

    Attention Beats Linear for Fast Implicit Neural Representation Generation

    Authors: Shuyi Zhang, Ke Liu, Jingjun Gu, Xiaoxu Cai, Zhihua Wang, Jiajun Bu, Haishuai Wang

    Abstract: Implicit Neural Representation (INR) has gained increasing popularity as a data representation method, serving as a prerequisite for innovative generation models. Unlike gradient-based methods, which exhibit lower efficiency in inference, the adoption of hyper-network for generating parameters in Multi-Layer Perceptrons (MLP), responsible for executing INR functions, has surfaced as a promising an… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

    Comments: Accept by ECCV 2024

  27. arXiv:2407.15036  [pdf, other

    cs.LG cs.AI cs.CV

    AsyCo: An Asymmetric Dual-task Co-training Model for Partial-label Learning

    Authors: Beibei Li, Yiyuan Zheng, Beihong Jin, Tao Xiang, Haobo Wang, Lei Feng

    Abstract: Partial-Label Learning (PLL) is a typical problem of weakly supervised learning, where each training instance is annotated with a set of candidate labels. Self-training PLL models achieve state-of-the-art performance but suffer from error accumulation problem caused by mistakenly disambiguated instances. Although co-training can alleviate this issue by training two networks simultaneously and allo… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

    Comments: 15 pages, accepted by Science China, Information Science

  28. arXiv:2407.14829  [pdf, other

    cs.CL

    Overview of AI-Debater 2023: The Challenges of Argument Generation Tasks

    Authors: Jiayu Lin, Guanrong Chen, Bojun Jin, Chenyang Li, Shutong Jia, Wancong Lin, Yang Sun, Yuhang He, Caihua Yang, Jianzhu Bao, Jipeng Wu, Wen Su, Jinglu Chen, Xinyi Li, Tianyu Chen, Mingjie Han, Shuaiwen Du, Zijian Wang, Jiyin Li, Fuzhong Suo, Hao Wang, Nuanchen Lin, Xuanjing Huang, Changjian Jiang, RuiFeng Xu , et al. (4 additional authors not shown)

    Abstract: In this paper we present the results of the AI-Debater 2023 Challenge held by the Chinese Conference on Affect Computing (CCAC 2023), and introduce the related datasets. We organize two tracks to handle the argumentative generation tasks in different scenarios, namely, Counter-Argument Generation (Track 1) and Claim-based Argument Generation (Track 2). Each track is equipped with its distinct data… ▽ More

    Submitted 24 July, 2024; v1 submitted 20 July, 2024; originally announced July 2024.

  29. arXiv:2407.14816  [pdf, other

    cs.CV

    Blind Image Deconvolution by Generative-based Kernel Prior and Initializer via Latent Encoding

    Authors: Jiangtao Zhang, Zongsheng Yue, Hui Wang, Qian Zhao, Deyu Meng

    Abstract: Blind image deconvolution (BID) is a classic yet challenging problem in the field of image processing. Recent advances in deep image prior (DIP) have motivated a series of DIP-based approaches, demonstrating remarkable success in BID. However, due to the high non-convexity of the inherent optimization process, these methods are notorious for their sensitivity to the initialized kernel. To alleviat… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

    Comments: ECCV@2024. Code: https://github.com/jtaoz/GKPILE-Deconvolution

    ACM Class: I.4.4

  30. arXiv:2407.14769  [pdf, other

    cs.HC

    A Two-Phase Visualization System for Continuous Human-AI Collaboration in Sequelae Analysis and Modeling

    Authors: Yang Ouyang, Chenyang Zhang, He Wang, Tianle Ma, Chang Jiang, Yuheng Yan, Zuoqin Yan, Xiaojuan Ma, Chuhan Shi, Quan Li

    Abstract: In healthcare, AI techniques are widely used for tasks like risk assessment and anomaly detection. Despite AI's potential as a valuable assistant, its role in complex medical data analysis often oversimplifies human-AI collaboration dynamics. To address this, we collaborated with a local hospital, engaging six physicians and one data scientist in a formative study. From this collaboration, we prop… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

    Comments: To appear at the IEEE VIS Conference 2024

  31. arXiv:2407.14570  [pdf, other

    cs.CV

    Are handcrafted filters helpful for attributing AI-generated images?

    Authors: Jialiang Li, Haoyue Wang, Sheng Li, Zhenxing Qian, Xinpeng Zhang, Athanasios V. Vasilakos

    Abstract: Recently, a vast number of image generation models have been proposed, which raises concerns regarding the misuse of these artificial intelligence (AI) techniques for generating fake images. To attribute the AI-generated images, existing schemes usually design and train deep neural networks (DNNs) to learn the model fingerprints, which usually requires a large amount of data for effective learning… ▽ More

    Submitted 23 July, 2024; v1 submitted 19 July, 2024; originally announced July 2024.

    Comments: 9 pages, 5 figures

  32. arXiv:2407.14564  [pdf, ps, other

    eess.IV cs.AI cs.CV cs.LG

    APS-USCT: Ultrasound Computed Tomography on Sparse Data via AI-Physic Synergy

    Authors: Yi Sheng, Hanchen Wang, Yipei Liu, Junhuan Yang, Weiwen Jiang, Youzuo Lin, Lei Yang

    Abstract: Ultrasound computed tomography (USCT) is a promising technique that achieves superior medical imaging reconstruction resolution by fully leveraging waveform information, outperforming conventional ultrasound methods. Despite its advantages, high-quality USCT reconstruction relies on extensive data acquisition by a large number of transducers, leading to increased costs, computational demands, exte… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: MICCAI

  33. arXiv:2407.14507  [pdf, other

    cs.CL

    Internal Consistency and Self-Feedback in Large Language Models: A Survey

    Authors: Xun Liang, Shichao Song, Zifan Zheng, Hanyu Wang, Qingchen Yu, Xunkai Li, Rong-Hua Li, Feiyu Xiong, Zhiyu Li

    Abstract: Large language models (LLMs) are expected to respond accurately but often exhibit deficient reasoning or generate hallucinatory content. To address these, studies prefixed with ``Self-'' such as Self-Consistency, Self-Improve, and Self-Refine have been initiated. They share a commonality: involving LLMs evaluating and updating itself to mitigate the issues. Nonetheless, these efforts lack a unifie… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: 27 pages, 9 figures, 10 tables, 14 equations

  34. arXiv:2407.14478  [pdf, other

    cs.CV

    A review on vision-based motion estimation

    Authors: Hongyi Liu, Haifeng Wang

    Abstract: Compared to contact sensors-based motion measurement, vision-based motion measurement has advantages of low cost and high efficiency and have been under active development in the past decades. This paper provides a review on existing motion measurement methods. In addition to the development of each branch of vision-based motion measurement methods, this paper also discussed the advantages and dis… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: 2 figures

  35. arXiv:2407.14289  [pdf, other

    cs.RO

    Neuromuscular Modeling for Locomotion with Wearable Assistive Robots -- A primer

    Authors: Mohamed Irfan Refai, Huawei Wang, Antonio Gogeascoechea, Rafael Ornelas Kobayashi, Lucas A. Gaudio, Federica Damonte, Guillaume Durandau, Herman van der Kooij, Utku S. Yavuz, Massimo Sartori

    Abstract: Wearable assistive robots (WR) for the lower extremity are extensively documented in literature. Various interfaces have been designed to control these devices during gait and balance activities. However, achieving seamless and intuitive control requires accurate modeling of the human neuromusculoskeletal (NMSK) system. Such modeling enables WR to anticipate user intentions and determine the neces… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  36. arXiv:2407.14114  [pdf

    cs.SE cs.AI

    A3Rank: Augmentation Alignment Analysis for Prioritizing Overconfident Failing Samples for Deep Learning Models

    Authors: Zhengyuan Wei, Haipeng Wang, Qilin Zhou, W. K. Chan

    Abstract: Sharpening deep learning models by training them with examples close to the decision boundary is a well-known best practice. Nonetheless, these models are still error-prone in producing predictions. In practice, the inference of the deep learning models in many application systems is guarded by a rejector, such as a confidence-based rejector, to filter out samples with insufficient prediction conf… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  37. arXiv:2407.14007  [pdf, other

    cs.CV cs.AI

    Multi-modal Relation Distillation for Unified 3D Representation Learning

    Authors: Huiqun Wang, Yiping Bao, Panwang Pan, Zeming Li, Xiao Liu, Ruijie Yang, Di Huang

    Abstract: Recent advancements in multi-modal pre-training for 3D point clouds have demonstrated promising results by aligning heterogeneous features across 3D shapes and their corresponding 2D images and language descriptions. However, current straightforward solutions often overlook intricate structural relations among samples, potentially limiting the full capabilities of multi-modal learning. To address… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  38. arXiv:2407.13566  [pdf

    cs.CY cs.SI eess.SY

    Decentralised Governance for Autonomous Cyber-Physical Systems

    Authors: Kelsie Nabben, Hongyang Wang, Michael Zargham

    Abstract: This paper examines the potential for Cyber-Physical Systems (CPS) to be governed in a decentralised manner, whereby blockchain-based infrastructure facilitates the communication between digital and physical domains through self-governing and self-organising principles. Decentralised governance paradigms that integrate computation in physical domains (such as 'Decentralised Autonomous Organisation… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Report number: Dawo/2024/20

  39. arXiv:2407.13120  [pdf, other

    cs.CV math.OC

    HPPP: Halpern-type Preconditioned Proximal Point Algorithms and Applications to Image Restoration

    Authors: Shuchang Zhang, Hui Zhang, Hongxia Wang

    Abstract: Preconditioned Proximal Point (PPP) algorithms provide a unified framework for splitting methods in image restoration. Recent advancements with RED (Regularization by Denoising) and PnP (Plug-and-Play) priors have achieved state-of-the-art performance in this domain, emphasizing the need for a meaningful particular solution. However, degenerate PPP algorithms typically exhibit weak convergence in… ▽ More

    Submitted 21 July, 2024; v1 submitted 17 July, 2024; originally announced July 2024.

  40. arXiv:2407.13094  [pdf, other

    cs.CV

    Rethinking Video-Text Understanding: Retrieval from Counterfactually Augmented Data

    Authors: Wufei Ma, Kai Li, Zhongshi Jiang, Moustafa Meshry, Qihao Liu, Huiyu Wang, Christian Häne, Alan Yuille

    Abstract: Recent video-text foundation models have demonstrated strong performance on a wide variety of downstream video understanding tasks. Can these video-text models genuinely understand the contents of natural videos? Standard video-text evaluations could be misleading as many questions can be inferred merely from the objects and contexts in a single frame or biases inherent in the datasets. In this pa… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: ECCV 2024. Project page: https://feint6k.github.io

  41. arXiv:2407.12962  [pdf, other

    cs.RO

    NAS: N-step computation of All Solutions to the footstep planning problem

    Authors: Jiayi Wang, Saeid Samadi, Hefan Wang, Pierre Fernbach, Olivier Stasse, Sethu Vijayakumar, Steve Tonneau

    Abstract: How many ways are there to climb a staircase in a given number of steps? Infinitely many, if we focus on the continuous aspect of the problem. A finite, possibly large number if we consider the discrete aspect, i.e. on which surface which effectors are going to step and in what order. We introduce NAS, an algorithm that considers both aspects simultaneously and computes all the possible solutions… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Submitted to Humanoids 2024

  42. arXiv:2407.12883  [pdf, other

    cs.CL cs.AI cs.IR

    BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval

    Authors: Hongjin Su, Howard Yen, Mengzhou Xia, Weijia Shi, Niklas Muennighoff, Han-yu Wang, Haisu Liu, Quan Shi, Zachary S. Siegel, Michael Tang, Ruoxi Sun, Jinsung Yoon, Sercan O. Arik, Danqi Chen, Tao Yu

    Abstract: Existing retrieval benchmarks primarily consist of information-seeking queries (e.g., aggregated questions from search engines) where keyword or semantic-based retrieval is usually sufficient. However, many complex real-world queries require in-depth reasoning to identify relevant documents that go beyond surface form matching. For example, finding documentation for a coding question requires unde… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 50 pages

  43. arXiv:2407.12871  [pdf, other

    cs.CL cs.AI cs.LG

    MetaTool: Facilitating Large Language Models to Master Tools with Meta-task Augmentation

    Authors: Xiaohan Wang, Dian Li, Yilin Zhao, Sinbadliu, Hui Wang

    Abstract: Utilizing complex tools with Large Language Models (LLMs) is a critical component for grounding AI agents in various real-world scenarios. The core challenge of manipulating tools lies in understanding their usage and functionality. The prevailing approach involves few-shot prompting with demonstrations or fine-tuning on expert trajectories. However, for complex tools and tasks, mere in-context de… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 8 pages, 4 figures

  44. arXiv:2407.12821  [pdf, other

    cs.CL cs.AI cs.LG

    AutoFlow: Automated Workflow Generation for Large Language Model Agents

    Authors: Zelong Li, Shuyuan Xu, Kai Mei, Wenyue Hua, Balaji Rama, Om Raheja, Hao Wang, He Zhu, Yongfeng Zhang

    Abstract: Recent advancements in Large Language Models (LLMs) have shown significant progress in understanding complex natural language. One important application of LLM is LLM-based AI Agent, which leverages the ability of LLM as well as external tools for complex-task solving. To make sure LLM Agents follow an effective and reliable procedure to solve the given task, manually designed workflows are usuall… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Open source code available at https://github.com/agiresearch/AutoFlow

  45. arXiv:2407.12815  [pdf, ps, other

    cs.CL cs.LG

    SMLT-MUGC: Small, Medium, and Large Texts -- Machine versus User-Generated Content Detection and Comparison

    Authors: Anjali Rawal, Hui Wang, Youjia Zheng, Yu-Hsuan Lin, Shanu Sushmita

    Abstract: Large language models (LLMs) have gained significant attention due to their ability to mimic human language. Identifying texts generated by LLMs is crucial for understanding their capabilities and mitigating potential consequences. This paper analyzes datasets of varying text lengths: small, medium, and large. We compare the performance of machine learning algorithms on four datasets: (1) small (t… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

  46. arXiv:2407.12727  [pdf, other

    cs.CV

    NL2Contact: Natural Language Guided 3D Hand-Object Contact Modeling with Diffusion Model

    Authors: Zhongqun Zhang, Hengfei Wang, Ziwei Yu, Yihua Cheng, Angela Yao, Hyung Jin Chang

    Abstract: Modeling the physical contacts between the hand and object is standard for refining inaccurate hand poses and generating novel human grasp in 3D hand-object reconstruction. However, existing methods rely on geometric constraints that cannot be specified or controlled. This paper introduces a novel task of controllable 3D hand-object contact modeling with natural language descriptions. Challenges i… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  47. arXiv:2407.12532  [pdf, other

    cs.CL cs.AI

    Towards Collaborative Intelligence: Propagating Intentions and Reasoning for Multi-Agent Coordination with Large Language Models

    Authors: Xihe Qiu, Haoyu Wang, Xiaoyu Tan, Chao Qu, Yujie Xiong, Yuan Cheng, Yinghui Xu, Wei Chu, Yuan Qi

    Abstract: Effective collaboration in multi-agent systems requires communicating goals and intentions between agents. Current agent frameworks often suffer from dependencies on single-agent execution and lack robust inter-module communication, frequently leading to suboptimal multi-agent reinforcement learning (MARL) policies and inadequate task coordination. To address these challenges, we present a framewo… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  48. arXiv:2407.12522  [pdf, other

    cs.CL cs.AI

    Struct-X: Enhancing Large Language Models Reasoning with Structured Data

    Authors: Xiaoyu Tan, Haoyu Wang, Xihe Qiu, Yuan Cheng, Yinghui Xu, Wei Chu, Yuan Qi

    Abstract: Structured data, rich in logical and relational information, has the potential to enhance the reasoning abilities of large language models (LLMs). Still, its integration poses a challenge due to the risk of overwhelming LLMs with excessive tokens and irrelevant context information. To address this, we propose Struct-X, a novel framework that operates through five key phases: ``read-model-fill-refl… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  49. arXiv:2407.12443  [pdf, other

    cs.LG cs.CV

    Preventing Catastrophic Overfitting in Fast Adversarial Training: A Bi-level Optimization Perspective

    Authors: Zhaoxin Wang, Handing Wang, Cong Tian, Yaochu Jin

    Abstract: Adversarial training (AT) has become an effective defense method against adversarial examples (AEs) and it is typically framed as a bi-level optimization problem. Among various AT methods, fast AT (FAT), which employs a single-step attack strategy to guide the training process, can achieve good robustness against adversarial attacks at a low cost. However, FAT methods suffer from the catastrophic… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  50. arXiv:2407.12428  [pdf, other

    cs.SE

    Context-Aware Fuzzing for Robustness Enhancement of Deep Learning Models

    Authors: Haipeng Wang, Zhengyuan Wei, Qilin Zhou, Wing-Kwong Chan

    Abstract: In the testing-retraining pipeline for enhancing the robustness property of deep learning (DL) models, many state-of-the-art robustness-oriented fuzzing techniques are metric-oriented. The pipeline generates adversarial examples as test cases via such a DL testing technique and retrains the DL model under test with test suites that contain these test cases. On the one hand, the strategies of these… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: The official version of this paper is to appear in ACM Transactions on Software Engineering and Methodology (accepted in July 2024)