Skip to main content

Showing 1–50 of 1,836 results for author: Wang, Q

  1. arXiv:2407.20183  [pdf, other

    cs.CL cs.AI

    MindSearch: Mimicking Human Minds Elicits Deep AI Searcher

    Authors: Zehui Chen, Kuikun Liu, Qiuchen Wang, Jiangning Liu, Wenwei Zhang, Kai Chen, Feng Zhao

    Abstract: Information seeking and integration is a complex cognitive task that consumes enormous time and effort. Inspired by the remarkable progress of Large Language Models, recent works attempt to solve this task by combining LLMs and search engines. However, these methods still obtain unsatisfying performance due to three challenges: (1) complex requests often cannot be accurately and completely retriev… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: Technical Report. Project Page: https://mindsearch.netlify.app Code: https://github.com/InternLM/MindSearch

  2. arXiv:2407.18879  [pdf, other

    cs.SD cs.LG eess.AS

    Utilizing TTS Synthesized Data for Efficient Development of Keyword Spotting Model

    Authors: Hyun Jin Park, Dhruuv Agarwal, Neng Chen, Rentao Sun, Kurt Partridge, Justin Chen, Harry Zhang, Pai Zhu, Jacob Bartel, Kyle Kastner, Gary Wang, Andrew Rosenberg, Quan Wang

    Abstract: This paper explores the use of TTS synthesized training data for KWS (keyword spotting) task while minimizing development cost and time. Keyword spotting models require a huge amount of training data to be accurate, and obtaining such training data can be costly. In the current state of the art, TTS models can generate large amounts of natural-sounding data, which can help reducing cost and time f… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: to be published in a Workshop at Interspeech 2024, Synthetic Data's Transformative Role in Foundational Speech Models

  3. arXiv:2407.18849  [pdf

    cs.SI cs.CY

    MNTD: An Efficient Dynamic Community Detector Based on Nonnegative Tensor Decomposition

    Authors: Hao Fang, Qu Wang, Qicong Hu, Hao Wu

    Abstract: Dynamic community detection is crucial for elucidating the temporal evolution of social structures, information dissemination, and interactive behaviors within complex networks. Nonnegative matrix factorization provides an efficient framework for identifying communities in static networks but fall short in depicting temporal variations in community affiliations. To solve this problem, this paper p… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: 10 pages, 5 figures,This paper will be published on 2024 IEEE International Conference on Systems, Man, and Cybernetics(SMC)

  4. arXiv:2407.17442  [pdf, other

    cs.CV

    AHMF: Adaptive Hybrid-Memory-Fusion Model for Driver Attention Prediction

    Authors: Dongyang Xu, Qingfan Wang, Ji Ma, Xiangyun Zeng, Lei Chen

    Abstract: Accurate driver attention prediction can serve as a critical reference for intelligent vehicles in understanding traffic scenes and making informed driving decisions. Though existing studies on driver attention prediction improved performance by incorporating advanced saliency detection techniques, they overlooked the opportunity to achieve human-inspired prediction by analyzing driving tasks from… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  5. arXiv:2407.17398  [pdf, other

    cs.CV

    3D Question Answering for City Scene Understanding

    Authors: Penglei Sun, Yaoxian Song, Xiang Liu, Xiaofei Yang, Qiang Wang, Tiefeng Li, Yang Yang, Xiaowen Chu

    Abstract: 3D multimodal question answering (MQA) plays a crucial role in scene understanding by enabling intelligent agents to comprehend their surroundings in 3D environments. While existing research has primarily focused on indoor household tasks and outdoor roadside autonomous driving tasks, there has been limited exploration of city-level scene understanding tasks. Furthermore, existing research faces c… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  6. arXiv:2407.17029  [pdf, other

    cs.LG

    Accurate and Efficient Fine-Tuning of Quantized Large Language Models Through Optimal Balance

    Authors: Ao Shen, Qiang Wang, Zhiquan Lai, Xionglve Li, Dongsheng Li

    Abstract: Large Language Models (LLMs) have demonstrated impressive performance across various domains. However, the enormous number of model parameters makes fine-tuning challenging, significantly limiting their application and deployment. Existing solutions combine parameter quantization with Low-Rank Adaptation (LoRA), greatly reducing memory usage but resulting in noticeable performance degradation. In… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  7. arXiv:2407.16840  [pdf, other

    eess.AS cs.AI

    Synth4Kws: Synthesized Speech for User Defined Keyword Spotting in Low Resource Environments

    Authors: Pai Zhu, Dhruuv Agarwal, Jacob W. Bartel, Kurt Partridge, Hyun Jin Park, Quan Wang

    Abstract: One of the challenges in developing a high quality custom keyword spotting (KWS) model is the lengthy and expensive process of collecting training data covering a wide range of languages, phrases and speaking styles. We introduce Synth4Kws - a framework to leverage Text to Speech (TTS) synthesized data for custom KWS in different resource settings. With no real data, we found increasing TTS phrase… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: 5 pages, 5 figures, 2 tables The paper is accepted in Interspeech SynData4GenAI 2024 Workshop - https://syndata4genai.org/#call-for-papers

  8. arXiv:2407.16554  [pdf, other

    cs.MM cs.CV cs.SD eess.AS

    Coarse-to-Fine Proposal Refinement Framework for Audio Temporal Forgery Detection and Localization

    Authors: Junyan Wu, Wei Lu, Xiangyang Luo, Rui Yang, Qian Wang, Xiaochun Cao

    Abstract: Recently, a novel form of audio partial forgery has posed challenges to its forensics, requiring advanced countermeasures to detect subtle forgery manipulations within long-duration audio. However, existing countermeasures still serve a classification purpose and fail to perform meaningful analysis of the start and end timestamps of partial forgery segments. To address this challenge, we introduce… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: 9pages, 3figures. This paper has been accepted for ACM MM 2024

    MSC Class: 68T07; 68T10 ACM Class: I.2; I.5

  9. arXiv:2407.16224  [pdf, other

    cs.CV

    OutfitAnyone: Ultra-high Quality Virtual Try-On for Any Clothing and Any Person

    Authors: Ke Sun, Jian Cao, Qi Wang, Linrui Tian, Xindi Zhang, Lian Zhuo, Bang Zhang, Liefeng Bo, Wenbo Zhou, Weiming Zhang, Daiheng Gao

    Abstract: Virtual Try-On (VTON) has become a transformative technology, empowering users to experiment with fashion without ever having to physically try on clothing. However, existing methods often struggle with generating high-fidelity and detail-consistent results. While diffusion models, such as Stable Diffusion series, have shown their capability in creating high-quality and photorealistic images, they… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: 10 pages, 13 figures

  10. arXiv:2407.15435  [pdf, other

    cs.CV

    Enhancement of 3D Gaussian Splatting using Raw Mesh for Photorealistic Recreation of Architectures

    Authors: Ruizhe Wang, Chunliang Hua, Tomakayev Shingys, Mengyuan Niu, Qingxin Yang, Lizhong Gao, Yi Zheng, Junyan Yang, Qiao Wang

    Abstract: The photorealistic reconstruction and rendering of architectural scenes have extensive applications in industries such as film, games, and transportation. It also plays an important role in urban planning, architectural design, and the city's promotion, especially in protecting historical and cultural relics. The 3D Gaussian Splatting, due to better performance over NeRF, has become a mainstream t… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  11. arXiv:2407.15346  [pdf, other

    cs.CV cs.CL cs.MM

    Knowledge Acquisition Disentanglement for Knowledge-based Visual Question Answering with Large Language Models

    Authors: Wenbin An, Feng Tian, Jiahao Nie, Wenkai Shi, Haonan Lin, Yan Chen, QianYing Wang, Yaqiang Wu, Guang Dai, Ping Chen

    Abstract: Knowledge-based Visual Question Answering (KVQA) requires both image and world knowledge to answer questions. Current methods first retrieve knowledge from the image and external knowledge base with the original complex question, then generate answers with Large Language Models (LLMs). However, since the original question contains complex elements that require knowledge from different sources, acq… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

    Comments: Pre-print

  12. arXiv:2407.14882  [pdf, other

    cs.LG cs.AI math.NA

    Reduced Effectiveness of Kolmogorov-Arnold Networks on Functions with Noise

    Authors: Haoran Shen, Chen Zeng, Jiahui Wang, Qiao Wang

    Abstract: It has been observed that even a small amount of noise introduced into the dataset can significantly degrade the performance of KAN. In this brief note, we aim to quantitatively evaluate the performance when noise is added to the dataset. We propose an oversampling technique combined with denoising to alleviate the impact of noise. Specifically, we employ kernel filtering based on diffusion maps f… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

    MSC Class: 68T07

  13. arXiv:2407.13764  [pdf, other

    cs.CV

    Shape of Motion: 4D Reconstruction from a Single Video

    Authors: Qianqian Wang, Vickie Ye, Hang Gao, Jake Austin, Zhengqi Li, Angjoo Kanazawa

    Abstract: Monocular dynamic reconstruction is a challenging and long-standing vision problem due to the highly ill-posed nature of the task. Existing approaches are limited in that they either depend on templates, are effective only in quasi-static scenes, or fail to model 3D motion explicitly. In this work, we introduce a method capable of reconstructing generic dynamic scenes, featuring explicit, full-seq… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  14. arXiv:2407.13598  [pdf, other

    cs.HC

    KNOWNET: Guided Health Information Seeking from LLMs via Knowledge Graph Integration

    Authors: Youfu Yan, Yu Hou, Yongkang Xiao, Rui Zhang, Qianwen Wang

    Abstract: The increasing reliance on Large Language Models (LLMs) for health information seeking can pose severe risks due to the potential for misinformation and the complexity of these topics. This paper introduces KNOWNET a visualization system that integrates LLMs with Knowledge Graphs (KG) to provide enhanced accuracy and structured exploration. Specifically, for enhanced accuracy, KNOWNET extracts tri… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 9 pages, 9 figures, accepted by IEEE VIS 2024

  15. arXiv:2407.13520  [pdf, other

    cs.CV

    EaDeblur-GS: Event assisted 3D Deblur Reconstruction with Gaussian Splatting

    Authors: Yuchen Weng, Zhengwen Shen, Ruofan Chen, Qi Wang, Jun Wang

    Abstract: 3D deblurring reconstruction techniques have recently seen significant advancements with the development of Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS). Although these techniques can recover relatively clear 3D reconstructions from blurry image inputs, they still face limitations in handling severe blurring and complex camera motion. To address these issues, we propose Event-ass… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  16. arXiv:2407.13480  [pdf

    cs.RO cs.AI

    Risk-Aware Vehicle Trajectory Prediction Under Safety-Critical Scenarios

    Authors: Qingfan Wang, Dongyang Xu, Gaoyuan Kuang, Chen Lv, Shengbo Eben Li, Bingbing Nie

    Abstract: Trajectory prediction is significant for intelligent vehicles to achieve high-level autonomous driving, and a lot of relevant research achievements have been made recently. Despite the rapid development, most existing studies solely focused on normal safe scenarios while largely neglecting safety-critical scenarios, particularly those involving imminent collisions. This oversight may result in aut… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  17. arXiv:2407.13246  [pdf, other

    cs.CV

    STS MICCAI 2023 Challenge: Grand challenge on 2D and 3D semi-supervised tooth segmentation

    Authors: Yaqi Wang, Yifan Zhang, Xiaodiao Chen, Shuai Wang, Dahong Qian, Fan Ye, Feng Xu, Hongyuan Zhang, Qianni Zhang, Chengyu Wu, Yunxiang Li, Weiwei Cui, Shan Luo, Chengkai Wang, Tianhao Li, Yi Liu, Xiang Feng, Huiyu Zhou, Dongyun Liu, Qixuan Wang, Zhouhao Lin, Wei Song, Yuanlin Li, Bing Wang, Chunshi Wang , et al. (2 additional authors not shown)

    Abstract: Computer-aided design (CAD) tools are increasingly popular in modern dental practice, particularly for treatment planning or comprehensive prognosis evaluation. In particular, the 2D panoramic X-ray image efficiently detects invisible caries, impacted teeth and supernumerary teeth in children, while the 3D dental cone beam computed tomography (CBCT) is widely used in orthodontics and endodontics d… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  18. arXiv:2407.13096  [pdf, other

    cs.PF

    DSO: A GPU Energy Efficiency Optimizer by Fusing Dynamic and Static Information

    Authors: Qiang Wang, Laiyi Li, Weile Luo, Yijia Zhang, Bingqiang Wang

    Abstract: Increased reliance on graphics processing units (GPUs) for high-intensity computing tasks raises challenges regarding energy consumption. To address this issue, dynamic voltage and frequency scaling (DVFS) has emerged as a promising technique for conserving energy while maintaining the quality of service (QoS) of GPU applications. However, existing solutions using DVFS are hindered by inefficiency… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  19. arXiv:2407.13088  [pdf, other

    cs.DC

    Scheduling Deep Learning Jobs in Multi-Tenant GPU Clusters via Wise Resource Sharing

    Authors: Yizhou Luo, Qiang Wang, Shaohuai Shi, Jiaxin Lai, Shuhan Qi, Jiajia Zhang, Xuan Wang

    Abstract: Deep learning (DL) has demonstrated significant success across diverse fields, leading to the construction of dedicated GPU accelerators within GPU clusters for high-quality training services. Efficient scheduler designs for such clusters are vital to reduce operational costs and enhance resource utilization. While recent schedulers have shown impressive performance in optimizing DL job performanc… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  20. arXiv:2407.12899  [pdf, other

    cs.CV cs.AI cs.MM

    DreamStory: Open-Domain Story Visualization by LLM-Guided Multi-Subject Consistent Diffusion

    Authors: Huiguo He, Huan Yang, Zixi Tuo, Yuan Zhou, Qiuyue Wang, Yuhang Zhang, Zeyu Liu, Wenhao Huang, Hongyang Chao, Jian Yin

    Abstract: Story visualization aims to create visually compelling images or videos corresponding to textual narratives. Despite recent advances in diffusion models yielding promising results, existing methods still struggle to create a coherent sequence of subject-consistent frames based solely on a story. To this end, we propose DreamStory, an automatic open-domain story visualization framework by leveragin… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  21. arXiv:2407.11282  [pdf, other

    cs.CL

    Uncertainty is Fragile: Manipulating Uncertainty in Large Language Models

    Authors: Qingcheng Zeng, Mingyu Jin, Qinkai Yu, Zhenting Wang, Wenyue Hua, Zihao Zhou, Guangyan Sun, Yanda Meng, Shiqing Ma, Qifan Wang, Felix Juefei-Xu, Kaize Ding, Fan Yang, Ruixiang Tang, Yongfeng Zhang

    Abstract: Large Language Models (LLMs) are employed across various high-stakes domains, where the reliability of their outputs is crucial. One commonly used method to assess the reliability of LLMs' responses is uncertainty estimation, which gauges the likelihood of their answers being correct. While many studies focus on improving the accuracy of uncertainty estimations for LLMs, our research investigates… ▽ More

    Submitted 19 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

  22. arXiv:2407.11277  [pdf, other

    cs.CL eess.AS

    Target conversation extraction: Source separation using turn-taking dynamics

    Authors: Tuochao Chen, Qirui Wang, Bohan Wu, Malek Itani, Emre Sefik Eskimez, Takuya Yoshioka, Shyamnath Gollakota

    Abstract: Extracting the speech of participants in a conversation amidst interfering speakers and noise presents a challenging problem. In this paper, we introduce the novel task of target conversation extraction, where the goal is to extract the audio of a target conversation based on the speaker embedding of one of its participants. To accomplish this, we propose leveraging temporal patterns inherent in h… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted by Interspeech 2024

  23. arXiv:2407.11098  [pdf, other

    cs.LG cs.AI

    Inertial Confinement Fusion Forecasting via LLMs

    Authors: Mingkai Chen, Taowen Wang, James Chenhao Liang, Chuan Liu, Chunshu Wu, Qifan Wang, Ying Nian Wu, Michael Huang, Chuang Ren, Ang Li, Tong Geng, Dongfang Liu

    Abstract: Controlled fusion energy is deemed pivotal for the advancement of human civilization. In this study, we introduce $\textbf{Fusion-LLM}$, a novel integration of Large Language Models (LLMs) with classical reservoir computing paradigms tailored to address challenges in Inertial Confinement Fusion ($\texttt{ICF}$). Our approach offers several key contributions: Firstly, we propose the… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  24. arXiv:2407.10897  [pdf, other

    physics.optics cs.CV cs.LG

    Optical Diffusion Models for Image Generation

    Authors: Ilker Oguz, Niyazi Ulas Dinc, Mustafa Yildirim, Junjie Ke, Innfarn Yoo, Qifei Wang, Feng Yang, Christophe Moser, Demetri Psaltis

    Abstract: Diffusion models generate new samples by progressively decreasing the noise from the initially provided random distribution. This inference procedure generally utilizes a trained neural network numerous times to obtain the final output, creating significant latency and energy consumption on digital electronic hardware such as GPUs. In this study, we demonstrate that the propagation of a light beam… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 14 pages, 6 figures

  25. arXiv:2407.10575  [pdf, other

    cs.CV

    A Survey of Defenses against AI-generated Visual Media: Detection, Disruption, and Authentication

    Authors: Jingyi Deng, Chenhao Lin, Zhengyu Zhao, Shuai Liu, Qian Wang, Chao Shen

    Abstract: Deep generative models have demonstrated impressive performance in various computer vision applications, including image synthesis, video generation, and medical analysis. Despite their significant advancements, these models may be used for malicious purposes, such as misinformation, deception, and copyright violation. In this paper, we provide a systematic and timely review of research efforts on… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  26. arXiv:2407.10281  [pdf, other

    cs.CV

    Beyond Prompt Learning: Continual Adapter for Efficient Rehearsal-Free Continual Learning

    Authors: Xinyuan Gao, Songlin Dong, Yuhang He, Qiang Wang, Yihong Gong

    Abstract: The problem of Rehearsal-Free Continual Learning (RFCL) aims to continually learn new knowledge while preventing forgetting of the old knowledge, without storing any old samples and prototypes. The latest methods leverage large-scale pre-trained models as the backbone and use key-query matching to generate trainable prompts to learn new knowledge. However, the domain gap between the pre-training d… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: ECCV2024

  27. arXiv:2407.10048  [pdf, other

    cs.SD eess.AS

    Whisper-SV: Adapting Whisper for Low-data-resource Speaker Verification

    Authors: Li Zhang, Ning Jiang, Qing Wang, Yue Li, Quan Lu, Lei Xie

    Abstract: Trained on 680,000 hours of massive speech data, Whisper is a multitasking, multilingual speech foundation model demonstrating superior performance in automatic speech recognition, translation, and language identification. However, its applicability in speaker verification (SV) tasks remains unexplored, particularly in low-data-resource scenarios where labeled speaker data in specific domains are… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  28. arXiv:2407.09790  [pdf, other

    cs.LG

    Team up GBDTs and DNNs: Advancing Efficient and Effective Tabular Prediction with Tree-hybrid MLPs

    Authors: Jiahuan Yan, Jintai Chen, Qianxing Wang, Danny Z. Chen, Jian Wu

    Abstract: Tabular datasets play a crucial role in various applications. Thus, developing efficient, effective, and widely compatible prediction algorithms for tabular data is important. Currently, two prominent model types, Gradient Boosted Decision Trees (GBDTs) and Deep Neural Networks (DNNs), have demonstrated performance advantages on distinct tabular prediction tasks. However, selecting an effective mo… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: Accepted at KDD 2024 Research Track, codes will be available at https://github.com/jyansir/tmlp

  29. arXiv:2407.09580  [pdf, other

    cs.CV cs.AI

    Don't Fear Peculiar Activation Functions: EUAF and Beyond

    Authors: Qianchao Wang, Shijun Zhang, Dong Zeng, Zhaoheng Xie, Hengtao Guo, Feng-Lei Fan, Tieyong Zeng

    Abstract: In this paper, we propose a new super-expressive activation function called the Parametric Elementary Universal Activation Function (PEUAF). We demonstrate the effectiveness of PEUAF through systematic and comprehensive experiments on various industrial and image datasets, including CIFAR10, Tiny-ImageNet, and ImageNet. Moreover, we significantly generalize the family of super-expressive activatio… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  30. arXiv:2407.09546  [pdf, other

    q-fin.TR cs.SI

    A Reflective LLM-based Agent to Guide Zero-shot Cryptocurrency Trading

    Authors: Yuan Li, Bingqiao Luo, Qian Wang, Nuo Chen, Xu Liu, Bingsheng He

    Abstract: The utilization of Large Language Models (LLMs) in financial trading has primarily been concentrated within the stock market, aiding in economic and financial decisions. Yet, the unique opportunities presented by the cryptocurrency market, noted for its on-chain data's transparency and the critical influence of off-chain signals like news, remain largely untapped by LLMs. This work aims to bridge… ▽ More

    Submitted 27 June, 2024; originally announced July 2024.

  31. arXiv:2407.08733  [pdf, other

    cs.CL

    Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist

    Authors: Zihao Zhou, Shudong Liu, Maizhen Ning, Wei Liu, Jindong Wang, Derek F. Wong, Xiaowei Huang, Qiufeng Wang, Kaizhu Huang

    Abstract: Exceptional mathematical reasoning ability is one of the key features that demonstrate the power of large language models (LLMs). How to comprehensively define and evaluate the mathematical abilities of LLMs, and even reflect the user experience in real-world scenarios, has emerged as a critical issue. Current benchmarks predominantly concentrate on problem-solving capabilities, which presents a s… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 35 pages, 10 figures, preprint

  32. arXiv:2407.08665  [pdf

    cond-mat.mtrl-sci cs.ET

    Superparamagnetic Tunnel Junctions for Reliable True Randomness

    Authors: Dooyong Koh, Qiuyuan Wang, Brooke C. McGoldrick, Luqiao Liu, Marc A. Baldo

    Abstract: Stochastic devices have the potential to disrupt computing, revolutionizing low-power machine learning acceleration, probabilistic computing, and hardware security. As implemented, however, superparamagnetic tunnel junctions (sMTJs) face significant challenges including the need for external magnetic fields, and poor reliability and scalability. Here, we present experimental demonstration of three… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  33. arXiv:2407.07339  [pdf, other

    cs.CR

    TDML -- A Trustworthy Distributed Machine Learning Framework

    Authors: Zhen Wang, Qin Wang, Guangsheng Yu, Shiping Chen

    Abstract: Recent years have witnessed a surge in deep learning research, marked by the introduction of expansive generative models like OpenAI's SORA and GPT, Meta AI's LLAMA series, and Google's FLAN, BART, and Gemini models. However, the rapid advancement of large models (LM) has intensified the demand for computing resources, particularly GPUs, which are crucial for their parallel processing capabilities… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  34. Tight Quantum Depth Lower Bound for Solving Systems of Linear Equations

    Authors: Qisheng Wang, Zhicheng Zhang

    Abstract: Since Harrow, Hassidim, and Lloyd (2009) showed that a system of linear equations with $N$ variables and condition number $κ$ can be solved on a quantum computer in $\operatorname{poly}(\log(N), κ)$ time, exponentially faster than any classical algorithms, its improvements and applications have been extensively investigated. The state-of-the-art quantum algorithm for this problem is due to Costa,… ▽ More

    Submitted 13 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

    Comments: Minor corrections to references in [v1]. 22 pages, 1 table. Close to the official version

    Journal ref: Physical Review A, 110(1): 012422, 2024

  35. arXiv:2407.05895  [pdf, other

    cs.LG stat.ML

    Link Representation Learning for Probabilistic Travel Time Estimation

    Authors: Chen Xu, Qiang Wang, Lijun Sun

    Abstract: Travel time estimation is a crucial application in navigation apps and web mapping services. Current deterministic and probabilistic methods primarily focus on modeling individual trips, assuming independence among trips. However, in real-world scenarios, we often observe strong inter-trip correlations due to factors such as weather conditions, traffic management, and road works. In this paper, we… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  36. arXiv:2407.05300  [pdf, ps, other

    cs.RO

    Space Adaptive Search for Nonholonomic Mobile Robots Path Planning

    Authors: Qi Wang

    Abstract: Path planning for a nonholonomic mobile robot is a challenging problem. This paper proposes a novel space adaptive search (SAS) approach that greatly reduces the computation cost of nonholonomic mobile robot path planning. The classic search-based path planning only updates the state on the current location in each step, which is very inefficient, and, therefore, can easily be trapped by local min… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: 12 pages, 62 figures

    MSC Class: 68T20 ACM Class: I.2.8

  37. arXiv:2407.05106  [pdf, other

    cs.CV

    DailyDVS-200: A Comprehensive Benchmark Dataset for Event-Based Action Recognition

    Authors: Qi Wang, Zhou Xu, Yuming Lin, Jingtao Ye, Hongsheng Li, Guangming Zhu, Syed Afaq Ali Shah, Mohammed Bennamoun, Liang Zhang

    Abstract: Neuromorphic sensors, specifically event cameras, revolutionize visual data acquisition by capturing pixel intensity changes with exceptional dynamic range, minimal latency, and energy efficiency, setting them apart from conventional frame-based cameras. The distinctive capabilities of event cameras have ignited significant interest in the domain of event-based action recognition, recognizing thei… ▽ More

    Submitted 13 July, 2024; v1 submitted 6 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  38. arXiv:2407.04208  [pdf, other

    cs.CV

    AMD: Automatic Multi-step Distillation of Large-scale Vision Models

    Authors: Cheng Han, Qifan Wang, Sohail A. Dianat, Majid Rabbani, Raghuveer M. Rao, Yi Fang, Qiang Guan, Lifu Huang, Dongfang Liu

    Abstract: Transformer-based architectures have become the de-facto standard models for diverse vision tasks owing to their superior performance. As the size of the models continues to scale up, model distillation becomes extremely important in various real applications, particularly on devices limited by computational resources. However, prevailing knowledge distillation methods exhibit diminished efficacy… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: 19 pages, 5 figures

  39. arXiv:2407.03604  [pdf, other

    cs.CL cs.CV

    Lateralization LoRA: Interleaved Instruction Tuning with Modality-Specialized Adaptations

    Authors: Zhiyang Xu, Minqian Liu, Ying Shen, Joy Rimchala, Jiaxin Zhang, Qifan Wang, Yu Cheng, Lifu Huang

    Abstract: Recent advancements in Vision-Language Models (VLMs) have led to the development of Vision-Language Generalists (VLGs) capable of understanding and generating interleaved images and text. Despite these advances, VLGs still struggle to follow user instructions for interleaved text and image generation. To address this issue, we introduce LeafInstruct, the first open-sourced interleaved instruction… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: 8 Pages, visual instruction tuning, parameter-efficient tuning

  40. arXiv:2407.03037  [pdf, other

    cs.SE

    Vision-driven Automated Mobile GUI Testing via Multimodal Large Language Model

    Authors: Zhe Liu, Cheng Li, Chunyang Chen, Junjie Wang, Boyu Wu, Yawen Wang, Jun Hu, Qing Wang

    Abstract: With the advancement of software rendering techniques, GUI pages in mobile apps now encompass a wealth of visual information, where the visual semantics of each page contribute to the overall app logic, presenting new challenges to software testing. Despite the progress in automated Graphical User Interface (GUI) testing, the absence of testing oracles has constrained its efficacy to identify only… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  41. arXiv:2407.02846  [pdf, other

    cs.CV

    Multi-Task Domain Adaptation for Language Grounding with 3D Objects

    Authors: Penglei Sun, Yaoxian Song, Xinglin Pan, Peijie Dong, Xiaofei Yang, Qiang Wang, Zhixu Li, Tiefeng Li, Xiaowen Chu

    Abstract: The existing works on object-level language grounding with 3D objects mostly focus on improving performance by utilizing the off-the-shelf pre-trained models to capture features, such as viewpoint selection or geometric priors. However, they have failed to consider exploring the cross-modal representation of language-vision alignment in the cross-domain field. To answer this problem, we propose a… ▽ More

    Submitted 5 July, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

  42. arXiv:2407.02816  [pdf, other

    cs.IT eess.SP math.ST

    Large and Small Deviations for Statistical Sequence Matching

    Authors: Lin Zhou, Qianyun Wang, Jingjing Wang, Lin Bai, Alfred O. Hero

    Abstract: We revisit the problem of statistical sequence matching between two databases of sequences initiated by Unnikrishnan (TIT 2015) and derive theoretical performance guarantees for the generalized likelihood ratio test (GLRT). We first consider the case where the number of matched pairs of sequences between the databases is known. In this case, the task is to accurately find the matched pairs of sequ… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Extended version of ISIT paper

  43. arXiv:2407.02530  [pdf, ps, other

    quant-ph cs.DS

    Unifying quantum spatial search, state transfer and uniform sampling on graphs: simple and exact

    Authors: Qingwen Wang, Ying Jiang, Lvzhou Li

    Abstract: This article presents a novel and succinct algorithmic framework via alternating quantum walks, unifying quantum spatial search, state transfer and uniform sampling on a large class of graphs. Using the framework, we can achieve exact uniform sampling over all vertices and perfect state transfer between any two vertices, provided that eigenvalues of Laplacian matrix of the graph are all integers.… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: This manuscript has some overlap with arXiv:2307.16133. More precisely, it is an advanced version of arXiv:2307.16133, which not only modifies the paper structure and some results but also adds several new results

  44. arXiv:2407.02057  [pdf, other

    cs.LG cs.SI

    HC-GLAD: Dual Hyperbolic Contrastive Learning for Unsupervised Graph-Level Anomaly Detection

    Authors: Yali Fu, Jindong Li, Jiahong Liu, Qianli Xing, Qi Wang, Irwin King

    Abstract: Unsupervised graph-level anomaly detection (UGAD) has garnered increasing attention in recent years due to its significance. However, most existing methods only rely on traditional graph neural networks to explore pairwise relationships but such kind of pairwise edges are not enough to describe multifaceted relationships involving anomaly. There is an emergency need to exploit node group informati… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  45. arXiv:2407.01007  [pdf, other

    cs.CV

    GMT: A Robust Global Association Model for Multi-Target Multi-Camera Tracking

    Authors: Huijie Fan, Tinghui Zhao, Qiang Wang, Baojie Fan, Yandong Tang, LianQing Liu

    Abstract: In the task of multi-target multi-camera (MTMC) tracking of pedestrians, the data association problem is a key issue and main challenge, especially with complications arising from camera movements, lighting variations, and obstructions. However, most MTMC models adopt two-step approaches, thus heavily depending on the results of the first-step tracking in practical applications. Moreover, the same… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  46. arXiv:2407.00788  [pdf, other

    cs.CV

    InstantStyle-Plus: Style Transfer with Content-Preserving in Text-to-Image Generation

    Authors: Haofan Wang, Peng Xing, Renyuan Huang, Hao Ai, Qixun Wang, Xu Bai

    Abstract: Style transfer is an inventive process designed to create an image that maintains the essence of the original while embracing the visual style of another. Although diffusion models have demonstrated impressive generative power in personalized subject-driven or style-driven applications, existing state-of-the-art methods still encounter difficulties in achieving a seamless balance between content p… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: Technical Report

  47. arXiv:2407.00499  [pdf, other

    cs.CL cs.AI cs.LG

    ConU: Conformal Uncertainty in Large Language Models with Correctness Coverage Guarantees

    Authors: Zhiyuan Wang, Jinhao Duan, Lu Cheng, Yue Zhang, Qingni Wang, Hengtao Shen, Xiaofeng Zhu, Xiaoshuang Shi, Kaidi Xu

    Abstract: Uncertainty quantification (UQ) in natural language generation (NLG) tasks remains an open challenge, exacerbated by the intricate nature of the recent large language models (LLMs). This study investigates adapting conformal prediction (CP), which can convert any heuristic measure of uncertainty into rigorous theoretical guarantees by constructing prediction sets, for black-box LLMs in open-ended… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: 13 pages, 9 figures, 6 tables

  48. arXiv:2407.00383  [pdf, other

    cs.LG cs.AI

    FANFOLD: Graph Normalizing Flows-driven Asymmetric Network for Unsupervised Graph-Level Anomaly Detection

    Authors: Rui Cao, Shijie Xue, Jindong Li, Qi Wang, Yi Chang

    Abstract: Unsupervised graph-level anomaly detection (UGAD) has attracted increasing interest due to its widespread application. In recent studies, knowledge distillation-based methods have been widely used in unsupervised anomaly detection to improve model efficiency and generalization. However, the inherent symmetry between the source (teacher) and target (student) networks typically results in consistent… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  49. arXiv:2406.19812  [pdf, other

    cs.SE cs.AI

    Fuzzy Logic Guided Reward Function Variation: An Oracle for Testing Reinforcement Learning Programs

    Authors: Shiyu Zhang, Haoyang Song, Qixin Wang, Yu Pei

    Abstract: Reinforcement Learning (RL) has gained significant attention across various domains. However, the increasing complexity of RL programs presents testing challenges, particularly the oracle problem: defining the correctness of the RL program. Conventional human oracles struggle to cope with the complexity, leading to inefficiencies and potential unreliability in RL testing. To alleviate this problem… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: 10 pages, 5 figures

    MSC Class: 68T05; 68T27; 93C42 ACM Class: D.2.5; I.2.3

  50. arXiv:2406.19311  [pdf, other

    cs.CR cs.SD eess.AS

    Zero-Query Adversarial Attack on Black-box Automatic Speech Recognition Systems

    Authors: Zheng Fang, Tao Wang, Lingchen Zhao, Shenyi Zhang, Bowen Li, Yunjie Ge, Qi Li, Chao Shen, Qian Wang

    Abstract: In recent years, extensive research has been conducted on the vulnerability of ASR systems, revealing that black-box adversarial example attacks pose significant threats to real-world ASR systems. However, most existing black-box attacks rely on queries to the target ASRs, which is impractical when queries are not permitted. In this paper, we propose ZQ-Attack, a transfer-based adversarial attack… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: To appear in the Proceedings of The ACM Conference on Computer and Communications Security (CCS), 2024