Skip to main content

Showing 1–50 of 5,966 results for author: Wang, J

  1. arXiv:2407.20198  [pdf, other

    eess.IV cs.CV

    SpaER: Learning Spatio-temporal Equivariant Representations for Fetal Brain Motion Tracking

    Authors: Jian Wang, Razieh Faghihpirayesh, Polina Golland, Ali Ghoulipour

    Abstract: In this paper, we introduce SpaER, a pioneering method for fetal motion tracking that leverages equivariant filters and self-attention mechanisms to effectively learn spatio-temporal representations. Different from conventional approaches that statically estimate fetal brain motions from pairs of images, our method dynamically tracks the rigid movement patterns of the fetal head across temporal an… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  2. arXiv:2407.20121  [pdf, other

    cs.IR cs.AI

    EXIT: An EXplicit Interest Transfer Framework for Cross-Domain Recommendation

    Authors: Lei Huang, Weitao Li, Chenrui Zhang, Jinpeng Wang, Xianchun Yi, Sheng Chen

    Abstract: Cross-domain recommendation has attracted substantial interest in industrial apps such as Meituan, which serves multiple business domains via knowledge transfer and meets the diverse interests of users. However, existing methods typically follow an implicit modeling paradigm that blends the knowledge from both the source and target domains, and design intricate network structures to share learned… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: Accepted at CIKM 2024

  3. arXiv:2407.19711  [pdf, other

    cs.SE

    TVDiag: A Task-oriented and View-invariant Failure Diagnosis Framework with Multimodal Data

    Authors: Shuaiyu Xie, Jian Wang, Hanbin He, Zhihao Wang, Yuqi Zhao, Neng Zhang, Bing Li

    Abstract: Microservice-based systems often suffer from reliability issues due to their intricate interactions and expanding scale. With the rapid growth of observability techniques, various methods have been proposed to achieve failure diagnosis, including root cause localization and failure type identification, by leveraging diverse monitoring data such as logs, metrics, or traces. However, traditional fai… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: 30 pages

  4. arXiv:2407.19672  [pdf, other

    cs.CL

    SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages

    Authors: Wenxuan Zhang, Hou Pong Chan, Yiran Zhao, Mahani Aljunied, Jianyu Wang, Chaoqun Liu, Yue Deng, Zhiqiang Hu, Weiwen Xu, Yew Ken Chia, Xin Li, Lidong Bing

    Abstract: Large Language Models (LLMs) have shown remarkable abilities across various tasks, yet their development has predominantly centered on high-resource languages like English and Chinese, leaving low-resource languages underserved. To address this disparity, we present SeaLLMs 3, the latest iteration of the SeaLLMs model family, tailored for Southeast Asian languages. This region, characterized by it… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

  5. arXiv:2407.19564  [pdf, other

    cs.CV cs.AI cs.RO

    Forecast-PEFT: Parameter-Efficient Fine-Tuning for Pre-trained Motion Forecasting Models

    Authors: Jifeng Wang, Kaouther Messaoud, Yuejiang Liu, Juergen Gall, Alexandre Alahi

    Abstract: Recent progress in motion forecasting has been substantially driven by self-supervised pre-training. However, adapting pre-trained models for specific downstream tasks, especially motion prediction, through extensive fine-tuning is often inefficient. This inefficiency arises because motion prediction closely aligns with the masked pre-training tasks, and traditional full fine-tuning methods fail t… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: This work has been submitted to the IEEE for possible publication

  6. arXiv:2407.19178  [pdf, other

    cs.CV eess.SP

    Power-LLaVA: Large Language and Vision Assistant for Power Transmission Line Inspection

    Authors: Jiahao Wang, Mingxuan Li, Haichen Luo, Jinguo Zhu, Aijun Yang, Mingzhe Rong, Xiaohua Wang

    Abstract: The inspection of power transmission line has achieved notable achievements in the past few years, primarily due to the integration of deep learning technology. However, current inspection approaches continue to encounter difficulties in generalization and intelligence, which restricts their further applicability. In this paper, we introduce Power-LLaVA, the first large language and vision assista… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

  7. arXiv:2407.18902  [pdf, other

    cs.RO cs.AI cs.LG

    Lessons from Learning to Spin "Pens"

    Authors: Jun Wang, Ying Yuan, Haichuan Che, Haozhi Qi, Yi Ma, Jitendra Malik, Xiaolong Wang

    Abstract: In-hand manipulation of pen-like objects is an important skill in our daily lives, as many tools such as hammers and screwdrivers are similarly shaped. However, current learning-based methods struggle with this task due to a lack of high-quality demonstrations and the significant gap between simulation and the real world. In this work, we push the boundaries of learning-based in-hand manipulation… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: Website: https://penspin.github.io/

  8. arXiv:2407.18743  [pdf, other

    cs.CL

    Towards Effective and Efficient Continual Pre-training of Large Language Models

    Authors: Jie Chen, Zhipeng Chen, Jiapeng Wang, Kun Zhou, Yutao Zhu, Jinhao Jiang, Yingqian Min, Wayne Xin Zhao, Zhicheng Dou, Jiaxin Mao, Yankai Lin, Ruihua Song, Jun Xu, Xu Chen, Rui Yan, Zhewei Wei, Di Hu, Wenbing Huang, Ji-Rong Wen

    Abstract: Continual pre-training (CPT) has been an important approach for adapting language models to specific domains or tasks. To make the CPT approach more traceable, this paper presents a technical report for continually pre-training Llama-3 (8B), which significantly enhances the Chinese language ability and scientific reasoning ability of the backbone model. To enhance the new abilities while retaining… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: 16 pages, 10 figures, 16 tables

    MSC Class: 68T50 ACM Class: I.2.7

  9. arXiv:2407.18667  [pdf, other

    cs.CV

    A Labeled Ophthalmic Ultrasound Dataset with Medical Report Generation Based on Cross-modal Deep Learning

    Authors: Jing Wang, Junyan Fan, Meng Zhou, Yanzhu Zhang, Mingyu Shi

    Abstract: Ultrasound imaging reveals eye morphology and aids in diagnosing and treating eye diseases. However, interpreting diagnostic reports requires specialized physicians. We present a labeled ophthalmic dataset for the precise analysis and the automated exploration of medical images along with their associated reports. It collects three modal data, including the ultrasound images, blood flow informatio… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

  10. arXiv:2407.18656  [pdf, other

    cs.CV

    Auto DragGAN: Editing the Generative Image Manifold in an Autoregressive Manner

    Authors: Pengxiang Cai, Zhiwei Liu, Guibo Zhu, Yunfang Niu, Jinqiao Wang

    Abstract: Pixel-level fine-grained image editing remains an open challenge. Previous works fail to achieve an ideal trade-off between control granularity and inference speed. They either fail to achieve pixel-level fine-grained control, or their inference speed requires optimization. To address this, this paper for the first time employs a regression-based network to learn the variation patterns of StyleGAN… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: This paper has been accepted as a poster paper for ACM Multimedia 2024

  11. arXiv:2407.18449  [pdf, other

    eess.IV cs.CV cs.LG

    Towards A Generalizable Pathology Foundation Model via Unified Knowledge Distillation

    Authors: Jiabo Ma, Zhengrui Guo, Fengtao Zhou, Yihui Wang, Yingxue Xu, Yu Cai, Zhengjie Zhu, Cheng Jin, Yi Lin Xinrui Jiang, Anjia Han, Li Liang, Ronald Cheong Kin Chan, Jiguang Wang, Kwang-Ting Cheng, Hao Chen

    Abstract: Foundation models pretrained on large-scale datasets are revolutionizing the field of computational pathology (CPath). The generalization ability of foundation models is crucial for the success in various downstream clinical tasks. However, current foundation models have only been evaluated on a limited type and number of tasks, leaving their generalization ability and overall performance unclear.… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Report number: I.2.10

  12. arXiv:2407.18362  [pdf, other

    eess.IV cs.CV cs.LG

    Retinal IPA: Iterative KeyPoints Alignment for Multimodal Retinal Imaging

    Authors: Jiacheng Wang, Hao Li, Dewei Hu, Rui Xu, Xing Yao, Yuankai K. Tao, Ipek Oguz

    Abstract: We propose a novel framework for retinal feature point alignment, designed for learning cross-modality features to enhance matching and registration across multi-modality retinal images. Our model draws on the success of previous learning-based feature detection and description methods. To better leverage unlabeled data and constrain the model to reproduce relevant keypoints, we integrate a keypoi… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  13. arXiv:2407.18232  [pdf, other

    cs.CV

    LION: Linear Group RNN for 3D Object Detection in Point Clouds

    Authors: Zhe Liu, Jinghua Hou, Xinyu Wang, Xiaoqing Ye, Jingdong Wang, Hengshuang Zhao, Xiang Bai

    Abstract: The benefit of transformers in large-scale 3D point cloud perception tasks, such as 3D object detection, is limited by their quadratic computation cost when modeling long-range relationships. In contrast, linear RNNs have low computational complexity and are suitable for long-range modeling. Toward this goal, we propose a simple and effective window-based framework built on LInear grOup RNN (i.e.,… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: Project page: https://happinesslz.github.io/projects/LION/

  14. arXiv:2407.18121  [pdf, other

    cs.CV

    Efficient Inference of Vision Instruction-Following Models with Elastic Cache

    Authors: Zuyan Liu, Benlin Liu, Jiahui Wang, Yuhao Dong, Guangyi Chen, Yongming Rao, Ranjay Krishna, Jiwen Lu

    Abstract: In the field of instruction-following large vision-language models (LVLMs), the efficient deployment of these models faces challenges, notably due to the high memory demands of their key-value (KV) caches. Conventional cache management strategies for LLMs focus on cache eviction, which often fails to address the specific needs of multimodal instruction-following models. Recognizing this gap, in th… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  15. arXiv:2407.17910  [pdf, other

    stat.ML cs.AI cs.LG

    Causal Deepsets for Off-policy Evaluation under Spatial or Spatio-temporal Interferences

    Authors: Runpeng Dai, Jianing Wang, Fan Zhou, Shikai Luo, Zhiwei Qin, Chengchun Shi, Hongtu Zhu

    Abstract: Off-policy evaluation (OPE) is widely applied in sectors such as pharmaceuticals and e-commerce to evaluate the efficacy of novel products or policies from offline datasets. This paper introduces a causal deepset framework that relaxes several key structural assumptions, primarily the mean-field assumption, prevalent in existing OPE methodologies that handle spatio-temporal interference. These tra… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  16. arXiv:2407.17838  [pdf

    cs.CV cs.AI

    UMono: Physical Model Informed Hybrid CNN-Transformer Framework for Underwater Monocular Depth Estimation

    Authors: Jian Wang, Jing Wang, Shenghui Rong, Bo He

    Abstract: Underwater monocular depth estimation serves as the foundation for tasks such as 3D reconstruction of underwater scenes. However, due to the influence of light and medium, the underwater environment undergoes a distinctive imaging process, which presents challenges in accurately estimating depth from a single image. The existing methods fail to consider the unique characteristics of underwater env… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  17. arXiv:2407.17227  [pdf, other

    cs.AI cs.CL

    LEAN-GitHub: Compiling GitHub LEAN repositories for a versatile LEAN prover

    Authors: Zijian Wu, Jiayu Wang, Dahua Lin, Kai Chen

    Abstract: Recently, large language models have presented promising results in aiding formal mathematical reasoning. However, their performance is restricted due to the scarcity of formal theorem-proving data, which requires additional effort to be extracted from raw formal language corpora. Meanwhile, a significant amount of human-written formal language corpora remains underutilized. To address this issue,… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  18. arXiv:2407.16993  [pdf, other

    cs.CV

    LoFormer: Local Frequency Transformer for Image Deblurring

    Authors: Xintian Mao, Jiansheng Wang, Xingran Xie, Qingli Li, Yan Wang

    Abstract: Due to the computational complexity of self-attention (SA), prevalent techniques for image deblurring often resort to either adopting localized SA or employing coarse-grained global SA methods, both of which exhibit drawbacks such as compromising global modeling or lacking fine-grained correlation. In order to address this issue by effectively modeling long-range dependencies without sacrificing f… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  19. arXiv:2407.16957  [pdf, other

    cs.CV

    Raindrop Clarity: A Dual-Focused Dataset for Day and Night Raindrop Removal

    Authors: Yeying Jin, Xin Li, Jiadong Wang, Yan Zhang, Malu Zhang

    Abstract: Existing raindrop removal datasets have two shortcomings. First, they consist of images captured by cameras with a focus on the background, leading to the presence of blurry raindrops. To our knowledge, none of these datasets include images where the focus is specifically on raindrops, which results in a blurry background. Second, these datasets predominantly consist of daytime images, thereby lac… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV2024, dataset and benchmark at: \url{https://github.com/jinyeying/RaindropClarity}

  20. arXiv:2407.16955  [pdf, other

    cs.CV cs.RO

    DVPE: Divided View Position Embedding for Multi-View 3D Object Detection

    Authors: Jiasen Wang, Zhenglin Li, Ke Sun, Xianyuan Liu, Yang Zhou

    Abstract: Sparse query-based paradigms have achieved significant success in multi-view 3D detection for autonomous vehicles. Current research faces challenges in balancing between enlarging receptive fields and reducing interference when aggregating multi-view features. Moreover, different poses of cameras present challenges in training global attention models. To address these problems, this paper proposes… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  21. arXiv:2407.16928  [pdf, other

    cs.CR

    From Sands to Mansions: Enabling Automatic Full-Life-Cycle Cyberattack Construction with LLM

    Authors: Lingzhi Wang, Jiahui Wang, Kyle Jung, Kedar Thiagarajan, Emily Wei, Xiangmin Shen, Yan Chen, Zhenyuan Li

    Abstract: The escalating battles between attackers and defenders in cybersecurity make it imperative to test and evaluate defense capabilities from the attackers' perspective. However, constructing full-life-cycle cyberattacks and performing red team emulations requires significant time and domain knowledge from security experts. Existing cyberattack simulation frameworks face challenges such as limited tec… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  22. arXiv:2407.16822  [pdf, other

    cs.CV cs.AI

    AI-Enhanced 7-Point Checklist for Melanoma Detection Using Clinical Knowledge Graphs and Data-Driven Quantification

    Authors: Yuheng Wang, Tianze Yu, Jiayue Cai, Sunil Kalia, Harvey Lui, Z. Jane Wang, Tim K. Lee

    Abstract: The 7-point checklist (7PCL) is widely used in dermoscopy to identify malignant melanoma lesions needing urgent medical attention. It assigns point values to seven attributes: major attributes are worth two points each, and minor ones are worth one point each. A total score of three or higher prompts further evaluation, often including a biopsy. However, a significant limitation of current methods… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  23. arXiv:2407.16788  [pdf, other

    cs.CV

    Occlusion-Aware 3D Motion Interpretation for Abnormal Behavior Detection

    Authors: Su Li, Wang Liang, Jianye Wang, Ziheng Zhang, Lei Zhang

    Abstract: Estimating abnormal posture based on 3D pose is vital in human pose analysis, yet it presents challenges, especially when reconstructing 3D human poses from monocular datasets with occlusions. Accurate reconstructions enable the restoration of 3D movements, which assist in the extraction of semantic details necessary for analyzing abnormal behaviors. However, most existing methods depend on predef… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  24. arXiv:2407.16697  [pdf, other

    cs.CV

    AbdomenAtlas: A Large-Scale, Detailed-Annotated, & Multi-Center Dataset for Efficient Transfer Learning and Open Algorithmic Benchmarking

    Authors: Wenxuan Li, Chongyu Qu, Xiaoxi Chen, Pedro R. A. S. Bassi, Yijia Shi, Yuxiang Lai, Qian Yu, Huimin Xue, Yixiong Chen, Xiaorui Lin, Yutong Tang, Yining Cao, Haoqi Han, Zheyuan Zhang, Jiawei Liu, Tiezheng Zhang, Yujiu Ma, Jincheng Wang, Guang Zhang, Alan Yuille, Zongwei Zhou

    Abstract: We introduce the largest abdominal CT dataset (termed AbdomenAtlas) of 20,460 three-dimensional CT volumes sourced from 112 hospitals across diverse populations, geographies, and facilities. AbdomenAtlas provides 673K high-quality masks of anatomical structures in the abdominal region annotated by a team of 10 radiologists with the help of AI algorithms. We start by having expert radiologists manu… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: Published in Medical Image Analysis

  25. arXiv:2407.16210  [pdf, other

    cs.GR cs.AI cs.LG

    Strategy and Skill Learning for Physics-based Table Tennis Animation

    Authors: Jiashun Wang, Jessica Hodgins, Jungdam Won

    Abstract: Recent advancements in physics-based character animation leverage deep learning to generate agile and natural motion, enabling characters to execute movements such as backflips, boxing, and tennis. However, reproducing the selection and use of diverse motor skills in dynamic environments to solve complex tasks, as humans do, still remains a challenge. We present a strategy and skill learning appro… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: SIGGRAPH 2024

  26. arXiv:2407.16207  [pdf, other

    cs.CL

    Graph-Structured Speculative Decoding

    Authors: Zhuocheng Gong, Jiahao Liu, Ziyue Wang, Pengfei Wu, Jingang Wang, Xunliang Cai, Dongyan Zhao, Rui Yan

    Abstract: Speculative decoding has emerged as a promising technique to accelerate the inference of Large Language Models (LLMs) by employing a small language model to draft a hypothesis sequence, which is then validated by the LLM. The effectiveness of this approach heavily relies on the balance between performance and efficiency of the draft model. In our research, we focus on enhancing the proportion of d… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  27. arXiv:2407.16154  [pdf, other

    cs.CL

    DDK: Distilling Domain Knowledge for Efficient Large Language Models

    Authors: Jiaheng Liu, Chenchen Zhang, Jinyang Guo, Yuanxing Zhang, Haoran Que, Ken Deng, Zhiqi Bai, Jie Liu, Ge Zhang, Jiakai Wang, Yanan Wu, Congnan Liu, Wenbo Su, Jiamang Wang, Lin Qu, Bo Zheng

    Abstract: Despite the advanced intelligence abilities of large language models (LLMs) in various applications, they still face significant computational and storage demands. Knowledge Distillation (KD) has emerged as an effective strategy to improve the performance of a smaller LLM (i.e., the student model) by transferring knowledge from a high-performing LLM (i.e., the teacher model). Prevailing techniques… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  28. arXiv:2407.15819  [pdf, other

    cs.CV

    Accelerating Pre-training of Multimodal LLMs via Chain-of-Sight

    Authors: Ziyuan Huang, Kaixiang Ji, Biao Gong, Zhiwu Qing, Qinglong Zhang, Kecheng Zheng, Jian Wang, Jingdong Chen, Ming Yang

    Abstract: This paper introduces Chain-of-Sight, a vision-language bridge module that accelerates the pre-training of Multimodal Large Language Models (MLLMs). Our approach employs a sequence of visual resamplers that capture visual details at various spacial scales. This architecture not only leverages global and local visual contexts effectively, but also facilitates the flexible extension of visual tokens… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  29. arXiv:2407.15567  [pdf, other

    cs.LG cs.DC cs.IT math.OC

    A New Theoretical Perspective on Data Heterogeneity in Federated Optimization

    Authors: Jiayi Wang, Shiqiang Wang, Rong-Rong Chen, Mingyue Ji

    Abstract: In federated learning (FL), data heterogeneity is the main reason that existing theoretical analyses are pessimistic about the convergence rate. In particular, for many FL algorithms, the convergence rate grows dramatically when the number of local updates becomes large, especially when the product of the gradient divergence and local Lipschitz constant is large. However, empirical studies can sho… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: ICML 2024

  30. arXiv:2407.15362  [pdf, other

    cs.CV cs.AI

    A Multimodal Knowledge-enhanced Whole-slide Pathology Foundation Model

    Authors: Yingxue Xu, Yihui Wang, Fengtao Zhou, Jiabo Ma, Shu Yang, Huangjing Lin, Xin Wang, Jiguang Wang, Li Liang, Anjia Han, Ronald Cheong Kin Chan, Hao Chen

    Abstract: Remarkable strides in computational pathology have been made in the task-agnostic foundation model that advances the performance of a wide array of downstream clinical tasks. Despite the promising performance, there are still several challenges. First, prior works have resorted to either vision-only or vision-captions data, disregarding invaluable pathology reports and gene expression profiles whi… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: 44 pages, 9 figures

  31. arXiv:2407.15334  [pdf, other

    cs.CV

    Explore the LiDAR-Camera Dynamic Adjustment Fusion for 3D Object Detection

    Authors: Yiran Yang, Xu Gao, Tong Wang, Xin Hao, Yifeng Shi, Xiao Tan, Xiaoqing Ye, Jingdong Wang

    Abstract: Camera and LiDAR serve as informative sensors for accurate and robust autonomous driving systems. However, these sensors often exhibit heterogeneous natures, resulting in distributional modality gaps that present significant challenges for fusion. To address this, a robust fusion technique is crucial, particularly for enhancing 3D object detection. In this paper, we introduce a dynamic adjustment… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

  32. arXiv:2407.15212  [pdf, other

    cs.CV cs.GR

    Surfel-based Gaussian Inverse Rendering for Fast and Relightable Dynamic Human Reconstruction from Monocular Video

    Authors: Yiqun Zhao, Chenming Wu, Binbin Huang, Yihao Zhi, Chen Zhao, Jingdong Wang, Shenghua Gao

    Abstract: Efficient and accurate reconstruction of a relightable, dynamic clothed human avatar from a monocular video is crucial for the entertainment industry. This paper introduces the Surfel-based Gaussian Inverse Avatar (SGIA) method, which introduces efficient training and rendering for relightable dynamic human reconstruction. SGIA advances previous Gaussian Avatar methods by comprehensively modeling… ▽ More

    Submitted 23 July, 2024; v1 submitted 21 July, 2024; originally announced July 2024.

    Comments: Under Review; Project Page: https://GS-IA.github.io

  33. arXiv:2407.15026  [pdf, other

    cs.AR cs.AI

    Benchmarking End-To-End Performance of AI-Based Chip Placement Algorithms

    Authors: Zhihai Wang, Zijie Geng, Zhaojie Tu, Jie Wang, Yuxi Qian, Zhexuan Xu, Ziyan Liu, Siyuan Xu, Zhentao Tang, Shixiong Kai, Mingxuan Yuan, Jianye Hao, Bin Li, Yongdong Zhang, Feng Wu

    Abstract: The increasing complexity of modern very-large-scale integration (VLSI) design highlights the significance of Electronic Design Automation (EDA) technologies. Chip placement is a critical step in the EDA workflow, which positions chip modules on the canvas with the goal of optimizing performance, power, and area (PPA) metrics of final chip designs. Recent advances have demonstrated the great poten… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: A comprehensive benchmark for AI-based chip placement algorithms using end-to-end performance metrics

  34. arXiv:2407.14882  [pdf, other

    cs.LG cs.AI math.NA

    Reduced Effectiveness of Kolmogorov-Arnold Networks on Functions with Noise

    Authors: Haoran Shen, Chen Zeng, Jiahui Wang, Qiao Wang

    Abstract: It has been observed that even a small amount of noise introduced into the dataset can significantly degrade the performance of KAN. In this brief note, we aim to quantitatively evaluate the performance when noise is added to the dataset. We propose an oversampling technique combined with denoising to alleviate the impact of noise. Specifically, we employ kernel filtering based on diffusion maps f… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

    MSC Class: 68T07

  35. arXiv:2407.14605  [pdf, other

    cs.CV cs.AI

    ESCAPE: Energy-based Selective Adaptive Correction for Out-of-distribution 3D Human Pose Estimation

    Authors: Luke Bidulka, Mohsen Gholami, Jiannan Zheng, Martin J. McKeown, Z. Jane Wang

    Abstract: Despite recent advances in human pose estimation (HPE), poor generalization to out-of-distribution (OOD) data remains a difficult problem. While previous works have proposed Test-Time Adaptation (TTA) to bridge the train-test domain gap by refining network parameters at inference, the absence of ground-truth annotations makes it highly challenging and existing methods typically increase inference… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: 32 pages, 8 figures

    ACM Class: I.2.6; I.2.10

  36. arXiv:2407.14568  [pdf, other

    cs.CL cs.AI cs.DB

    SQLfuse: Enhancing Text-to-SQL Performance through Comprehensive LLM Synergy

    Authors: Tingkai Zhang, Chaoyu Chen, Cong Liao, Jun Wang, Xudong Zhao, Hang Yu, Jianchao Wang, Jianguo Li, Wenhui Shi

    Abstract: Text-to-SQL conversion is a critical innovation, simplifying the transition from complex SQL to intuitive natural language queries, especially significant given SQL's prevalence in the job market across various roles. The rise of Large Language Models (LLMs) like GPT-3.5 and GPT-4 has greatly advanced this field, offering improved natural language understanding and the ability to generate nuanced… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  37. arXiv:2407.14532  [pdf, other

    cs.DC cs.LG

    A Scenario-Oriented Benchmark for Assessing AIOps Algorithms in Microservice Management

    Authors: Yongqian Sun, Jiaju Wang, Zhengdan Li, Xiaohui Nie, Minghua Ma, Shenglin Zhang, Yuhe Ji, Lu Zhang, Wen Long, Hengmao Chen, Yongnan Luo, Dan Pei

    Abstract: AIOps algorithms play a crucial role in the maintenance of microservice systems. Many previous benchmarks' performance leaderboard provides valuable guidance for selecting appropriate algorithms. However, existing AIOps benchmarks mainly utilize offline datasets to evaluate algorithms. They cannot consistently evaluate the performance of algorithms using real-time datasets, and the operation scena… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: Codes are available at https://github.com/MicroServo/microservo, datasets are available at https://github.com/MicroServo/hot-plugging

  38. arXiv:2407.14100  [pdf, other

    cs.GR cs.AI cs.LG

    ParamsDrag: Interactive Parameter Space Exploration via Image-Space Dragging

    Authors: Guan Li, Yang Liu, Guihua Shan, Shiyu Cheng, Weiqun Cao, Junpeng Wang, Ko-Chih Wang

    Abstract: Numerical simulation serves as a cornerstone in scientific modeling, yet the process of fine-tuning simulation parameters poses significant challenges. Conventionally, parameter adjustment relies on extensive numerical simulations, data analysis, and expert insights, resulting in substantial computational costs and low efficiency. The emergence of deep learning in recent years has provided promisi… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: To be published in Proc. IEEE VIS 2024

  39. arXiv:2407.14069  [pdf, other

    cs.CV

    Self-Supervised Video Representation Learning in a Heuristic Decoupled Perspective

    Authors: Zeen Song, Jingyao Wang, Jianqi Zhang, Changwen Zheng, Wenwen Qiang

    Abstract: Video contrastive learning (v-CL) has gained prominence as a leading framework for unsupervised video representation learning, showcasing impressive performance across various tasks such as action classification and detection. In the field of video representation learning, a feature extractor should ideally capture both static and dynamic semantics. However, our series of experiments reveals that… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  40. arXiv:2407.14058  [pdf, other

    cs.LG

    On the Causal Sufficiency and Necessity of Multi-Modal Representation Learning

    Authors: Jingyao Wang, Wenwen Qiang, Jiangmeng Li, Lingyu Si, Changwen Zheng, Bing Su

    Abstract: An effective paradigm of multi-modal learning (MML) is to learn unified representations among modalities. From a causal perspective, constraining the consistency between different modalities can mine causal representations that convey primary events. However, such simple consistency may face the risk of learning insufficient or unnecessary information: a necessary but insufficient cause is invaria… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  41. arXiv:2407.13998  [pdf, other

    cs.CL cs.AI

    RAG-QA Arena: Evaluating Domain Robustness for Long-form Retrieval Augmented Question Answering

    Authors: Rujun Han, Yuhao Zhang, Peng Qi, Yumo Xu, Jenyuan Wang, Lan Liu, William Yang Wang, Bonan Min, Vittorio Castelli

    Abstract: Question answering based on retrieval augmented generation (RAG-QA) is an important research topic in NLP and has a wide range of real-world applications. However, most existing datasets for this task are either constructed using a single source corpus or consist of short extractive answers, which fall short of evaluating large language model (LLM) based RAG-QA systems on cross-domain generalizati… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  42. arXiv:2407.13520  [pdf, other

    cs.CV

    EaDeblur-GS: Event assisted 3D Deblur Reconstruction with Gaussian Splatting

    Authors: Yuchen Weng, Zhengwen Shen, Ruofan Chen, Qi Wang, Jun Wang

    Abstract: 3D deblurring reconstruction techniques have recently seen significant advancements with the development of Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS). Although these techniques can recover relatively clear 3D reconstructions from blurry image inputs, they still face limitations in handling severe blurring and complex camera motion. To address these issues, we propose Event-ass… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  43. arXiv:2407.13284  [pdf, other

    cs.IR

    Semantic-aware Representation Learning for Homography Estimation

    Authors: Yuhan Liu, Qianxin Huang, Siqi Hui, Jingwen Fu, Sanping Zhou, Kangyi Wu, Pengna Li, Jinjun Wang

    Abstract: Homography estimation is the task of determining the transformation from an image pair. Our approach focuses on employing detector-free feature matching methods to address this issue. Previous work has underscored the importance of incorporating semantic information, however there still lacks an efficient way to utilize semantic information. Previous methods suffer from treating the semantics as a… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  44. arXiv:2407.13278  [pdf, other

    cs.LG

    Deep Time Series Models: A Comprehensive Survey and Benchmark

    Authors: Yuxuan Wang, Haixu Wu, Jiaxiang Dong, Yong Liu, Mingsheng Long, Jianmin Wang

    Abstract: Time series, characterized by a sequence of data points arranged in a discrete-time order, are ubiquitous in real-world applications. Different from other modalities, time series present unique challenges due to their complex and dynamic nature, including the entanglement of nonlinear patterns and time-variant trends. Analyzing time series data is of great significance in real-world scenarios and… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: \

  45. arXiv:2407.13220  [pdf, other

    eess.AS cs.SD

    MEDIC: Zero-shot Music Editing with Disentangled Inversion Control

    Authors: Huadai Liu, Jialei Wang, Rongjie Huang, Yang Liu, Jiayang Xu, Zhou Zhao

    Abstract: Text-guided diffusion models catalyze a paradigm shift in audio generation, facilitating the adaptability of source audio to conform to specific textual prompts. Recent advancements introduce inversion techniques, like DDIM inversion, to zero-shot editing, exploiting pre-trained diffusion models for audio modification. Nonetheless, our investigation exposes that DDIM inversion suffers from an accu… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  46. arXiv:2407.13201  [pdf, other

    cs.SE

    $μ$Drive: User-Controlled Autonomous Driving

    Authors: Kun Wang, Christopher M. Poskitt, Yang Sun, Jun Sun, Jingyi Wang, Peng Cheng, Jiming Chen

    Abstract: Autonomous Vehicles (AVs) rely on sophisticated Autonomous Driving Systems (ADSs) to provide passengers a satisfying and safe journey. The individual preferences of riders plays a crucial role in shaping the perception of safety and comfort while they are in the car. Existing ADSs, however, lack mechanisms to systematically capture and integrate rider preferences into their planning modules. To br… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  47. arXiv:2407.13137  [pdf, other

    cs.CV

    OE-BevSeg: An Object Informed and Environment Aware Multimodal Framework for Bird's-eye-view Vehicle Semantic Segmentation

    Authors: Jian Sun, Yuqi Dai, Chi-Man Vong, Qing Xu, Shengbo Eben Li, Jianqiang Wang, Lei He, Keqiang Li

    Abstract: Bird's-eye-view (BEV) semantic segmentation is becoming crucial in autonomous driving systems. It realizes ego-vehicle surrounding environment perception by projecting 2D multi-view images into 3D world space. Recently, BEV segmentation has made notable progress, attributed to better view transformation modules, larger image encoders, or more temporal information. However, there are still two issu… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  48. arXiv:2407.12962  [pdf, other

    cs.RO

    NAS: N-step computation of All Solutions to the footstep planning problem

    Authors: Jiayi Wang, Saeid Samadi, Hefan Wang, Pierre Fernbach, Olivier Stasse, Sethu Vijayakumar, Steve Tonneau

    Abstract: How many ways are there to climb a staircase in a given number of steps? Infinitely many, if we focus on the continuous aspect of the problem. A finite, possibly large number if we consider the discrete aspect, i.e. on which surface which effectors are going to step and in what order. We introduce NAS, an algorithm that considers both aspects simultaneously and computes all the possible solutions… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Submitted to Humanoids 2024

  49. arXiv:2407.12940  [pdf, other

    cs.RO cs.CV

    KiGRAS: Kinematic-Driven Generative Model for Realistic Agent Simulation

    Authors: Jianbo Zhao, Jiaheng Zhuang, Qibin Zhou, Taiyu Ban, Ziyao Xu, Hangning Zhou, Junhe Wang, Guoan Wang, Zhiheng Li, Bin Li

    Abstract: Trajectory generation is a pivotal task in autonomous driving. Recent studies have introduced the autoregressive paradigm, leveraging the state transition model to approximate future trajectory distributions. This paradigm closely mirrors the real-world trajectory generation process and has achieved notable success. However, its potential is limited by the ineffective representation of realistic t… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  50. arXiv:2407.12797  [pdf, other

    cs.PF cs.LG

    CEBench: A Benchmarking Toolkit for the Cost-Effectiveness of LLM Pipelines

    Authors: Wenbo Sun, Jiaqi Wang, Qiming Guo, Ziyu Li, Wenlu Wang, Rihan Hai

    Abstract: Online Large Language Model (LLM) services such as ChatGPT and Claude 3 have transformed business operations and academic research by effortlessly enabling new opportunities. However, due to data-sharing restrictions, sectors such as healthcare and finance prefer to deploy local LLM applications using costly hardware resources. This scenario requires a balance between the effectiveness advantages… ▽ More

    Submitted 20 June, 2024; originally announced July 2024.