Skip to main content

Showing 1–50 of 4,411 results for author: Liu, J

  1. arXiv:2407.20183  [pdf, other

    cs.CL cs.AI

    MindSearch: Mimicking Human Minds Elicits Deep AI Searcher

    Authors: Zehui Chen, Kuikun Liu, Qiuchen Wang, Jiangning Liu, Wenwei Zhang, Kai Chen, Feng Zhao

    Abstract: Information seeking and integration is a complex cognitive task that consumes enormous time and effort. Inspired by the remarkable progress of Large Language Models, recent works attempt to solve this task by combining LLMs and search engines. However, these methods still obtain unsatisfying performance due to three challenges: (1) complex requests often cannot be accurately and completely retriev… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: Technical Report. Project Page: https://mindsearch.netlify.app Code: https://github.com/InternLM/MindSearch

  2. arXiv:2407.20171  [pdf, other

    cs.CV

    Diffusion Feedback Helps CLIP See Better

    Authors: Wenxuan Wang, Quan Sun, Fan Zhang, Yepeng Tang, Jing Liu, Xinlong Wang

    Abstract: Contrastive Language-Image Pre-training (CLIP), which excels at abstracting open-world representations across domains and modalities, has become a foundation for a variety of vision and multimodal tasks. However, recent studies reveal that CLIP has severe visual shortcomings, such as which can hardly distinguish orientation, quantity, color, structure, etc. These visual shortcomings also limit the… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  3. arXiv:2407.19875  [pdf, other

    cs.CV

    Exploring Robust Face-Voice Matching in Multilingual Environments

    Authors: Jiehui Tang, Xiaofei Wang, Zhen Xiao, Jiayi Liu, Xueliang Liu, Richang Hong

    Abstract: This paper presents Team Xaiofei's innovative approach to exploring Face-Voice Association in Multilingual Environments (FAME) at ACM Multimedia 2024. We focus on the impact of different languages in face-voice matching by building upon Fusion and Orthogonal Projection (FOP), introducing four key components: a dual-branch structure, dynamic sample pair weighting, robust data augmentation, and scor… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  4. arXiv:2407.19547  [pdf, other

    cs.CV

    Temporal Feature Matters: A Framework for Diffusion Model Quantization

    Authors: Yushi Huang, Ruihao Gong, Xianglong Liu, Jing Liu, Yuhang Li, Jiwen Lu, Dacheng Tao

    Abstract: The Diffusion models, widely used for image generation, face significant challenges related to their broad applicability due to prolonged inference times and high memory demands. Efficient Post-Training Quantization (PTQ) is crucial to address these issues in traditional models. Unlike those models, diffusion models critically rely on the time-step $t$ for effective multi-round denoising. Typicall… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2311.16503

  5. arXiv:2407.19542  [pdf, other

    cs.CV

    UniVoxel: Fast Inverse Rendering by Unified Voxelization of Scene Representation

    Authors: Shuang Wu, Songlin Tang, Guangming Lu, Jianzhuang Liu, Wenjie Pei

    Abstract: Typical inverse rendering methods focus on learning implicit neural scene representations by modeling the geometry, materials and illumination separately, which entails significant computations for optimization. In this work we design a Unified Voxelization framework for explicit learning of scene representations, dubbed UniVoxel, which allows for efficient modeling of the geometry, materials and… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: ECCV2024

  6. arXiv:2407.19514  [pdf, other

    cs.CV cs.MM

    Detached and Interactive Multimodal Learning

    Authors: Yunfeng Fan, Wenchao Xu, Haozhao Wang, Junhong Liu, Song Guo

    Abstract: Recently, Multimodal Learning (MML) has gained significant interest as it compensates for single-modality limitations through comprehensive complementary information within multimodal data. However, traditional MML methods generally use the joint learning framework with a uniform learning objective that can lead to the modality competition issue, where feedback predominantly comes from certain mod… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: Accepted by ACM MM 24

  7. arXiv:2407.19224  [pdf, other

    cs.SD cs.MM eess.AS

    RAVSS: Robust Audio-Visual Speech Separation in Multi-Speaker Scenarios with Missing Visual Cues

    Authors: Tianrui Pan, Jie Liu, Bohan Wang, Jie Tang, Gangshan Wu

    Abstract: While existing Audio-Visual Speech Separation (AVSS) methods primarily concentrate on the audio-visual fusion strategy for two-speaker separation, they demonstrate a severe performance drop in the multi-speaker separation scenarios. Typically, AVSS methods employ guiding videos to sequentially isolate individual speakers from the given audio mixture, resulting in notable missing and noisy parts ac… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

  8. arXiv:2407.18962  [pdf

    cs.RO cs.LG

    Autonomous Navigation of Unmanned Vehicle Through Deep Reinforcement Learning

    Authors: Letian Xu, Jiabei Liu, Haopeng Zhao, Tianyao Zheng, Tongzhou Jiang, Lipeng Liu

    Abstract: This paper explores the method of achieving autonomous navigation of unmanned vehicles through Deep Reinforcement Learning (DRL). The focus is on using the Deep Deterministic Policy Gradient (DDPG) algorithm to address issues in high-dimensional continuous action spaces. The paper details the model of a Ackermann robot and the structure and application of the DDPG algorithm. Experiments were condu… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  9. arXiv:2407.18854  [pdf, other

    cs.CV cs.AI

    Unifying Visual and Semantic Feature Spaces with Diffusion Models for Enhanced Cross-Modal Alignment

    Authors: Yuze Zheng, Zixuan Li, Xiangxian Li, Jinxing Liu, Yuqing Wang, Xiangxu Meng, Lei Meng

    Abstract: Image classification models often demonstrate unstable performance in real-world applications due to variations in image information, driven by differing visual perspectives of subject objects and lighting discrepancies. To mitigate these challenges, existing studies commonly incorporate additional modal information matching the visual data to regularize the model's learning process, enabling the… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

  10. arXiv:2407.18626  [pdf, other

    cs.CL cs.AI cs.CV cs.DL cs.MM

    Every Part Matters: Integrity Verification of Scientific Figures Based on Multimodal Large Language Models

    Authors: Xiang Shi, Jiawei Liu, Yinpeng Liu, Qikai Cheng, Wei Lu

    Abstract: This paper tackles a key issue in the interpretation of scientific figures: the fine-grained alignment of text and figures. It advances beyond prior research that primarily dealt with straightforward, data-driven visualizations such as bar and pie charts and only offered a basic understanding of diagrams through captioning and classification. We introduce a novel task, Figure Integrity Verificatio… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: 28 pages, 11 figures, under review

  11. arXiv:2407.17825  [pdf, other

    cs.SI cs.CR

    Blockchain Takeovers in Web 3.0: An Empirical Study on the TRON-Steem Incident

    Authors: Chao Li, Runhua Xu, Balaji Palanisamy, Li Duan, Meng Shen, Jiqiang Liu, Wei Wang

    Abstract: A fundamental goal of Web 3.0 is to establish a decentralized network and application ecosystem, thereby enabling users to retain control over their data while promoting value exchange. However, the recent Tron-Steem takeover incident poses a significant threat to this vision. In this paper, we present a thorough empirical analysis of the Tron-Steem takeover incident. By conducting a fine-grained… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  12. arXiv:2407.17816  [pdf, other

    cs.LG cs.AI

    NC-NCD: Novel Class Discovery for Node Classification

    Authors: Yue Hou, Xueyuan Chen, He Zhu, Romei Liu, Bowen Shi, Jiaheng Liu, Junran Wu, Ke Xu

    Abstract: Novel Class Discovery (NCD) involves identifying new categories within unlabeled data by utilizing knowledge acquired from previously established categories. However, existing NCD methods often struggle to maintain a balance between the performance of old and new categories. Discovering unlabeled new categories in a class-incremental way is more practical but also more challenging, as it is freque… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: Accepted by CIKM'24

  13. arXiv:2407.17416  [pdf, other

    eess.AS cs.AI cs.CL

    Explaining Spectrograms in Machine Learning: A Study on Neural Networks for Speech Classification

    Authors: Jesin James, Balamurali B. T., Binu Abeysinghe, Junchen Liu

    Abstract: This study investigates discriminative patterns learned by neural networks for accurate speech classification, with a specific focus on vowel classification tasks. By examining the activations and features of neural networks for vowel classification, we gain insights into what the networks "see" in spectrograms. Through the use of class activation mapping, we identify the frequencies that contribu… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: 5th International Conference on Artificial Intelligence and Speech Technology (AIST-2023), New Delhi, India

  14. arXiv:2407.17379  [pdf, other

    cs.CV cs.CL

    MMRA: A Benchmark for Multi-granularity Multi-image Relational Association

    Authors: Siwei Wu, Kang Zhu, Yu Bai, Yiming Liang, Yizhi Li, Haoning Wu, Jiaheng Liu, Ruibo Liu, Xingwei Qu, Xuxin Cheng, Ge Zhang, Wenhao Huang, Chenghua Lin

    Abstract: Given the remarkable success that large visual language models (LVLMs) have achieved in image perception tasks, the endeavor to make LVMLs perceive the world like humans is drawing increasing attention. Current multi-modal benchmarks mainly focus on the objective fact or certain topic related potential knowledge within a image, but overlook the associative relations between multiple images. Theref… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: VLMS, Multi-Image Association

  15. arXiv:2407.17333  [pdf, other

    cs.LG

    Global and Local Confidence Based Fraud Detection Graph Neural Network

    Authors: Jiaxun Liu, Yue Tian, Guanjun Liu

    Abstract: This paper presents the Global and Local Confidence Graph Neural Network (GLC-GNN), an innovative approach to graph-based anomaly detection that addresses the challenges of heterophily and camouflage in fraudulent activities. By introducing a prototype to encapsulate the global features of a graph and calculating a Global Confidence (GC) value for each node, GLC-GNN effectively distinguishes betwe… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  16. arXiv:2407.17120  [pdf, other

    cs.LG cs.AI

    Parameter-Efficient Fine-Tuning for Continual Learning: A Neural Tangent Kernel Perspective

    Authors: Jingren Liu, Zhong Ji, YunLong Yu, Jiale Cao, Yanwei Pang, Jungong Han, Xuelong Li

    Abstract: Parameter-efficient fine-tuning for continual learning (PEFT-CL) has shown promise in adapting pre-trained models to sequential tasks while mitigating catastrophic forgetting problem. However, understanding the mechanisms that dictate continual performance in this paradigm remains elusive. To tackle this complexity, we undertake a rigorous analysis of PEFT-CL dynamics to derive relevant metrics fo… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  17. arXiv:2407.17007  [pdf, other

    cs.CY cs.AI cs.HC

    Pensieve Discuss: Scalable Small-Group CS Tutoring System with AI

    Authors: Yoonseok Yang, Jack Liu, J. D. Zamfirescu-Pereira, John DeNero

    Abstract: Small-group tutoring in Computer Science (CS) is effective, but presents the challenge of providing a dedicated tutor for each group and encouraging collaboration among group members at scale. We present Pensieve Discuss, a software platform that integrates synchronous editing for scaffolded programming problems with online human and AI tutors, designed to improve student collaboration and experie… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: 6 pages, 7 figures, 4 tables, 1 page of references

  18. arXiv:2407.16949  [pdf, ps, other

    cs.GT cs.CR

    Profitable Manipulations of Cryptographic Self-Selection are Statistically Detectable

    Authors: Linda Cai, Jingyi Liu, S. Matthew Weinberg, Chenghan Zhou

    Abstract: Cryptographic Self-Selection is a common primitive underlying leader-selection for Proof-of-Stake blockchain protocols. The concept was first popularized in Algorand [CM19], who also observed that the protocol might be manipulable. [FHWY22] provide a concrete manipulation that is strictly profitable for a staker of any size (and also prove upper bounds on the gains from manipulation). Separately… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  19. arXiv:2407.16697  [pdf, other

    cs.CV

    AbdomenAtlas: A Large-Scale, Detailed-Annotated, & Multi-Center Dataset for Efficient Transfer Learning and Open Algorithmic Benchmarking

    Authors: Wenxuan Li, Chongyu Qu, Xiaoxi Chen, Pedro R. A. S. Bassi, Yijia Shi, Yuxiang Lai, Qian Yu, Huimin Xue, Yixiong Chen, Xiaorui Lin, Yutong Tang, Yining Cao, Haoqi Han, Zheyuan Zhang, Jiawei Liu, Tiezheng Zhang, Yujiu Ma, Jincheng Wang, Guang Zhang, Alan Yuille, Zongwei Zhou

    Abstract: We introduce the largest abdominal CT dataset (termed AbdomenAtlas) of 20,460 three-dimensional CT volumes sourced from 112 hospitals across diverse populations, geographies, and facilities. AbdomenAtlas provides 673K high-quality masks of anatomical structures in the abdominal region annotated by a team of 10 radiologists with the help of AI algorithms. We start by having expert radiologists manu… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: Published in Medical Image Analysis

  20. arXiv:2407.16207  [pdf, other

    cs.CL

    Graph-Structured Speculative Decoding

    Authors: Zhuocheng Gong, Jiahao Liu, Ziyue Wang, Pengfei Wu, Jingang Wang, Xunliang Cai, Dongyan Zhao, Rui Yan

    Abstract: Speculative decoding has emerged as a promising technique to accelerate the inference of Large Language Models (LLMs) by employing a small language model to draft a hypothesis sequence, which is then validated by the LLM. The effectiveness of this approach heavily relies on the balance between performance and efficiency of the draft model. In our research, we focus on enhancing the proportion of d… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  21. arXiv:2407.16154  [pdf, other

    cs.CL

    DDK: Distilling Domain Knowledge for Efficient Large Language Models

    Authors: Jiaheng Liu, Chenchen Zhang, Jinyang Guo, Yuanxing Zhang, Haoran Que, Ken Deng, Zhiqi Bai, Jie Liu, Ge Zhang, Jiakai Wang, Yanan Wu, Congnan Liu, Wenbo Su, Jiamang Wang, Lin Qu, Bo Zheng

    Abstract: Despite the advanced intelligence abilities of large language models (LLMs) in various applications, they still face significant computational and storage demands. Knowledge Distillation (KD) has emerged as an effective strategy to improve the performance of a smaller LLM (i.e., the student model) by transferring knowledge from a high-performing LLM (i.e., the teacher model). Prevailing techniques… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  22. arXiv:2407.16124  [pdf, other

    cs.CV

    Fréchet Video Motion Distance: A Metric for Evaluating Motion Consistency in Videos

    Authors: Jiahe Liu, Youran Qu, Qi Yan, Xiaohui Zeng, Lele Wang, Renjie Liao

    Abstract: Significant advancements have been made in video generative models recently. Unlike image generation, video generation presents greater challenges, requiring not only generating high-quality frames but also ensuring temporal consistency across these frames. Despite the impressive progress, research on metrics for evaluating the quality of generated videos, especially concerning temporal and motion… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  23. arXiv:2407.15975  [pdf, other

    cs.CL

    Multilingual Fine-Grained News Headline Hallucination Detection

    Authors: Jiaming Shen, Tianqi Liu, Jialu Liu, Zhen Qin, Jay Pavagadhi, Simon Baumgartner, Michael Bendersky

    Abstract: The popularity of automated news headline generation has surged with advancements in pre-trained language models. However, these models often suffer from the ``hallucination'' problem, where the generated headline is not fully supported by its source article. Efforts to address this issue have predominantly focused on English, using over-simplistic classification schemes that overlook nuanced hall… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  24. arXiv:2407.15706  [pdf, other

    cs.CV

    Multi-Modality Co-Learning for Efficient Skeleton-based Action Recognition

    Authors: Jinfu Liu, Chen Chen, Mengyuan Liu

    Abstract: Skeleton-based action recognition has garnered significant attention due to the utilization of concise and resilient skeletons. Nevertheless, the absence of detailed body information in skeletons restricts performance, while other multimodal methods require substantial inference resources and are inefficient when using multimodal data during both training and inference stages. To address this and… ▽ More

    Submitted 24 July, 2024; v1 submitted 22 July, 2024; originally announced July 2024.

  25. arXiv:2407.15351  [pdf, other

    cs.LG cs.AI cs.CL

    LLMExplainer: Large Language Model based Bayesian Inference for Graph Explanation Generation

    Authors: Jiaxing Zhang, Jiayi Liu, Dongsheng Luo, Jennifer Neville, Hua Wei

    Abstract: Recent studies seek to provide Graph Neural Network (GNN) interpretability via multiple unsupervised learning models. Due to the scarcity of datasets, current methods easily suffer from learning bias. To solve this problem, we embed a Large Language Model (LLM) as knowledge into the GNN explanation network to avoid the learning bias problem. We inject LLM as a Bayesian Inference (BI) module to mit… ▽ More

    Submitted 23 July, 2024; v1 submitted 21 July, 2024; originally announced July 2024.

    Comments: Preprint Paper with 13 pages

  26. arXiv:2407.14899  [pdf, other

    cs.LG cs.CV

    Hyperspectral Unmixing Under Endmember Variability: A Variational Inference Framework

    Authors: Yuening Li, Xiao Fu, Junbin Liu, Wing-Kin Ma

    Abstract: This work proposes a variational inference (VI) framework for hyperspectral unmixing in the presence of endmember variability (HU-EV). An EV-accounted noisy linear mixture model (LMM) is considered, and the presence of outliers is also incorporated into the model. Following the marginalized maximum likelihood (MML) principle, a VI algorithmic structure is designed for probabilistic inference for H… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

  27. arXiv:2407.14774  [pdf, other

    cs.CV cs.AI cs.GR

    Intelligent Artistic Typography: A Comprehensive Review of Artistic Text Design and Generation

    Authors: Yuhang Bai, Zichuan Huang, Wenshuo Gao, Shuai Yang, Jiaying Liu

    Abstract: Artistic text generation aims to amplify the aesthetic qualities of text while maintaining readability. It can make the text more attractive and better convey its expression, thus enjoying a wide range of application scenarios such as social media display, consumer electronics, fashion, and graphic design. Artistic text generation includes artistic text stylization and semantic typography. Artisti… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

    Comments: GitHub Page: https://github.com/williamyang1991/Awesome-Artistic-Typography/

  28. arXiv:2407.14138  [pdf, other

    cs.CV

    Visual Text Generation in the Wild

    Authors: Yuanzhi Zhu, Jiawei Liu, Feiyu Gao, Wenyu Liu, Xinggang Wang, Peng Wang, Fei Huang, Cong Yao, Zhibo Yang

    Abstract: Recently, with the rapid advancements of generative models, the field of visual text generation has witnessed significant progress. However, it is still challenging to render high-quality text images in real-world scenarios, as three critical criteria should be satisfied: (1) Fidelity: the generated text images should be photo-realistic and the contents are expected to be the same as specified in… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  29. arXiv:2407.14126  [pdf, other

    cs.CV

    Mono-ViFI: A Unified Learning Framework for Self-supervised Single- and Multi-frame Monocular Depth Estimation

    Authors: Jinfeng Liu, Lingtong Kong, Bo Li, Zerong Wang, Hong Gu, Jinwei Chen

    Abstract: Self-supervised monocular depth estimation has gathered notable interest since it can liberate training from dependency on depth annotations. In monocular video training case, recent methods only conduct view synthesis between existing camera views, leading to insufficient guidance. To tackle this, we try to synthesize more virtual camera views by flow-based video frame interpolation (VFI), termed… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: 27 pages, accepted by ECCV 2024

  30. arXiv:2407.14078  [pdf, other

    cs.CV

    Stable-Hair: Real-World Hair Transfer via Diffusion Model

    Authors: Yuxuan Zhang, Qing Zhang, Yiren Song, Jiaming Liu

    Abstract: Current hair transfer methods struggle to handle diverse and intricate hairstyles, thus limiting their applicability in real-world scenarios. In this paper, we propose a novel diffusion-based hair transfer framework, named \textit{Stable-Hair}, which robustly transfers a wide range of real-world hairstyles onto user-provided faces for virtual hair try-on. To achieve this goal, our Stable-Hair fram… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  31. arXiv:2407.13757  [pdf, other

    cs.CL cs.AI cs.CR

    Black-Box Opinion Manipulation Attacks to Retrieval-Augmented Generation of Large Language Models

    Authors: Zhuo Chen, Jiawei Liu, Haotan Liu, Qikai Cheng, Fan Zhang, Wei Lu, Xiaozhong Liu

    Abstract: Retrieval-Augmented Generation (RAG) is applied to solve hallucination problems and real-time constraints of large language models, but it also induces vulnerabilities against retrieval corruption attacks. Existing research mainly explores the unreliability of RAG in white-box and closed-domain QA tasks. In this paper, we aim to reveal the vulnerabilities of Retrieval-Enhanced Generative (RAG) mod… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 10 pages, 3 figures, under review

  32. arXiv:2407.13609  [pdf, other

    cs.CV cs.AI

    Training-free Composite Scene Generation for Layout-to-Image Synthesis

    Authors: Jiaqi Liu, Tao Huang, Chang Xu

    Abstract: Recent breakthroughs in text-to-image diffusion models have significantly advanced the generation of high-fidelity, photo-realistic images from textual descriptions. Yet, these models often struggle with interpreting spatial arrangements from text, hindering their ability to produce images with precise spatial configurations. To bridge this gap, layout-to-image generation has emerged as a promisin… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  33. arXiv:2407.13252  [pdf, other

    cs.CV

    Unveiling Structural Memorization: Structural Membership Inference Attack for Text-to-Image Diffusion Models

    Authors: Qiao Li, Xiaomeng Fu, Xi Wang, Jin Liu, Xingyu Gao, Jiao Dai, Jizhong Han

    Abstract: With the rapid advancements of large-scale text-to-image diffusion models, various practical applications have emerged, bringing significant convenience to society. However, model developers may misuse the unauthorized data to train diffusion models. These data are at risk of being memorized by the models, thus potentially violating citizens' privacy rights. Therefore, in order to judge whether a… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  34. arXiv:2407.13163  [pdf, other

    cs.IR cs.AI

    ROLeR: Effective Reward Shaping in Offline Reinforcement Learning for Recommender Systems

    Authors: Yi Zhang, Ruihong Qiu, Jiajun Liu, Sen Wang

    Abstract: Offline reinforcement learning (RL) is an effective tool for real-world recommender systems with its capacity to model the dynamic interest of users and its interactive nature. Most existing offline RL recommender systems focus on model-based RL through learning a world model from offline data and building the recommendation policy by interacting with this model. Although these methods have made p… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: CIKM 2024

  35. arXiv:2407.12823  [pdf, other

    cs.CL cs.AI

    WTU-EVAL: A Whether-or-Not Tool Usage Evaluation Benchmark for Large Language Models

    Authors: Kangyun Ning, Yisong Su, Xueqiang Lv, Yuanzhe Zhang, Jian Liu, Kang Liu, Jinan Xu

    Abstract: Although Large Language Models (LLMs) excel in NLP tasks, they still need external tools to extend their ability. Current research on tool learning with LLMs often assumes mandatory tool use, which does not always align with real-world situations, where the necessity for tools is uncertain, and incorrect or unnecessary use of tools can damage the general abilities of LLMs. Therefore, we propose to… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  36. arXiv:2407.12684  [pdf, other

    cs.CV

    4Dynamic: Text-to-4D Generation with Hybrid Priors

    Authors: Yu-Jie Yuan, Leif Kobbelt, Jiwen Liu, Yuan Zhang, Pengfei Wan, Yu-Kun Lai, Lin Gao

    Abstract: Due to the fascinating generative performance of text-to-image diffusion models, growing text-to-3D generation works explore distilling the 2D generative priors into 3D, using the score distillation sampling (SDS) loss, to bypass the data scarcity problem. The existing text-to-3D methods have achieved promising results in realism and 3D consistency, but text-to-4D generation still faces challenges… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  37. arXiv:2407.12568  [pdf, other

    cs.CV

    LTRL: Boosting Long-tail Recognition via Reflective Learning

    Authors: Qihao Zhao, Yalun Dai, Shen Lin, Wei Hu, Fan Zhang, Jun Liu

    Abstract: In real-world scenarios, where knowledge distributions exhibit long-tail. Humans manage to master knowledge uniformly across imbalanced distributions, a feat attributed to their diligent practices of reviewing, summarizing, and correcting errors. Motivated by this learning process, we propose a novel learning paradigm, called reflecting learning, in handling long-tail recognition. Our method integ… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: ECCV2024

  38. arXiv:2407.12312  [pdf, other

    cs.CV

    Shap-Mix: Shapley Value Guided Mixing for Long-Tailed Skeleton Based Action Recognition

    Authors: Jiahang Zhang, Lilang Lin, Jiaying Liu

    Abstract: In real-world scenarios, human actions often fall into a long-tailed distribution. It makes the existing skeleton-based action recognition works, which are mostly designed based on balanced datasets, suffer from a sharp performance degradation. Recently, many efforts have been madeto image/video long-tailed learning. However, directly applying them to skeleton data can be sub-optimal due to the la… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Accepted by IJCAI 2024. Project Website: https://jhang2020.github.io/Projects/Shap-Mix/Shap-Mix.html

  39. arXiv:2407.11853  [pdf, other

    cs.ET

    A Case for Application-Aware Space Radiation Tolerance in Orbital Computing

    Authors: Meiqi Wang, Han Qiu, Longnv Xu, Di Wang, Yuanjie Li, Tianwei Zhang, Jun Liu, Hewu Li

    Abstract: We are witnessing a surge in the use of commercial off-the-shelf (COTS) hardware for cost-effective in-orbit computing, such as deep neural network (DNN) based on-satellite sensor data processing, Earth object detection, and task decision.However, once exposed to harsh space environments, COTS hardware is vulnerable to cosmic radiation and suffers from exhaustive single-event upsets (SEUs) and mul… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  40. arXiv:2407.11844  [pdf, other

    cs.LG cs.AI cs.CR stat.ML

    Variational Randomized Smoothing for Sample-Wise Adversarial Robustness

    Authors: Ryo Hase, Ye Wang, Toshiaki Koike-Akino, Jing Liu, Kieran Parsons

    Abstract: Randomized smoothing is a defensive technique to achieve enhanced robustness against adversarial examples which are small input perturbations that degrade the performance of neural network models. Conventional randomized smoothing adds random noise with a fixed noise level for every input sample to smooth out adversarial perturbations. This paper proposes a new variational framework that uses a pe… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 20 pages, under preparation

  41. arXiv:2407.11700  [pdf, other

    cs.CV eess.IV

    Rate-Distortion-Cognition Controllable Versatile Neural Image Compression

    Authors: Jinming Liu, Ruoyu Feng, Yunpeng Qi, Qiuyu Chen, Zhibo Chen, Wenjun Zeng, Xin Jin

    Abstract: Recently, the field of Image Coding for Machines (ICM) has garnered heightened interest and significant advances thanks to the rapid progress of learning-based techniques for image compression and analysis. Previous studies often require training separate codecs to support various bitrate levels, machine tasks, and networks, thus lacking both flexibility and practicality. To address these challeng… ▽ More

    Submitted 17 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: ECCV2024

  42. arXiv:2407.11637  [pdf, other

    cs.CV

    REMM:Rotation-Equivariant Framework for End-to-End Multimodal Image Matching

    Authors: Han Nie, Bin Luo, Jun Liu, Zhitao Fu, Weixing Liu, Xin Su

    Abstract: We present REMM, a rotation-equivariant framework for end-to-end multimodal image matching, which fully encodes rotational differences of descriptors in the whole matching pipeline. Previous learning-based methods mainly focus on extracting modal-invariant descriptors, while consistently ignoring the rotational invariance. In this paper, we demonstrate that our REMM is very useful for multimodal i… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 13 pages, 13 figures

  43. arXiv:2407.11486  [pdf, other

    cs.CV

    An efficient framework based on large foundation model for cervical cytopathology whole slide image screening

    Authors: Jialong Huang, Gaojie Li, Shichao Kan, Jianfeng Liu, Yixiong Liang

    Abstract: Current cervical cytopathology whole slide image (WSI) screening primarily relies on detection-based approaches, which are limited in performance due to the expense and time-consuming annotation process. Multiple Instance Learning (MIL), a weakly supervised approach that relies solely on bag-level labels, can effectively alleviate these challenges. Nonetheless, MIL commonly employs frozen pretrain… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  44. arXiv:2407.11382  [pdf, other

    cs.CV cs.AI cs.RO

    Segment, Lift and Fit: Automatic 3D Shape Labeling from 2D Prompts

    Authors: Jianhao Li, Tianyu Sun, Zhongdao Wang, Enze Xie, Bailan Feng, Hongbo Zhang, Ze Yuan, Ke Xu, Jiaheng Liu, Ping Luo

    Abstract: This paper proposes an algorithm for automatically labeling 3D objects from 2D point or box prompts, especially focusing on applications in autonomous driving. Unlike previous arts, our auto-labeler predicts 3D shapes instead of bounding boxes and does not require training on a specific dataset. We propose a Segment, Lift, and Fit (SLF) paradigm to achieve this goal. Firstly, we segment high-quali… ▽ More

    Submitted 17 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  45. arXiv:2407.11299  [pdf, other

    cs.RO cs.CV

    FR-SLAM: A SLAM Improvement Method Based on Floor Plan Registration

    Authors: Jiantao Feng, Xinde Li, HyunCheol Park, Juan Liu, Zhentong Zhang

    Abstract: Simultaneous Localization and Mapping (SLAM) technology enables the construction of environmental maps and localization, serving as a key technique for indoor autonomous navigation of mobile robots. Traditional SLAM methods typically require exhaustive traversal of all rooms during indoor navigation to obtain a complete map, resulting in lengthy path planning times and prolonged time to reach targ… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  46. Sampling and active learning methods for network reliability estimation using K-terminal spanning tree

    Authors: Chen Ding, Pengfei Wei, Yan Shi, Jinxing Liu, Matteo Broggi, Michael Beer

    Abstract: Network reliability analysis remains a challenge due to the increasing size and complexity of networks. This paper presents a novel sampling method and an active learning method for efficient and accurate network reliability estimation under node failure and edge failure scenarios. The proposed sampling method adopts Monte Carlo technique to sample component lifetimes and the K-terminal spanning t… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Journal ref: Reliability Engineering & System Safety (2024) 110309

  47. arXiv:2407.10874  [pdf, other

    cs.HC cs.CV cs.LG

    Random Channel Ablation for Robust Hand Gesture Classification with Multimodal Biosignals

    Authors: Keshav Bimbraw, Jing Liu, Ye Wang, Toshiaki Koike-Akino

    Abstract: Biosignal-based hand gesture classification is an important component of effective human-machine interaction. For multimodal biosignal sensing, the modalities often face data loss due to missing channels in the data which can adversely affect the gesture classification performance. To make the classifiers robust to missing channels in the data, this paper proposes using Random Channel Ablation (RC… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 5 pages, 4 figures

  48. arXiv:2407.10870  [pdf, other

    cs.CV cs.AI cs.HC cs.LG

    GPT Sonograpy: Hand Gesture Decoding from Forearm Ultrasound Images via VLM

    Authors: Keshav Bimbraw, Ye Wang, Jing Liu, Toshiaki Koike-Akino

    Abstract: Large vision-language models (LVLMs), such as the Generative Pre-trained Transformer 4-omni (GPT-4o), are emerging multi-modal foundation models which have great potential as powerful artificial-intelligence (AI) assistance tools for a myriad of applications, including healthcare, industrial, and academic sectors. Although such foundation models perform well in a wide range of general tasks, their… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 8 pages, 9 figures

  49. arXiv:2407.10756  [pdf, other

    cs.CV

    GTPT: Group-based Token Pruning Transformer for Efficient Human Pose Estimation

    Authors: Haonan Wang, Jie Liu, Jie Tang, Gangshan Wu, Bo Xu, Yanbing Chou, Yong Wang

    Abstract: In recent years, 2D human pose estimation has made significant progress on public benchmarks. However, many of these approaches face challenges of less applicability in the industrial community due to the large number of parametric quantities and computational overhead. Efficient human pose estimation remains a hurdle, especially for whole-body pose estimation with numerous keypoints. While most c… ▽ More

    Submitted 16 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: ECCV 2024 accepted

  50. arXiv:2407.10737  [pdf, other

    cs.CV cs.AI

    Aligning Neuronal Coding of Dynamic Visual Scenes with Foundation Vision Models

    Authors: Rining Wu, Feixiang Zhou, Ziwei Yin, Jian K. Liu

    Abstract: Our brains represent the ever-changing environment with neurons in a highly dynamic fashion. The temporal features of visual pixels in dynamic natural scenes are entrapped in the neuronal responses of the retina. It is crucial to establish the intrinsic temporal relationship between visual pixels and neuronal responses. Recent foundation vision models have paved an advanced way of understanding im… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: This article is accepted by ECCV 2024, which ID is 12149. Accepted papers' id can be found in: https://eccv2024.ecva.net/Conferences/2024/AcceptedPapers