Skip to main content

Showing 1–50 of 214 results for author: Liao, J

  1. arXiv:2407.07871  [pdf, other

    cs.IR

    Enhancing HNSW Index for Real-Time Updates: Addressing Unreachable Points and Performance Degradation

    Authors: Wentao Xiao, Yueyang Zhan, Rui Xi, Mengshu Hou, Jianming Liao

    Abstract: The approximate nearest neighbor search (ANNS) is a fundamental and essential component in data mining and information retrieval, with graph-based methodologies demonstrating superior performance compared to alternative approaches. Extensive research efforts have been dedicated to improving search efficiency by developing various graph-based indices, such as HNSW (Hierarchical Navigable Small Worl… ▽ More

    Submitted 15 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

  2. arXiv:2407.07410  [pdf

    cs.CV cs.GR cs.LG

    Mutual Information calculation on different appearances

    Authors: Jiecheng Liao, Junhao Lu, Jeff Ji, Jiacheng He

    Abstract: Mutual information has many applications in image alignment and matching, mainly due to its ability to measure the statistical dependence between two images, even if the two images are from different modalities (e.g., CT and MRI). It considers not only the pixel intensities of the images but also the spatial relationships between the pixels. In this project, we apply the mutual information formula… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: demo for the work: elucidator.cn/demo-mi/

  3. arXiv:2407.07111  [pdf, other

    cs.CV cs.AI cs.LG cs.MM

    Diffusion Model-Based Video Editing: A Survey

    Authors: Wenhao Sun, Rong-Cheng Tu, Jingyi Liao, Dacheng Tao

    Abstract: The rapid development of diffusion models (DMs) has significantly advanced image and video applications, making "what you want is what you see" a reality. Among these, video editing has gained substantial attention and seen a swift rise in research activity, necessitating a comprehensive and systematic review of the existing literature. This paper reviews diffusion model-based video editing techni… ▽ More

    Submitted 26 June, 2024; originally announced July 2024.

    Comments: 23 pages, 12 figures, a project related to this paper can be found at https://github.com/wenhao728/awesome-diffusion-v2v

  4. arXiv:2407.04923  [pdf, other

    cs.CV cs.CL

    OmChat: A Recipe to Train Multimodal Language Models with Strong Long Context and Video Understanding

    Authors: Tiancheng Zhao, Qianqian Zhang, Kyusong Lee, Peng Liu, Lu Zhang, Chunxin Fang, Jiajia Liao, Kelei Jiang, Yibo Ma, Ruochen Xu

    Abstract: We introduce OmChat, a model designed to excel in handling long contexts and video understanding tasks. OmChat's new architecture standardizes how different visual inputs are processed, making it more efficient and adaptable. It uses a dynamic vision encoding process to effectively handle images of various resolutions, capturing fine details across a range of image qualities. OmChat utilizes an ac… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: 14 pages

  5. arXiv:2407.00280  [pdf, other

    eess.IV cs.CV

    IVCA: Inter-Relation-Aware Video Complexity Analyzer

    Authors: Junqi Liao, Yao Li, Zhuoyuan Li, Li Li, Dong Liu

    Abstract: To meet the real-time analysis requirements of video streaming applications, we propose an inter-relation-aware video complexity analyzer (IVCA) as an extension to VCA. The IVCA addresses the limitation of VCA by considering inter-frame relations, namely motion and reference structure. First, we enhance the accuracy of temporal features by introducing feature-domain motion estimation into the IVCA… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

    Comments: The report for the solution of second prize winner in ICIP 2024 Grand Challenge on Video Complexity (Team: USTC-iVC_Team1, USTC-iVC_Team2)

  6. arXiv:2406.18832  [pdf, other

    cs.CL

    OutlierTune: Efficient Channel-Wise Quantization for Large Language Models

    Authors: Jinguang Wang, Yuexi Yin, Haifeng Sun, Qi Qi, Jingyu Wang, Zirui Zhuang, Tingting Yang, Jianxin Liao

    Abstract: Quantizing the activations of large language models (LLMs) has been a significant challenge due to the presence of structured outliers. Most existing methods focus on the per-token or per-tensor quantization of activations, making it difficult to achieve both accuracy and hardware efficiency. To address this problem, we propose OutlierTune, an efficient per-channel post-training quantization (PTQ)… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  7. arXiv:2406.06626  [pdf, other

    cs.LG cs.AI cs.HC eess.SP

    Benchmarking Neural Decoding Backbones towards Enhanced On-edge iBCI Applications

    Authors: Zhou Zhou, Guohang He, Zheng Zhang, Luziwei Leng, Qinghai Guo, Jianxing Liao, Xuan Song, Ran Cheng

    Abstract: Traditional invasive Brain-Computer Interfaces (iBCIs) typically depend on neural decoding processes conducted on workstations within laboratory settings, which prevents their everyday usage. Implementing these decoding processes on edge devices, such as the wearables, introduces considerable challenges related to computational demands, processing speed, and maintaining accuracy. This study seeks… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  8. arXiv:2405.18132  [pdf, other

    cs.CV

    EG4D: Explicit Generation of 4D Object without Score Distillation

    Authors: Qi Sun, Zhiyang Guo, Ziyu Wan, Jing Nathan Yan, Shengming Yin, Wengang Zhou, Jing Liao, Houqiang Li

    Abstract: In recent years, the increasing demand for dynamic 3D assets in design and gaming applications has given rise to powerful generative pipelines capable of synthesizing high-quality 4D objects. Previous methods generally rely on score distillation sampling (SDS) algorithm to infer the unseen views and motion of 4D objects, thus leading to unsatisfactory results with defects like over-saturation and… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  9. RealityEffects: Augmenting 3D Volumetric Videos with Object-Centric Annotation and Dynamic Visual Effects

    Authors: Jian Liao, Kevin Van, Zhijie Xia, Ryo Suzuki

    Abstract: This paper introduces RealityEffects, a desktop authoring interface designed for editing and augmenting 3D volumetric videos with object-centric annotations and visual effects. RealityEffects enhances volumetric capture by introducing a novel method for augmenting captured physical motion with embedded, responsive visual effects, referred to as object-centric augmentation. In RealityEffects, users… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: DIS 2024

  10. arXiv:2405.16414  [pdf, other

    cs.CV

    PPRSteg: Printing and Photography Robust QR Code Steganography via Attention Flow-Based Model

    Authors: Huayuan Ye, Shenzhuo Zhang, Shiqi Jiang, Jing Liao, Shuhang Gu, Changbo Wang, Chenhui Li

    Abstract: Image steganography can hide information in a host image and obtain a stego image that is perceptually indistinguishable from the original one. This technique has tremendous potential in scenarios like copyright protection, information retrospection, etc. Some previous studies have proposed to enhance the robustness of the methods against image disturbances to increase their applicability. However… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

    Comments: 9 content pages

  11. arXiv:2405.10317  [pdf, other

    cs.CV cs.GR

    Text-to-Vector Generation with Neural Path Representation

    Authors: Peiying Zhang, Nanxuan Zhao, Jing Liao

    Abstract: Vector graphics are widely used in digital art and highly favored by designers due to their scalability and layer-wise properties. However, the process of creating and editing vector graphics requires creativity and design expertise, making it a time-consuming task. Recent advancements in text-to-vector (T2V) generation have aimed to make this process more accessible. However, existing T2V methods… ▽ More

    Submitted 20 May, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

    Comments: Accepted by SIGGRAPH 2024. Project page: https://intchous.github.io/T2V-NPR

  12. arXiv:2405.10316  [pdf, other

    cs.CV cs.GR

    Analogist: Out-of-the-box Visual In-Context Learning with Image Diffusion Model

    Authors: Zheng Gu, Shiyuan Yang, Jing Liao, Jing Huo, Yang Gao

    Abstract: Visual In-Context Learning (ICL) has emerged as a promising research area due to its capability to accomplish various tasks with limited example pairs through analogical reasoning. However, training-based visual ICL has limitations in its ability to generalize to unseen tasks and requires the collection of a diverse task dataset. On the other hand, existing methods in the inference-based visual IC… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: Project page: https://analogist2d.github.io

  13. arXiv:2405.04503  [pdf, other

    cs.RO

    Physics-data hybrid dynamic model of a multi-axis manipulator for sensorless dexterous manipulation and high-performance motion planning

    Authors: Wu-Te Yang, Jyun-Ming Liao, Pei-Chun Lin

    Abstract: We report on the development of an implementable physics-data hybrid dynamic model for an articulated manipulator to plan and operate in various scenarios. Meanwhile, the physics-based and data-driven dynamic models are studied in this research to select the best model for planning. The physics-based model is constructed using the Lagrangian method, and the loss terms include inertia loss, viscous… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: 26 pages, 16 figures

  14. arXiv:2405.00515  [pdf, other

    cs.RO cs.CV

    GAD-Generative Learning for HD Map-Free Autonomous Driving

    Authors: Weijian Sun, Yanbo Jia, Qi Zeng, Zihao Liu, Jiang Liao, Yue Li, Xianfeng Li

    Abstract: Deep-learning-based techniques have been widely adopted for autonomous driving software stacks for mass production in recent years, focusing primarily on perception modules, with some work extending this method to prediction modules. However, the downstream planning and control modules are still designed with hefty handcrafted rules, dominated by optimization-based methods such as quadratic progra… ▽ More

    Submitted 31 May, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

  15. arXiv:2405.00250  [pdf, other

    cs.CV cs.RO

    SemVecNet: Generalizable Vector Map Generation for Arbitrary Sensor Configurations

    Authors: Narayanan Elavathur Ranganatha, Hengyuan Zhang, Shashank Venkatramani, Jing-Yan Liao, Henrik I. Christensen

    Abstract: Vector maps are essential in autonomous driving for tasks like localization and planning, yet their creation and maintenance are notably costly. While recent advances in online vector map generation for autonomous vehicles are promising, current models lack adaptability to different sensor configurations. They tend to overfit to specific sensor poses, leading to decreased performance and higher re… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

    Comments: 8 pages, 6 figures, Accepted to IV 2024

  16. arXiv:2404.18112  [pdf, other

    cs.CV cs.RO

    Garbage Segmentation and Attribute Analysis by Robotic Dogs

    Authors: Nuo Xu, Jianfeng Liao, Qiwei Meng, Wei Song

    Abstract: Efficient waste management and recycling heavily rely on garbage exploration and identification. In this study, we propose GSA2Seg (Garbage Segmentation and Attribute Analysis), a novel visual approach that utilizes quadruped robotic dogs as autonomous agents to address waste management and recycling challenges in diverse indoor and outdoor environments. Equipped with advanced visual perception sy… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

  17. arXiv:2404.15341  [pdf, other

    eess.SP cs.LG

    Classifier-guided neural blind deconvolution: a physics-informed denoising module for bearing fault diagnosis under heavy noise

    Authors: Jing-Xiao Liao, Chao He, Jipu Li, Jinwei Sun, Shiping Zhang, Xiaoge Zhang

    Abstract: Blind deconvolution (BD) has been demonstrated as an efficacious approach for extracting bearing fault-specific features from vibration signals under strong background noise. Despite BD's desirable feature in adaptability and mathematical interpretability, a significant challenge persists: How to effectively integrate BD with fault-diagnosing classifiers? This issue arises because the traditional… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  18. arXiv:2404.14356  [pdf, other

    cs.SE

    Rethinking Legal Compliance Automation: Opportunities with Large Language Models

    Authors: Shabnam Hassani, Mehrdad Sabetzadeh, Daniel Amyot, Jain Liao

    Abstract: As software-intensive systems face growing pressure to comply with laws and regulations, providing automated support for compliance analysis has become paramount. Despite advances in the Requirements Engineering (RE) community on legal compliance analysis, important obstacles remain in developing accurate and generalizable compliance automation solutions. This paper highlights some observed limita… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: Accepted for publication at the RE@Next! track of RE 2024

  19. arXiv:2404.12888  [pdf, other

    cs.CV cs.GR cs.LG

    Learn2Talk: 3D Talking Face Learns from 2D Talking Face

    Authors: Yixiang Zhuang, Baoping Cheng, Yao Cheng, Yuntao Jin, Renshuai Liu, Chengyang Li, Xuan Cheng, Jing Liao, Juncong Lin

    Abstract: Speech-driven facial animation methods usually contain two main classes, 3D and 2D talking face, both of which attract considerable research attention in recent years. However, to the best of our knowledge, the research on 3D talking face does not go deeper as 2D talking face, in the aspect of lip-synchronization (lip-sync) and speech perception. To mind the gap between the two sub-fields, we prop… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  20. arXiv:2404.12347  [pdf, other

    cs.CV cs.GR

    AniClipart: Clipart Animation with Text-to-Video Priors

    Authors: Ronghuan Wu, Wanchao Su, Kede Ma, Jing Liao

    Abstract: Clipart, a pre-made graphic art form, offers a convenient and efficient way of illustrating visual content. Traditional workflows to convert static clipart images into motion sequences are laborious and time-consuming, involving numerous intricate steps like rigging, key animation and in-betweening. Recent advancements in text-to-video generation hold great potential in resolving this problem. Nev… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: Project Page: https://aniclipart.github.io/

  21. arXiv:2404.10343  [pdf, other

    cs.CV eess.IV

    The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

    Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More

    Submitted 25 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

  22. arXiv:2404.04937  [pdf, other

    cs.CR cs.GT

    Optimizing Information Propagation for Blockchain-empowered Mobile AIGC: A Graph Attention Network Approach

    Authors: Jiana Liao, Jinbo Wen, Jiawen Kang, Yang Zhang, Jianbo Du, Qihao Li, Weiting Zhang, Dong Yang

    Abstract: Artificial Intelligence-Generated Content (AIGC) is a rapidly evolving field that utilizes advanced AI algorithms to generate content. Through integration with mobile edge networks, mobile AIGC networks have gained significant attention, which can provide real-time customized and personalized AIGC services and products. Since blockchains can facilitate decentralized and transparent data management… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2403.13237

  23. arXiv:2404.03654  [pdf, other

    cs.CV

    RaFE: Generative Radiance Fields Restoration

    Authors: Zhongkai Wu, Ziyu Wan, Jing Zhang, Jing Liao, Dong Xu

    Abstract: NeRF (Neural Radiance Fields) has demonstrated tremendous potential in novel view synthesis and 3D reconstruction, but its performance is sensitive to input image quality, which struggles to achieve high-fidelity rendering when provided with low-quality sparse input viewpoints. Previous methods for NeRF restoration are tailored for specific degradation type, ignoring the generality of restoration.… ▽ More

    Submitted 7 April, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

    Comments: Project Page: https://zkaiwu.github.io/RaFE

  24. arXiv:2403.15878  [pdf, other

    cs.CV

    Diffusion-based Aesthetic QR Code Generation via Scanning-Robust Perceptual Guidance

    Authors: Jia-Wei Liao, Winston Wang, Tzu-Sian Wang, Li-Xuan Peng, Cheng-Fu Chou, Jun-Cheng Chen

    Abstract: QR codes, prevalent in daily applications, lack visual appeal due to their conventional black-and-white design. Integrating aesthetics while maintaining scannability poses a challenge. In this paper, we introduce a novel diffusion-model-based aesthetic QR code generation pipeline, utilizing pre-trained ControlNet and guided iterative refinement via a novel classifier guidance (SRG) based on the pr… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

  25. arXiv:2403.13237  [pdf, ps, other

    cs.CR math.OC

    Graph Attention Network-based Block Propagation with Optimal AoI and Reputation in Web 3.0

    Authors: Jiana Liao, Jinbo Wen, Jiawen Kang, Changyan Yi, Yang Zhang, Yutao Jiao, Dusit Niyato, Dong In Kim, Shengli Xie

    Abstract: Web 3.0 is recognized as a pioneering paradigm that empowers users to securely oversee data without reliance on a centralized authority. Blockchains, as a core technology to realize Web 3.0, can facilitate decentralized and transparent data management. Nevertheless, the evolution of blockchain-enabled Web 3.0 is still in its nascent phase, grappling with challenges such as ensuring efficiency and… ▽ More

    Submitted 8 May, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

  26. arXiv:2402.18998  [pdf, other

    cs.CV

    COFT-AD: COntrastive Fine-Tuning for Few-Shot Anomaly Detection

    Authors: Jingyi Liao, Xun Xu, Manh Cuong Nguyen, Adam Goodge, Chuan Sheng Foo

    Abstract: Existing approaches towards anomaly detection~(AD) often rely on a substantial amount of anomaly-free data to train representation and density models. However, large anomaly-free datasets may not always be available before the inference stage; in which case an anomaly detection model must be trained with only a handful of normal samples, a.k.a. few-shot anomaly detection (FSAD). In this paper, we… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

    Comments: IEEE Transactions on Image Processing

  27. arXiv:2402.17593  [pdf, other

    cs.RO

    Autonomous Shuttle Operation for Vulnerable Populations: Lessons and Experiences

    Authors: Ren Zhong, Zhaofeng Tian, Jinghui Liao, Weisong Shi

    Abstract: The increasing shortage of drivers poses a significant threat to vulnerable populations, particularly seniors and disabled individuals who heavily depend on public transportation for accessing healthcare services and social events. Autonomous Vehicles (AVs) emerge as a promising alternative, offering potential improvements in accessibility and independence for these groups. However, current design… ▽ More

    Submitted 28 February, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

  28. Joint Resource Allocation and Trajectory Design for Resilient Multi-UAV Communication Networks

    Authors: Linghui Ge, Xiao Liang, Hua Zhang, Peihao Dong, Jianxin Liao, Jingyu Wang

    Abstract: In contrast to terrestrial wireless networks, dynamic Unmanned Aerial Vehicle (UAV) networks are susceptible to unexpected link failures arising from UAV breakdowns or the depletion of its batteries. Drastic user rate fluctuations and sum rate drops can occur due to the unexpected UAV link failures. Previous research has focused primarily on re-establishing these links to maintain service continui… ▽ More

    Submitted 20 January, 2024; originally announced February 2024.

    Journal ref: IEEE Wireless Communications Letters, 2024

  29. arXiv:2402.16379  [pdf, other

    cs.CL cs.AI

    TEaR: Improving LLM-based Machine Translation with Systematic Self-Refinement

    Authors: Zhaopeng Feng, Yan Zhang, Hao Li, Bei Wu, Jiayu Liao, Wenqiang Liu, Jun Lang, Yang Feng, Jian Wu, Zuozhu Liu

    Abstract: Large Language Models (LLMs) have achieved impressive results in Machine Translation (MT). However, careful evaluations by human reveal that the translations produced by LLMs still contain multiple errors. Importantly, feeding back such error information into the LLMs can lead to self-refinement and result in improved translation performance. Motivated by these insights, we introduce a systematic… ▽ More

    Submitted 21 June, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

    Comments: Our code and data are available at https://github.com/fzp0424/self_correct_mt

  30. arXiv:2402.06700  [pdf, other

    cs.LG cs.AI

    Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement

    Authors: Muning Wen, Junwei Liao, Cheng Deng, Jun Wang, Weinan Zhang, Ying Wen

    Abstract: Large Language Models (LLMs) have shown promise as intelligent agents in interactive decision-making tasks. Traditional approaches often depend on meticulously designed prompts, high-quality examples, or additional reward models for in-context learning, supervised fine-tuning, or RLHF. Reinforcement learning (RL) presents a dynamic alternative for LLMs to overcome these dependencies by engaging di… ▽ More

    Submitted 6 June, 2024; v1 submitted 9 February, 2024; originally announced February 2024.

  31. Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion

    Authors: Shiyuan Yang, Liang Hou, Haibin Huang, Chongyang Ma, Pengfei Wan, Di Zhang, Xiaodong Chen, Jing Liao

    Abstract: Recent text-to-video diffusion models have achieved impressive progress. In practice, users often desire the ability to control object motion and camera movement independently for customized video creation. However, current methods lack the focus on separately controlling object motion and camera movement in a decoupled manner, which limits the controllability and flexibility of text-to-video mode… ▽ More

    Submitted 6 May, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

  32. arXiv:2402.03025  [pdf, other

    cs.IR cs.LG

    Understanding and Guiding Weakly Supervised Entity Alignment with Potential Isomorphism Propagation

    Authors: Yuanyi Wang, Wei Tang, Haifeng Sun, Zirui Zhuang, Xiaoyuan Fu, Jingyu Wang, Qi Qi, Jianxin Liao

    Abstract: Weakly Supervised Entity Alignment (EA) is the task of identifying equivalent entities across diverse knowledge graphs (KGs) using only a limited number of seed alignments. Despite substantial advances in aggregation-based weakly supervised EA, the underlying mechanisms in this setting remain unexplored. In this paper, we present a propagation perspective to analyze weakly supervised EA and explai… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  33. arXiv:2401.17859  [pdf, other

    cs.IR

    Towards Semantic Consistency: Dirichlet Energy Driven Robust Multi-Modal Entity Alignment

    Authors: Yuanyi Wang, Haifeng Sun, Jiabo Wang, Jingyu Wang, Wei Tang, Qi Qi, Shaoling Sun, Jianxin Liao

    Abstract: In Multi-Modal Knowledge Graphs (MMKGs), Multi-Modal Entity Alignment (MMEA) is crucial for identifying identical entities across diverse modal attributes. However, semantic inconsistency, mainly due to missing modal attributes, poses a significant challenge. Traditional approaches rely on attribute interpolation, but this often introduces modality noise, distorting the original semantics. Moreove… ▽ More

    Submitted 19 March, 2024; v1 submitted 31 January, 2024; originally announced January 2024.

    Comments: arXiv admin note: text overlap with arXiv:2307.16210 by other authors

  34. arXiv:2401.17807  [pdf, other

    cs.CV cs.GR

    Advances in 3D Generation: A Survey

    Authors: Xiaoyu Li, Qi Zhang, Di Kang, Weihao Cheng, Yiming Gao, Jingbo Zhang, Zhihao Liang, Jing Liao, Yan-Pei Cao, Ying Shan

    Abstract: Generating 3D models lies at the core of computer graphics and has been the focus of decades of research. With the emergence of advanced neural representations and generative models, the field of 3D content generation is developing rapidly, enabling the creation of increasingly high-quality and diverse 3D models. The rapid growth of this field makes it difficult to stay abreast of all recent devel… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

    Comments: 33 pages, 12 figures

  35. arXiv:2401.12798  [pdf, other

    cs.IR cs.CL

    Gradient Flow of Energy: A General and Efficient Approach for Entity Alignment Decoding

    Authors: Yuanyi Wang, Haifeng Sun, Jingyu Wang, Qi Qi, Shaoling Sun, Jianxin Liao

    Abstract: Entity alignment (EA), a pivotal process in integrating multi-source Knowledge Graphs (KGs), seeks to identify equivalent entity pairs across these graphs. Most existing approaches regard EA as a graph representation learning task, concentrating on enhancing graph encoders. However, the decoding process in EA - essential for effective operation and alignment accuracy - has received limited attenti… ▽ More

    Submitted 17 April, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

  36. arXiv:2401.02143  [pdf, other

    cs.LG cs.AI cs.IR cs.SI

    Graph Neural Networks for Tabular Data Learning: A Survey with Taxonomy and Directions

    Authors: Cheng-Te Li, Yu-Che Tsai, Chih-Yao Chen, Jay Chiehen Liao

    Abstract: In this survey, we dive into Tabular Data Learning (TDL) using Graph Neural Networks (GNNs), a domain where deep learning-based approaches have increasingly shown superior performance in both classification and regression tasks compared to traditional methods. The survey highlights a critical gap in deep neural TDL methods: the underrepresentation of latent correlations among data instances and fe… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

    Comments: Under review, ongoing work, Github page: https://github.com/Roytsai27/awesome-GNN4TDL

  37. arXiv:2401.01491   

    cs.CE

    A Hybrid Neural Network Model For Predicting The Nitrate Concentration In The Recirculating Aquaculture System

    Authors: Xiangyu Fan, Jiaxin Lia, Yingzhe Wang, Yingsha Qu, Hao Li, Keming Qu, Zhengguo Cui

    Abstract: This study was groundbreaking in its application of neural network models for nitrate management in the Recirculating Aquaculture System (RAS). A hybrid neural network model was proposed, which accurately predicted daily nitrate concentration and its trends using six water quality parameters. We conducted a 105-day aquaculture experiment, during which we collected 450 samples from five sets of RAS… ▽ More

    Submitted 15 January, 2024; v1 submitted 2 January, 2024; originally announced January 2024.

    Comments: The content of this paper needs to be further filled and improved

  38. arXiv:2312.14389  [pdf, other

    cs.CV

    StyleRetoucher: Generalized Portrait Image Retouching with GAN Priors

    Authors: Wanchao Su, Can Wang, Chen Liu, Hangzhou Han, Hongbo Fu, Jing Liao

    Abstract: Creating fine-retouched portrait images is tedious and time-consuming even for professional artists. There exist automatic retouching methods, but they either suffer from over-smoothing artifacts or lack generalization ability. To address such issues, we present StyleRetoucher, a novel automatic portrait image retouching framework, leveraging StyleGAN's generation and generalization ability to imp… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

    Comments: 13 pages, 15 figures

  39. arXiv:2312.07539  [pdf, other

    cs.CV

    HeadArtist: Text-conditioned 3D Head Generation with Self Score Distillation

    Authors: Hongyu Liu, Xuan Wang, Ziyu Wan, Yujun Shen, Yibing Song, Jing Liao, Qifeng Chen

    Abstract: This work presents HeadArtist for 3D head generation from text descriptions. With a landmark-guided ControlNet serving as the generative prior, we come up with an efficient pipeline that optimizes a parameterized 3D head model under the supervision of the prior distillation itself. We call such a process self score distillation (SSD). In detail, given a sampled camera pose, we first render an imag… ▽ More

    Submitted 8 May, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

    Comments: Amazing results are shown in https://kumapowerliu.github.io/HeadArtist. Accepted by SIGGRAPH 2024

  40. arXiv:2312.06663  [pdf, other

    cs.CV cs.GR

    CAD: Photorealistic 3D Generation via Adversarial Distillation

    Authors: Ziyu Wan, Despoina Paschalidou, Ian Huang, Hongyu Liu, Bokui Shen, Xiaoyu Xiang, Jing Liao, Leonidas Guibas

    Abstract: The increased demand for 3D data in AR/VR, robotics and gaming applications, gave rise to powerful generative pipelines capable of synthesizing high-quality 3D objects. Most of these models rely on the Score Distillation Sampling (SDS) algorithm to optimize a 3D representation such that the rendered image maintains a high likelihood as evaluated by a pre-trained diffusion model. However, finding a… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

    Comments: Project page: http://raywzy.com/CAD/

  41. arXiv:2312.02445  [pdf, other

    cs.IR

    LLaRA: Large Language-Recommendation Assistant

    Authors: Jiayi Liao, Sihang Li, Zhengyi Yang, Jiancan Wu, Yancheng Yuan, Xiang Wang, Xiangnan He

    Abstract: Sequential recommendation aims to predict users' next interaction with items based on their past engagement sequence. Recently, the advent of Large Language Models (LLMs) has sparked interest in leveraging them for sequential recommendation, viewing it as language modeling. Previous studies represent items within LLMs' input prompts as either ID indices or textual metadata. However, these approach… ▽ More

    Submitted 4 May, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

    Comments: 11 pages, 5 figures

  42. arXiv:2312.02157  [pdf, other

    cs.CV

    Mesh-Guided Neural Implicit Field Editing

    Authors: Can Wang, Mingming He, Menglei Chai, Dongdong Chen, Jing Liao

    Abstract: Neural implicit fields have emerged as a powerful 3D representation for reconstructing and rendering photo-realistic views, yet they possess limited editability. Conversely, explicit 3D representations, such as polygonal meshes, offer ease of editing but may not be as suitable for rendering high-quality novel views. To harness the strengths of both representations, we propose a new approach that e… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

    Comments: Project page: https://cassiepython.github.io/MNeuEdit/

  43. arXiv:2311.16961  [pdf, other

    cs.CV

    HumanRef: Single Image to 3D Human Generation via Reference-Guided Diffusion

    Authors: Jingbo Zhang, Xiaoyu Li, Qi Zhang, Yanpei Cao, Ying Shan, Jing Liao

    Abstract: Generating a 3D human model from a single reference image is challenging because it requires inferring textures and geometries in invisible views while maintaining consistency with the reference image. Previous methods utilizing 3D generative models are limited by the availability of 3D training data. Optimization-based methods that lift text-to-image diffusion models to 3D generation often fail t… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

    Comments: Homepage: https://eckertzhang.github.io/HumanRef.github.io/

  44. arXiv:2311.02305  [pdf, other

    cs.CV cs.AI cs.RO

    OSM vs HD Maps: Map Representations for Trajectory Prediction

    Authors: Jing-Yan Liao, Parth Doshi, Zihan Zhang, David Paz, Henrik Christensen

    Abstract: While High Definition (HD) Maps have long been favored for their precise depictions of static road elements, their accessibility constraints and susceptibility to rapid environmental changes impede the widespread deployment of autonomous driving, especially in the motion forecasting task. In this context, we propose to leverage OpenStreetMap (OSM) as a promising alternative to HD Maps for long-ter… ▽ More

    Submitted 3 November, 2023; originally announced November 2023.

  45. arXiv:2311.01759  [pdf, other

    cs.LG cs.AR

    TinyFormer: Efficient Transformer Design and Deployment on Tiny Devices

    Authors: Jianlei Yang, Jiacheng Liao, Fanding Lei, Meichen Liu, Junyi Chen, Lingkun Long, Han Wan, Bei Yu, Weisheng Zhao

    Abstract: Developing deep learning models on tiny devices (e.g. Microcontroller units, MCUs) has attracted much attention in various embedded IoT applications. However, it is challenging to efficiently design and deploy recent advanced models (e.g. transformers) on tiny devices due to their severe hardware resource constraints. In this work, we propose TinyFormer, a framework specifically designed to develo… ▽ More

    Submitted 3 November, 2023; originally announced November 2023.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  46. arXiv:2310.19331  [pdf, other

    cs.NI

    AdapINT: A Flexible and Adaptive In-Band Network Telemetry System Based on Deep Reinforcement Learning

    Authors: Penghui Zhang, Hua Zhang, Yibo Pi, Zijian Cao, Jingyu Wang, Jianxin Liao

    Abstract: In-band Network Telemetry (INT) has emerged as a promising network measurement technology. However, existing network telemetry systems lack the flexibility to meet diverse telemetry requirements and are also difficult to adapt to dynamic network environments. In this paper, we propose AdapINT, a versatile and adaptive in-band network telemetry framework assisted by dual-timescale probes, including… ▽ More

    Submitted 30 October, 2023; originally announced October 2023.

    Comments: 14 pages, 19 figures

  47. arXiv:2310.11864  [pdf, other

    cs.CV cs.GR cs.LG

    VQ-NeRF: Neural Reflectance Decomposition and Editing with Vector Quantization

    Authors: Hongliang Zhong, Jingbo Zhang, Jing Liao

    Abstract: We propose VQ-NeRF, a two-branch neural network model that incorporates Vector Quantization (VQ) to decompose and edit reflectance fields in 3D scenes. Conventional neural reflectance fields use only continuous representations to model 3D scenes, despite the fact that objects are typically composed of discrete materials in reality. This lack of discretization can result in noisy material decomposi… ▽ More

    Submitted 10 November, 2023; v1 submitted 18 October, 2023; originally announced October 2023.

    Comments: Accepted by TVCG. Project Page: https://jtbzhl.github.io/VQ-NeRF.github.io/

  48. arXiv:2310.10651  [pdf, other

    cs.CV cs.GR

    HairCLIPv2: Unifying Hair Editing via Proxy Feature Blending

    Authors: Tianyi Wei, Dongdong Chen, Wenbo Zhou, Jing Liao, Weiming Zhang, Gang Hua, Nenghai Yu

    Abstract: Hair editing has made tremendous progress in recent years. Early hair editing methods use well-drawn sketches or masks to specify the editing conditions. Even though they can enable very fine-grained local control, such interaction modes are inefficient for the editing conditions that can be easily specified by language descriptions or reference images. Thanks to the recent breakthrough of cross-m… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

    Comments: ICCV 2023, code is available at https://github.com/wty-ustc/HairCLIPv2

  49. Uni-paint: A Unified Framework for Multimodal Image Inpainting with Pretrained Diffusion Model

    Authors: Shiyuan Yang, Xiaodong Chen, Jing Liao

    Abstract: Recently, text-to-image denoising diffusion probabilistic models (DDPMs) have demonstrated impressive image generation capabilities and have also been successfully applied to image inpainting. However, in practice, users often require more control over the inpainting process beyond textual guidance, especially when they want to composite objects with customized appearance, color, shape, and layout… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

    Comments: Accepted by ACMMM'23

  50. Enhancing Cross-Dataset Performance of Distracted Driving Detection With Score-Softmax Classifier

    Authors: Cong Duan, Zixuan Liu, Jiahao Xia, Minghai Zhang, Jiacai Liao, Libo Cao

    Abstract: Deep neural networks enable real-time monitoring of in-vehicle driver, facilitating the timely prediction of distractions, fatigue, and potential hazards. This technology is now integral to intelligent transportation systems. Recent research has exposed unreliable cross-dataset end-to-end driver behavior recognition due to overfitting, often referred to as ``shortcut learning", resulting from limi… ▽ More

    Submitted 20 October, 2023; v1 submitted 8 October, 2023; originally announced October 2023.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible