Skip to main content

Showing 1–50 of 570 results for author: Fang, Y

  1. arXiv:2407.20204  [pdf, ps, other

    cs.CC

    Constant-Cost Communication is not Reducible to k-Hamming Distance

    Authors: Yuting Fang, Mika Göös, Nathaniel Harms, Pooya Hatami

    Abstract: Every known communication problem whose randomized communication cost is constant (independent of the input size) can be reduced to $k$-Hamming Distance, that is, solved with a constant number of deterministic queries to some $k$-Hamming Distance oracle. We exhibit the first examples of constant-cost problems which cannot be reduced to $k$-Hamming Distance. To prove this separation, we relate it… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  2. arXiv:2407.18595  [pdf, other

    cs.CV

    LinguaLinker: Audio-Driven Portraits Animation with Implicit Facial Control Enhancement

    Authors: Rui Zhang, Yixiao Fang, Zhengnan Lu, Pei Cheng, Zebiao Huang, Bin Fu

    Abstract: This study delves into the intricacies of synchronizing facial dynamics with multilingual audio inputs, focusing on the creation of visually compelling, time-synchronized animations through diffusion-based techniques. Diverging from traditional parametric models for facial animation, our approach, termed LinguaLinker, adopts a holistic diffusion-based framework that integrates audio-driven visual… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

  3. arXiv:2407.17453  [pdf, other

    cs.CV

    $VILA^2$: VILA Augmented VILA

    Authors: Yunhao Fang, Ligeng Zhu, Yao Lu, Yan Wang, Pavlo Molchanov, Jang Hyun Cho, Marco Pavone, Song Han, Hongxu Yin

    Abstract: Visual language models (VLMs) have rapidly progressed, driven by the success of large language models (LLMs). While model architectures and training infrastructures advance rapidly, data curation remains under-explored. When data quantity and quality become a bottleneck, existing work either directly crawls more raw data from the Internet that does not have a guarantee of data quality or distills… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  4. arXiv:2407.17438  [pdf, other

    cs.CV cs.AI cs.LG

    HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation

    Authors: Zhenzhi Wang, Yixuan Li, Yanhong Zeng, Youqing Fang, Yuwei Guo, Wenran Liu, Jing Tan, Kai Chen, Tianfan Xue, Bo Dai, Dahua Lin

    Abstract: Human image animation involves generating videos from a character photo, allowing user control and unlocking potential for video and movie production. While recent approaches yield impressive results using high-quality training data, the inaccessibility of these datasets hampers fair and transparent benchmarking. Moreover, these approaches prioritize 2D human motion and overlook the significance o… ▽ More

    Submitted 28 July, 2024; v1 submitted 24 July, 2024; originally announced July 2024.

    Comments: camera controllable human image animation, a dataset and a baseline

  5. arXiv:2407.16394  [pdf, other

    cs.CV

    SEDS: Semantically Enhanced Dual-Stream Encoder for Sign Language Retrieval

    Authors: Longtao Jiang, Min Wang, Zecheng Li, Yao Fang, Wengang Zhou, Houqiang Li

    Abstract: Different from traditional video retrieval, sign language retrieval is more biased towards understanding the semantic information of human actions contained in video clips. Previous works typically only encode RGB videos to obtain high-level semantic features, resulting in local action details drowned in a large amount of visual information redundancy. Furthermore, existing RGB-based sign retrieva… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: Accepted to ACM International Conference on Multimedia (MM) 2024

  6. arXiv:2407.13491  [pdf, other

    eess.SP cs.IT

    Performance Analysis and Low-Complexity Beamforming Design for Near-Field Physical Layer Security

    Authors: Yunpu Zhang, Yuan Fang, Xianghao Yu, Changsheng You, Ying-Jun Angela Zhang

    Abstract: Extremely large-scale arrays (XL-arrays) have emerged as a key enabler in achieving the unprecedented performance requirements of future wireless networks, leading to a significant increase in the range of the near-field region. This transition necessitates the spherical wavefront model for characterizing the wireless propagation rather than the far-field planar counterpart, thereby introducing ex… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 13 pages, 13 figures

  7. arXiv:2407.13335  [pdf, other

    cs.CV

    OAT: Object-Level Attention Transformer for Gaze Scanpath Prediction

    Authors: Yini Fang, Jingling Yu, Haozheng Zhang, Ralf van der Lans, Bertram Shi

    Abstract: Visual search is important in our daily life. The efficient allocation of visual attention is critical to effectively complete visual search tasks. Prior research has predominantly modelled the spatial allocation of visual attention in images at the pixel level, e.g. using a saliency map. However, emerging evidence shows that visual attention is guided by objects rather than pixel intensities. Thi… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Accepted in ECCV 2024

  8. arXiv:2407.13268  [pdf, other

    cs.AI cs.LG

    Mixture of Experts based Multi-task Supervise Learning from Crowds

    Authors: Tao Han, Huaixuan Shi, Xinyi Ding, Xiao Ma, Huamao Gu, Yili Fang

    Abstract: Existing truth inference methods in crowdsourcing aim to map redundant labels and items to the ground truth. They treat the ground truth as hidden variables and use statistical or deep learning-based worker behavior models to infer the ground truth. However, worker behavior models that rely on ground truth hidden variables overlook workers' behavior at the item feature level, leading to imprecise… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  9. arXiv:2407.10181  [pdf, other

    cs.CV

    Multiscale Sliced Wasserstein Distances as Perceptual Color Difference Measures

    Authors: Jiaqi He, Zhihua Wang, Leon Wang, Tsein-I Liu, Yuming Fang, Qilin Sun, Kede Ma

    Abstract: Contemporary color difference (CD) measures for photographic images typically operate by comparing co-located pixels, patches in a ``perceptually uniform'' color space, or features in a learned latent space. Consequently, these measures inadequately capture the human color perception of misaligned image pairs, which are prevalent in digital photography (e.g., the same scene captured by different s… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  10. arXiv:2407.09919  [pdf, other

    cs.CV

    Arbitrary-Scale Video Super-Resolution with Structural and Textural Priors

    Authors: Wei Shang, Dongwei Ren, Wanying Zhang, Yuming Fang, Wangmeng Zuo, Kede Ma

    Abstract: Arbitrary-scale video super-resolution (AVSR) aims to enhance the resolution of video frames, potentially at various scaling factors, which presents several challenges regarding spatial detail reproduction, temporal consistency, and computational complexity. In this paper, we first describe a strong baseline for AVSR by putting together three variants of elementary building blocks: 1) a flow-guide… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024, the code is available at https://github.com/shangwei5/ST-AVSR

    ACM Class: I.4.3

  11. arXiv:2407.08813  [pdf, other

    eess.IV cs.AI cs.CV

    FairDomain: Achieving Fairness in Cross-Domain Medical Image Segmentation and Classification

    Authors: Yu Tian, Congcong Wen, Min Shi, Muhammad Muneeb Afzal, Hao Huang, Muhammad Osama Khan, Yan Luo, Yi Fang, Mengyu Wang

    Abstract: Addressing fairness in artificial intelligence (AI), particularly in medical AI, is crucial for ensuring equitable healthcare outcomes. Recent efforts to enhance fairness have introduced new methodologies and datasets in medical AI. However, the fairness issue under the setting of domain transfer is almost unexplored, while it is common that clinics rely on different imaging technologies (e.g., di… ▽ More

    Submitted 18 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

    Comments: ECCV 2024; Codes and datasets are available at https://github.com/Harvard-Ophthalmology-AI-Lab/FairDomain

  12. arXiv:2407.06304  [pdf, other

    cs.CV cs.AI cs.CL

    VIMI: Grounding Video Generation through Multi-modal Instruction

    Authors: Yuwei Fang, Willi Menapace, Aliaksandr Siarohin, Tsai-Shien Chen, Kuan-Chien Wang, Ivan Skorokhodov, Graham Neubig, Sergey Tulyakov

    Abstract: Existing text-to-video diffusion models rely solely on text-only encoders for their pretraining. This limitation stems from the absence of large-scale multimodal prompt video datasets, resulting in a lack of visual grounding and restricting their versatility and application in multimodal integration. To address this, we construct a large-scale multimodal prompt dataset by employing retrieval metho… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  13. arXiv:2407.04208  [pdf, other

    cs.CV

    AMD: Automatic Multi-step Distillation of Large-scale Vision Models

    Authors: Cheng Han, Qifan Wang, Sohail A. Dianat, Majid Rabbani, Raghuveer M. Rao, Yi Fang, Qiang Guan, Lifu Huang, Dongfang Liu

    Abstract: Transformer-based architectures have become the de-facto standard models for diverse vision tasks owing to their superior performance. As the size of the models continues to scale up, model distillation becomes extremely important in various real applications, particularly on devices limited by computational resources. However, prevailing knowledge distillation methods exhibit diminished efficacy… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: 19 pages, 5 figures

  14. arXiv:2407.03542  [pdf

    eess.IV cs.CV cs.LG

    Probing Perfection: The Relentless Art of Meddling for Pulmonary Airway Segmentation from HRCT via a Human-AI Collaboration Based Active Learning Method

    Authors: Shiyi Wang, Yang Nan, Sheng Zhang, Federico Felder, Xiaodan Xing, Yingying Fang, Javier Del Ser, Simon L F Walsh, Guang Yang

    Abstract: In pulmonary tracheal segmentation, the scarcity of annotated data is a prevalent issue in medical segmentation. Additionally, Deep Learning (DL) methods face challenges: the opacity of 'black box' models and the need for performance enhancement. Our Human-Computer Interaction (HCI) based models (RS_UNet, LC_UNet, UUNet, and WD_UNet) address these challenges by combining diverse query strategies w… ▽ More

    Submitted 23 July, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

  15. A Learned Generalized Geodesic Distance Function-Based Approach for Node Feature Augmentation on Graphs

    Authors: Amitoz Azad, Yuan Fang

    Abstract: Geodesic distances on manifolds have numerous applications in image processing, computer graphics and computer vision. In this work, we introduce an approach called `LGGD' (Learned Generalized Geodesic Distances). This method involves generating node features by learning a generalized geodesic distance function through a training pipeline that incorporates training data, graph topology and the nod… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Accepted at KDD 2024 Research Track

  16. arXiv:2407.00474  [pdf, other

    cs.LG cs.AI

    MH-pFLGB: Model Heterogeneous personalized Federated Learning via Global Bypass for Medical Image Analysis

    Authors: Luyuan Xie, Manqing Lin, ChenMing Xu, Tianyu Luan, Zhipeng Zeng, Wenjun Qian, Cong Li, Yuejian Fang, Qingni Shen, Zhonghai Wu

    Abstract: In the evolving application of medical artificial intelligence, federated learning is notable for its ability to protect training data privacy. Federated learning facilitates collaborative model development without the need to share local data from healthcare institutions. Yet, the statistical and system heterogeneity among these institutions poses substantial challenges, which affects the effecti… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: arXiv admin note: text overlap with arXiv:2405.06822

  17. arXiv:2407.00462  [pdf, other

    cs.CV cs.AI

    pFLFE: Cross-silo Personalized Federated Learning via Feature Enhancement on Medical Image Segmentation

    Authors: Luyuan Xie, Manqing Lin, Siyuan Liu, ChenMing Xu, Tianyu Luan, Cong Li, Yuejian Fang, Qingni Shen, Zhonghai Wu

    Abstract: In medical image segmentation, personalized cross-silo federated learning (FL) is becoming popular for utilizing varied data across healthcare settings to overcome data scarcity and privacy concerns. However, existing methods often suffer from client drift, leading to inconsistent performance and delayed training. We propose a new framework, Personalized Federated Learning via Feature Enhancement… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  18. arXiv:2406.19959  [pdf, other

    cs.SD eess.AS

    RealMAN: A Real-Recorded and Annotated Microphone Array Dataset for Dynamic Speech Enhancement and Localization

    Authors: Bing Yang, Changsheng Quan, Yabo Wang, Pengyu Wang, Yujie Yang, Ying Fang, Nian Shao, Hui Bu, Xin Xu, Xiaofei Li

    Abstract: The training of deep learning-based multichannel speech enhancement and source localization systems relies heavily on the simulation of room impulse response and multichannel diffuse noise, due to the lack of large-scale real-recorded datasets. However, the acoustic mismatch between simulated and real-world data could degrade the model performance when applying in real-world scenarios. To bridge t… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  19. arXiv:2406.18552  [pdf, other

    cs.CV cs.AI

    Decoding Decision Reasoning: A Counterfactual-Powered Model for Knowledge Discovery

    Authors: Yingying Fang, Zihao Jin, Xiaodan Xing, Simon Walsh, Guang Yang

    Abstract: In medical imaging, particularly in early disease detection and prognosis tasks, discerning the rationale behind an AI model's predictions is crucial for evaluating the reliability of its decisions. Conventional explanation methods face challenges in identifying discernible decisive features in medical image classifications, where discriminative features are subtle or not immediately apparent. To… ▽ More

    Submitted 23 May, 2024; originally announced June 2024.

  20. arXiv:2406.18535  [pdf, other

    q-bio.BM cs.AI cs.IR

    DRAK: Unlocking Molecular Insights with Domain-Specific Retrieval-Augmented Knowledge in LLMs

    Authors: Jinzhe Liu, Xiangsheng Huang, Zhuo Chen, Yin Fang

    Abstract: Large Language Models (LLMs) encounter challenges with the unique syntax of specific domains, such as biomolecules. Existing fine-tuning or modality alignment techniques struggle to bridge the domain knowledge gap and understand complex molecular data, limiting LLMs' progress in specialized fields. To overcome these limitations, we propose an expandable and adaptable non-parametric knowledge injec… ▽ More

    Submitted 4 March, 2024; originally announced June 2024.

    Comments: Ongoing work; 11 pages, 6 Figures, 2 Tables

  21. arXiv:2406.18045  [pdf, other

    cs.CL cs.AI

    PharmaGPT: Domain-Specific Large Language Models for Bio-Pharmaceutical and Chemistry

    Authors: Linqing Chen, Weilei Wang, Zilong Bai, Peng Xu, Yan Fang, Jie Fang, Wentao Wu, Lizhi Zhou, Ruiji Zhang, Yubin Xia, Chaobo Xu, Ran Hu, Licong Xu, Qijun Cai, Haoran Hua, Jing Sun, Jin Liu, Tian Qiu, Haowen Liu, Meng Hu, Xiuwen Li, Fei Gao, Yufu Wang, Lin Tie, Chaochao Wang , et al. (11 additional authors not shown)

    Abstract: Large language models (LLMs) have revolutionized Natural Language Processing (NLP) by minimizing the need for complex feature engineering. However, the application of LLMs in specialized domains like biopharmaceuticals and chemistry remains largely unexplored. These fields are characterized by intricate terminologies, specialized knowledge, and a high demand for precision areas where general purpo… ▽ More

    Submitted 9 July, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

  22. arXiv:2406.17974  [pdf, other

    cs.CL cs.CV

    Evaluating Fairness in Large Vision-Language Models Across Diverse Demographic Attributes and Prompts

    Authors: Xuyang Wu, Yuan Wang, Hsin-Tai Wu, Zhiqiang Tao, Yi Fang

    Abstract: Large vision-language models (LVLMs) have recently achieved significant progress, demonstrating strong capabilities in open-world visual understanding. However, it is not yet clear how LVLMs address demographic biases in real life, especially the disparities across attributes such as gender, skin tone, and age. In this paper, we empirically investigate \emph{visual fairness} in several mainstream… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  23. arXiv:2406.17199  [pdf, other

    cs.LG

    Contrastive General Graph Matching with Adaptive Augmentation Sampling

    Authors: Jianyuan Bo, Yuan Fang

    Abstract: Graph matching has important applications in pattern recognition and beyond. Current approaches predominantly adopt supervised learning, demanding extensive labeled data which can be limited or costly. Meanwhile, self-supervised learning methods for graph matching often require additional side information such as extra categorical information and input features, limiting their application to the g… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  24. arXiv:2406.17173  [pdf, other

    eess.IV cs.CV cs.LG

    Diff3Dformer: Leveraging Slice Sequence Diffusion for Enhanced 3D CT Classification with Transformer Networks

    Authors: Zihao Jin, Yingying Fang, Jiahao Huang, Caiwen Xu, Simon Walsh, Guang Yang

    Abstract: The manifestation of symptoms associated with lung diseases can vary in different depths for individual patients, highlighting the significance of 3D information in CT scans for medical image classification. While Vision Transformer has shown superior performance over convolutional neural networks in image classification tasks, their effectiveness is often demonstrated on sufficiently large 2D dat… ▽ More

    Submitted 26 June, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

    Comments: conference

  25. arXiv:2406.16189  [pdf, other

    eess.IV cs.CV

    Fuzzy Attention-based Border Rendering Network for Lung Organ Segmentation

    Authors: Sheng Zhang, Yang Nan, Yingying Fang, Shiyi Wang, Xiaodan Xing, Zhifan Gao, Guang Yang

    Abstract: Automatic lung organ segmentation on CT images is crucial for lung disease diagnosis. However, the unlimited voxel values and class imbalance of lung organs can lead to false-negative/positive and leakage issues in advanced methods. Additionally, some slender lung organs are easily lost during the recycled down/up-sample procedure, e.g., bronchioles & arterioles, causing severe discontinuity issue… ▽ More

    Submitted 1 July, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

    Comments: MICCAI 2024

  26. arXiv:2406.15182  [pdf, other

    cs.CV

    DiffExplainer: Unveiling Black Box Models Via Counterfactual Generation

    Authors: Yingying Fang, Shuang Wu, Zihao Jin, Caiwen Xu, Shiyi Wang, Simon Walsh, Guang Yang

    Abstract: In the field of medical imaging, particularly in tasks related to early disease detection and prognosis, understanding the reasoning behind AI model predictions is imperative for assessing their reliability. Conventional explanation methods encounter challenges in identifying decisive features in medical image classifications, especially when discriminative features are subtle or not immediately e… ▽ More

    Submitted 26 June, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

    Comments: MICCAI 2024

  27. arXiv:2406.14966  [pdf, other

    cs.CY cs.CR

    AIGC-Chain: A Blockchain-Enabled Full Lifecycle Recording System for AIGC Product Copyright Management

    Authors: Jiajia Jiang, Moting Su, Xiangli Xiao, Yushu Zhang, Yuming Fang

    Abstract: As artificial intelligence technology becomes increasingly prevalent, Artificial Intelligence Generated Content (AIGC) is being adopted across various sectors. Although AIGC is playing an increasingly significant role in business and culture, questions surrounding its copyright have sparked widespread debate. The current legal framework for copyright and intellectual property is grounded in the co… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  28. arXiv:2406.12831  [pdf, other

    cs.CV cs.AI cs.MM

    VIA: A Spatiotemporal Video Adaptation Framework for Global and Local Video Editing

    Authors: Jing Gu, Yuwei Fang, Ivan Skorokhodov, Peter Wonka, Xinya Du, Sergey Tulyakov, Xin Eric Wang

    Abstract: Video editing stands as a cornerstone of digital media, from entertainment and education to professional communication. However, previous methods often overlook the necessity of comprehensively understanding both global and local contexts, leading to inaccurate and inconsistency edits in the spatiotemporal dimension, especially for long videos. In this paper, we introduce VIA, a unified spatiotemp… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 13 pages, 11 figures

  29. arXiv:2406.12426  [pdf, other

    cs.IT eess.SP

    Multi-Active-IRS-Assisted Cooperative Sensing: Cramér-Rao Bound and Joint Beamforming Design

    Authors: Yuan Fang, Xianghao Yu, Jie Xu, Ying-Jun Angela Zhang

    Abstract: This paper studies the multi-intelligent reflecting surface (IRS)-assisted cooperative sensing, in which multiple active IRSs are deployed in a distributed manner to facilitate multi-view target sensing at the non-line-of-sight (NLoS) area of the base station (BS). Different from prior works employing passive IRSs, we leverage active IRSs with the capability of amplifying the reflected signals to… ▽ More

    Submitted 18 July, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2404.13536

  30. arXiv:2406.12262  [pdf, other

    cs.LG stat.ML

    Investigating Data Usage for Inductive Conformal Predictors

    Authors: Yizirui Fang, Anthony Bellotti

    Abstract: Inductive conformal predictors (ICPs) are algorithms that are able to generate prediction sets, instead of point predictions, which are valid at a user-defined confidence level, only assuming exchangeability. These algorithms are useful for reliable machine learning and are increasing in popularity. The ICP development process involves dividing development data into three parts: training, calibrat… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  31. arXiv:2406.12052  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    UniGLM: Training One Unified Language Model for Text-Attributed Graphs

    Authors: Yi Fang, Dongzhe Fan, Sirui Ding, Ninghao Liu, Qiaoyu Tan

    Abstract: Representation learning on text-attributed graphs (TAGs), where nodes are represented by textual descriptions, is crucial for textual and relational knowledge systems and recommendation systems. Currently, state-of-the-art embedding methods for TAGs primarily focus on fine-tuning language models (e.g., BERT) using structure-aware training signals. While effective, these methods are tailored for in… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  32. arXiv:2406.11945  [pdf, other

    cs.LG cs.AI cs.IR

    GAugLLM: Improving Graph Contrastive Learning for Text-Attributed Graphs with Large Language Models

    Authors: Yi Fang, Dongzhe Fan, Daochen Zha, Qiaoyu Tan

    Abstract: This work studies self-supervised graph learning for text-attributed graphs (TAGs) where nodes are represented by textual attributes. Unlike traditional graph contrastive methods that perturb the numerical feature space and alter the graph's topological structure, we aim to improve view generation through language supervision. This is driven by the prevalence of textual attributes in real applicat… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  33. arXiv:2406.11687  [pdf, other

    cs.CL

    Tokenization Falling Short: The Curse of Tokenization

    Authors: Yekun Chai, Yewei Fang, Qiwei Peng, Xuhong Li

    Abstract: Language models typically tokenize raw text into sequences of subword identifiers from a predefined vocabulary, a process inherently sensitive to typographical errors, length variations, and largely oblivious to the internal structure of tokens-issues we term the curse of tokenization. In this study, we delve into these drawbacks and demonstrate that large language models (LLMs) remain susceptible… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  34. arXiv:2406.11514  [pdf, other

    cs.CL

    Counterfactual Debating with Preset Stances for Hallucination Elimination of LLMs

    Authors: Yi Fang, Moxin Li, Wenjie Wang, Hui Lin, Fuli Feng

    Abstract: Large Language Models (LLMs) excel in various natural language processing tasks but struggle with hallucination issues. Existing solutions have considered utilizing LLMs' inherent reasoning abilities to alleviate hallucination, such as self-correction and diverse sampling methods. However, these methods often overtrust LLMs' initial answers due to inherent biases. The key to alleviating this issue… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  35. arXiv:2406.10175  [pdf, other

    cs.CV

    Enhancing Incomplete Multi-modal Brain Tumor Segmentation with Intra-modal Asymmetry and Inter-modal Dependency

    Authors: Weide Liu, Jingwen Hou, Xiaoyang Zhong, Huijing Zhan, Jun Cheng, Yuming Fang, Guanghui Yue

    Abstract: Deep learning-based brain tumor segmentation (BTS) models for multi-modal MRI images have seen significant advancements in recent years. However, a common problem in practice is the unavailability of some modalities due to varying scanning protocols and patient conditions, making segmentation from incomplete MRI modalities a challenging issue. Previous methods have attempted to address this by fus… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  36. arXiv:2406.10098  [pdf, other

    cs.LG cs.AI

    ECGMamba: Towards Efficient ECG Classification with BiSSM

    Authors: Yupeng Qiang, Xunde Dong, Xiuling Liu, Yang Yang, Yihai Fang, Jianhong Dou

    Abstract: Electrocardiogram (ECG) signal analysis represents a pivotal technique in the diagnosis of cardiovascular diseases. Although transformer-based models have made significant progress in ECG classification, they exhibit inefficiencies in the inference phase. The issue is primarily attributable to the secondary computational complexity of Transformer's self-attention mechanism. particularly when proce… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 6 pages, 2 figures. arXiv admin note: text overlap with arXiv:2404.17858 by other authors

  37. arXiv:2406.10036  [pdf, other

    cs.IT

    Information Compression in the AI Era: Recent Advances and Future Challenges

    Authors: Jun Chen, Yong Fang, Ashish Khisti, Ayfer Ozgur, Nir Shlezinger, Chao Tian

    Abstract: This survey articles focuses on emerging connections between the fields of machine learning and data compression. While fundamental limits of classical (lossy) data compression are established using rate-distortion theory, the connections to machine learning have resulted in new theoretical analysis and application areas. We survey recent works on task-based and goal-oriented compression, the rate… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: arXiv admin note: text overlap with arXiv:2002.04290

  38. arXiv:2406.09389  [pdf, other

    eess.IV cs.CV

    Sagiri: Low Dynamic Range Image Enhancement with Generative Diffusion Prior

    Authors: Baiang Li, Sizhuo Ma, Yanhong Zeng, Xiaogang Xu, Youqing Fang, Zhao Zhang, Jian Wang, Kai Chen

    Abstract: Capturing High Dynamic Range (HDR) scenery using 8-bit cameras often suffers from over-/underexposure, loss of fine details due to low bit-depth compression, skewed color distributions, and strong noise in dark areas. Traditional LDR image enhancement methods primarily focus on color mapping, which enhances the visual representation by expanding the image's color range and adjusting the brightness… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: https://sagiri0208.github.io

  39. arXiv:2406.08310  [pdf, other

    cs.LG

    GraphFM: A Comprehensive Benchmark for Graph Foundation Model

    Authors: Yuhao Xu, Xinqi Liu, Keyu Duan, Yi Fang, Yu-Neng Chuang, Daochen Zha, Qiaoyu Tan

    Abstract: Foundation Models (FMs) serve as a general class for the development of artificial intelligence systems, offering broad potential for generalization across a spectrum of downstream tasks. Despite extensive research into self-supervised learning as the cornerstone of FMs, several outstanding issues persist in Graph Foundation Models that rely on graph self-supervised learning, namely: 1) Homogeniza… ▽ More

    Submitted 14 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  40. arXiv:2406.06839  [pdf, other

    cs.CL

    EAVE: Efficient Product Attribute Value Extraction via Lightweight Sparse-layer Interaction

    Authors: Li Yang, Qifan Wang, Jianfeng Chi, Jiahao Liu, Jingang Wang, Fuli Feng, Zenglin Xu, Yi Fang, Lifu Huang, Dongfang Liu

    Abstract: Product attribute value extraction involves identifying the specific values associated with various attributes from a product profile. While existing methods often prioritize the development of effective models to improve extraction performance, there has been limited emphasis on extraction efficiency. However, in real-world scenarios, products are typically associated with multiple attributes, ne… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  41. Accelerating evolutionary exploration through language model-based transfer learning

    Authors: Maximilian Reissmann, Yuan Fang, Andrew S. H. Ooi, Richard D. Sandberg

    Abstract: Gene expression programming is an evolutionary optimization algorithm with the potential to generate interpretable and easily implementable equations for regression problems. Despite knowledge gained from previous optimizations being potentially available, the initial candidate solutions are typically generated randomly at the beginning and often only include features or terms based on preliminary… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  42. arXiv:2406.04738  [pdf, other

    cs.DB cs.DS

    In-depth Analysis of Densest Subgraph Discovery in a Unified Framework

    Authors: Yingli Zhou, Qingshuo Guo, Yi Yang, Yixiang Fang, Chenhao Ma, Laks Lakshmanan

    Abstract: As a fundamental topic in graph mining, Densest Subgraph Discovery (DSD) has found a wide spectrum of real applications. Several DSD algorithms, including exact and approximation algorithms, have been proposed in the literature. However, these algorithms have not been systematically and comprehensively compared under the same experimental settings. In this paper, we first propose a unified framewo… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: 19pages, 27 figures

  43. arXiv:2406.03824  [pdf, other

    cs.LG cs.IT

    Predictability Analysis of Regression Problems via Conditional Entropy Estimations

    Authors: Yu-Hsueh Fang, Chia-Yen Lee

    Abstract: In the field of machine learning, regression problems are pivotal due to their ability to predict continuous outcomes. Traditional error metrics like mean squared error, mean absolute error, and coefficient of determination measure model accuracy. The model accuracy is the consequence of the selected model and the features, which blurs the analysis of contribution. Predictability, in the other han… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  44. arXiv:2406.00344  [pdf, other

    cs.SI cs.DB

    Efficient Historical Butterfly Counting in Large Temporal Bipartite Networks via Graph Structure-aware Index

    Authors: Qiuyang Mang, Jingbang Chen, Hangrui Zhou, Yu Gao, Yingli Zhou, Richard Peng, Yixiang Fang, Chenhao Ma

    Abstract: Bipartite graphs are ubiquitous in many domains, e.g., e-commerce platforms, social networks, and academia, by modeling interactions between distinct entity sets. Within these graphs, the butterfly motif, a complete 2*2 biclique, represents the simplest yet significant subgraph structure, crucial for analyzing complex network patterns. Counting the butterflies offers significant benefits across va… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  45. arXiv:2406.00333  [pdf, other

    cs.IR

    A Practice-Friendly Two-Stage LLM-Enhanced Paradigm in Sequential Recommendation

    Authors: Dugang Liu, Shenxian Xian, Xiaolin Lin, Xiaolian Zhang, Hong Zhu, Yuan Fang, Zhen Chen, Zhong Ming

    Abstract: The training paradigm integrating large language models (LLM) is gradually reshaping sequential recommender systems (SRS) and has shown promising results. However, most existing LLM-enhanced methods rely on rich textual information on the item side and instance-level supervised fine-tuning (SFT) to inject collaborative information into LLM, which is inefficient and limited in many applications. To… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  46. arXiv:2405.20654  [pdf, other

    cs.CL cs.IR

    Passage-specific Prompt Tuning for Passage Reranking in Question Answering with Large Language Models

    Authors: Xuyang Wu, Zhiyuan Peng, Krishna Sravanthi Rajanala Sai, Hsin-Tai Wu, Yi Fang

    Abstract: Effective passage retrieval and reranking methods have been widely utilized to identify suitable candidates in open-domain question answering tasks, recent studies have resorted to LLMs for reranking the retrieved passages by the log-likelihood of the question conditioned on each passage. Although these methods have demonstrated promising results, the performance is notably sensitive to the human-… ▽ More

    Submitted 20 June, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

    Comments: Accepted at Gen-IR@SIGIR24

  47. arXiv:2405.19298  [pdf, other

    cs.CV eess.IV

    Adaptive Image Quality Assessment via Teaching Large Multimodal Model to Compare

    Authors: Hanwei Zhu, Haoning Wu, Yixuan Li, Zicheng Zhang, Baoliang Chen, Lingyu Zhu, Yuming Fang, Guangtao Zhai, Weisi Lin, Shiqi Wang

    Abstract: While recent advancements in large multimodal models (LMMs) have significantly improved their abilities in image quality assessment (IQA) relying on absolute quality rating, how to transfer reliable relative quality comparison outputs to continuous perceptual quality scores remains largely unexplored. To address this gap, we introduce Compare2Score-an all-around LMM-based no-reference IQA (NR-IQA)… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  48. arXiv:2405.13937  [pdf, other

    cs.LG

    DyGPrompt: Learning Feature and Time Prompts on Dynamic Graphs

    Authors: Xingtong Yu, Zhenghao Liu, Yuan Fang, Xinming Zhang

    Abstract: Dynamic graphs are pervasive in the real world, modeling dynamic relations between objects across various fields. For dynamic graph modeling, dynamic graph neural networks (DGNNs) have emerged as a mainstream technique, which are generally pre-trained on the link prediction task, leaving a significant gap from the objectives of downstream tasks such as node classification. To bridge the gap, promp… ▽ More

    Submitted 2 July, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

    Comments: Under review

  49. arXiv:2405.13934  [pdf, other

    cs.LG

    Text-Free Multi-domain Graph Pre-training: Toward Graph Foundation Models

    Authors: Xingtong Yu, Chang Zhou, Yuan Fang, Xinming Zhang

    Abstract: Given the ubiquity of graph data, it is intriguing to ask: Is it possible to train a graph foundation model on a broad range of graph data across diverse domains? A major hurdle toward this goal lies in the fact that graphs from different domains often exhibit profoundly divergent characteristics. Although there have been some initial efforts in integrating multi-domain graphs for pre-training, th… ▽ More

    Submitted 28 May, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

    Comments: Under review

  50. arXiv:2405.09863  [pdf, other

    cs.CV cs.AI

    Box-Free Model Watermarks Are Prone to Black-Box Removal Attacks

    Authors: Haonan An, Guang Hua, Zhiping Lin, Yuguang Fang

    Abstract: Box-free model watermarking is an emerging technique to safeguard the intellectual property of deep learning models, particularly those for low-level image processing tasks. Existing works have verified and improved its effectiveness in several aspects. However, in this paper, we reveal that box-free model watermarking is prone to removal attacks, even under the real-world threat model such that t… ▽ More

    Submitted 21 May, 2024; v1 submitted 16 May, 2024; originally announced May 2024.