subscribe to arXiv mailings

A Closer Look at GAN Priors: Exploiting Intermediate Features for Enhanced Model Inversion Attacks

Authors: Yixiang Qiu, Hao Fang, Hongyao Yu, Bin Chen, MeiKang Qiu, Shu-Tao Xia

Abstract: Model Inversion (MI) attacks aim to reconstruct privacy-sensitive training data from released models by utilizing output information, raising extensive concerns about the security of Deep Neural Networks (DNNs). Recent advances in generative adversarial networks (GANs) have contributed significantly to the improved performance of MI attacks due to their powerful ability to generate realistic image… ▽ More Model Inversion (MI) attacks aim to reconstruct privacy-sensitive training data from released models by utilizing output information, raising extensive concerns about the security of Deep Neural Networks (DNNs). Recent advances in generative adversarial networks (GANs) have contributed significantly to the improved performance of MI attacks due to their powerful ability to generate realistic images with high fidelity and appropriate semantics. However, previous MI attacks have solely disclosed private information in the latent space of GAN priors, limiting their semantic extraction and transferability across multiple target models and datasets. To address this challenge, we propose a novel method, Intermediate Features enhanced Generative Model Inversion (IF-GMI), which disassembles the GAN structure and exploits features between intermediate blocks. This allows us to extend the optimization space from latent code to intermediate features with enhanced expressive capabilities. To prevent GAN priors from generating unrealistic images, we apply a L1 ball constraint to the optimization process. Experiments on multiple benchmarks demonstrate that our method significantly outperforms previous approaches and achieves state-of-the-art results under various settings, especially in the out-of-distribution (OOD) scenario. Our code is available at: https://github.com/final-solution/IF-GMI △ Less

Submitted 27 July, 2024; v1 submitted 18 July, 2024; originally announced July 2024.

Comments: ECCV 2024

arXiv:2407.11356 [pdf, other]

The Devil is in the Statistics: Mitigating and Exploiting Statistics Difference for Generalizable Semi-supervised Medical Image Segmentation

Authors: Muyang Qiu, Jian Zhang, Lei Qi, Qian Yu, Yinghuan Shi, Yang Gao

Abstract: Despite the recent success of domain generalization in medical image segmentation, voxel-wise annotation for all source domains remains a huge burden. Semi-supervised domain generalization has been proposed very recently to combat this challenge by leveraging limited labeled data along with abundant unlabeled data collected from multiple medical institutions, depending on precisely harnessing unla… ▽ More Despite the recent success of domain generalization in medical image segmentation, voxel-wise annotation for all source domains remains a huge burden. Semi-supervised domain generalization has been proposed very recently to combat this challenge by leveraging limited labeled data along with abundant unlabeled data collected from multiple medical institutions, depending on precisely harnessing unlabeled data while improving generalization simultaneously. In this work, we observe that domain shifts between medical institutions cause disparate feature statistics, which significantly deteriorates pseudo-label quality due to an unexpected normalization process. Nevertheless, this phenomenon could be exploited to facilitate unseen domain generalization. Therefore, we propose 1) multiple statistics-individual branches to mitigate the impact of domain shifts for reliable pseudo-labels and 2) one statistics-aggregated branch for domain-invariant feature learning. Furthermore, to simulate unseen domains with statistics difference, we approach this from two aspects, i.e., a perturbation with histogram matching at image level and a random batch normalization selection strategy at feature level, producing diverse statistics to expand the training distribution. Evaluation results on three medical image datasets demonstrate the effectiveness of our method compared with recent SOTA methods. The code is available at https://github.com/qiumuyang/SIAB. △ Less

Submitted 15 July, 2024; originally announced July 2024.

arXiv:2407.10563 [pdf, other]

Pathformer3D: A 3D Scanpath Transformer for 360° Images

Authors: Rong Quan, Yantao Lai, Mengyu Qiu, Dong Liang

Abstract: Scanpath prediction in 360° images can help realize rapid rendering and better user interaction in Virtual/Augmented Reality applications. However, existing scanpath prediction models for 360° images execute scanpath prediction on 2D equirectangular projection plane, which always result in big computation error owing to the 2D plane's distortion and coordinate discontinuity. In this work, we perfo… ▽ More Scanpath prediction in 360° images can help realize rapid rendering and better user interaction in Virtual/Augmented Reality applications. However, existing scanpath prediction models for 360° images execute scanpath prediction on 2D equirectangular projection plane, which always result in big computation error owing to the 2D plane's distortion and coordinate discontinuity. In this work, we perform scanpath prediction for 360° images in 3D spherical coordinate system and proposed a novel 3D scanpath Transformer named Pathformer3D. Specifically, a 3D Transformer encoder is first used to extract 3D contextual feature representation for the 360° image. Then, the contextual feature representation and historical fixation information are input into a Transformer decoder to output current time step's fixation embedding, where the self-attention module is used to imitate the visual working memory mechanism of human visual system and directly model the time dependencies among the fixations. Finally, a 3D Gaussian distribution is learned from each fixation embedding, from which the fixation position can be sampled. Evaluation on four panoramic eye-tracking datasets demonstrates that Pathformer3D outperforms the current state-of-the-art methods. Code is available at https://github.com/lsztzp/Pathformer3D . △ Less

Submitted 15 July, 2024; originally announced July 2024.

Comments: ECCV 2024

arXiv:2407.09966 [pdf, other]

Optimizing ROI Benefits Vehicle ReID in ITS

Authors: Mei Qiu, Lauren Ann Christopher, Lingxi Li, Stanley Chien, Yaobin Chen

Abstract: Vehicle re-identification (ReID) is a computer vision task that matches the same vehicle across different cameras or viewpoints in a surveillance system. This is crucial for Intelligent Transportation Systems (ITS), where the effectiveness is influenced by the regions from which vehicle images are cropped. This study explores whether optimal vehicle detection regions, guided by detection confidenc… ▽ More Vehicle re-identification (ReID) is a computer vision task that matches the same vehicle across different cameras or viewpoints in a surveillance system. This is crucial for Intelligent Transportation Systems (ITS), where the effectiveness is influenced by the regions from which vehicle images are cropped. This study explores whether optimal vehicle detection regions, guided by detection confidence scores, can enhance feature matching and ReID tasks. Using our framework with multiple Regions of Interest (ROIs) and lane-wise vehicle counts, we employed YOLOv8 for detection and DeepSORT for tracking across twelve Indiana Highway videos, including two pairs of videos from non-overlapping cameras. Tracked vehicle images were cropped from inside and outside the ROIs at five-frame intervals. Features were extracted using pre-trained models: ResNet50, ResNeXt50, Vision Transformer, and Swin-Transformer. Feature consistency was assessed through cosine similarity, information entropy, and clustering variance. Results showed that features from images cropped inside ROIs had higher mean cosine similarity values compared to those involving one image inside and one outside the ROIs. The most significant difference was observed during night conditions (0.7842 inside vs. 0.5 outside the ROI with Swin-Transformer) and in cross-camera scenarios (0.75 inside-inside vs. 0.52 inside-outside the ROI with Vision Transformer). Information entropy and clustering variance further supported that features in ROIs are more consistent. These findings suggest that strategically selected ROIs can enhance tracking performance and ReID accuracy in ITS. △ Less

Submitted 13 July, 2024; originally announced July 2024.

arXiv:2407.07842 [pdf, other]

Study on Aspect Ratio Variability toward Robustness of Vision Transformer-based Vehicle Re-identification

Authors: Mei Qiu, Lauren Christopher, Lingxi Li

Abstract: Vision Transformers (ViTs) have excelled in vehicle re-identification (ReID) tasks. However, non-square aspect ratios of image or video input might significantly affect the re-identification performance. To address this issue, we propose a novel ViT-based ReID framework in this paper, which fuses models trained on a variety of aspect ratios. Our main contributions are threefold: (i) We analyze asp… ▽ More Vision Transformers (ViTs) have excelled in vehicle re-identification (ReID) tasks. However, non-square aspect ratios of image or video input might significantly affect the re-identification performance. To address this issue, we propose a novel ViT-based ReID framework in this paper, which fuses models trained on a variety of aspect ratios. Our main contributions are threefold: (i) We analyze aspect ratio performance on VeRi-776 and VehicleID datasets, guiding input settings based on aspect ratios of original images. (ii) We introduce patch-wise mixup intra-image during ViT patchification (guided by spatial attention scores) and implement uneven stride for better object aspect ratio matching. (iii) We propose a dynamic feature fusing ReID network, enhancing model robustness. Our ReID method achieves a significantly improved mean Average Precision (mAP) of 91.0\% compared to the the closest state-of-the-art (CAL) result of 80.9\% on VehicleID dataset. △ Less

Submitted 10 July, 2024; originally announced July 2024.

arXiv:2407.05610 [pdf, other]

Described Spatial-Temporal Video Detection

Authors: Wei Ji, Xiangyan Liu, Yingfei Sun, Jiajun Deng, You Qin, Ammar Nuwanna, Mengyao Qiu, Lina Wei, Roger Zimmermann

Abstract: Detecting visual content on language expression has become an emerging topic in the community. However, in the video domain, the existing setting, i.e., spatial-temporal video grounding (STVG), is formulated to only detect one pre-existing object in each frame, ignoring the fact that language descriptions can involve none or multiple entities within a video. In this work, we advance the STVG to a… ▽ More Detecting visual content on language expression has become an emerging topic in the community. However, in the video domain, the existing setting, i.e., spatial-temporal video grounding (STVG), is formulated to only detect one pre-existing object in each frame, ignoring the fact that language descriptions can involve none or multiple entities within a video. In this work, we advance the STVG to a more practical setting called described spatial-temporal video detection (DSTVD) by overcoming the above limitation. To facilitate the exploration of DSTVD, we first introduce a new benchmark, namely DVD-ST. Notably, DVD-ST supports grounding from none to many objects onto the video in response to queries and encompasses a diverse range of over 150 entities, including appearance, actions, locations, and interactions. The extensive breadth and diversity of the DVD-ST dataset make it an exemplary testbed for the investigation of DSTVD. In addition to the new benchmark, we further present two baseline methods for our proposed DSTVD task by extending two representative STVG models, i.e., TubeDETR, and STCAT. These extended models capitalize on tubelet queries to localize and track referred objects across the video sequence. Besides, we adjust the training objectives of these models to optimize spatial and temporal localization accuracy and multi-class classification capabilities. Furthermore, we benchmark the baselines on the introduced DVD-ST dataset and conduct extensive experimental analysis to guide future investigation. Our code and benchmark will be publicly available. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.04688 [pdf, other]

Enhancing Vehicle Re-identification and Matching for Weaving Analysis

Authors: Mei Qiu, Wei Lin, Stanley Chien, Lauren Christopher, Yaobin Chen, Shu Hu

Abstract: Vehicle weaving on highways contributes to traffic congestion, raises safety issues, and underscores the need for sophisticated traffic management systems. Current tools are inadequate in offering precise and comprehensive data on lane-specific weaving patterns. This paper introduces an innovative method for collecting non-overlapping video data in weaving zones, enabling the generation of quantit… ▽ More Vehicle weaving on highways contributes to traffic congestion, raises safety issues, and underscores the need for sophisticated traffic management systems. Current tools are inadequate in offering precise and comprehensive data on lane-specific weaving patterns. This paper introduces an innovative method for collecting non-overlapping video data in weaving zones, enabling the generation of quantitative insights into lane-specific weaving behaviors. Our experimental results confirm the efficacy of this approach, delivering critical data that can assist transportation authorities in enhancing traffic control and roadway infrastructure. △ Less

Submitted 5 July, 2024; originally announced July 2024.

arXiv:2407.00748 [pdf, other]

doi 10.1145/3637528.3671737

Self-consistent Deep Geometric Learning for Heterogeneous Multi-source Spatial Point Data Prediction

Authors: Dazhou Yu, Xiaoyun Gong, Yun Li, Meikang Qiu, Liang Zhao

Abstract: Multi-source spatial point data prediction is crucial in fields like environmental monitoring and natural resource management, where integrating data from various sensors is the key to achieving a holistic environmental understanding. Existing models in this area often fall short due to their domain-specific nature and lack a strategy for integrating information from various sources in the absence… ▽ More Multi-source spatial point data prediction is crucial in fields like environmental monitoring and natural resource management, where integrating data from various sensors is the key to achieving a holistic environmental understanding. Existing models in this area often fall short due to their domain-specific nature and lack a strategy for integrating information from various sources in the absence of ground truth labels. Key challenges include evaluating the quality of different data sources and modeling spatial relationships among them effectively. Addressing these issues, we introduce an innovative multi-source spatial point data prediction framework that adeptly aligns information from varied sources without relying on ground truth labels. A unique aspect of our method is the 'fidelity score,' a quantitative measure for evaluating the reliability of each data source. Furthermore, we develop a geo-location-aware graph neural network tailored to accurately depict spatial relationships between data points. Our framework has been rigorously tested on two real-world datasets and one synthetic dataset. The results consistently demonstrate its superior performance over existing state-of-the-art methods. △ Less

Submitted 30 June, 2024; originally announced July 2024.

arXiv:2406.09317 [pdf, other]

Common and Rare Fundus Diseases Identification Using Vision-Language Foundation Model with Knowledge of Over 400 Diseases

Authors: Meng Wang, Tian Lin, Aidi Lin, Kai Yu, Yuanyuan Peng, Lianyu Wang, Cheng Chen, Ke Zou, Huiyu Liang, Man Chen, Xue Yao, Meiqin Zhang, Binwei Huang, Chaoxin Zheng, Peixin Zhang, Wei Chen, Yilong Luo, Yifan Chen, Honghe Xia, Tingkun Shi, Qi Zhang, Jinming Guo, Xiaolin Chen, Jingcheng Wang, Yih Chung Tham , et al. (24 additional authors not shown)

Abstract: Previous foundation models for retinal images were pre-trained with limited disease categories and knowledge base. Here we introduce RetiZero, a vision-language foundation model that leverages knowledge from over 400 fundus diseases. To RetiZero's pre-training, we compiled 341,896 fundus images paired with text descriptions, sourced from public datasets, ophthalmic literature, and online resources… ▽ More Previous foundation models for retinal images were pre-trained with limited disease categories and knowledge base. Here we introduce RetiZero, a vision-language foundation model that leverages knowledge from over 400 fundus diseases. To RetiZero's pre-training, we compiled 341,896 fundus images paired with text descriptions, sourced from public datasets, ophthalmic literature, and online resources, encompassing a diverse range of diseases across multiple ethnicities and countries. RetiZero exhibits superior performance in several downstream tasks, including zero-shot disease recognition, image-to-image retrieval, and internal- and cross-domain disease identification. In zero-shot scenarios, RetiZero achieves Top5 accuracy scores of 0.8430 for 15 fundus diseases and 0.7561 for 52 fundus diseases. For image retrieval, it achieves Top5 scores of 0.9500 and 0.8860 for the same disease sets, respectively. Clinical evaluations show that RetiZero's Top3 zero-shot performance surpasses the average of 19 ophthalmologists from Singapore, China and the United States. Furthermore, RetiZero significantly enhances clinicians' accuracy in diagnosing fundus disease. These findings underscore the value of integrating the RetiZero foundation model into clinical settings, where a variety of fundus diseases are encountered. △ Less

Submitted 30 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.05704 [pdf, other]

Hierarchical Features Matter: A Deep Exploration of GAN Priors for Improved Dataset Distillation

Authors: Xinhao Zhong, Hao Fang, Bin Chen, Xulin Gu, Tao Dai, Meikang Qiu, Shu-Tao Xia

Abstract: Dataset distillation is an emerging dataset reduction method, which condenses large-scale datasets while maintaining task accuracy. Current methods have integrated parameterization techniques to boost synthetic dataset performance by shifting the optimization space from pixel to another informative feature domain. However, they limit themselves to a fixed optimization space for distillation, negle… ▽ More Dataset distillation is an emerging dataset reduction method, which condenses large-scale datasets while maintaining task accuracy. Current methods have integrated parameterization techniques to boost synthetic dataset performance by shifting the optimization space from pixel to another informative feature domain. However, they limit themselves to a fixed optimization space for distillation, neglecting the diverse guidance across different informative latent spaces. To overcome this limitation, we propose a novel parameterization method dubbed Hierarchical Generative Latent Distillation (H-GLaD), to systematically explore hierarchical layers within the generative adversarial networks (GANs). This allows us to progressively span from the initial latent space to the final pixel space. In addition, we introduce a novel class-relevant feature distance metric to alleviate the computational burden associated with synthetic dataset evaluation, bridging the gap between synthetic and original datasets. Experimental results demonstrate that the proposed H-GLaD achieves a significant improvement in both same-architecture and cross-architecture performance with equivalent time consumption. △ Less

Submitted 12 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

arXiv:2405.16919 [pdf, other]

VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models

Authors: Zejun Li, Ruipu Luo, Jiwen Zhang, Minghui Qiu, Zhongyu Wei

Abstract: While large multi-modal models (LMMs) have exhibited impressive capabilities across diverse tasks, their effectiveness in handling complex tasks has been limited by the prevailing single-step reasoning paradigm. To this end, this paper proposes VoCoT, a multi-step Visually grounded object-centric Chain-of-Thought reasoning framework tailored for inference with LMMs. VoCoT is characterized by two k… ▽ More While large multi-modal models (LMMs) have exhibited impressive capabilities across diverse tasks, their effectiveness in handling complex tasks has been limited by the prevailing single-step reasoning paradigm. To this end, this paper proposes VoCoT, a multi-step Visually grounded object-centric Chain-of-Thought reasoning framework tailored for inference with LMMs. VoCoT is characterized by two key features: (1) object-centric reasoning paths that revolve around cross-modal shared object-level information, and (2) visually grounded representation of object concepts in a multi-modal interleaved and aligned manner, which effectively bridges the modality gap within LMMs during long-term generation. Additionally, we construct an instruction dataset to facilitate LMMs in adapting to reasoning with VoCoT. By introducing VoCoT into the prevalent open-source LMM architecture, we introduce VolCano. With only 7B parameters and limited input resolution, VolCano demonstrates excellent performance across various scenarios, surpassing SOTA models, including GPT-4V, in tasks requiring complex reasoning. Our code, data and model will be available at https://github.com/RupertLuo/VoCoT. △ Less

Submitted 28 May, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.07547 [pdf, other]

Channel Coding Toward 6G: Technical Overview and Outlook

Authors: Mohammad Rowshan, Min Qiu, Yixuan Xie, Xinyi Gu, Jinhong Yuan

Abstract: Channel coding plays a pivotal role in ensuring reliable communication over wireless channels. With the growing need for ultra-reliable communication in emerging wireless use cases, the significance of channel coding has amplified. Furthermore, minimizing decoding latency is crucial for critical-mission applications, while optimizing energy efficiency is paramount for mobile and the Internet of Th… ▽ More Channel coding plays a pivotal role in ensuring reliable communication over wireless channels. With the growing need for ultra-reliable communication in emerging wireless use cases, the significance of channel coding has amplified. Furthermore, minimizing decoding latency is crucial for critical-mission applications, while optimizing energy efficiency is paramount for mobile and the Internet of Things (IoT) communications. As the fifth generation (5G) of mobile communications is currently in operation and 5G-advanced is on the horizon, the objective of this paper is to assess prominent channel coding schemes in the context of recent advancements and the anticipated requirements for the sixth generation (6G). In this paper, after considering the potential impact of channel coding on key performance indicators (KPIs) of wireless networks, we review the evolution of mobile communication standards and the organizations involved in the standardization, from the first generation (1G) to the current 5G, highlighting the technologies integral to achieving targeted KPIs such as reliability, data rate, latency, energy efficiency, spectral efficiency, connection density, and traffic capacity. Following this, we delve into the anticipated requirements for potential use cases in 6G. The subsequent sections of the paper focus on a comprehensive review of three primary coding schemes utilized in past generations and their recent advancements: low-density parity-check (LDPC) codes, turbo codes (including convolutional codes), polar codes (alongside Reed-Muller codes). Additionally, we examine alternative coding schemes like Fountain codes and sparse regression codes. Our evaluation includes a comparative analysis of error correction performance and the performance of hardware implementation for these coding schemes, providing insights into their potential and suitability for the upcoming 6G era. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: 102 pages, 87 figures, IEEE Open Journal of the Communications Society (invited paper)

arXiv:2404.15212 [pdf, other]

Real-time Lane-wise Traffic Monitoring in Optimal ROIs

Authors: Mei Qiu, Wei Lin, Lauren Ann Christopher, Stanley Chien, Yaobin Chen, Shu Hu

Abstract: In the US, thousands of Pan, Tilt, and Zoom (PTZ) traffic cameras monitor highway conditions. There is a great interest in using these highway cameras to gather valuable road traffic data to support traffic analysis and decision-making for highway safety and efficient traffic management. However, there are too many cameras for a few human traffic operators to effectively monitor, so a fully automa… ▽ More In the US, thousands of Pan, Tilt, and Zoom (PTZ) traffic cameras monitor highway conditions. There is a great interest in using these highway cameras to gather valuable road traffic data to support traffic analysis and decision-making for highway safety and efficient traffic management. However, there are too many cameras for a few human traffic operators to effectively monitor, so a fully automated solution is desired. This paper introduces a novel system that learns the locations of highway lanes and traffic directions from these camera feeds automatically. It collects real-time, lane-specific traffic data continuously, even adjusting for changes in camera angle or zoom. This facilitates efficient traffic analysis, decision-making, and improved highway safety. △ Less

Submitted 28 March, 2024; originally announced April 2024.

arXiv:2404.04861 [pdf, other]

Privacy-Preserving Traceable Functional Encryption for Inner Product

Authors: Muyao Qiu, Jinguang Han

Abstract: Functional encryption introduces a new paradigm of public key encryption that decryption only reveals the function value of encrypted data. To curb key leakage issues and trace users in FE-IP, a new primitive called traceable functional encryption for inner product (TFE-IP) has been proposed. However, the privacy protection of user's identities has not been considered in the existing TFE-IP scheme… ▽ More Functional encryption introduces a new paradigm of public key encryption that decryption only reveals the function value of encrypted data. To curb key leakage issues and trace users in FE-IP, a new primitive called traceable functional encryption for inner product (TFE-IP) has been proposed. However, the privacy protection of user's identities has not been considered in the existing TFE-IP schemes. In order to balance privacy and accountability, we propose the concept of privacy-preserving traceable functional encryption for inner product (PPTFE-IP) and give a concrete construction. Our scheme provides the following features: (1) To prevent key sharing, a user's key is bound with both his/her identity and a vector; (2) The key generation center (KGC) and a user execute a two-party secure computing protocol to generate a key without the former knowing anything about the latter's identity; (3) Each user can verify the correctness of his/her key; (4) A user can calculate the inner product of the two vectors embedded in his/her key and in a ciphertext; (5) Only the tracer can trace the identity embedded in a key. The security of our scheme is formally reduced to well-known complexity assumptions, and the implementation is conducted to evaluate its efficiency. The novelty of our scheme is to protect users' privacy and provide traceability if required. △ Less

Submitted 14 April, 2024; v1 submitted 7 April, 2024; originally announced April 2024.

arXiv:2403.14922 [pdf, other]

CODA: A COst-efficient Test-time Domain Adaptation Mechanism for HAR

Authors: Minghui Qiu, Yandao Huang, Lin Chen, Lu Wang, Kaishun Wu

Abstract: In recent years, emerging research on mobile sensing has led to novel scenarios that enhance daily life for humans, but dynamic usage conditions often result in performance degradation when systems are deployed in real-world settings. Existing solutions typically employ one-off adaptation schemes based on neural networks, which struggle to ensure robustness against uncertain drifting conditions in… ▽ More In recent years, emerging research on mobile sensing has led to novel scenarios that enhance daily life for humans, but dynamic usage conditions often result in performance degradation when systems are deployed in real-world settings. Existing solutions typically employ one-off adaptation schemes based on neural networks, which struggle to ensure robustness against uncertain drifting conditions in human-centric sensing scenarios. In this paper, we propose CODA, a COst-efficient Domain Adaptation mechanism for mobile sensing that addresses real-time drifts from the data distribution perspective with active learning theory, ensuring cost-efficient adaptation directly on the device. By incorporating a clustering loss and importance-weighted active learning algorithm, CODA retains the relationship between different clusters during cost-effective instance-level updates, preserving meaningful structure within the data distribution. We also showcase its generalization by seamlessly integrating it with Neural Network-based solutions for Human Activity Recognition tasks. Through meticulous evaluations across diverse datasets, including phone-based, watch-based, and integrated sensor-based sensing tasks, we demonstrate the feasibility and potential of online adaptation with CODA. The promising results achieved by CODA, even without learnable parameters, also suggest the possibility of realizing unobtrusive adaptation through specific application designs with sufficient feedback. △ Less

Submitted 21 March, 2024; originally announced March 2024.

arXiv:2402.17613 [pdf, other]

Neural Automated Writing Evaluation with Corrective Feedback

Authors: Izia Xiaoxiao Wang, Xihan Wu, Edith Coates, Min Zeng, Jiexin Kuang, Siliang Liu, Mengyang Qiu, Jungyeul Park

Abstract: The utilization of technology in second language learning and teaching has become ubiquitous. For the assessment of writing specifically, automated writing evaluation (AWE) and grammatical error correction (GEC) have become immensely popular and effective methods for enhancing writing proficiency and delivering instant and individualized feedback to learners. By leveraging the power of natural lan… ▽ More The utilization of technology in second language learning and teaching has become ubiquitous. For the assessment of writing specifically, automated writing evaluation (AWE) and grammatical error correction (GEC) have become immensely popular and effective methods for enhancing writing proficiency and delivering instant and individualized feedback to learners. By leveraging the power of natural language processing (NLP) and machine learning algorithms, AWE and GEC systems have been developed separately to provide language learners with automated corrective feedback and more accurate and unbiased scoring that would otherwise be subject to examiners. In this paper, we propose an integrated system for automated writing evaluation with corrective feedback as a means of bridging the gap between AWE and GEC results for second language learners. This system enables language learners to simulate the essay writing tests: a student writes and submits an essay, and the system returns the assessment of the writing along with suggested grammatical error corrections. Given that automated scoring and grammatical correction are more efficient and cost-effective than human grading, this integrated system would also alleviate the burden of manually correcting innumerable essays. △ Less

Submitted 6 May, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

Comments: Supported by the SoTL Seed Program at UBC

arXiv:2402.15931 [pdf, ps, other]

Frustratingly Simple Prompting-based Text Denoising

Authors: Jungyeul Park, Mengyang Qiu

Abstract: This paper introduces a novel perspective on the automated essay scoring (AES) task, challenging the conventional view of the ASAP dataset as a static entity. Employing simple text denoising techniques using prompting, we explore the dynamic potential within the dataset. While acknowledging the previous emphasis on building regression systems, our paper underscores how making minor changes to a da… ▽ More This paper introduces a novel perspective on the automated essay scoring (AES) task, challenging the conventional view of the ASAP dataset as a static entity. Employing simple text denoising techniques using prompting, we explore the dynamic potential within the dataset. While acknowledging the previous emphasis on building regression systems, our paper underscores how making minor changes to a dataset through text denoising can enhance the final results. △ Less

Submitted 24 February, 2024; originally announced February 2024.

Comments: Published as a Tiny Paper at ICLR 2024

arXiv:2402.15930 [pdf, ps, other]

Evaluating Prompting Strategies for Grammatical Error Correction Based on Language Proficiency

Authors: Min Zeng, Jiexin Kuang, Mengyang Qiu, Jayoung Song, Jungyeul Park

Abstract: The writing examples of English language learners may be different from those of native speakers. Given that there is a significant differences in second language (L2) learners' error types by their proficiency levels, this paper attempts to reduce overcorrection by examining the interaction between LLM's performance and L2 language proficiency. Our method focuses on zero-shot and few-shot prompti… ▽ More The writing examples of English language learners may be different from those of native speakers. Given that there is a significant differences in second language (L2) learners' error types by their proficiency levels, this paper attempts to reduce overcorrection by examining the interaction between LLM's performance and L2 language proficiency. Our method focuses on zero-shot and few-shot prompting and fine-tuning models for GEC for learners of English as a foreign language based on the different proficiency. We investigate GEC results and find that overcorrection happens primarily in advanced language learners' writing (proficiency C) rather than proficiency A (a beginner level) and proficiency B (an intermediate level). Fine-tuned LLMs, and even few-shot prompting with writing examples of English learners, actually tend to exhibit decreased recall measures. To make our claim concrete, we conduct a comprehensive examination of GEC outcomes and their evaluation results based on language proficiency. △ Less

Submitted 24 February, 2024; originally announced February 2024.

Comments: To appear in LREC-COLING 2024, short paper (preprint)

arXiv:2402.15521 [pdf, other]

HKD-SHO: A hybrid smart home system based on knowledge-based and data-driven services

Authors: Mingming Qiu, Elie Najm, Rémi Sharrock, Bruno Traverson

Abstract: A smart home is realized by setting up various services. Several methods have been proposed to create smart home services, which can be divided into knowledge-based and data-driven approaches. However, knowledge-based approaches usually require manual input from the inhabitant, which can be complicated if the physical phenomena of the concerned environment states are complex, and the inhabitant do… ▽ More A smart home is realized by setting up various services. Several methods have been proposed to create smart home services, which can be divided into knowledge-based and data-driven approaches. However, knowledge-based approaches usually require manual input from the inhabitant, which can be complicated if the physical phenomena of the concerned environment states are complex, and the inhabitant does not know how to adjust related actuators to achieve the target values of the states monitored by services. Moreover, machine learning-based data-driven approaches that we are interested in are like black boxes and cannot show the inhabitant in which situations certain services proposed certain actuators' states. To solve these problems, we propose a hybrid system called HKD-SHO (Hybrid Knowledge-based and Data-driven services based Smart HOme system), where knowledge-based and machine learning-based data-driven services are profitably integrated. The principal advantage is that it inherits the explicability of knowledge-based services and the dynamism of data-driven services. We compare HKD-SHO with several systems for creating dynamic smart home services, and the results show the better performance of HKD-SHO. △ Less

Submitted 15 February, 2024; originally announced February 2024.

Comments: keywords: Hybrid System, Knowledge Representation, Reinforcement Learning, Services, Smart Home

arXiv:2401.11058 [pdf, ps, other]

Low Complexity Turbo SIC-MMSE Detection for Orthogonal Time Frequency Space Modulation

Authors: Qi Li, Jinhong Yuan, Min Qiu, Shuangyang Li, Yixuan Xie

Abstract: Recently, orthogonal time frequency space (OTFS) modulation has garnered considerable attention due to its robustness against doubly-selective wireless channels. In this paper, we propose a low-complexity iterative successive interference cancellation based minimum mean squared error (SIC-MMSE) detection algorithm for zero-padded OTFS (ZP-OTFS) modulation. In the proposed algorithm, signals are de… ▽ More Recently, orthogonal time frequency space (OTFS) modulation has garnered considerable attention due to its robustness against doubly-selective wireless channels. In this paper, we propose a low-complexity iterative successive interference cancellation based minimum mean squared error (SIC-MMSE) detection algorithm for zero-padded OTFS (ZP-OTFS) modulation. In the proposed algorithm, signals are detected based on layers processed by multiple SIC-MMSE linear filters for each sub-channel, with interference on the targeted signal layer being successively canceled either by hard or soft information. To reduce the complexity of computing individual layer filter coefficients, we also propose a novel filter coefficients recycling approach in place of generating the exact form of MMSE filter weights. Moreover, we design a joint detection and decoding algorithm for ZP-OTFS to enhance error performance. Compared to the conventional SIC-MMSE detection, our proposed algorithms outperform other linear detectors, e.g., maximal ratio combining (MRC), for ZP-OTFS with up to 3 dB gain while maintaining comparable computation complexity. △ Less

Submitted 19 January, 2024; originally announced January 2024.

Comments: 15 pages, 12 figures, accepted by IEEE Transactions on Communications

arXiv:2401.01433 [pdf, other]

Multiple Access Techniques for Intelligent and Multi-Functional 6G: Tutorial, Survey, and Outlook

Authors: Bruno Clerckx, Yijie Mao, Zhaohui Yang, Mingzhe Chen, Ahmed Alkhateeb, Liang Liu, Min Qiu, Jinhong Yuan, Vincent W. S. Wong, Juan Montojo

Abstract: Multiple access (MA) is a crucial part of any wireless system and refers to techniques that make use of the resource dimensions to serve multiple users/devices/machines/services, ideally in the most efficient way. Given the needs of multi-functional wireless networks for integrated communications, sensing, localization, computing, coupled with the surge of machine learning / artificial intelligenc… ▽ More Multiple access (MA) is a crucial part of any wireless system and refers to techniques that make use of the resource dimensions to serve multiple users/devices/machines/services, ideally in the most efficient way. Given the needs of multi-functional wireless networks for integrated communications, sensing, localization, computing, coupled with the surge of machine learning / artificial intelligence (AI) in wireless networks, MA techniques are expected to experience a paradigm shift in 6G and beyond. In this paper, we provide a tutorial, survey and outlook of past, emerging and future MA techniques and pay a particular attention to how wireless network intelligence and multi-functionality will lead to a re-thinking of those techniques. The paper starts with an overview of orthogonal, physical layer multicasting, space domain, power domain, ratesplitting, code domain MAs, and other domains, and highlight the importance of researching universal multiple access to shrink instead of grow the knowledge tree of MA schemes by providing a unified understanding of MA schemes across all resource dimensions. It then jumps into rethinking MA schemes in the era of wireless network intelligence, covering AI for MA such as AI-empowered resource allocation, optimization, channel estimation, receiver designs, user behavior predictions, and MA for AI such as federated learning/edge intelligence and over the air computation. We then discuss MA for network multi-functionality and the interplay between MA and integrated sensing, localization, and communications. We finish with studying MA for emerging intelligent applications before presenting a roadmap toward 6G standardization. We also point out numerous directions that are promising for future research. △ Less

Submitted 2 January, 2024; originally announced January 2024.

Comments: submitted for publication in Proceedings of the IEEE

arXiv:2312.17493 [pdf, other]

Differentially Private Low-Rank Adaptation of Large Language Model Using Federated Learning

Authors: Xiao-Yang Liu, Rongyi Zhu, Daochen Zha, Jiechao Gao, Shan Zhong, Matt White, Meikang Qiu

Abstract: The surge in interest and application of large language models (LLMs) has sparked a drive to fine-tune these models to suit specific applications, such as finance and medical science. However, concerns regarding data privacy have emerged, especially when multiple stakeholders aim to collaboratively enhance LLMs using sensitive data. In this scenario, federated learning becomes a natural choice, al… ▽ More The surge in interest and application of large language models (LLMs) has sparked a drive to fine-tune these models to suit specific applications, such as finance and medical science. However, concerns regarding data privacy have emerged, especially when multiple stakeholders aim to collaboratively enhance LLMs using sensitive data. In this scenario, federated learning becomes a natural choice, allowing decentralized fine-tuning without exposing raw data to central servers. Motivated by this, we investigate how data privacy can be ensured in LLM fine-tuning through practical federated learning approaches, enabling secure contributions from multiple parties to enhance LLMs. Yet, challenges arise: 1) despite avoiding raw data exposure, there is a risk of inferring sensitive information from model outputs, and 2) federated learning for LLMs incurs notable communication overhead. To address these challenges, this article introduces DP-LoRA, a novel federated learning algorithm tailored for LLMs. DP-LoRA preserves data privacy by employing a Gaussian mechanism that adds noise in weight updates, maintaining individual data privacy while facilitating collaborative model training. Moreover, DP-LoRA optimizes communication efficiency via low-rank adaptation, minimizing the transmission of updated weights during distributed training. The experimental results across medical, financial, and general datasets using various LLMs demonstrate that DP-LoRA effectively ensures strict privacy constraints while minimizing communication overhead. △ Less

Submitted 2 June, 2024; v1 submitted 29 December, 2023; originally announced December 2023.

Comments: 21 pages, 1 figure, 19 tables

arXiv:2311.06761 [pdf, other]

Learning Knowledge-Enhanced Contextual Language Representations for Domain Natural Language Understanding

Authors: Ruyao Xu, Taolin Zhang, Chengyu Wang, Zhongjie Duan, Cen Chen, Minghui Qiu, Dawei Cheng, Xiaofeng He, Weining Qian

Abstract: Knowledge-Enhanced Pre-trained Language Models (KEPLMs) improve the performance of various downstream NLP tasks by injecting knowledge facts from large-scale Knowledge Graphs (KGs). However, existing methods for pre-training KEPLMs with relational triples are difficult to be adapted to close domains due to the lack of sufficient domain graph semantics. In this paper, we propose a Knowledge-enhance… ▽ More Knowledge-Enhanced Pre-trained Language Models (KEPLMs) improve the performance of various downstream NLP tasks by injecting knowledge facts from large-scale Knowledge Graphs (KGs). However, existing methods for pre-training KEPLMs with relational triples are difficult to be adapted to close domains due to the lack of sufficient domain graph semantics. In this paper, we propose a Knowledge-enhanced lANGuAge Representation learning framework for various clOsed dOmains (KANGAROO) via capturing the implicit graph structure among the entities. Specifically, since the entity coverage rates of closed-domain KGs can be relatively low and may exhibit the global sparsity phenomenon for knowledge injection, we consider not only the shallow relational representations of triples but also the hyperbolic embeddings of deep hierarchical entity-class structures for effective knowledge fusion.Moreover, as two closed-domain entities under the same entity-class often have locally dense neighbor subgraphs counted by max point biconnected component, we further propose a data augmentation strategy based on contrastive learning over subgraphs to construct hard negative samples of higher quality. It makes the underlying KELPMs better distinguish the semantics of these neighboring entities to further complement the global semantic sparsity. In the experiments, we evaluate KANGAROO over various knowledge-aware and general NLP tasks in both full and few-shot learning settings, outperforming various KEPLM training paradigms performance in closed-domains significantly. △ Less

Submitted 12 November, 2023; originally announced November 2023.

Comments: emnlp 2023

arXiv:2310.08420 [pdf, other]

Visual Attention Prompted Prediction and Learning

Authors: Yifei Zhang, Siyi Gu, Bo Pan, Guangji Bai, Meikang Qiu, Xiaofeng Yang, Liang Zhao

Abstract: Visual explanation (attention)-guided learning uses not only labels but also explanations to guide model reasoning process. While visual attention-guided learning has shown promising results, it requires a large number of explanation annotations that are time-consuming to prepare. However, in many real-world situations, it is usually desired to prompt the model with visual attention without model… ▽ More Visual explanation (attention)-guided learning uses not only labels but also explanations to guide model reasoning process. While visual attention-guided learning has shown promising results, it requires a large number of explanation annotations that are time-consuming to prepare. However, in many real-world situations, it is usually desired to prompt the model with visual attention without model retraining. For example, when doing AI-assisted cancer classification on a medical image, users (e.g., clinicians) can provide the AI model with visual attention prompt on which areas are indispensable and which are precluded. Despite its promising objectives, achieving visual attention-prompted prediction presents several major challenges: 1) How can the visual prompt be effectively integrated into the model's reasoning process? 2) How should the model handle samples that lack visual prompts? 3) What is the impact on the model's performance when a visual prompt is imperfect? This paper introduces a novel framework for attention-prompted prediction and learning, utilizing visual prompts to steer the model's reasoning process. To improve performance in non-prompted situations and align it with prompted scenarios, we propose a co-training approach for both non-prompted and prompted models, ensuring they share similar parameters and activations. Additionally, for instances where the visual prompt does not encompass the entire input image, we have developed innovative attention prompt refinement methods. These methods interpolate the incomplete prompts while maintaining alignment with the model's explanations. Extensive experiments on four datasets demonstrate the effectiveness of our proposed framework in enhancing predictions for samples both with and without prompt. △ Less

Submitted 23 April, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

arXiv:2308.15840 [pdf, other]

doi 10.1007/s10618-024-01035-w

MSGNN: Multi-scale Spatio-temporal Graph Neural Network for Epidemic Forecasting

Authors: Mingjie Qiu, Zhiyi Tan, Bing-kun Bao

Abstract: Infectious disease forecasting has been a key focus and proved to be crucial in controlling epidemic. A recent trend is to develop forecast-ing models based on graph neural networks (GNNs). However, existing GNN-based methods suffer from two key limitations: (1) Current models broaden receptive fields by scaling the depth of GNNs, which is insuffi-cient to preserve the semantics of long-range conn… ▽ More Infectious disease forecasting has been a key focus and proved to be crucial in controlling epidemic. A recent trend is to develop forecast-ing models based on graph neural networks (GNNs). However, existing GNN-based methods suffer from two key limitations: (1) Current models broaden receptive fields by scaling the depth of GNNs, which is insuffi-cient to preserve the semantics of long-range connectivity between distant but epidemic related areas. (2) Previous approaches model epidemics within single spatial scale, while ignoring the multi-scale epidemic pat-terns derived from different scales. To address these deficiencies, we devise the Multi-scale Spatio-temporal Graph Neural Network (MSGNN) based on an innovative multi-scale view. To be specific, in the proposed MSGNN model, we first devise a novel graph learning module, which directly captures long-range connectivity from trans-regional epidemic signals and integrates them into a multi-scale graph. Based on the learned multi-scale graph, we utilize a newly designed graph convolution module to exploit multi-scale epidemic patterns. This module allows us to facilitate multi-scale epidemic modeling by mining both scale-shared and scale-specific pat-terns. Experimental results on forecasting new cases of COVID-19 in United State demonstrate the superiority of our method over state-of-arts. Further analyses and visualization also show that MSGNN offers not only accurate, but also robust and interpretable forecasting result. △ Less

Submitted 30 August, 2023; originally announced August 2023.

Comments: 29 pages

Report number: DAMI-D-23-00319R2

Journal ref: Data Min Knowl Disc (2024)

arXiv:2308.13229 [pdf, other]

ReST: A Reconfigurable Spatial-Temporal Graph Model for Multi-Camera Multi-Object Tracking

Authors: Cheng-Che Cheng, Min-Xuan Qiu, Chen-Kuo Chiang, Shang-Hong Lai

Abstract: Multi-Camera Multi-Object Tracking (MC-MOT) utilizes information from multiple views to better handle problems with occlusion and crowded scenes. Recently, the use of graph-based approaches to solve tracking problems has become very popular. However, many current graph-based methods do not effectively utilize information regarding spatial and temporal consistency. Instead, they rely on single-came… ▽ More Multi-Camera Multi-Object Tracking (MC-MOT) utilizes information from multiple views to better handle problems with occlusion and crowded scenes. Recently, the use of graph-based approaches to solve tracking problems has become very popular. However, many current graph-based methods do not effectively utilize information regarding spatial and temporal consistency. Instead, they rely on single-camera trackers as input, which are prone to fragmentation and ID switch errors. In this paper, we propose a novel reconfigurable graph model that first associates all detected objects across cameras spatially before reconfiguring it into a temporal graph for Temporal Association. This two-stage association approach enables us to extract robust spatial and temporal-aware features and address the problem with fragmented tracklets. Furthermore, our model is designed for online tracking, making it suitable for real-world applications. Experimental results show that the proposed graph model is able to extract more discriminating features for object tracking, and our model achieves state-of-the-art performance on several public datasets. △ Less

Submitted 25 August, 2023; originally announced August 2023.

Comments: Accepted by ICCV2023

arXiv:2308.09012 [pdf, other]

FashionLOGO: Prompting Multimodal Large Language Models for Fashion Logo Embeddings

Authors: Yulin Su, Min Yang, Minghui Qiu, Jing Wang, Tao Wang

Abstract: Logo embedding plays a crucial role in various e-commerce applications by facilitating image retrieval or recognition, such as intellectual property protection and product search. However, current methods treat logo embedding as a purely visual problem, which may limit their performance in real-world scenarios. A notable issue is that the textual knowledge embedded in logo images has not been adeq… ▽ More Logo embedding plays a crucial role in various e-commerce applications by facilitating image retrieval or recognition, such as intellectual property protection and product search. However, current methods treat logo embedding as a purely visual problem, which may limit their performance in real-world scenarios. A notable issue is that the textual knowledge embedded in logo images has not been adequately explored. Therefore, we propose a novel approach that leverages textual knowledge as an auxiliary to improve the robustness of logo embedding. The emerging Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities in both visual and textual understanding and could become valuable visual assistants in understanding logo images. Inspired by this observation, our proposed method, FashionLOGO, aims to utilize MLLMs to enhance fashion logo embedding. We explore how MLLMs can improve logo embedding by prompting them to generate explicit textual knowledge through three types of prompts, including image OCR, brief captions, and detailed descriptions prompts, in a zero-shot setting. We adopt a cross-attention transformer to enable image embedding queries to learn supplementary knowledge from textual embeddings automatically. To reduce computational costs, we only use the image embedding model in the inference stage, similar to traditional inference pipelines. Our extensive experiments on three real-world datasets demonstrate that FashionLOGO learns generalized and robust logo embeddings, achieving state-of-the-art performance in all benchmark datasets. Furthermore, we conduct comprehensive ablation studies to demonstrate the performance improvements resulting from the introduction of MLLMs. △ Less

Submitted 17 August, 2023; originally announced August 2023.

arXiv:2308.08883 [pdf, other]

Coexistence of Heterogeneous Services in the Uplink with Discrete Signaling and Treating Interference as Noise

Authors: Min Qiu, Yu-Chih Huang, Jinhong Yuan

Abstract: The problem of enabling the coexistence of heterogeneous services, e.g., different ultra-reliable low-latency communications (URLLC) services and/or enhanced mobile broadband (eMBB) services, in the uplink is studied. Each service has its own error probability and blocklength constraints and the longer transmission block suffers from heterogeneous interference. Due to the latency concern, the deco… ▽ More The problem of enabling the coexistence of heterogeneous services, e.g., different ultra-reliable low-latency communications (URLLC) services and/or enhanced mobile broadband (eMBB) services, in the uplink is studied. Each service has its own error probability and blocklength constraints and the longer transmission block suffers from heterogeneous interference. Due to the latency concern, the decoding of URLLC messages cannot leverage successive interference cancellation (SIC) and should always be performed before the decoding of eMBB messages. This can significantly degrade the achievable rates of URLLC users when the interference from other users is strong. To overcome this issue, we propose a new transmission scheme based on discrete signaling and treating interference as noise decoding, i.e., without SIC. Guided by the deterministic model, we provide a systematic way to construct discrete signaling for handling heterogeneous interference effectively. We demonstrate theoretically and numerically that the proposed scheme can perform close to the benchmark scheme based on capacity-achieving Gaussian signaling with the assumption of perfect SIC. △ Less

Submitted 17 August, 2023; originally announced August 2023.

Comments: 7 pages, accepted for presentation at IEEE Global Communications Conference (GLOBECOM) 2023

arXiv:2308.04278 [pdf, other]

Achieving Covert Communication With A Probabilistic Jamming Strategy

Authors: Xun Chen, Fujun Gao, Min Qiu, Jia Zhang, Feng Shu, Shihao Yan

Abstract: In this work, we consider a covert communication scenario, where a transmitter Alice communicates to a receiver Bob with the aid of a probabilistic and uninformed jammer against an adversary warden's detection. The transmission status and power of the jammer are random and follow some priori probabilities. We first analyze the warden's detection performance as a function of the jammer's transmissi… ▽ More In this work, we consider a covert communication scenario, where a transmitter Alice communicates to a receiver Bob with the aid of a probabilistic and uninformed jammer against an adversary warden's detection. The transmission status and power of the jammer are random and follow some priori probabilities. We first analyze the warden's detection performance as a function of the jammer's transmission probability, transmit power distribution, and Alice's transmit power. We then maximize the covert throughput from Alice to Bob subject to a covertness constraint, by designing the covert communication strategies from three different perspectives: Alice's perspective, the jammer's perspective, and the global perspective. Our analysis reveals that the minimum jamming power should not always be zero in the probabilistic jamming strategy, which is different from that in the continuous jamming strategy presented in the literature. In addition, we prove that the minimum jamming power should be the same as Alice's covert transmit power, depending on the covertness and average jamming power constraints. Furthermore, our results show that the probabilistic jamming can outperform the continuous jamming in terms of achieving a higher covert throughput under the same covertness and average jamming power constraints. △ Less

Submitted 29 August, 2023; v1 submitted 8 August, 2023; originally announced August 2023.

arXiv:2308.02457 [pdf, other]

A Survey on Temporal Knowledge Graph Completion: Taxonomy, Progress, and Prospects

Authors: Jiapu Wang, Boyue Wang, Meikang Qiu, Shirui Pan, Bo Xiong, Heng Liu, Linhao Luo, Tengfei Liu, Yongli Hu, Baocai Yin, Wen Gao

Abstract: Temporal characteristics are prominently evident in a substantial volume of knowledge, which underscores the pivotal role of Temporal Knowledge Graphs (TKGs) in both academia and industry. However, TKGs often suffer from incompleteness for three main reasons: the continuous emergence of new knowledge, the weakness of the algorithm for extracting structured information from unstructured data, and t… ▽ More Temporal characteristics are prominently evident in a substantial volume of knowledge, which underscores the pivotal role of Temporal Knowledge Graphs (TKGs) in both academia and industry. However, TKGs often suffer from incompleteness for three main reasons: the continuous emergence of new knowledge, the weakness of the algorithm for extracting structured information from unstructured data, and the lack of information in the source dataset. Thus, the task of Temporal Knowledge Graph Completion (TKGC) has attracted increasing attention, aiming to predict missing items based on the available information. In this paper, we provide a comprehensive review of TKGC methods and their details. Specifically, this paper mainly consists of three components, namely, 1)Background, which covers the preliminaries of TKGC methods, loss functions required for training, as well as the dataset and evaluation protocol; 2)Interpolation, that estimates and predicts the missing elements or set of elements through the relevant available information. It further categorizes related TKGC methods based on how to process temporal information; 3)Extrapolation, which typically focuses on continuous TKGs and predicts future events, and then classifies all extrapolation methods based on the algorithms they utilize. We further pinpoint the challenges and discuss future research directions of TKGC. △ Less

Submitted 4 August, 2023; originally announced August 2023.

arXiv:2307.04525 [pdf, other]

Cluster-Induced Mask Transformers for Effective Opportunistic Gastric Cancer Screening on Non-contrast CT Scans

Authors: Mingze Yuan, Yingda Xia, Xin Chen, Jiawen Yao, Junli Wang, Mingyan Qiu, Hexin Dong, Jingren Zhou, Bin Dong, Le Lu, Li Zhang, Zaiyi Liu, Ling Zhang

Abstract: Gastric cancer is the third leading cause of cancer-related mortality worldwide, but no guideline-recommended screening test exists. Existing methods can be invasive, expensive, and lack sensitivity to identify early-stage gastric cancer. In this study, we explore the feasibility of using a deep learning approach on non-contrast CT scans for gastric cancer detection. We propose a novel cluster-ind… ▽ More Gastric cancer is the third leading cause of cancer-related mortality worldwide, but no guideline-recommended screening test exists. Existing methods can be invasive, expensive, and lack sensitivity to identify early-stage gastric cancer. In this study, we explore the feasibility of using a deep learning approach on non-contrast CT scans for gastric cancer detection. We propose a novel cluster-induced Mask Transformer that jointly segments the tumor and classifies abnormality in a multi-task manner. Our model incorporates learnable clusters that encode the texture and shape prototypes of gastric cancer, utilizing self- and cross-attention to interact with convolutional features. In our experiments, the proposed method achieves a sensitivity of 85.0% and specificity of 92.6% for detecting gastric tumors on a hold-out test set consisting of 100 patients with cancer and 148 normal. In comparison, two radiologists have an average sensitivity of 73.5% and specificity of 84.3%. We also obtain a specificity of 97.7% on an external test set with 903 normal cases. Our approach performs comparably to established state-of-the-art gastric cancer screening tools like blood testing and endoscopy, while also being more sensitive in detecting early-stage cancer. This demonstrates the potential of our approach as a novel, non-invasive, low-cost, and accurate method for opportunistic gastric cancer screening. △ Less

Submitted 15 July, 2023; v1 submitted 10 July, 2023; originally announced July 2023.

Comments: MICCAI 2023

arXiv:2306.17451 [pdf, ps, other]

Self-Connected Spatially Coupled LDPC Codes with Improved Termination

Authors: Yihuan Liao, Min Qiu, Jinhong Yuan

Abstract: This paper investigates the design of self-connected spatially coupled low-density parity-check (SC-LDPC) codes. First, a termination method is proposed to reduce rate loss. Particularly, a single-side open SC-LDPC ensemble is introduced, which halves the rate loss of a conventional terminated SC-LDPC by reducing the number of check nodes. We further propose a self-connection method that allows re… ▽ More This paper investigates the design of self-connected spatially coupled low-density parity-check (SC-LDPC) codes. First, a termination method is proposed to reduce rate loss. Particularly, a single-side open SC-LDPC ensemble is introduced, which halves the rate loss of a conventional terminated SC-LDPC by reducing the number of check nodes. We further propose a self-connection method that allows reliable information to propagate from several directions to improve the decoding threshold. We demonstrate that the proposed ensembles not only achieve a better trade-off between rate loss and gap to capacity than several existing protograph SC-LDPC codes with short chain lengths but also exhibit threshold saturation behavior. Finite blocklength error performance is provided to exemplify the superiority of the proposed codes over conventional protograph SC-LDPC codes. △ Less

Submitted 30 June, 2023; originally announced June 2023.

Comments: 6 pages, 8 figures, accepted for publication in IEEE Communications Letters

arXiv:2306.07207 [pdf, other]

Valley: Video Assistant with Large Language model Enhanced abilitY

Authors: Ruipu Luo, Ziwang Zhao, Min Yang, Junwei Dong, Da Li, Pengcheng Lu, Tao Wang, Linmei Hu, Minghui Qiu, Zhongyu Wei

Abstract: Large language models (LLMs), with their remarkable conversational capabilities, have demonstrated impressive performance across various applications and have emerged as formidable AI assistants. In view of this, it raises an intuitive question: Can we harness the power of LLMs to build multimodal AI assistants for visual applications? Recently, several multi-modal models have been developed for t… ▽ More Large language models (LLMs), with their remarkable conversational capabilities, have demonstrated impressive performance across various applications and have emerged as formidable AI assistants. In view of this, it raises an intuitive question: Can we harness the power of LLMs to build multimodal AI assistants for visual applications? Recently, several multi-modal models have been developed for this purpose. They typically pre-train an adaptation module to align the semantics of the vision encoder and language model, followed by fine-tuning on instruction-following data. However, despite the success of this pipeline in image and language understanding, its effectiveness in joint video and language understanding has not been widely explored. In this paper, we aim to develop a novel multi-modal foundation model capable of comprehending video, image, and language within a general framework. To achieve this goal, we introduce Valley, a Video Assistant with Large Language model Enhanced abilitY. The Valley consists of a LLM, a temporal modeling module, a visual encoder, and a simple projection module designed to bridge visual and textual modes. To empower Valley with video comprehension and instruction-following capabilities, we construct a video instruction dataset and adopt a two-stage tuning procedure to train it. Specifically, we employ ChatGPT to facilitate the construction of task-oriented conversation data encompassing various tasks, including multi-shot captions, long video descriptions, action recognition, causal relationship inference, etc. Subsequently, we adopt a pre-training-then-instructions-tuned pipeline to align visual and textual modalities and improve the instruction-following capability of Valley. Qualitative experiments demonstrate that Valley has the potential to function as a highly effective video assistant that can make complex video understanding scenarios easy. △ Less

Submitted 8 October, 2023; v1 submitted 12 June, 2023; originally announced June 2023.

arXiv:2305.02200 [pdf, other]

Deep Graph Representation Learning and Optimization for Influence Maximization

Authors: Chen Ling, Junji Jiang, Junxiang Wang, My Thai, Lukas Xue, James Song, Meikang Qiu, Liang Zhao

Abstract: Influence maximization (IM) is formulated as selecting a set of initial users from a social network to maximize the expected number of influenced users. Researchers have made great progress in designing various traditional methods, and their theoretical design and performance gain are close to a limit. In the past few years, learning-based IM methods have emerged to achieve stronger generalization… ▽ More Influence maximization (IM) is formulated as selecting a set of initial users from a social network to maximize the expected number of influenced users. Researchers have made great progress in designing various traditional methods, and their theoretical design and performance gain are close to a limit. In the past few years, learning-based IM methods have emerged to achieve stronger generalization ability to unknown graphs than traditional ones. However, the development of learning-based IM methods is still limited by fundamental obstacles, including 1) the difficulty of effectively solving the objective function; 2) the difficulty of characterizing the diversified underlying diffusion patterns; and 3) the difficulty of adapting the solution under various node-centrality-constrained IM variants. To cope with the above challenges, we design a novel framework DeepIM to generatively characterize the latent representation of seed sets, and we propose to learn the diversified information diffusion pattern in a data-driven and end-to-end manner. Finally, we design a novel objective function to infer optimal seed sets under flexible node-centrality-based budget constraints. Extensive analyses are conducted over both synthetic and real-world datasets to demonstrate the overall performance of DeepIM. The code and data are available at: https://github.com/triplej0079/DeepIM. △ Less

Submitted 6 May, 2023; v1 submitted 1 May, 2023; originally announced May 2023.

Comments: In Proceedings of the 40th International Conference on Machine Learning (ICML 2023), Honolulu, Hawaii, USA. PMLR 202, 2023

arXiv:2304.00212 [pdf, other]

Devil is in the Queries: Advancing Mask Transformers for Real-world Medical Image Segmentation and Out-of-Distribution Localization

Authors: Mingze Yuan, Yingda Xia, Hexin Dong, Zifan Chen, Jiawen Yao, Mingyan Qiu, Ke Yan, Xiaoli Yin, Yu Shi, Xin Chen, Zaiyi Liu, Bin Dong, Jingren Zhou, Le Lu, Ling Zhang, Li Zhang

Abstract: Real-world medical image segmentation has tremendous long-tailed complexity of objects, among which tail conditions correlate with relatively rare diseases and are clinically significant. A trustworthy medical AI algorithm should demonstrate its effectiveness on tail conditions to avoid clinically dangerous damage in these out-of-distribution (OOD) cases. In this paper, we adopt the concept of obj… ▽ More Real-world medical image segmentation has tremendous long-tailed complexity of objects, among which tail conditions correlate with relatively rare diseases and are clinically significant. A trustworthy medical AI algorithm should demonstrate its effectiveness on tail conditions to avoid clinically dangerous damage in these out-of-distribution (OOD) cases. In this paper, we adopt the concept of object queries in Mask Transformers to formulate semantic segmentation as a soft cluster assignment. The queries fit the feature-level cluster centers of inliers during training. Therefore, when performing inference on a medical image in real-world scenarios, the similarity between pixels and the queries detects and localizes OOD regions. We term this OOD localization as MaxQuery. Furthermore, the foregrounds of real-world medical images, whether OOD objects or inliers, are lesions. The difference between them is less than that between the foreground and background, possibly misleading the object queries to focus redundantly on the background. Thus, we propose a query-distribution (QD) loss to enforce clear boundaries between segmentation targets and other regions at the query level, improving the inlier segmentation and OOD indication. Our proposed framework is tested on two real-world segmentation tasks, i.e., segmentation of pancreatic and liver tumors, outperforming previous state-of-the-art algorithms by an average of 7.39% on AUROC, 14.69% on AUPR, and 13.79% on FPR95 for OOD localization. On the other hand, our framework improves the performance of inlier segmentation by an average of 5.27% DSC when compared with the leading baseline nnUNet. △ Less

Submitted 31 March, 2023; originally announced April 2023.

Comments: CVPR 2023 Highlight

arXiv:2302.08018 [pdf, other]

Towards Fair Machine Learning Software: Understanding and Addressing Model Bias Through Counterfactual Thinking

Authors: Zichong Wang, Yang Zhou, Meikang Qiu, Israat Haque, Laura Brown, Yi He, Jianwu Wang, David Lo, Wenbin Zhang

Abstract: The increasing use of Machine Learning (ML) software can lead to unfair and unethical decisions, thus fairness bugs in software are becoming a growing concern. Addressing these fairness bugs often involves sacrificing ML performance, such as accuracy. To address this issue, we present a novel counterfactual approach that uses counterfactual thinking to tackle the root causes of bias in ML software… ▽ More The increasing use of Machine Learning (ML) software can lead to unfair and unethical decisions, thus fairness bugs in software are becoming a growing concern. Addressing these fairness bugs often involves sacrificing ML performance, such as accuracy. To address this issue, we present a novel counterfactual approach that uses counterfactual thinking to tackle the root causes of bias in ML software. In addition, our approach combines models optimized for both performance and fairness, resulting in an optimal solution in both aspects. We conducted a thorough evaluation of our approach on 10 benchmark tasks using a combination of 5 performance metrics, 3 fairness metrics, and 15 measurement scenarios, all applied to 8 real-world datasets. The conducted extensive evaluations show that the proposed method significantly improves the fairness of ML software while maintaining competitive performance, outperforming state-of-the-art solutions in 84.6% of overall cases based on a recent benchmarking tool. △ Less

Submitted 15 February, 2023; originally announced February 2023.

arXiv:2302.03507 [pdf, other]

Meta-Learning Siamese Network for Few-Shot Text Classification

Authors: Chengcheng Han, Yuhe Wang, Yingnan Fu, Xiang Li, Minghui Qiu, Ming Gao, Aoying Zhou

Abstract: Few-shot learning has been used to tackle the problem of label scarcity in text classification, of which meta-learning based methods have shown to be effective, such as the prototypical networks (PROTO). Despite the success of PROTO, there still exist three main problems: (1) ignore the randomness of the sampled support sets when computing prototype vectors; (2) disregard the importance of labeled… ▽ More Few-shot learning has been used to tackle the problem of label scarcity in text classification, of which meta-learning based methods have shown to be effective, such as the prototypical networks (PROTO). Despite the success of PROTO, there still exist three main problems: (1) ignore the randomness of the sampled support sets when computing prototype vectors; (2) disregard the importance of labeled samples; (3) construct meta-tasks in a purely random manner. In this paper, we propose a Meta-Learning Siamese Network, namely, Meta-SN, to address these issues. Specifically, instead of computing prototype vectors from the sampled support sets, Meta-SN utilizes external knowledge (e.g. class names and descriptive texts) for class labels, which is encoded as the low-dimensional embeddings of prototype vectors. In addition, Meta-SN presents a novel sampling strategy for constructing meta-tasks, which gives higher sampling probabilities to hard-to-classify samples. Extensive experiments are conducted on six benchmark datasets to show the clear superiority of Meta-SN over other state-of-the-art models. For reproducibility, all the datasets and codes are provided at https://github.com/hccngu/Meta-SN. △ Less

Submitted 16 March, 2023; v1 submitted 5 February, 2023; originally announced February 2023.

arXiv:2301.12291 [pdf, other]

CancerUniT: Towards a Single Unified Model for Effective Detection, Segmentation, and Diagnosis of Eight Major Cancers Using a Large Collection of CT Scans

Authors: Jieneng Chen, Yingda Xia, Jiawen Yao, Ke Yan, Jianpeng Zhang, Le Lu, Fakai Wang, Bo Zhou, Mingyan Qiu, Qihang Yu, Mingze Yuan, Wei Fang, Yuxing Tang, Minfeng Xu, Jian Zhou, Yuqian Zhao, Qifeng Wang, Xianghua Ye, Xiaoli Yin, Yu Shi, Xin Chen, Jingren Zhou, Alan Yuille, Zaiyi Liu, Ling Zhang

Abstract: Human readers or radiologists routinely perform full-body multi-organ multi-disease detection and diagnosis in clinical practice, while most medical AI systems are built to focus on single organs with a narrow list of a few diseases. This might severely limit AI's clinical adoption. A certain number of AI models need to be assembled non-trivially to match the diagnostic process of a human reading… ▽ More Human readers or radiologists routinely perform full-body multi-organ multi-disease detection and diagnosis in clinical practice, while most medical AI systems are built to focus on single organs with a narrow list of a few diseases. This might severely limit AI's clinical adoption. A certain number of AI models need to be assembled non-trivially to match the diagnostic process of a human reading a CT scan. In this paper, we construct a Unified Tumor Transformer (CancerUniT) model to jointly detect tumor existence & location and diagnose tumor characteristics for eight major cancers in CT scans. CancerUniT is a query-based Mask Transformer model with the output of multi-tumor prediction. We decouple the object queries into organ queries, tumor detection queries and tumor diagnosis queries, and further establish hierarchical relationships among the three groups. This clinically-inspired architecture effectively assists inter- and intra-organ representation learning of tumors and facilitates the resolution of these complex, anatomically related multi-organ cancer image reading tasks. CancerUniT is trained end-to-end using a curated large-scale CT images of 10,042 patients including eight major types of cancers and occurring non-cancer tumors (all are pathology-confirmed with 3D tumor masks annotated by radiologists). On the test set of 631 patients, CancerUniT has demonstrated strong performance under a set of clinically relevant evaluation metrics, substantially outperforming both multi-disease methods and an assembly of eight single-organ expert models in tumor detection, segmentation, and diagnosis. This moves one step closer towards a universal high performance cancer screening tool. △ Less

Submitted 6 October, 2023; v1 submitted 28 January, 2023; originally announced January 2023.

Comments: ICCV 2023 Camera Ready Version

arXiv:2301.09303 [pdf, other]

Downlink Transmission under Heterogeneous Blocklength Constraints: Discrete Signaling with Single-User Decoding

Authors: Min Qiu, Yu-Chih Huang, Jinhong Yuan

Abstract: In this paper, we consider the downlink broadcast channel under heterogenous blocklength constraints, where each user experiences different interference statistics across its received symbols. Different from the homogeneous blocklength case, the strong users with short blocklength transmitted symbol blocks usually cannot wait to receive the entire transmission frame and perform successive interfer… ▽ More In this paper, we consider the downlink broadcast channel under heterogenous blocklength constraints, where each user experiences different interference statistics across its received symbols. Different from the homogeneous blocklength case, the strong users with short blocklength transmitted symbol blocks usually cannot wait to receive the entire transmission frame and perform successive interference cancellation (SIC) owing to their stringent latency requirements. Even if SIC is feasible, it may not be perfect under finite blocklength constraints. To cope with the heterogeneity in latency and reliability requirements, we propose a practical downlink transmission scheme with discrete signaling and single-user decoding, i.e., without SIC. In addition, we derive the finite blocklength achievable rate and use it for guiding the design of channel coding and modulations. Both achievable rate and error probability simulation show that the proposed scheme can operate close to the benchmark scheme which assumes capacity-achieving signaling and perfect SIC. △ Less

Submitted 23 January, 2023; originally announced January 2023.

Comments: 7 pages, 1 figure, accepted for presentation at IEEE ICC 2023. arXiv admin note: substantial text overlap with arXiv:2212.01736

arXiv:2212.10013 [pdf, other]

DocAsRef: An Empirical Study on Repurposing Reference-Based Summary Quality Metrics Reference-Freely

Authors: Forrest Sheng Bao, Ruixuan Tu, Ge Luo, Yinfei Yang, Hebi Li, Minghui Qiu, Youbiao He, Cen Chen

Abstract: Automated summary quality assessment falls into two categories: reference-based and reference-free. Reference-based metrics, historically deemed more accurate due to the additional information provided by human-written references, are limited by their reliance on human input. In this paper, we hypothesize that the comparison methodologies used by some reference-based metrics to evaluate a system s… ▽ More Automated summary quality assessment falls into two categories: reference-based and reference-free. Reference-based metrics, historically deemed more accurate due to the additional information provided by human-written references, are limited by their reliance on human input. In this paper, we hypothesize that the comparison methodologies used by some reference-based metrics to evaluate a system summary against its corresponding reference can be effectively adapted to assess it against its source document, thereby transforming these metrics into reference-free ones. Experimental results support this hypothesis. After being repurposed reference-freely, the zero-shot BERTScore using the pretrained DeBERTa-large-MNLI model of <0.5B parameters consistently outperforms its original reference-based version across various aspects on the SummEval and Newsroom datasets. It also excels in comparison to most existing reference-free metrics and closely competes with zero-shot summary evaluators based on GPT-3.5. △ Less

Submitted 26 November, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

Comments: Accepted into Findings of EMNLP 2023

arXiv:2212.01736 [pdf, other]

Downlink Transmission with Heterogeneous URLLC Services: Discrete Signaling With Single-User Decoding

Authors: Min Qiu, Yu-Chih Huang, Jinhong Yuan

Abstract: The problem of designing downlink transmission schemes for supporting heterogeneous ultra-reliable low-latency communications (URLLC) and/or with other types of services is investigated. We consider the broadcast channel, where the base station sends superimposed signals to multiple users. Under heterogeneous blocklength constraints, strong users who are URLLC users cannot wait to receive the enti… ▽ More The problem of designing downlink transmission schemes for supporting heterogeneous ultra-reliable low-latency communications (URLLC) and/or with other types of services is investigated. We consider the broadcast channel, where the base station sends superimposed signals to multiple users. Under heterogeneous blocklength constraints, strong users who are URLLC users cannot wait to receive the entire transmission frame and perform successive interference cancellation (SIC) due to stringent latency requirements, in contrast to the conventional infinite blocklength cases. Even if SIC is feasible, SIC may be imperfect under finite blocklength constraints. To cope with the heterogeneity in latency and reliability requirements, we propose a practical downlink transmission scheme with discrete signaling and single-user decoding (SUD), i.e., without SIC. We carefully design the discrete input distributions to enable efficient SUD by exploiting the structural interference. Furthermore, we derive the second-order achievable rate under heterogenous blocklength and error probability constraints and use it to guide the design of channel coding and modulations. It is shown that in terms of achievable rate under short blocklength, the proposed scheme with regular quadrature amplitude modulations and SUD can operate extremely close to the benchmark schemes that assume perfect SIC with Gaussian signaling. △ Less

Submitted 2 May, 2023; v1 submitted 3 December, 2022; originally announced December 2022.

Comments: 16 pages, 7 figures, accepted by IEEE Journal on Selected Areas in Communications

arXiv:2211.05100 [pdf, other]

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Authors: BigScience Workshop, :, Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ilić, Daniel Hesslow, Roman Castagné, Alexandra Sasha Luccioni, François Yvon, Matthias Gallé, Jonathan Tow, Alexander M. Rush, Stella Biderman, Albert Webson, Pawan Sasanka Ammanamanchi, Thomas Wang, Benoît Sagot, Niklas Muennighoff, Albert Villanova del Moral, Olatunji Ruwase, Rachel Bawden, Stas Bekman, Angelina McMillan-Major , et al. (369 additional authors not shown)

Abstract: Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access… ▽ More Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License. △ Less

Submitted 27 June, 2023; v1 submitted 9 November, 2022; originally announced November 2022.

arXiv:2210.11674 [pdf, other]

WristSketcher: Creating Dynamic Sketches in AR with a Sensing Wristband

Authors: Enting Ying, Tianyang Xiong, Shihui Guo, Ming Qiu, Yipeng Qin, Hongbo Fu

Abstract: Restricted by the limited interaction area of native AR glasses (e.g., touch bars), it is challenging to create sketches in AR glasses. Recent works have attempted to use mobile devices (e.g., tablets) or mid-air bare-hand gestures to expand the interactive spaces and can work as the 2D/3D sketching input interfaces for AR glasses. Between them, mobile devices allow for accurate sketching but are… ▽ More Restricted by the limited interaction area of native AR glasses (e.g., touch bars), it is challenging to create sketches in AR glasses. Recent works have attempted to use mobile devices (e.g., tablets) or mid-air bare-hand gestures to expand the interactive spaces and can work as the 2D/3D sketching input interfaces for AR glasses. Between them, mobile devices allow for accurate sketching but are often heavy to carry, while sketching with bare hands is zero-burden but can be inaccurate due to arm instability. In addition, mid-air bare-hand sketching can easily lead to social misunderstandings and its prolonged use can cause arm fatigue. As a new attempt, in this work, we present WristSketcher, a new AR system based on a flexible sensing wristband for creating 2D dynamic sketches, featuring an almost zero-burden authoring model for accurate and comfortable sketch creation in real-world scenarios. Specifically, we have streamlined the interaction space from the mid-air to the surface of a lightweight sensing wristband, and implemented AR sketching and associated interaction commands by developing a gesture recognition method based on the sensing pressure points on the wristband. The set of interactive gestures used by our WristSketcher is determined by a heuristic study on user preferences. Moreover, we endow our WristSketcher with the ability of animation creation, allowing it to create dynamic and expressive sketches. Experimental results demonstrate that our WristSketcher i) faithfully recognizes users' gesture interactions with a high accuracy of 96.0%; ii) achieves higher sketching accuracy than Freehand sketching; iii) achieves high user satisfaction in ease of use, usability and functionality; and iv) shows innovation potentials in art creation, memory aids, and entertainment applications. △ Less

Submitted 26 October, 2022; v1 submitted 20 October, 2022; originally announced October 2022.

arXiv:2210.09049 [pdf, other]

SpanProto: A Two-stage Span-based Prototypical Network for Few-shot Named Entity Recognition

Authors: Jianing Wang, Chengcheng Han, Chengyu Wang, Chuanqi Tan, Minghui Qiu, Songfang Huang, Jun Huang, Ming Gao

Abstract: Few-shot Named Entity Recognition (NER) aims to identify named entities with very little annotated data. Previous methods solve this problem based on token-wise classification, which ignores the information of entity boundaries, and inevitably the performance is affected by the massive non-entity tokens. To this end, we propose a seminal span-based prototypical network (SpanProto) that tackles few… ▽ More Few-shot Named Entity Recognition (NER) aims to identify named entities with very little annotated data. Previous methods solve this problem based on token-wise classification, which ignores the information of entity boundaries, and inevitably the performance is affected by the massive non-entity tokens. To this end, we propose a seminal span-based prototypical network (SpanProto) that tackles few-shot NER via a two-stage approach, including span extraction and mention classification. In the span extraction stage, we transform the sequential tags into a global boundary matrix, enabling the model to focus on the explicit boundary information. For mention classification, we leverage prototypical learning to capture the semantic representations for each labeled span and make the model better adapt to novel-class entities. To further improve the model performance, we split out the false positives generated by the span extractor but not labeled in the current episode set, and then present a margin-based loss to separate them from each prototype region. Experiments over multiple benchmarks demonstrate that our model outperforms strong baselines by a large margin. △ Less

Submitted 21 November, 2022; v1 submitted 17 October, 2022; originally announced October 2022.

arXiv:2210.08536 [pdf, other]

Knowledge Prompting in Pre-trained Language Model for Natural Language Understanding

Authors: Jianing Wang, Wenkang Huang, Qiuhui Shi, Hongbin Wang, Minghui Qiu, Xiang Li, Ming Gao

Abstract: Knowledge-enhanced Pre-trained Language Model (PLM) has recently received significant attention, which aims to incorporate factual knowledge into PLMs. However, most existing methods modify the internal structures of fixed types of PLMs by stacking complicated modules, and introduce redundant and irrelevant factual knowledge from knowledge bases (KBs). In this paper, to address these problems, we… ▽ More Knowledge-enhanced Pre-trained Language Model (PLM) has recently received significant attention, which aims to incorporate factual knowledge into PLMs. However, most existing methods modify the internal structures of fixed types of PLMs by stacking complicated modules, and introduce redundant and irrelevant factual knowledge from knowledge bases (KBs). In this paper, to address these problems, we introduce a seminal knowledge prompting paradigm and further propose a knowledge-prompting-based PLM framework KP-PLM. This framework can be flexibly combined with existing mainstream PLMs. Specifically, we first construct a knowledge sub-graph from KBs for each context. Then we design multiple continuous prompts rules and transform the knowledge sub-graph into natural language prompts. To further leverage the factual knowledge from these prompts, we propose two novel knowledge-aware self-supervised tasks including prompt relevance inspection and masked prompt modeling. Extensive experiments on multiple natural language understanding (NLU) tasks show the superiority of KP-PLM over other state-of-the-art methods in both full-resource and low-resource settings. △ Less

Submitted 16 October, 2022; originally announced October 2022.

Comments: 14 pages, 5 figures. This paper has been accepted for the main conference of EMNLP2022 (long paper)

arXiv:2207.08814 [pdf, other]

PBRE: A Rule Extraction Method from Trained Neural Networks Designed for Smart Home Services

Authors: Mingming Qiu, Elie Najm, Remi Sharrock, Bruno Traverson

Abstract: Designing smart home services is a complex task when multiple services with a large number of sensors and actuators are deployed simultaneously. It may rely on knowledge-based or data-driven approaches. The former can use rule-based methods to design services statically, and the latter can use learning methods to discover inhabitants' preferences dynamically. However, neither of these approaches i… ▽ More Designing smart home services is a complex task when multiple services with a large number of sensors and actuators are deployed simultaneously. It may rely on knowledge-based or data-driven approaches. The former can use rule-based methods to design services statically, and the latter can use learning methods to discover inhabitants' preferences dynamically. However, neither of these approaches is entirely satisfactory because rules cannot cover all possible situations that may change, and learning methods may make decisions that are sometimes incomprehensible to the inhabitant. In this paper, PBRE (Pedagogic Based Rule Extractor) is proposed to extract rules from learning methods to realize dynamic rule generation for smart home systems. The expected advantage is that both the explainability of rule-based methods and the dynamicity of learning methods are adopted. We compare PBRE with an existing rule extraction method, and the results show better performance of PBRE. We also apply PBRE to extract rules from a smart home service represented by an NRL (Neural Network-based Reinforcement Learning). The results show that PBRE can help the NRL-simulated service to make understandable suggestions to the inhabitant. △ Less

Submitted 18 July, 2022; originally announced July 2022.

arXiv:2206.13752 [pdf, other]

Sub-Block Rearranged Staircase Codes for Optical Transport Networks

Authors: Min Qiu, Jinhong Yuan

Abstract: We propose a new family of spatially coupled product codes, called sub-block rearranged staircase (SR-staircase) codes. Each SR-staircase code block is constructed by encoding rearranged preceding code blocks and new information blocks, where the rearrangement involves sub-blocks decomposition and transposition. The proposed codes can be constructed to have each code block size of $1/q$ to that of… ▽ More We propose a new family of spatially coupled product codes, called sub-block rearranged staircase (SR-staircase) codes. Each SR-staircase code block is constructed by encoding rearranged preceding code blocks and new information blocks, where the rearrangement involves sub-blocks decomposition and transposition. The proposed codes can be constructed to have each code block size of $1/q$ to that of the conventional staircase codes while having the same rate and component codes, for any positive integer $q$. In this regard, we can use strong algebraic component codes to construct SR-staircase codes with a similar or the same code block size and rate as staircase codes with weak component codes. Moreover, both waterfall and error floor performance can be further improved by using a large coupling width. The superior performance of the proposed codes is demonstrated through density evolution and error floor analysis as well as simulation. △ Less

Submitted 28 June, 2022; originally announced June 2022.

Comments: 6 pages, 3 figures, 1 table, accepted by the 2022 IEEE International Symposium on Information Theory (ISIT). arXiv admin note: substantial text overlap with arXiv:2201.09415

arXiv:2205.05313 [pdf, other]

Towards Unified Prompt Tuning for Few-shot Text Classification

Authors: Jianing Wang, Chengyu Wang, Fuli Luo, Chuanqi Tan, Minghui Qiu, Fei Yang, Qiuhui Shi, Songfang Huang, Ming Gao

Abstract: Prompt-based fine-tuning has boosted the performance of Pre-trained Language Models (PLMs) on few-shot text classification by employing task-specific prompts. Yet, PLMs are unfamiliar with prompt-style expressions during pre-training, which limits the few-shot learning performance on downstream tasks. It would be desirable if the models can acquire some prompting knowledge before adaptation to spe… ▽ More Prompt-based fine-tuning has boosted the performance of Pre-trained Language Models (PLMs) on few-shot text classification by employing task-specific prompts. Yet, PLMs are unfamiliar with prompt-style expressions during pre-training, which limits the few-shot learning performance on downstream tasks. It would be desirable if the models can acquire some prompting knowledge before adaptation to specific NLP tasks. We present the Unified Prompt Tuning (UPT) framework, leading to better few-shot text classification for BERT-style models by explicitly capturing prompting semantics from non-target NLP datasets. In UPT, a novel paradigm Prompt-Options-Verbalizer is proposed for joint prompt learning across different NLP tasks, forcing PLMs to capture task-invariant prompting knowledge. We further design a self-supervised task named Knowledge-enhanced Selective Masked Language Modeling to improve the PLM's generalization abilities for accurate adaptation to previously unseen tasks. After multi-task learning across multiple tasks, the PLM can be better prompt-tuned towards any dissimilar target tasks in low-resourced settings. Experiments over a variety of NLP tasks show that UPT consistently outperforms state-of-the-arts for prompt-based fine-tuning. △ Less

Submitted 11 May, 2022; originally announced May 2022.

arXiv:2205.03071 [pdf, other]

KECP: Knowledge Enhanced Contrastive Prompting for Few-shot Extractive Question Answering

Authors: Jianing Wang, Chengyu Wang, Minghui Qiu, Qiuhui Shi, Hongbin Wang, Jun Huang, Ming Gao

Abstract: Extractive Question Answering (EQA) is one of the most important tasks in Machine Reading Comprehension (MRC), which can be solved by fine-tuning the span selecting heads of Pre-trained Language Models (PLMs). However, most existing approaches for MRC may perform poorly in the few-shot learning scenario. To solve this issue, we propose a novel framework named Knowledge Enhanced Contrastive Prompt-… ▽ More Extractive Question Answering (EQA) is one of the most important tasks in Machine Reading Comprehension (MRC), which can be solved by fine-tuning the span selecting heads of Pre-trained Language Models (PLMs). However, most existing approaches for MRC may perform poorly in the few-shot learning scenario. To solve this issue, we propose a novel framework named Knowledge Enhanced Contrastive Prompt-tuning (KECP). Instead of adding pointer heads to PLMs, we introduce a seminal paradigm for EQA that transform the task into a non-autoregressive Masked Language Modeling (MLM) generation problem. Simultaneously, rich semantics from the external knowledge base (KB) and the passage context are support for enhancing the representations of the query. In addition, to boost the performance of PLMs, we jointly train the model by the MLM and contrastive learning objectives. Experiments on multiple benchmarks demonstrate that our method consistently outperforms state-of-the-art approaches in few-shot settings by a large margin. △ Less

Submitted 6 May, 2022; originally announced May 2022.

arXiv:2205.00258 [pdf, other]

EasyNLP: A Comprehensive and Easy-to-use Toolkit for Natural Language Processing

Authors: Chengyu Wang, Minghui Qiu, Chen Shi, Taolin Zhang, Tingting Liu, Lei Li, Jianing Wang, Ming Wang, Jun Huang, Wei Lin

Abstract: The success of Pre-Trained Models (PTMs) has reshaped the development of Natural Language Processing (NLP). Yet, it is not easy to obtain high-performing models and deploy them online for industrial practitioners. To bridge this gap, EasyNLP is designed to make it easy to build NLP applications, which supports a comprehensive suite of NLP algorithms. It further features knowledge-enhanced pre-trai… ▽ More The success of Pre-Trained Models (PTMs) has reshaped the development of Natural Language Processing (NLP). Yet, it is not easy to obtain high-performing models and deploy them online for industrial practitioners. To bridge this gap, EasyNLP is designed to make it easy to build NLP applications, which supports a comprehensive suite of NLP algorithms. It further features knowledge-enhanced pre-training, knowledge distillation and few-shot learning functionalities for large-scale PTMs, and provides a unified framework of model training, inference and deployment for real-world applications. Currently, EasyNLP has powered over ten business units within Alibaba Group and is seamlessly integrated to the Platform of AI (PAI) products on Alibaba Cloud. The source code of our EasyNLP toolkit is released at GitHub (https://github.com/alibaba/EasyNLP). △ Less

Submitted 13 March, 2023; v1 submitted 30 April, 2022; originally announced May 2022.

Comments: 8 pages

Showing 1–50 of 115 results for author: Qiu, M