subscribe to arXiv mailings

Cheems: Wonderful Matrices More Efficient and More Effective Architecture

Authors: Jingze Shi, Lu He, Yuhan Wang, Tianyu He, Bingheng Wu, Mingkun Hou

Abstract: Recent studies have shown that, relative position encoding performs well in selective state space model scanning algorithms, and the architecture that balances SSM and Attention enhances the efficiency and effectiveness of the algorithm, while the sparse activation of the mixture of experts reduces the training cost. I studied the effectiveness of using different position encodings in structured s… ▽ More Recent studies have shown that, relative position encoding performs well in selective state space model scanning algorithms, and the architecture that balances SSM and Attention enhances the efficiency and effectiveness of the algorithm, while the sparse activation of the mixture of experts reduces the training cost. I studied the effectiveness of using different position encodings in structured state space dual algorithms, and the more effective SSD-Attn internal and external function mixing method, and designed a more efficient cross domain mixture of experts. I found that the same matrix is very wonderful in different algorithms, which allows us to establish a new hybrid sparse architecture: Cheems. Compared with other hybrid architectures, it is more efficient and more effective in language modeling tasks. △ Less

Submitted 24 July, 2024; v1 submitted 23 July, 2024; originally announced July 2024.

arXiv:2407.07871 [pdf, other]

Enhancing HNSW Index for Real-Time Updates: Addressing Unreachable Points and Performance Degradation

Authors: Wentao Xiao, Yueyang Zhan, Rui Xi, Mengshu Hou, Jianming Liao

Abstract: The approximate nearest neighbor search (ANNS) is a fundamental and essential component in data mining and information retrieval, with graph-based methodologies demonstrating superior performance compared to alternative approaches. Extensive research efforts have been dedicated to improving search efficiency by developing various graph-based indices, such as HNSW (Hierarchical Navigable Small Worl… ▽ More The approximate nearest neighbor search (ANNS) is a fundamental and essential component in data mining and information retrieval, with graph-based methodologies demonstrating superior performance compared to alternative approaches. Extensive research efforts have been dedicated to improving search efficiency by developing various graph-based indices, such as HNSW (Hierarchical Navigable Small World). However, the performance of HNSW and most graph-based indices become unacceptable when faced with a large number of real-time deletions, insertions, and updates. Furthermore, during update operations, HNSW can result in some data points becoming unreachable, a situation we refer to as the `unreachable points phenomenon'. This phenomenon could significantly affect the search accuracy of the graph in certain situations. To address these issues, we present efficient measures to overcome the shortcomings of HNSW, specifically addressing poor performance over long periods of delete and update operations and resolving the issues caused by the unreachable points phenomenon. Our proposed MN-RU algorithm effectively improves update efficiency and suppresses the growth rate of unreachable points, ensuring better overall performance and maintaining the integrity of the graph. Our results demonstrate that our methods outperform existing approaches. Furthermore, since our methods are based on HNSW, they can be easily integrated with existing indices widely used in the industrial field, making them practical for future real-world applications. Code is available at \url{https://github.com/xwt1/MN-RU.git} △ Less

Submitted 15 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

arXiv:2406.03069 [pdf, ps, other]

"Give Me an Example Like This": Episodic Active Reinforcement Learning from Demonstrations

Authors: Muhan Hou, Koen Hindriks, A. E. Eiben, Kim Baraka

Abstract: Reinforcement Learning (RL) has achieved great success in sequential decision-making problems, but often at the cost of a large number of agent-environment interactions. To improve sample efficiency, methods like Reinforcement Learning from Expert Demonstrations (RLED) introduce external expert demonstrations to facilitate agent exploration during the learning process. In practice, these demonstra… ▽ More Reinforcement Learning (RL) has achieved great success in sequential decision-making problems, but often at the cost of a large number of agent-environment interactions. To improve sample efficiency, methods like Reinforcement Learning from Expert Demonstrations (RLED) introduce external expert demonstrations to facilitate agent exploration during the learning process. In practice, these demonstrations, which are often collected from human users, are costly and hence often constrained to a limited amount. How to select the best set of human demonstrations that is most beneficial for learning therefore becomes a major concern. This paper presents EARLY (Episodic Active Learning from demonstration querY), an algorithm that enables a learning agent to generate optimized queries of expert demonstrations in a trajectory-based feature space. Based on a trajectory-level estimate of uncertainty in the agent's current policy, EARLY determines the optimized timing and content for feature-based queries. By querying episodic demonstrations as opposed to isolated state-action pairs, EARLY improves the human teaching experience and achieves better learning performance. We validate the effectiveness of our method in three simulated navigation tasks of increasing difficulty. The results show that our method is able to achieve expert-level performance for all three tasks with convergence over 30\% faster than other baseline methods when demonstrations are generated by simulated oracle policies. The results of a follow-up pilot user study (N=18) further validate that our method can still maintain a significantly better convergence in the case of human expert demonstrators while achieving a better user experience in perceived task load and consuming significantly less human time. △ Less

Submitted 6 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

arXiv:2406.00037 [pdf, other]

Aligning LLMs through Multi-perspective User Preference Ranking-based Feedback for Programming Question Answering

Authors: Hongyu Yang, Liyang He, Min Hou, Shuanghong Shen, Rui Li, Jiahui Hou, Jianhui Ma, Junda Zhao

Abstract: Code Community Question Answering (CCQA) seeks to tackle programming-related issues, thereby boosting productivity in both software engineering and academic research. Recent advancements in Reinforcement Learning from Human Feedback (RLHF) have transformed the fine-tuning process of Large Language Models (LLMs) to produce responses that closely mimic human behavior. Leveraging LLMs with RLHF for p… ▽ More Code Community Question Answering (CCQA) seeks to tackle programming-related issues, thereby boosting productivity in both software engineering and academic research. Recent advancements in Reinforcement Learning from Human Feedback (RLHF) have transformed the fine-tuning process of Large Language Models (LLMs) to produce responses that closely mimic human behavior. Leveraging LLMs with RLHF for practical CCQA applications has thus emerged as a promising area of study. Unlike standard code question-answering tasks, CCQA involves multiple possible answers, with varying user preferences for each response. Additionally, code communities often show a preference for new APIs. These challenges prevent LLMs from generating responses that cater to the diverse preferences of users in CCQA tasks. To address these issues, we propose a novel framework called Aligning LLMs through Multi-perspective User Preference Ranking-based Feedback for Programming Question Answering (ALMupQA) to create user-focused responses. Our approach starts with Multi-perspective Preference Ranking Alignment (MPRA), which synthesizes varied user preferences based on the characteristics of answers from code communities. We then introduce a Retrieval-augmented In-context Learning (RIL) module to mitigate the problem of outdated answers by retrieving responses to similar questions from a question bank. Due to the limited availability of high-quality, multi-answer CCQA datasets, we also developed a dataset named StaCCQA from real code communities. Extensive experiments demonstrated the effectiveness of the ALMupQA framework in terms of accuracy and user preference. Compared to the base model, ALMupQA showed nearly an 11% improvement in BLEU, with increases of 20% and 17.5% in BERTScore and CodeBERTScore, respectively. △ Less

Submitted 27 May, 2024; originally announced June 2024.

arXiv:2405.15783 [pdf, other]

doi 10.1145/3626772.3658596

Multimodality Invariant Learning for Multimedia-Based New Item Recommendation

Authors: Haoyue Bai, Le Wu, Min Hou, Miaomiao Cai, Zhuangzhuang He, Yuyang Zhou, Richang Hong, Meng Wang

Abstract: Multimedia-based recommendation provides personalized item suggestions by learning the content preferences of users. With the proliferation of digital devices and APPs, a huge number of new items are created rapidly over time. How to quickly provide recommendations for new items at the inference time is challenging. What's worse, real-world items exhibit varying degrees of modality missing(e.g., m… ▽ More Multimedia-based recommendation provides personalized item suggestions by learning the content preferences of users. With the proliferation of digital devices and APPs, a huge number of new items are created rapidly over time. How to quickly provide recommendations for new items at the inference time is challenging. What's worse, real-world items exhibit varying degrees of modality missing(e.g., many short videos are uploaded without text descriptions). Though many efforts have been devoted to multimedia-based recommendations, they either could not deal with new multimedia items or assumed the modality completeness in the modeling process. In this paper, we highlight the necessity of tackling the modality missing issue for new item recommendation. We argue that users' inherent content preference is stable and better kept invariant to arbitrary modality missing environments. Therefore, we approach this problem from a novel perspective of invariant learning. However, how to construct environments from finite user behavior training data to generalize any modality missing is challenging. To tackle this issue, we propose a novel Multimodality Invariant Learning reCommendation(a.k.a. MILK) framework. Specifically, MILK first designs a cross-modality alignment module to keep semantic consistency from pretrained multimedia item features. After that, MILK designs multi-modal heterogeneous environments with cyclic mixup to augment training data, in order to mimic any modality missing for invariant user preference learning. Extensive experiments on three real datasets verify the superiority of our proposed framework. The code is available at https://github.com/HaoyueBai98/MILK. △ Less

Submitted 28 April, 2024; originally announced May 2024.

arXiv:2403.15369 [pdf, other]

OceanPlan: Hierarchical Planning and Replanning for Natural Language AUV Piloting in Large-scale Unexplored Ocean Environments

Authors: Ruochu Yang, Fumin Zhang, Mengxue Hou

Abstract: We develop a hierarchical LLM-task-motion planning and replanning framework to efficiently ground an abstracted human command into tangible Autonomous Underwater Vehicle (AUV) control through enhanced representations of the world. We also incorporate a holistic replanner to provide real-world feedback with all planners for robust AUV operation. While there has been extensive research in bridging t… ▽ More We develop a hierarchical LLM-task-motion planning and replanning framework to efficiently ground an abstracted human command into tangible Autonomous Underwater Vehicle (AUV) control through enhanced representations of the world. We also incorporate a holistic replanner to provide real-world feedback with all planners for robust AUV operation. While there has been extensive research in bridging the gap between LLMs and robotic missions, they are unable to guarantee success of AUV applications in the vast and unknown ocean environment. To tackle specific challenges in marine robotics, we design a hierarchical planner to compose executable motion plans, which achieves planning efficiency and solution quality by decomposing long-horizon missions into sub-tasks. At the same time, real-time data stream is obtained by a replanner to address environmental uncertainties during plan execution. Experiments validate that our proposed framework delivers successful AUV performance of long-duration missions through natural language piloting. △ Less

Submitted 22 March, 2024; originally announced March 2024.

Comments: submitted to IROS 2024

arXiv:2402.03750 [pdf, other]

doi 10.1109/HPCC-DSS-SmartCity-DependSys53884.2021.00182

Digital Twin Mobility Profiling: A Spatio-Temporal Graph Learning Approach

Authors: Xin Chen, Mingliang Hou, Tao Tang, Achhardeep Kaur, Feng Xia

Abstract: With the arrival of the big data era, mobility profiling has become a viable method of utilizing enormous amounts of mobility data to create an intelligent transportation system. Mobility profiling can extract potential patterns in urban traffic from mobility data and is critical for a variety of traffic-related applications. However, due to the high level of complexity and the huge amount of data… ▽ More With the arrival of the big data era, mobility profiling has become a viable method of utilizing enormous amounts of mobility data to create an intelligent transportation system. Mobility profiling can extract potential patterns in urban traffic from mobility data and is critical for a variety of traffic-related applications. However, due to the high level of complexity and the huge amount of data, mobility profiling faces huge challenges. Digital Twin (DT) technology paves the way for cost-effective and performance-optimised management by digitally creating a virtual representation of the network to simulate its behaviour. In order to capture the complex spatio-temporal features in traffic scenario, we construct alignment diagrams to assist in completing the spatio-temporal correlation representation and design dilated alignment convolution network (DACN) to learn the fine-grained correlations, i.e., spatio-temporal interactions. We propose a digital twin mobility profiling (DTMP) framework to learn node profiles on a mobility network DT model. Extensive experiments have been conducted upon three real-world datasets. Experimental results demonstrate the effectiveness of DTMP. △ Less

Submitted 6 February, 2024; originally announced February 2024.

Comments: 10 pages, 7 figures

MSC Class: 68T09; 68T30; 68U35 ACM Class: I.2.6; I.2.4; H.1.2

Journal ref: The 7th IEEE International Conference on Data Science and Systems (DSS), Dec 20 - 22, 2021, Haikou, China

arXiv:2401.04145 [pdf, other]

Learn Once Plan Arbitrarily (LOPA): Attention-Enhanced Deep Reinforcement Learning Method for Global Path Planning

Authors: Guoming Huang, Mingxin Hou, Xiaofang Yuan, Shuqiao Huang, Yaonan Wang

Abstract: Deep reinforcement learning (DRL) methods have recently shown promise in path planning tasks. However, when dealing with global planning tasks, these methods face serious challenges such as poor convergence and generalization. To this end, we propose an attention-enhanced DRL method called LOPA (Learn Once Plan Arbitrarily) in this paper. Firstly, we analyze the reasons of these problems from the… ▽ More Deep reinforcement learning (DRL) methods have recently shown promise in path planning tasks. However, when dealing with global planning tasks, these methods face serious challenges such as poor convergence and generalization. To this end, we propose an attention-enhanced DRL method called LOPA (Learn Once Plan Arbitrarily) in this paper. Firstly, we analyze the reasons of these problems from the perspective of DRL's observation, revealing that the traditional design causes DRL to be interfered by irrelevant map information. Secondly, we develop the LOPA which utilizes a novel attention-enhanced mechanism to attain an improved attention capability towards the key information of the observation. Such a mechanism is realized by two steps: (1) an attention model is built to transform the DRL's observation into two dynamic views: local and global, significantly guiding the LOPA to focus on the key information on the given maps; (2) a dual-channel network is constructed to process these two views and integrate them to attain an improved reasoning capability. The LOPA is validated via multi-objective global path planning experiments. The result suggests the LOPA has improved convergence and generalization performance as well as great path planning efficiency. △ Less

Submitted 7 January, 2024; originally announced January 2024.

arXiv:2401.02914 [pdf, other]

doi 10.1109/ICASSP49357.2023.10095594

A unified uncertainty-aware exploration: Combining epistemic and aleatory uncertainty

Authors: Parvin Malekzadeh, Ming Hou, Konstantinos N. Plataniotis

Abstract: Exploration is a significant challenge in practical reinforcement learning (RL), and uncertainty-aware exploration that incorporates the quantification of epistemic and aleatory uncertainty has been recognized as an effective exploration strategy. However, capturing the combined effect of aleatory and epistemic uncertainty for decision-making is difficult. Existing works estimate aleatory and epis… ▽ More Exploration is a significant challenge in practical reinforcement learning (RL), and uncertainty-aware exploration that incorporates the quantification of epistemic and aleatory uncertainty has been recognized as an effective exploration strategy. However, capturing the combined effect of aleatory and epistemic uncertainty for decision-making is difficult. Existing works estimate aleatory and epistemic uncertainty separately and consider the composite uncertainty as an additive combination of the two. Nevertheless, the additive formulation leads to excessive risk-taking behavior, causing instability. In this paper, we propose an algorithm that clarifies the theoretical connection between aleatory and epistemic uncertainty, unifies aleatory and epistemic uncertainty estimation, and quantifies the combined effect of both uncertainties for a risk-sensitive exploration. Our method builds on a novel extension of distributional RL that estimates a parameterized return distribution whose parameters are random variables encoding epistemic uncertainty. Experimental results on tasks with exploration and risk challenges show that our method outperforms alternative approaches. △ Less

Submitted 5 January, 2024; originally announced January 2024.

Comments: Accepted by ICASSP2023

arXiv:2311.15609 [pdf, other]

A manometric feature descriptor with linear-SVM to distinguish esophageal contraction vigor

Authors: Jialin Liu, Lu Yan, Xiaowei Liu, Yuzhuo Dai, Fanggen Lu, Yuanting Ma, Muzhou Hou, Zheng Wang

Abstract: n clinical, if a patient presents with nonmechanical obstructive dysphagia, esophageal chest pain, and gastro esophageal reflux symptoms, the physician will usually assess the esophageal dynamic function. High-resolution manometry (HRM) is a clinically commonly used technique for detection of esophageal dynamic function comprehensively and objectively. However, after the results of HRM are obtaine… ▽ More n clinical, if a patient presents with nonmechanical obstructive dysphagia, esophageal chest pain, and gastro esophageal reflux symptoms, the physician will usually assess the esophageal dynamic function. High-resolution manometry (HRM) is a clinically commonly used technique for detection of esophageal dynamic function comprehensively and objectively. However, after the results of HRM are obtained, doctors still need to evaluate by a variety of parameters. This work is burdensome, and the process is complex. We conducted image processing of HRM to predict the esophageal contraction vigor for assisting the evaluation of esophageal dynamic function. Firstly, we used Feature-Extraction and Histogram of Gradients (FE-HOG) to analyses feature of proposal of swallow (PoS) to further extract higher-order features. Then we determine the classification of esophageal contraction vigor normal, weak and failed by using linear-SVM according to these features. Our data set includes 3000 training sets, 500 validation sets and 411 test sets. After verification our accuracy reaches 86.83%, which is higher than other common machine learning methods. △ Less

Submitted 27 November, 2023; originally announced November 2023.

arXiv:2311.00230 [pdf, other]

DINO-Mix: Enhancing Visual Place Recognition with Foundational Vision Model and Feature Mixing

Authors: Gaoshuang Huang, Yang Zhou, Xiaofei Hu, Chenglong Zhang, Luying Zhao, Wenjian Gan, Mingbo Hou

Abstract: Utilizing visual place recognition (VPR) technology to ascertain the geographical location of publicly available images is a pressing issue for real-world VPR applications. Although most current VPR methods achieve favorable results under ideal conditions, their performance in complex environments, characterized by lighting variations, seasonal changes, and occlusions caused by moving objects, is… ▽ More Utilizing visual place recognition (VPR) technology to ascertain the geographical location of publicly available images is a pressing issue for real-world VPR applications. Although most current VPR methods achieve favorable results under ideal conditions, their performance in complex environments, characterized by lighting variations, seasonal changes, and occlusions caused by moving objects, is generally unsatisfactory. In this study, we utilize the DINOv2 model as the backbone network for trimming and fine-tuning to extract robust image features. We propose a novel VPR architecture called DINO-Mix, which combines a foundational vision model with feature aggregation. This architecture relies on the powerful image feature extraction capabilities of foundational vision models. We employ an MLP-Mixer-based mix module to aggregate image features, resulting in globally robust and generalizable descriptors that enable high-precision VPR. We experimentally demonstrate that the proposed DINO-Mix architecture significantly outperforms current state-of-the-art (SOTA) methods. In test sets having lighting variations, seasonal changes, and occlusions (Tokyo24/7, Nordland, SF-XL-Testv1), our proposed DINO-Mix architecture achieved Top-1 accuracy rates of 91.75%, 80.18%, and 82%, respectively. Compared with SOTA methods, our architecture exhibited an average accuracy improvement of 5.14%. △ Less

Submitted 5 December, 2023; v1 submitted 31 October, 2023; originally announced November 2023.

Comments: Under review / Open source code

arXiv:2311.00177 [pdf, other]

Students' Perspective on AI Code Completion: Benefits and Challenges

Authors: Wannita Takerngsaksiri, Cleshan Warusavitarne, Christian Yaacoub, Matthew Hee Keng Hou, Chakkrit Tantithamthavorn

Abstract: AI Code Completion (e.g., GitHub's Copilot) has revolutionized how computer science students interact with programming languages. However, AI code completion has been studied from the developers' perspectives, not the students' perspectives who represent the future generation of our digital world. In this paper, we investigated the benefits, challenges, and expectations of AI code completion from… ▽ More AI Code Completion (e.g., GitHub's Copilot) has revolutionized how computer science students interact with programming languages. However, AI code completion has been studied from the developers' perspectives, not the students' perspectives who represent the future generation of our digital world. In this paper, we investigated the benefits, challenges, and expectations of AI code completion from students' perspectives. To facilitate the study, we first developed an open-source Visual Studio Code Extension tool AutoAurora, powered by a state-of-the-art large language model StarCoder, as an AI code completion research instrument. Next, we conduct an interview study with ten student participants and apply grounded theory to help analyze insightful findings regarding the benefits, challenges, and expectations of students on AI code completion. Our findings show that AI code completion enhanced students' productivity and efficiency by providing correct syntax suggestions, offering alternative solutions, and functioning as a coding tutor. However, the over-reliance on AI code completion may lead to a surface-level understanding of programming concepts, diminishing problem-solving skills and restricting creativity. In the future, AI code completion should be explainable and provide best coding practices to enhance the education process. △ Less

Submitted 31 May, 2024; v1 submitted 31 October, 2023; originally announced November 2023.

Comments: Accepted at COMPSAC 2024 Workshop (The 7th IEEE International Workshop on Advances in Artificial Intelligence and Machine Learning: AI & ML for a Sustainable and Better Future)

arXiv:2310.10818 [pdf, other]

doi 10.1016/j.neucom.2023.01.076

Uncertainty-aware transfer across tasks using hybrid model-based successor feature reinforcement learning

Authors: Parvin Malekzadeh, Ming Hou, Konstantinos N. Plataniotis

Abstract: Sample efficiency is central to developing practical reinforcement learning (RL) for complex and large-scale decision-making problems. The ability to transfer and generalize knowledge gained from previous experiences to downstream tasks can significantly improve sample efficiency. Recent research indicates that successor feature (SF) RL algorithms enable knowledge generalization between tasks with… ▽ More Sample efficiency is central to developing practical reinforcement learning (RL) for complex and large-scale decision-making problems. The ability to transfer and generalize knowledge gained from previous experiences to downstream tasks can significantly improve sample efficiency. Recent research indicates that successor feature (SF) RL algorithms enable knowledge generalization between tasks with different rewards but identical transition dynamics. It has recently been hypothesized that combining model-based (MB) methods with SF algorithms can alleviate the limitation of fixed transition dynamics. Furthermore, uncertainty-aware exploration is widely recognized as another appealing approach for improving sample efficiency. Putting together two ideas of hybrid model-based successor feature (MB-SF) and uncertainty leads to an approach to the problem of sample efficient uncertainty-aware knowledge transfer across tasks with different transition dynamics or/and reward functions. In this paper, the uncertainty of the value of each action is approximated by a Kalman filter (KF)-based multiple-model adaptive estimation. This KF-based framework treats the parameters of a model as random variables. To the best of our knowledge, this is the first attempt at formulating a hybrid MB-SF algorithm capable of generalizing knowledge across large or continuous state space tasks with various transition dynamics while requiring less computation at decision time than MB methods. The number of samples required to learn the tasks was compared to recent SF and MB baselines. The results show that our algorithm generalizes its knowledge across different transition dynamics, learns downstream tasks with significantly fewer samples than starting from scratch, and outperforms existing approaches. △ Less

Submitted 22 July, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

Comments: 40 pages

Journal ref: Neurocomputing 530 (2023): 165-187

arXiv:2310.09706 [pdf, other]

AdaptSSR: Pre-training User Model with Augmentation-Adaptive Self-Supervised Ranking

Authors: Yang Yu, Qi Liu, Kai Zhang, Yuren Zhang, Chao Song, Min Hou, Yuqing Yuan, Zhihao Ye, Zaixi Zhang, Sanshi Lei Yu

Abstract: User modeling, which aims to capture users' characteristics or interests, heavily relies on task-specific labeled data and suffers from the data sparsity issue. Several recent studies tackled this problem by pre-training the user model on massive user behavior sequences with a contrastive learning task. Generally, these methods assume different views of the same behavior sequence constructed via d… ▽ More User modeling, which aims to capture users' characteristics or interests, heavily relies on task-specific labeled data and suffers from the data sparsity issue. Several recent studies tackled this problem by pre-training the user model on massive user behavior sequences with a contrastive learning task. Generally, these methods assume different views of the same behavior sequence constructed via data augmentation are semantically consistent, i.e., reflecting similar characteristics or interests of the user, and thus maximizing their agreement in the feature space. However, due to the diverse interests and heavy noise in user behaviors, existing augmentation methods tend to lose certain characteristics of the user or introduce noisy behaviors. Thus, forcing the user model to directly maximize the similarity between the augmented views may result in a negative transfer. To this end, we propose to replace the contrastive learning task with a new pretext task: Augmentation-Adaptive SelfSupervised Ranking (AdaptSSR), which alleviates the requirement of semantic consistency between the augmented views while pre-training a discriminative user model. Specifically, we adopt a multiple pairwise ranking loss which trains the user model to capture the similarity orders between the implicitly augmented view, the explicitly augmented view, and views from other users. We further employ an in-batch hard negative sampling strategy to facilitate model training. Moreover, considering the distinct impacts of data augmentation on different behavior sequences, we design an augmentation-adaptive fusion mechanism to automatically adjust the similarity order constraint applied to each sample based on the estimated similarity between the augmented views. Extensive experiments on both public and industrial datasets with six downstream tasks verify the effectiveness of AdaptSSR. △ Less

Submitted 24 October, 2023; v1 submitted 14 October, 2023; originally announced October 2023.

Comments: Accepted by NeurIPS 2023

arXiv:2309.16052 [pdf, other]

OceanChat: Piloting Autonomous Underwater Vehicles in Natural Language

Authors: Ruochu Yang, Mengxue Hou, Junkai Wang, Fumin Zhang

Abstract: In the trending research of fusing Large Language Models (LLMs) and robotics, we aim to pave the way for innovative development of AI systems that can enable Autonomous Underwater Vehicles (AUVs) to seamlessly interact with humans in an intuitive manner. We propose OceanChat, a system that leverages a closed-loop LLM-guided task and motion planning framework to tackle AUV missions in the wild. LLM… ▽ More In the trending research of fusing Large Language Models (LLMs) and robotics, we aim to pave the way for innovative development of AI systems that can enable Autonomous Underwater Vehicles (AUVs) to seamlessly interact with humans in an intuitive manner. We propose OceanChat, a system that leverages a closed-loop LLM-guided task and motion planning framework to tackle AUV missions in the wild. LLMs translate an abstract human command into a high-level goal, while a task planner further grounds the goal into a task sequence with logical constraints. To assist the AUV with understanding the task sequence, we utilize a motion planner to incorporate real-time Lagrangian data streams received by the AUV, thus mapping the task sequence into an executable motion plan. Considering the highly dynamic and partially known nature of the underwater environment, an event-triggered replanning scheme is developed to enhance the system's robustness towards uncertainty. We also build a simulation platform HoloEco that generates photo-realistic simulation for a wide range of AUV applications. Experimental evaluation verifies that the proposed system can achieve improved performance in terms of both success rate and computation time. Project website: \url{https://sites.google.com/view/oceanchat} △ Less

Submitted 27 September, 2023; originally announced September 2023.

arXiv:2308.05219 [pdf, other]

Decoding Layer Saliency in Language Transformers

Authors: Elizabeth M. Hou, Gregory Castanon

Abstract: In this paper, we introduce a strategy for identifying textual saliency in large-scale language models applied to classification tasks. In visual networks where saliency is more well-studied, saliency is naturally localized through the convolutional layers of the network; however, the same is not true in modern transformer-stack networks used to process natural language. We adapt gradient-based sa… ▽ More In this paper, we introduce a strategy for identifying textual saliency in large-scale language models applied to classification tasks. In visual networks where saliency is more well-studied, saliency is naturally localized through the convolutional layers of the network; however, the same is not true in modern transformer-stack networks used to process natural language. We adapt gradient-based saliency methods for these networks, propose a method for evaluating the degree of semantic coherence of each layer, and demonstrate consistent improvement over numerous other methods for textual saliency on multiple benchmark classification datasets. Our approach requires no additional training or access to labelled data, and is comparatively very computationally efficient. △ Less

Submitted 9 August, 2023; originally announced August 2023.

arXiv:2307.12524 [pdf, other]

Landslide Surface Displacement Prediction Based on VSXC-LSTM Algorithm

Authors: Menglin Kong, Ruichen Li, Fan Liu, Xingquan Li, Juan Cheng, Muzhou Hou, Cong Cao

Abstract: Landslide is a natural disaster that can easily threaten local ecology, people's lives and property. In this paper, we conduct modelling research on real unidirectional surface displacement data of recent landslides in the research area and propose a time series prediction framework named VMD-SegSigmoid-XGBoost-ClusterLSTM (VSXC-LSTM) based on variational mode decomposition, which can predict the… ▽ More Landslide is a natural disaster that can easily threaten local ecology, people's lives and property. In this paper, we conduct modelling research on real unidirectional surface displacement data of recent landslides in the research area and propose a time series prediction framework named VMD-SegSigmoid-XGBoost-ClusterLSTM (VSXC-LSTM) based on variational mode decomposition, which can predict the landslide surface displacement more accurately. The model performs well on the test set. Except for the random item subsequence that is hard to fit, the root mean square error (RMSE) and the mean absolute percentage error (MAPE) of the trend item subsequence and the periodic item subsequence are both less than 0.1, and the RMSE is as low as 0.006 for the periodic item prediction module based on XGBoost\footnote{Accepted in ICANN2023}. △ Less

Submitted 24 July, 2023; originally announced July 2023.

arXiv:2307.12519 [pdf, other]

DEPHN: Different Expression Parallel Heterogeneous Network using virtual gradient optimization for Multi-task Learning

Authors: Menglin Kong, Ri Su, Shaojie Zhao, Muzhou Hou

Abstract: Recommendation system algorithm based on multi-task learning (MTL) is the major method for Internet operators to understand users and predict their behaviors in the multi-behavior scenario of platform. Task correlation is an important consideration of MTL goals, traditional models use shared-bottom models and gating experts to realize shared representation learning and information differentiation.… ▽ More Recommendation system algorithm based on multi-task learning (MTL) is the major method for Internet operators to understand users and predict their behaviors in the multi-behavior scenario of platform. Task correlation is an important consideration of MTL goals, traditional models use shared-bottom models and gating experts to realize shared representation learning and information differentiation. However, The relationship between real-world tasks is often more complex than existing methods do not handle properly sharing information. In this paper, we propose an Different Expression Parallel Heterogeneous Network (DEPHN) to model multiple tasks simultaneously. DEPHN constructs the experts at the bottom of the model by using different feature interaction methods to improve the generalization ability of the shared information flow. In view of the model's differentiating ability for different task information flows, DEPHN uses feature explicit mapping and virtual gradient coefficient for expert gating during the training process, and adaptively adjusts the learning intensity of the gated unit by considering the difference of gating values and task correlation. Extensive experiments on artificial and real-world datasets demonstrate that our proposed method can capture task correlation in complex situations and achieve better performance than baseline models\footnote{Accepted in IJCNN2023}. △ Less

Submitted 24 July, 2023; originally announced July 2023.

arXiv:2307.12518 [pdf, other]

FaFCNN: A General Disease Classification Framework Based on Feature Fusion Neural Networks

Authors: Menglin Kong, Shaojie Zhao, Juan Cheng, Xingquan Li, Ri Su, Muzhou Hou, Cong Cao

Abstract: There are two fundamental problems in applying deep learning/machine learning methods to disease classification tasks, one is the insufficient number and poor quality of training samples; another one is how to effectively fuse multiple source features and thus train robust classification models. To address these problems, inspired by the process of human learning knowledge, we propose the Feature-… ▽ More There are two fundamental problems in applying deep learning/machine learning methods to disease classification tasks, one is the insufficient number and poor quality of training samples; another one is how to effectively fuse multiple source features and thus train robust classification models. To address these problems, inspired by the process of human learning knowledge, we propose the Feature-aware Fusion Correlation Neural Network (FaFCNN), which introduces a feature-aware interaction module and a feature alignment module based on domain adversarial learning. This is a general framework for disease classification, and FaFCNN improves the way existing methods obtain sample correlation features. The experimental results show that training using augmented features obtained by pre-training gradient boosting decision tree yields more performance gains than random-forest based methods. On the low-quality dataset with a large amount of missing data in our setup, FaFCNN obtains a consistently optimal performance compared to competitive baselines. In addition, extensive experiments demonstrate the robustness of the proposed method and the effectiveness of each component of the model\footnote{Accepted in IEEE SMC2023}. △ Less

Submitted 24 July, 2023; originally announced July 2023.

arXiv:2306.07114 [pdf, ps, other]

Coupled Attention Networks for Multivariate Time Series Anomaly Detection

Authors: Feng Xia, Xin Chen, Shuo Yu, Mingliang Hou, Mujie Liu, Linlin You

Abstract: Multivariate time series anomaly detection (MTAD) plays a vital role in a wide variety of real-world application domains. Over the past few years, MTAD has attracted rapidly increasing attention from both academia and industry. Many deep learning and graph learning models have been developed for effective anomaly detection in multivariate time series data, which enable advanced applications such a… ▽ More Multivariate time series anomaly detection (MTAD) plays a vital role in a wide variety of real-world application domains. Over the past few years, MTAD has attracted rapidly increasing attention from both academia and industry. Many deep learning and graph learning models have been developed for effective anomaly detection in multivariate time series data, which enable advanced applications such as smart surveillance and risk management with unprecedented capabilities. Nevertheless, MTAD is facing critical challenges deriving from the dependencies among sensors and variables, which often change over time. To address this issue, we propose a coupled attention-based neural network framework (CAN) for anomaly detection in multivariate time series data featuring dynamic variable relationships. We combine adaptive graph learning methods with graph attention to generate a global-local graph that can represent both global correlations and dynamic local correlations among sensors. To capture inter-sensor relationships and temporal dependencies, a convolutional neural network based on the global-local graph is integrated with a temporal self-attention module to construct a coupled attention module. In addition, we develop a multilevel encoder-decoder architecture that accommodates reconstruction and prediction tasks to better characterize multivariate time series data. Extensive experiments on real-world datasets have been conducted to evaluate the performance of the proposed CAN approach, and the results show that CAN significantly outperforms state-of-the-art baselines. △ Less

Submitted 12 June, 2023; originally announced June 2023.

arXiv:2305.12058 [pdf, other]

DADIN: Domain Adversarial Deep Interest Network for Cross Domain Recommender Systems

Authors: Menglin Kong, Muzhou Hou, Shaojie Zhao, Feng Liu, Ri Su, Yinghao Chen

Abstract: Click-Through Rate (CTR) prediction is one of the main tasks of the recommendation system, which is conducted by a user for different items to give the recommendation results. Cross-domain CTR prediction models have been proposed to overcome problems of data sparsity, long tail distribution of user-item interactions, and cold start of items or users. In order to make knowledge transfer from source… ▽ More Click-Through Rate (CTR) prediction is one of the main tasks of the recommendation system, which is conducted by a user for different items to give the recommendation results. Cross-domain CTR prediction models have been proposed to overcome problems of data sparsity, long tail distribution of user-item interactions, and cold start of items or users. In order to make knowledge transfer from source domain to target domain more smoothly, an innovative deep learning cross-domain CTR prediction model, Domain Adversarial Deep Interest Network (DADIN) is proposed to convert the cross-domain recommendation task into a domain adaptation problem. The joint distribution alignment of two domains is innovatively realized by introducing domain agnostic layers and specially designed loss, and optimized together with CTR prediction loss in a way of adversarial training. It is found that the Area Under Curve (AUC) of DADIN is 0.08% higher than the most competitive baseline on Huawei dataset and is 0.71% higher than its competitors on Amazon dataset, achieving the state-of-the-art results on the basis of the evaluation of this model performance on two real datasets. The ablation study shows that by introducing adversarial method, this model has respectively led to the AUC improvements of 2.34% on Huawei dataset and 16.67% on Amazon dataset. △ Less

Submitted 19 May, 2023; originally announced May 2023.

arXiv:2304.13096 [pdf, other]

Real-time Autonomous Glider Navigation Software

Authors: Ruochu Yang, Mengxue Hou, Chad Lembke, Catherine Edwards, Fumin Zhang

Abstract: Underwater gliders are widely utilized for ocean sampling, surveillance, and other various oceanic applications. In the context of complex ocean environments, gliders may yield poor navigation performance due to strong ocean currents, thus requiring substantial human effort during the manual piloting process. To enhance navigation accuracy, we developed a real-time autonomous glider navigation sof… ▽ More Underwater gliders are widely utilized for ocean sampling, surveillance, and other various oceanic applications. In the context of complex ocean environments, gliders may yield poor navigation performance due to strong ocean currents, thus requiring substantial human effort during the manual piloting process. To enhance navigation accuracy, we developed a real-time autonomous glider navigation software, named GENIoS Python, which generates waypoints based on flow predictions to assist human piloting. The software is designed to closely check glider status, provide customizable experiment settings, utilize lightweight computing resources, offer stably communicate with dockservers, robustly run for extended operation time, and quantitatively compare flow estimates, which add to its value as an autonomous tool for underwater glider navigation. △ Less

Submitted 20 December, 2023; v1 submitted 25 April, 2023; originally announced April 2023.

Comments: OCEANS 2023 Limerick

arXiv:2212.12963 [pdf, other]

Anomaly Detection of Underwater Gliders Verified by Deployment Data

Authors: Ruochu Yang, Mengxue Hou, Chad Lembke, Catherine Edwards, Fumin Zhang

Abstract: This paper utilizes an anomaly detection algorithm to check if underwater gliders are operating normally in the unknown ocean environment. Glider pilots can be warned of the detected glider anomaly in real time, thus taking over the glider appropriately and avoiding further damage to the glider. The adopted algorithm is validated by two valuable sets of data in real glider deployments, the Univers… ▽ More This paper utilizes an anomaly detection algorithm to check if underwater gliders are operating normally in the unknown ocean environment. Glider pilots can be warned of the detected glider anomaly in real time, thus taking over the glider appropriately and avoiding further damage to the glider. The adopted algorithm is validated by two valuable sets of data in real glider deployments, the University of South Florida (USF) glider Stella and the Skidaway Institute of Oceanography (SkIO) glider Angus. △ Less

Submitted 27 December, 2022; v1 submitted 25 December, 2022; originally announced December 2022.

Comments: 10 pages, 16 figures, accepted by the International Symposium on Underwater Technology (UT23)

arXiv:2210.15125 [pdf, other]

ViT-CAT: Parallel Vision Transformers with Cross Attention Fusion for Popularity Prediction in MEC Networks

Authors: Zohreh HajiAkhondi-Meybodi, Arash Mohammadi, Ming Hou, Jamshid Abouei, Konstantinos N. Plataniotis

Abstract: Mobile Edge Caching (MEC) is a revolutionary technology for the Sixth Generation (6G) of wireless networks with the promise to significantly reduce users' latency via offering storage capacities at the edge of the network. The efficiency of the MEC network, however, critically depends on its ability to dynamically predict/update the storage of caching nodes with the top-K popular contents. Convent… ▽ More Mobile Edge Caching (MEC) is a revolutionary technology for the Sixth Generation (6G) of wireless networks with the promise to significantly reduce users' latency via offering storage capacities at the edge of the network. The efficiency of the MEC network, however, critically depends on its ability to dynamically predict/update the storage of caching nodes with the top-K popular contents. Conventional statistical caching schemes are not robust to the time-variant nature of the underlying pattern of content requests, resulting in a surge of interest in using Deep Neural Networks (DNNs) for time-series popularity prediction in MEC networks. However, existing DNN models within the context of MEC fail to simultaneously capture both temporal correlations of historical request patterns and the dependencies between multiple contents. This necessitates an urgent quest to develop and design a new and innovative popularity prediction architecture to tackle this critical challenge. The paper addresses this gap by proposing a novel hybrid caching framework based on the attention mechanism. Referred to as the parallel Vision Transformers with Cross Attention (ViT-CAT) Fusion, the proposed architecture consists of two parallel ViT networks, one for collecting temporal correlation, and the other for capturing dependencies between different contents. Followed by a Cross Attention (CA) module as the Fusion Center (FC), the proposed ViT-CAT is capable of learning the mutual information between temporal and spatial correlations, as well, resulting in improving the classification accuracy, and decreasing the model's complexity about 8 times. Based on the simulation results, the proposed ViT-CAT architecture outperforms its counterparts across the classification accuracy, complexity, and cache-hit ratio. △ Less

Submitted 26 October, 2022; originally announced October 2022.

arXiv:2210.05874 [pdf, other]

Multi-Content Time-Series Popularity Prediction with Multiple-Model Transformers in MEC Networks

Authors: Zohreh HajiAkhondi-Meybodi, Arash Mohammadi, Ming Hou, Elahe Rahimian, Shahin Heidarian, Jamshid Abouei, Konstantinos N. Plataniotis

Abstract: Coded/uncoded content placement in Mobile Edge Caching (MEC) has evolved as an efficient solution to meet the significant growth of global mobile data traffic by boosting the content diversity in the storage of caching nodes. To meet the dynamic nature of the historical request pattern of multimedia contents, the main focus of recent researches has been shifted to develop data-driven and real-time… ▽ More Coded/uncoded content placement in Mobile Edge Caching (MEC) has evolved as an efficient solution to meet the significant growth of global mobile data traffic by boosting the content diversity in the storage of caching nodes. To meet the dynamic nature of the historical request pattern of multimedia contents, the main focus of recent researches has been shifted to develop data-driven and real-time caching schemes. In this regard and with the assumption that users' preferences remain unchanged over a short horizon, the Top-K popular contents are identified as the output of the learning model. Most existing datadriven popularity prediction models, however, are not suitable for the coded/uncoded content placement frameworks. On the one hand, in coded/uncoded content placement, in addition to classifying contents into two groups, i.e., popular and nonpopular, the probability of content request is required to identify which content should be stored partially/completely, where this information is not provided by existing data-driven popularity prediction models. On the other hand, the assumption that users' preferences remain unchanged over a short horizon only works for content with a smooth request pattern. To tackle these challenges, we develop a Multiple-model (hybrid) Transformer-based Edge Caching (MTEC) framework with higher generalization ability, suitable for various types of content with different time-varying behavior, that can be adapted with coded/uncoded content placement frameworks. Simulation results corroborate the effectiveness of the proposed MTEC caching framework in comparison to its counterparts in terms of the cache-hit ratio, classification accuracy, and the transferred byte volume. △ Less

Submitted 11 October, 2022; originally announced October 2022.

arXiv:2206.09184 [pdf, other]

PHN: Parallel heterogeneous network with soft gating for CTR prediction

Authors: Ri Su, Alphonse Houssou Hounye, Cong Cao, Muzhou Hou

Abstract: The Click-though Rate (CTR) prediction task is a basic task in recommendation system. Most of the previous researches of CTR models built based on Wide \& deep structure and gradually evolved into parallel structures with different modules. However, the simple accumulation of parallel structures can lead to higher structural complexity and longer training time. Based on the Sigmoid activation func… ▽ More The Click-though Rate (CTR) prediction task is a basic task in recommendation system. Most of the previous researches of CTR models built based on Wide \& deep structure and gradually evolved into parallel structures with different modules. However, the simple accumulation of parallel structures can lead to higher structural complexity and longer training time. Based on the Sigmoid activation function of output layer, the linear addition activation value of parallel structures in the training process is easy to make the samples fall into the weak gradient interval, resulting in the phenomenon of weak gradient, and reducing the effectiveness of training. To this end, this paper proposes a Parallel Heterogeneous Network (PHN) model, which constructs a network with parallel structure through three different interaction analysis methods, and uses Soft Selection Gating (SSG) to feature heterogeneous data with different structure. Finally, residual link with trainable parameters are used in the network to mitigate the influence of weak gradient phenomenon. Furthermore, we demonstrate the effectiveness of PHN in a large number of comparative experiments, and visualize the performance of the model in training process and structure. △ Less

Submitted 18 June, 2022; originally announced June 2022.

arXiv:2205.08422 [pdf, other]

JUNO: Jump-Start Reinforcement Learning-based Node Selection for UWB Indoor Localization

Authors: Zohreh Hajiakhondi-Meybodi, Ming Hou, Arash Mohammadi

Abstract: Ultra-Wideband (UWB) is one of the key technologies empowering the Internet of Thing (IoT) concept to perform reliable, energy-efficient, and highly accurate monitoring, screening, and localization in indoor environments. Performance of UWB-based localization systems, however, can significantly degrade because of Non Line of Sight (NLoS) connections between a mobile user and UWB beacons. To mitiga… ▽ More Ultra-Wideband (UWB) is one of the key technologies empowering the Internet of Thing (IoT) concept to perform reliable, energy-efficient, and highly accurate monitoring, screening, and localization in indoor environments. Performance of UWB-based localization systems, however, can significantly degrade because of Non Line of Sight (NLoS) connections between a mobile user and UWB beacons. To mitigate the destructive effects of NLoS connections, we target development of a Reinforcement Learning (RL) anchor selection framework that can efficiently cope with the dynamic nature of indoor environments. Existing RL models in this context, however, lack the ability to generalize well to be used in a new setting. Moreover, it takes a long time for the conventional RL models to reach the optimal policy. To tackle these challenges, we propose the Jump-start RL-based Uwb NOde selection (JUNO) framework, which performs real-time location predictions without relying on complex NLoS identification/mitigation methods. The effectiveness of the proposed JUNO framework is evaluated in term of the location error, where the mobile user moves randomly through an ultra-dense indoor environment with a high chance of establishing NLoS connections. Simulation results corroborate the effectiveness of the proposed framework in comparison to its state-of-the-art counterparts. △ Less

Submitted 6 May, 2022; originally announced May 2022.

arXiv:2204.00049 [pdf, other]

doi 10.1016/j.neucom.2021.10.008

AKF-SR: Adaptive Kalman Filtering-based Successor Representation

Authors: Parvin Malekzadeh, Mohammad Salimibeni, Ming Hou, Arash Mohammadi, Konstantinos N. Plataniotis

Abstract: Recent studies in neuroscience suggest that Successor Representation (SR)-based models provide adaptation to changes in the goal locations or reward function faster than model-free algorithms, together with lower computational cost compared to that of model-based algorithms. However, it is not known how such representation might help animals to manage uncertainty in their decision-making. Existing… ▽ More Recent studies in neuroscience suggest that Successor Representation (SR)-based models provide adaptation to changes in the goal locations or reward function faster than model-free algorithms, together with lower computational cost compared to that of model-based algorithms. However, it is not known how such representation might help animals to manage uncertainty in their decision-making. Existing methods for SR learning do not capture uncertainty about the estimated SR. In order to address this issue, the paper presents a Kalman filter-based SR framework, referred to as Adaptive Kalman Filtering-based Successor Representation (AKF-SR). First, Kalman temporal difference approach, which is a combination of the Kalman filter and the temporal difference method, is used within the AKF-SR framework to cast the SR learning procedure into a filtering problem to benefit from the uncertainty estimation of the SR, and also decreases in memory requirement and sensitivity to model's parameters in comparison to deep neural network-based algorithms. An adaptive Kalman filtering approach is then applied within the proposed AKF-SR framework in order to tune the measurement noise covariance and measurement mapping function of Kalman filter as the most important parameters affecting the filter's performance. Moreover, an active learning method that exploits the estimated uncertainty of the SR to form the behaviour policy leading to more visits to less certain values is proposed to improve the overall performance of an agent in terms of received rewards while interacting with its environment. △ Less

Submitted 31 March, 2022; originally announced April 2022.

Journal ref: Neurocomputing 467 (2022), pp.476-490

arXiv:2202.10339 [pdf, other]

doi 10.1109/TITS.2022.3148116

Exploring Human Mobility for Multi-Pattern Passenger Prediction: A Graph Learning Framework

Authors: Xiangjie Kong, Kailai Wang, Mingliang Hou, Feng Xia, Gour Karmakar, Jianxin Li

Abstract: Traffic flow prediction is an integral part of an intelligent transportation system and thus fundamental for various traffic-related applications. Buses are an indispensable way of moving for urban residents with fixed routes and schedules, which leads to latent travel regularity. However, human mobility patterns, specifically the complex relationships between bus passengers, are deeply hidden in… ▽ More Traffic flow prediction is an integral part of an intelligent transportation system and thus fundamental for various traffic-related applications. Buses are an indispensable way of moving for urban residents with fixed routes and schedules, which leads to latent travel regularity. However, human mobility patterns, specifically the complex relationships between bus passengers, are deeply hidden in this fixed mobility mode. Although many models exist to predict traffic flow, human mobility patterns have not been well explored in this regard. To reduce this research gap and learn human mobility knowledge from this fixed travel behaviors, we propose a multi-pattern passenger flow prediction framework, MPGCN, based on Graph Convolutional Network (GCN). Firstly, we construct a novel sharing-stop network to model relationships between passengers based on bus record data. Then, we employ GCN to extract features from the graph by learning useful topology information and introduce a deep clustering method to recognize mobility patterns hidden in bus passengers. Furthermore, to fully utilize Spatio-temporal information, we propose GCN2Flow to predict passenger flow based on various mobility patterns. To the best of our knowledge, this paper is the first work to adopt a multipattern approach to predict the bus passenger flow from graph learning. We design a case study for optimizing routes. Extensive experiments upon a real-world bus dataset demonstrate that MPGCN has potential efficacy in passenger flow prediction and route optimization. △ Less

Submitted 17 February, 2022; originally announced February 2022.

Journal ref: IEEE Transactions on Intelligent Transportation Systems, 2022

arXiv:2201.08883 [pdf, other]

The Rational Selection of Goal Operations and the Integration ofSearch Strategies with Goal-Driven Autonomy

Authors: Sravya Kondrakunta, Venkatsampath Raja Gogineni, Michael T. Cox, Demetris Coleman, Xiaobao Tan, Tony Lin, Mengxue Hou, Fumin Zhang, Frank McQuarrie, Catherine R. Edwards

Abstract: Intelligent physical systems as embodied cognitive systems must perform high-level reasoning while concurrently managing an underlying control architecture. The link between cognition and control must manage the problem of converting continuous values from the real world to symbolic representations (and back). To generate effective behaviors, reasoning must include a capacity to replan, acquire an… ▽ More Intelligent physical systems as embodied cognitive systems must perform high-level reasoning while concurrently managing an underlying control architecture. The link between cognition and control must manage the problem of converting continuous values from the real world to symbolic representations (and back). To generate effective behaviors, reasoning must include a capacity to replan, acquire and update new information, detect and respond to anomalies, and perform various operations on system goals. But, these processes are not independent and need further exploration. This paper examines an agent's choices when multiple goal operations co-occur and interact, and it establishes a method of choosing between them. We demonstrate the benefits and discuss the trade offs involved with this and show positive results in a dynamic marine search task. △ Less

Submitted 21 January, 2022; originally announced January 2022.

Comments: Presented at The Ninth Advances in Cognitive Systems (ACS) Conference 2021 (arXiv:2201.06134)

Report number: Report-no: ACS2021/08

arXiv:2201.02660 [pdf, ps, other]

A Multi-Behavior Planning Framework for Robot Guide

Authors: Muhan Hou, Zonghao Mu, Jing Li, Qizhi Yu, Jason Gu

Abstract: The guiding task of a mobile robot requires not only human-aware navigation, but also appropriate yet timely interaction for active instruction. State-of-the-art tour-guide models limit their socially-aware consideration to adapting to users' motion, ignoring the interactive behavior planning to fulfill the communicative demands. We propose a multi-behavior planning framework based on Monte Carlo… ▽ More The guiding task of a mobile robot requires not only human-aware navigation, but also appropriate yet timely interaction for active instruction. State-of-the-art tour-guide models limit their socially-aware consideration to adapting to users' motion, ignoring the interactive behavior planning to fulfill the communicative demands. We propose a multi-behavior planning framework based on Monte Carlo Tree Search to better assist users to understand confusing scene contexts, select proper paths and timely arrive at the destination. To provide proactive guidance, we construct a sampling-based probability model of human motion to consider the interrelated effects between robots and humans. We validate our method both in simulation and real-world experiments along with performance comparison with state-of-the-art models. △ Less

Submitted 7 January, 2022; originally announced January 2022.

arXiv:2111.12483 [pdf, other]

LDP-Net: An Unsupervised Pansharpening Network Based on Learnable Degradation Processes

Authors: Jiahui Ni, Zhimin Shao, Zhongzhou Zhang, Mingzheng Hou, Jiliu Zhou, Leyuan Fang, Yi Zhang

Abstract: Pansharpening in remote sensing image aims at acquiring a high-resolution multispectral (HRMS) image directly by fusing a low-resolution multispectral (LRMS) image with a panchromatic (PAN) image. The main concern is how to effectively combine the rich spectral information of LRMS image with the abundant spatial information of PAN image. Recently, many methods based on deep learning have been prop… ▽ More Pansharpening in remote sensing image aims at acquiring a high-resolution multispectral (HRMS) image directly by fusing a low-resolution multispectral (LRMS) image with a panchromatic (PAN) image. The main concern is how to effectively combine the rich spectral information of LRMS image with the abundant spatial information of PAN image. Recently, many methods based on deep learning have been proposed for the pansharpening task. However, these methods usually has two main drawbacks: 1) requiring HRMS for supervised learning; and 2) simply ignoring the latent relation between the MS and PAN image and fusing them directly. To solve these problems, we propose a novel unsupervised network based on learnable degradation processes, dubbed as LDP-Net. A reblurring block and a graying block are designed to learn the corresponding degradation processes, respectively. In addition, a novel hybrid loss function is proposed to constrain both spatial and spectral consistency between the pansharpened image and the PAN and LRMS images at different resolutions. Experiments on Worldview2 and Worldview3 images demonstrate that our proposed LDP-Net can fuse PAN and LRMS images effectively without the help of HRMS samples, achieving promising performance in terms of both qualitative visual effects and quantitative metrics. △ Less

Submitted 24 November, 2021; originally announced November 2021.

arXiv:2110.15568 [pdf, other]

Unsupervised PET Reconstruction from a Bayesian Perspective

Authors: Chenyu Shen, Wenjun Xia, Hongwei Ye, Mingzheng Hou, Hu Chen, Yan Liu, Jiliu Zhou, Yi Zhang

Abstract: Positron emission tomography (PET) reconstruction has become an ill-posed inverse problem due to low-count projection data, and a robust algorithm is urgently required to improve imaging quality. Recently, the deep image prior (DIP) has drawn much attention and has been successfully applied in several image restoration tasks, such as denoising and inpainting, since it does not need any labels (ref… ▽ More Positron emission tomography (PET) reconstruction has become an ill-posed inverse problem due to low-count projection data, and a robust algorithm is urgently required to improve imaging quality. Recently, the deep image prior (DIP) has drawn much attention and has been successfully applied in several image restoration tasks, such as denoising and inpainting, since it does not need any labels (reference image). However, overfitting is a vital defect of this framework. Hence, many methods have been proposed to mitigate this problem, and DeepRED is a typical representation that combines DIP and regularization by denoising (RED). In this article, we leverage DeepRED from a Bayesian perspective to reconstruct PET images from a single corrupted sinogram without any supervised or auxiliary information. In contrast to the conventional denoisers customarily used in RED, a DnCNN-like denoiser, which can add an adaptive constraint to DIP and facilitate the computation of derivation, is employed. Moreover, to further enhance the regularization, Gaussian noise is injected into the gradient updates, deriving a Markov chain Monte Carlo (MCMC) sampler. Experimental studies on brain and whole-body datasets demonstrate that our proposed method can achieve better performance in terms of qualitative and quantitative results compared to several classic and state-of-the-art methods. △ Less

Submitted 29 October, 2021; originally announced October 2021.

arXiv:2110.05076 [pdf, other]

A Closer Look at Prototype Classifier for Few-shot Image Classification

Authors: Mingcheng Hou, Issei Sato

Abstract: The prototypical network is a prototype classifier based on meta-learning and is widely used for few-shot learning because it classifies unseen examples by constructing class-specific prototypes without adjusting hyper-parameters during meta-testing. Interestingly, recent research has attracted a lot of attention, showing that training a new linear classifier, which does not use a meta-learning al… ▽ More The prototypical network is a prototype classifier based on meta-learning and is widely used for few-shot learning because it classifies unseen examples by constructing class-specific prototypes without adjusting hyper-parameters during meta-testing. Interestingly, recent research has attracted a lot of attention, showing that training a new linear classifier, which does not use a meta-learning algorithm, performs comparably with the prototypical network. However, the training of a new linear classifier requires the retraining of the classifier every time a new class appears. In this paper, we analyze how a prototype classifier works equally well without training a new linear classifier or meta-learning. We experimentally find that directly using the feature vectors, which is extracted by using standard pre-trained models to construct a prototype classifier in meta-testing, does not perform as well as the prototypical network and training new linear classifiers on the feature vectors of pre-trained models. Thus, we derive a novel generalization bound for a prototypical classifier and show that the transformation of a feature vector can improve the performance of prototype classifiers. We experimentally investigate several normalization methods for minimizing the derived bound and find that the same performance can be obtained by using the L2 normalization and minimizing the ratio of the within-class variance to the between-class variance without training a new classifier or meta-learning. △ Less

Submitted 15 September, 2022; v1 submitted 11 October, 2021; originally announced October 2021.

Comments: 21 pages with 10 appendix section Our paper has been accepted in 36th Conference on Neural Information Processing Systems (NeurIPS 2022)

arXiv:2108.13157 [pdf, other]

doi 10.1109/TSP.2022.3171678

DQLEL: Deep Q-Learning for Energy-Optimized LoS/NLoS UWB Node Selection

Authors: Zohreh Hajiakhondi-Meybodi, Arash Mohammadi, Ming Hou, Konstantinos N. Plataniotis

Abstract: Recent advancements in Internet of Things (IoTs) have brought about a surge of interest in indoor positioning for the purpose of providing reliable, accurate, and energy-efficient indoor navigation/localization systems. Ultra Wide Band (UWB) technology has been emerged as a potential candidate to satisfy the aforementioned requirements. Although UWB technology can enhance the accuracy of indoor po… ▽ More Recent advancements in Internet of Things (IoTs) have brought about a surge of interest in indoor positioning for the purpose of providing reliable, accurate, and energy-efficient indoor navigation/localization systems. Ultra Wide Band (UWB) technology has been emerged as a potential candidate to satisfy the aforementioned requirements. Although UWB technology can enhance the accuracy of indoor positioning due to the use of a wide-frequency spectrum, there are key challenges ahead for its efficient implementation. On the one hand, achieving high precision in positioning relies on the identification/mitigation Non Line of Sight (NLoS) links, leading to a significant increase in the complexity of the localization framework. On the other hand, UWB beacons have a limited battery life, which is especially problematic in practical circumstances with certain beacons located in strategic positions. To address these challenges, we introduce an efficient node selection framework to enhance the location accuracy without using complex NLoS mitigation methods, while maintaining a balance between the remaining battery life of UWB beacons. Referred to as the Deep Q-Learning Energy-optimized LoS/NLoS (DQLEL) UWB node selection framework, the mobile user is autonomously trained to determine the optimal set of UWB beacons to be localized based on the 2-D Time Difference of Arrival (TDoA) framework. The effectiveness of the proposed DQLEL framework is evaluated in terms of the link condition, the deviation of the remaining battery life of UWB beacons, location error, and cumulative rewards. Based on the simulation results, the proposed DQLEL framework significantly outperformed its counterparts across the aforementioned aspects. △ Less

Submitted 25 October, 2021; v1 submitted 24 August, 2021; originally announced August 2021.

arXiv:2103.03770 [pdf, other]

Matching Algorithms: Fundamentals, Applications and Challenges

Authors: Jing Ren, Feng Xia, Xiangtai Chen, Jiaying Liu, Mingliang Hou, Ahsan Shehzad, Nargiz Sultanova, Xiangjie Kong

Abstract: Matching plays a vital role in the rational allocation of resources in many areas, ranging from market operation to people's daily lives. In economics, the term matching theory is coined for pairing two agents in a specific market to reach a stable or optimal state. In computer science, all branches of matching problems have emerged, such as the question-answer matching in information retrieval, u… ▽ More Matching plays a vital role in the rational allocation of resources in many areas, ranging from market operation to people's daily lives. In economics, the term matching theory is coined for pairing two agents in a specific market to reach a stable or optimal state. In computer science, all branches of matching problems have emerged, such as the question-answer matching in information retrieval, user-item matching in a recommender system, and entity-relation matching in the knowledge graph. A preference list is the core element during a matching process, which can either be obtained directly from the agents or generated indirectly by prediction. Based on the preference list access, matching problems are divided into two categories, i.e., explicit matching and implicit matching. In this paper, we first introduce the matching theory's basic models and algorithms in explicit matching. The existing methods for coping with various matching problems in implicit matching are reviewed, such as retrieval matching, user-item matching, entity-relation matching, and image matching. Furthermore, we look into representative applications in these areas, including marriage and labor markets in explicit matching and several similarity-based matching problems in implicit matching. Finally, this survey paper concludes with a discussion of open issues and promising future directions in the field of matching. △ Less

Submitted 16 March, 2021; v1 submitted 5 March, 2021; originally announced March 2021.

Comments: 20 pages, 5 figures

arXiv:2102.07617 [pdf]

doi 10.1098/rsta.2020.0362

On the Philosophical, Cognitive and Mathematical Foundations of Symbiotic Autonomous Systems (SAS)

Authors: Yingxu Wang, Fakhri Karray, Sam Kwong, Konstantinos N. Plataniotis, Henry Leung, Ming Hou, Edward Tunstel, Imre J. Rudas, Ljiljana Trajkovic, Okyay Kaynak, Janusz Kacprzyk, Mengchu Zhou, Michael H. Smith, Philip Chen, Shushma Patel

Abstract: Symbiotic Autonomous Systems (SAS) are advanced intelligent and cognitive systems exhibiting autonomous collective intelligence enabled by coherent symbiosis of human-machine interactions in hybrid societies. Basic research in the emerging field of SAS has triggered advanced general AI technologies functioning without human intervention or hybrid symbiotic systems synergizing humans and intelligen… ▽ More Symbiotic Autonomous Systems (SAS) are advanced intelligent and cognitive systems exhibiting autonomous collective intelligence enabled by coherent symbiosis of human-machine interactions in hybrid societies. Basic research in the emerging field of SAS has triggered advanced general AI technologies functioning without human intervention or hybrid symbiotic systems synergizing humans and intelligent machines into coherent cognitive systems. This work presents a theoretical framework of SAS underpinned by the latest advances in intelligence, cognition, computer, and system sciences. SAS are characterized by the composition of autonomous and symbiotic systems that adopt bio-brain-social-inspired and heterogeneously synergized structures and autonomous behaviors. This paper explores their cognitive and mathematical foundations. The challenge to seamless human-machine interactions in a hybrid environment is addressed. SAS-based collective intelligence is explored in order to augment human capability by autonomous machine intelligence towards the next generation of general AI, autonomous computers, and trustworthy mission-critical intelligent systems. Emerging paradigms and engineering applications of SAS are elaborated via an autonomous knowledge learning system that symbiotically works between humans and cognitive robots. △ Less

Submitted 11 February, 2021; originally announced February 2021.

Comments: Accepted by Phil. Trans. Royal Society (A): Math, Phys & Engg Sci., 379(219x), 2021, Oxford, UK

Journal ref: Phil. Trans. Royal Society (A): Math, Phys & Engg Sci., 379(219x), 2021, Oxford, UK

arXiv:2101.11787 [pdf, other]

Joint Transmission Scheme and Coded Content Placement in Cluster-centric UAV-aided Cellular Networks

Authors: Zohreh HajiAkhondi-Meybodi, Arash Mohammadi, Jamshid Abouei, Ming Hou, Konstantinos N. Plataniotis

Abstract: Recently, as a consequence of the COVID-19 pandemic, dependence on telecommunication for remote working and telemedicine has significantly increased. In cellular networks, incorporation of Unmanned Aerial Vehicles (UAVs) can result in enhanced connectivity for outdoor users due to the high probability of establishing Line of Sight (LoS) links. The UAV's limited battery life and its signal attenuat… ▽ More Recently, as a consequence of the COVID-19 pandemic, dependence on telecommunication for remote working and telemedicine has significantly increased. In cellular networks, incorporation of Unmanned Aerial Vehicles (UAVs) can result in enhanced connectivity for outdoor users due to the high probability of establishing Line of Sight (LoS) links. The UAV's limited battery life and its signal attenuation in indoor areas, however, make it inefficient to manage users' requests in indoor environments. Referred to as the Cluster centric and Coded UAV-aided Femtocaching (CCUF) framework, the network's coverage in both indoor and outdoor environments increases via a two-phase clustering for FAPs' formation and UAVs' deployment. First objective is to increase the content diversity. In this context, we propose a coded content placement in a cluster-centric cellular network, which is integrated with the Coordinated Multi-Point (CoMP) to mitigate the inter-cell interference in edge areas. Then, we compute, experimentally, the number of coded contents to be stored in each caching node to increase the cache-hit ratio, Signal-to-Interference-plus-Noise Ratio (SINR), and cache diversity and decrease the users' access delay and cache redundancy for different content popularity profiles. Capitalizing on clustering, our second objective is to assign the best caching node to indoor/outdoor users for managing their requests. In this regard, we define the movement speed of ground users as the decision metric of the transmission scheme for serving outdoor users' requests to avoid frequent handovers between FAPs and increase the battery life of UAVs. Simulation results illustrate that the proposed CCUF implementation increases the cache hit-ratio, SINR, and cache diversity and decrease the users' access delay, cache redundancy and UAVs' energy consumption. △ Less

Submitted 15 July, 2021; v1 submitted 27 January, 2021; originally announced January 2021.

arXiv:2008.02359 [pdf, other]

doi 10.1109/ACCESS.2020.3015855

Risk, Trust, and Bias: Causal Regulators of Biometric-Enabled Decision Support

Authors: Kenneth Lai, Helder C. R. Oliveira, Ming Hou, Svetlana N. Yanushkevich, Vlad P. Shmerko

Abstract: Biometrics and biometric-enabled decision support systems (DSS) have become a mandatory part of complex dynamic systems such as security checkpoints, personal health monitoring systems, autonomous robots, and epidemiological surveillance. Risk, trust, and bias (R-T-B) are emerging measures of performance of such systems. The existing studies on the R-T-B impact on system performance mostly ignore… ▽ More Biometrics and biometric-enabled decision support systems (DSS) have become a mandatory part of complex dynamic systems such as security checkpoints, personal health monitoring systems, autonomous robots, and epidemiological surveillance. Risk, trust, and bias (R-T-B) are emerging measures of performance of such systems. The existing studies on the R-T-B impact on system performance mostly ignore the complementary nature of R-T-B and their causal relationships, for instance, risk of trust, risk of bias, and risk of trust over biases. This paper offers a complete taxonomy of the R-T-B causal performance regulators for the biometric-enabled DSS. The proposed novel taxonomy links the R-T-B assessment to the causal inference mechanism for reasoning in decision making. Practical details of the R-T-B assessment in the DSS are demonstrated using the experiments of assessing the trust in synthetic biometric and the risk of bias in face biometrics. The paper also outlines the emerging applications of the proposed approach beyond biometrics, including decision support for epidemiological surveillance such as for COVID-19 pandemics. △ Less

Submitted 13 August, 2020; v1 submitted 5 August, 2020; originally announced August 2020.

Comments: Accepted to IEEE ACCESS

arXiv:2007.14361 [pdf, other]

doi 10.23919/Eusipco47968.2020.9287384

Assessing Risks of Biases in Cognitive Decision Support Systems

Authors: Kenneth Lai, Helder C. R. Oliveira, Ming Hou, Svetlana N. Yanushkevich, Vlad Shmerko

Abstract: Recognizing, assessing, countering, and mitigating the biases of different nature from heterogeneous sources is a critical problem in designing a cognitive Decision Support System (DSS). An example of such a system is a cognitive biometric-enabled security checkpoint. Biased algorithms affect the decision-making process in an unpredictable way, e.g. face recognition for different demographic group… ▽ More Recognizing, assessing, countering, and mitigating the biases of different nature from heterogeneous sources is a critical problem in designing a cognitive Decision Support System (DSS). An example of such a system is a cognitive biometric-enabled security checkpoint. Biased algorithms affect the decision-making process in an unpredictable way, e.g. face recognition for different demographic groups may severely impact the risk assessment at a checkpoint. This paper addresses a challenging research question on how to manage an ensemble of biases? We provide performance projections of the DSS operational landscape in terms of biases. A probabilistic reasoning technique is used for assessment of the risk of such biases. We also provide a motivational experiment using face biometric component of the checkpoint system which highlights the discovery of an ensemble of biases and the techniques to assess their risks. △ Less

Submitted 28 July, 2020; originally announced July 2020.

Comments: submitted to 28th European Signal Processing Conference (EUSIPCO 2020)

Journal ref: 2020 28th European Signal Processing Conference (EUSIPCO), Amsterdam, 2021, pp. 840-844

arXiv:1910.12283 [pdf, other]

Long-term Joint Scheduling for Urban Traffic

Authors: Xianfeng Liang, Likang Wu, Joya Chen, Yang Liu, Runlong Yu, Min Hou, Han Wu, Yuyang Ye, Qi Liu, Enhong Chen

Abstract: Recently, the traffic congestion in modern cities has become a growing worry for the residents. As presented in Baidu traffic report, the commuting stress index has reached surprising 1.973 in Beijing during rush hours, which results in longer trip time and increased vehicular queueing. Previous works have demonstrated that by reasonable scheduling, e.g, rebalancing bike-sharing systems and optimi… ▽ More Recently, the traffic congestion in modern cities has become a growing worry for the residents. As presented in Baidu traffic report, the commuting stress index has reached surprising 1.973 in Beijing during rush hours, which results in longer trip time and increased vehicular queueing. Previous works have demonstrated that by reasonable scheduling, e.g, rebalancing bike-sharing systems and optimized bus transportation, the traffic efficiency could be significantly improved with little resource consumption. However, there are still two disadvantages that restrict their performance: (1) they only consider single scheduling in a short time, but ignoring the layout after first reposition, and (2) they only focus on the single transport. However, the multi-modal characteristics of urban public transportation are largely under-exploited. In this paper, we propose an efficient and economical multi-modal traffic scheduling scheme named JLRLS based on spatio -temporal prediction, which adopts reinforcement learning to obtain optimal long-term and joint schedule. In JLRLS, we combines multiple transportation to conduct scheduling by their own characteristics, which potentially helps the system to reach the optimal performance. Our implementation of an example by PaddlePaddle is available at https://github.com/bigdata-ustc/Long-term-Joint-Scheduling, with an explaining video at https://youtu.be/t5M2wVPhTyk. △ Less

Submitted 27 October, 2019; originally announced October 2019.

Comments: KDD Cup 2019 Special PaddlePaddle Award

arXiv:1905.12862 [pdf, other]

Explainable Fashion Recommendation: A Semantic Attribute Region Guided Approach

Authors: Min Hou, Le Wu, Enhong Chen, Zhi Li, Vincent W. Zheng, Qi Liu

Abstract: In fashion recommender systems, each product usually consists of multiple semantic attributes (e.g., sleeves, collar, etc). When making cloth decisions, people usually show preferences for different semantic attributes (e.g., the clothes with v-neck collar). Nevertheless, most previous fashion recommendation models comprehend the clothing images with a global content representation and lack detail… ▽ More In fashion recommender systems, each product usually consists of multiple semantic attributes (e.g., sleeves, collar, etc). When making cloth decisions, people usually show preferences for different semantic attributes (e.g., the clothes with v-neck collar). Nevertheless, most previous fashion recommendation models comprehend the clothing images with a global content representation and lack detailed understanding of users' semantic preferences, which usually leads to inferior recommendation performance. To bridge this gap, we propose a novel Semantic Attribute Explainable Recommender System (SAERS). Specifically, we first introduce a fine-grained interpretable semantic space. We then develop a Semantic Extraction Network (SEN) and Fine-grained Preferences Attention (FPA) module to project users and items into this space, respectively. With SAERS, we are capable of not only providing cloth recommendations for users, but also explaining the reason why we recommend the cloth through intuitive visual attribute semantic highlights in a personalized manner. Extensive experiments conducted on real-world datasets clearly demonstrate the effectiveness of our approach compared with the state-of-the-art methods. △ Less

Submitted 27 June, 2019; v1 submitted 30 May, 2019; originally announced May 2019.

Comments: Accepted to IJCAI2019

arXiv:1901.05320 [pdf, other]

Real-world Underwater Enhancement: Challenges, Benchmarks, and Solutions

Authors: Risheng Liu, Xin Fan, Ming Zhu, Minjun Hou, Zhongxuan Luo

Abstract: Underwater image enhancement is such an important low-level vision task with many applications that numerous algorithms have been proposed in recent years. These algorithms developed upon various assumptions demonstrate successes from various aspects using different data sets and different metrics. In this work, we setup an undersea image capturing system, and construct a large-scale Real-world Un… ▽ More Underwater image enhancement is such an important low-level vision task with many applications that numerous algorithms have been proposed in recent years. These algorithms developed upon various assumptions demonstrate successes from various aspects using different data sets and different metrics. In this work, we setup an undersea image capturing system, and construct a large-scale Real-world Underwater Image Enhancement (RUIE) data set divided into three subsets. The three subsets target at three challenging aspects for enhancement, i.e., image visibility quality, color casts, and higher-level detection/classification, respectively. We conduct extensive and systematic experiments on RUIE to evaluate the effectiveness and limitations of various algorithms to enhance visibility and correct color casts on images with hierarchical categories of degradation. Moreover, underwater image enhancement in practice usually serves as a preprocessing step for mid-level and high-level vision tasks. We thus exploit the object detection performance on enhanced images as a brand new task-specific evaluation criterion. The findings from these evaluations not only confirm what is commonly believed, but also suggest promising solutions and new directions for visibility enhancement, color correction, and object detection on real-world underwater images. △ Less

Submitted 6 March, 2019; v1 submitted 15 January, 2019; originally announced January 2019.

Comments: arXiv admin note: text overlap with arXiv:1712.04143 by other authors

arXiv:1810.13098 [pdf, other]

Low-Rank Embedding of Kernels in Convolutional Neural Networks under Random Shuffling

Authors: Chao Li, Zhun Sun, Jinshi Yu, Ming Hou, Qibin Zhao

Abstract: Although the convolutional neural networks (CNNs) have become popular for various image processing and computer vision task recently, it remains a challenging problem to reduce the storage cost of the parameters for resource-limited platforms. In the previous studies, tensor decomposition (TD) has achieved promising compression performance by embedding the kernel of a convolutional layer into a lo… ▽ More Although the convolutional neural networks (CNNs) have become popular for various image processing and computer vision task recently, it remains a challenging problem to reduce the storage cost of the parameters for resource-limited platforms. In the previous studies, tensor decomposition (TD) has achieved promising compression performance by embedding the kernel of a convolutional layer into a low-rank subspace. However the employment of TD is naively on the kernel or its specified variants. Unlike the conventional approaches, this paper shows that the kernel can be embedded into more general or even random low-rank subspaces. We demonstrate this by compressing the convolutional layers via randomly-shuffled tensor decomposition (RsTD) for a standard classification task using CIFAR-10. In addition, we analyze how the spatial similarity of the training data influences the low-rank structure of the kernels. The experimental results show that the CNN can be significantly compressed even if the kernels are randomly shuffled. Furthermore, the RsTD-based method yields more stable classification accuracy than the conventional TD-based methods in a large range of compression ratios. △ Less

Submitted 31 October, 2018; originally announced October 2018.

arXiv:1806.03237 [pdf]

A Wireless Multimedia Sensor Network Platform for Environmental Event Detection Dedicated to Precision Agriculture

Authors: Hongling Shi, Kun Mean Hou, Xunxing Diao, Liu Xing, Jian-Jin Li, Christophe De Vaulx

Abstract: Precision agriculture has been considered as a new technique to improve agricultural production and support sustainable development by preserving planet resource and minimizing pollution. By monitoring different parameters of interest in a cultivated field, wireless sensor network (WSN) enables real-time decision making with regard to issues such as management of water resources for irrigation, ch… ▽ More Precision agriculture has been considered as a new technique to improve agricultural production and support sustainable development by preserving planet resource and minimizing pollution. By monitoring different parameters of interest in a cultivated field, wireless sensor network (WSN) enables real-time decision making with regard to issues such as management of water resources for irrigation, choosing the optimum point for harvesting, estimating fertilizer requirements and predicting crop yield more accurately. In spite the tremendous advanced of scalar WSN in recent year, scalar WSN cannot meet all the requirements of ubiquitous intelligent environmental event detections because scalar data such as temperature, soil humidity, air humidity and light intensity are not rich enough to detect all the environmental events such as plant diseases and present of insects. Thus to fulfill those requirements multimedia data is needed. In this paper we present a robust multi-support and modular Wireless Multimedia Sensor Network (WMSN) platform, which is a type of wireless sensor network equipped with a low cost CCD camera. This WMSN platform may be used for diverse environmental event detections such as the presence of plant diseases and insects in precision agriculture applications. △ Less

Submitted 15 May, 2018; originally announced June 2018.

Journal ref: New and Smart Information Communication Science and Technology to Support Sustainable Development (NICST 2013), Sep 2013, Clermont-Ferrand, France

arXiv:1805.08493 [pdf, other]

Blind Predicting Similar Quality Map for Image Quality Assessment

Authors: Da Pan, Ping Shi, Ming Hou, Zefeng Ying, Sizhe Fu, Yuan Zhang

Abstract: A key problem in blind image quality assessment (BIQA) is how to effectively model the properties of human visual system in a data-driven manner. In this paper, we propose a simple and efficient BIQA model based on a novel framework which consists of a fully convolutional neural network (FCNN) and a pooling network to solve this problem. In principle, FCNN is capable of predicting a pixel-by-pixel… ▽ More A key problem in blind image quality assessment (BIQA) is how to effectively model the properties of human visual system in a data-driven manner. In this paper, we propose a simple and efficient BIQA model based on a novel framework which consists of a fully convolutional neural network (FCNN) and a pooling network to solve this problem. In principle, FCNN is capable of predicting a pixel-by-pixel similar quality map only from a distorted image by using the intermediate similarity maps derived from conventional full-reference image quality assessment methods. The predicted pixel-by-pixel quality maps have good consistency with the distortion correlations between the reference and distorted images. Finally, a deep pooling network regresses the quality map into a score. Experiments have demonstrated that our predictions outperform many state-of-the-art BIQA methods. △ Less

Submitted 10 March, 2019; v1 submitted 22 May, 2018; originally announced May 2018.

arXiv:1712.00732 [pdf, other]

doi 10.1145/3159652.3159666

SHINE: Signed Heterogeneous Information Network Embedding for Sentiment Link Prediction

Authors: Hongwei Wang, Fuzheng Zhang, Min Hou, Xing Xie, Minyi Guo, Qi Liu

Abstract: In online social networks people often express attitudes towards others, which forms massive sentiment links among users. Predicting the sign of sentiment links is a fundamental task in many areas such as personal advertising and public opinion analysis. Previous works mainly focus on textual sentiment classification, however, text information can only disclose the "tip of the iceberg" about users… ▽ More In online social networks people often express attitudes towards others, which forms massive sentiment links among users. Predicting the sign of sentiment links is a fundamental task in many areas such as personal advertising and public opinion analysis. Previous works mainly focus on textual sentiment classification, however, text information can only disclose the "tip of the iceberg" about users' true opinions, of which the most are unobserved but implied by other sources of information such as social relation and users' profile. To address this problem, in this paper we investigate how to predict possibly existing sentiment links in the presence of heterogeneous information. First, due to the lack of explicit sentiment links in mainstream social networks, we establish a labeled heterogeneous sentiment dataset which consists of users' sentiment relation, social relation and profile knowledge by entity-level sentiment extraction method. Then we propose a novel and flexible end-to-end Signed Heterogeneous Information Network Embedding (SHINE) framework to extract users' latent representations from heterogeneous networks and predict the sign of unobserved sentiment links. SHINE utilizes multiple deep autoencoders to map each user into a low-dimension feature space while preserving the network structure. We demonstrate the superiority of SHINE over state-of-the-art baselines on link prediction and node recommendation in two real-world datasets. The experimental results also prove the efficacy of SHINE in cold start scenario. △ Less

Submitted 3 December, 2017; originally announced December 2017.

Comments: The 11th ACM International Conference on Web Search and Data Mining (WSDM 2018)

arXiv:1711.08054 [pdf, other]

Generative Adversarial Positive-Unlabelled Learning

Authors: Ming Hou, Brahim Chaib-draa, Chao Li, Qibin Zhao

Abstract: In this work, we consider the task of classifying binary positive-unlabeled (PU) data. The existing discriminative learning based PU models attempt to seek an optimal reweighting strategy for U data, so that a decent decision boundary can be found. However, given limited P data, the conventional PU models tend to suffer from overfitting when adapted to very flexible deep neural networks. In contra… ▽ More In this work, we consider the task of classifying binary positive-unlabeled (PU) data. The existing discriminative learning based PU models attempt to seek an optimal reweighting strategy for U data, so that a decent decision boundary can be found. However, given limited P data, the conventional PU models tend to suffer from overfitting when adapted to very flexible deep neural networks. In contrast, we are the first to innovate a totally new paradigm to attack the binary PU task, from perspective of generative learning by leveraging the powerful generative adversarial networks (GAN). Our generative positive-unlabeled (GenPU) framework incorporates an array of discriminators and generators that are endowed with different roles in simultaneously producing positive and negative realistic samples. We provide theoretical analysis to justify that, at equilibrium, GenPU is capable of recovering both positive and negative data distributions. Moreover, we show GenPU is generalizable and closely related to the semi-supervised classification. Given rather limited P data, experiments on both synthetic and real-world dataset demonstrate the effectiveness of our proposed framework. With infinite realistic and diverse sample streams generated from GenPU, a very flexible classifier can then be trained using deep neural networks. △ Less

Submitted 4 April, 2018; v1 submitted 21 November, 2017; originally announced November 2017.

Comments: 8 pages

arXiv:1711.06787 [pdf, other]

Learning Aggregated Transmission Propagation Networks for Haze Removal and Beyond

Authors: Risheng Liu, Xin Fan, Minjun Hou, Zhiying Jiang, Zhongxuan Luo, Lei Zhang

Abstract: Single image dehazing is an important low-level vision task with many applications. Early researches have investigated different kinds of visual priors to address this problem. However, they may fail when their assumptions are not valid on specific images. Recent deep networks also achieve relatively good performance in this task. But unfortunately, due to the disappreciation of rich physical rule… ▽ More Single image dehazing is an important low-level vision task with many applications. Early researches have investigated different kinds of visual priors to address this problem. However, they may fail when their assumptions are not valid on specific images. Recent deep networks also achieve relatively good performance in this task. But unfortunately, due to the disappreciation of rich physical rules in hazes, large amounts of data are required for their training. More importantly, they may still fail when there exist completely different haze distributions in testing images. By considering the collaborations of these two perspectives, this paper designs a novel residual architecture to aggregate both prior (i.e., domain knowledge) and data (i.e., haze distribution) information to propagate transmissions for scene radiance estimation. We further present a variational energy based perspective to investigate the intrinsic propagation behavior of our aggregated deep model. In this way, we actually bridge the gap between prior driven models and data driven networks and leverage advantages but avoid limitations of previous dehazing approaches. A lightweight learning framework is proposed to train our propagation network. Finally, by introducing a taskaware image separation formulation with a flexible optimization scheme, we extend the proposed model for more challenging vision tasks, such as underwater image enhancement and single image rain removal. Experiments on both synthetic and realworld images demonstrate the effectiveness and efficiency of the proposed framework. △ Less

Submitted 31 July, 2018; v1 submitted 17 November, 2017; originally announced November 2017.

arXiv:1601.08027 [pdf]

TrAD: Traffic Adaptive Data Dissemination Protocol for Both Urban and Highway VANETs

Authors: Bin Tian, K. M. Hou, Jianjin Li

Abstract: Vehicular Ad hoc Networks (VANETs) aim to improve transportation activities that include traffic safety, transport efficiency and even infotainment on the wheels, in which a great number of traffic event-driven messages are needed to disseminate in a region of interest timely. However, due to the nature of VANETs, highly dynamic mobility and frequent disconnection, data dissemination faces great c… ▽ More Vehicular Ad hoc Networks (VANETs) aim to improve transportation activities that include traffic safety, transport efficiency and even infotainment on the wheels, in which a great number of traffic event-driven messages are needed to disseminate in a region of interest timely. However, due to the nature of VANETs, highly dynamic mobility and frequent disconnection, data dissemination faces great challenges. Inter-Vehicle Communication (IVC) protocols are the key technology to mitigate this issue. Therefore, we propose an infrastructure-less Traffic Adaptive data Dissemination (TrAD) protocol that considers road traffic and network traffic status for both highway and urban scenarios. TrAD is flexible to fit the irregular road topology and owns double broadcast suppression techniques. Three state-of-the-art IVC protocols have been compared with TrAD by means of realistic simulations. The performance of all protocols is quantitatively evaluated with different real city maps and traffic routes. Finally, TrAD gets an outstanding overall performance in terms of several metrics, even though under the worse condition of GPS drift. △ Less

Submitted 29 January, 2016; originally announced January 2016.

Comments: Accepted by the 30-th IEEE International Conference on Advanced Information Networking and Applications (AINA-2016)

Showing 1–50 of 50 results for author: Hou, M