subscribe to arXiv mailings

arXiv:2407.19528 [pdf, other]

Motamot: A Dataset for Revealing the Supremacy of Large Language Models over Transformer Models in Bengali Political Sentiment Analysis

Authors: Fatema Tuj Johora Faria, Mukaffi Bin Moin, Rabeya Islam Mumu, Md Mahabubul Alam Abir, Abrar Nawar Alfy, Mohammad Shafiul Alam

Abstract: Sentiment analysis is the process of identifying and categorizing people's emotions or opinions regarding various topics. Analyzing political sentiment is critical for understanding the complexities of public opinion processes, especially during election seasons. It gives significant information on voter preferences, attitudes, and current trends. In this study, we investigate political sentiment… ▽ More Sentiment analysis is the process of identifying and categorizing people's emotions or opinions regarding various topics. Analyzing political sentiment is critical for understanding the complexities of public opinion processes, especially during election seasons. It gives significant information on voter preferences, attitudes, and current trends. In this study, we investigate political sentiment analysis during Bangladeshi elections, specifically examining how effectively Pre-trained Language Models (PLMs) and Large Language Models (LLMs) capture complex sentiment characteristics. Our study centers on the creation of the "Motamot" dataset, comprising 7,058 instances annotated with positive and negative sentiments, sourced from diverse online newspaper portals, forming a comprehensive resource for political sentiment analysis. We meticulously evaluate the performance of various PLMs including BanglaBERT, Bangla BERT Base, XLM-RoBERTa, mBERT, and sahajBERT, alongside LLMs such as Gemini 1.5 Pro and GPT 3.5 Turbo. Moreover, we explore zero-shot and few-shot learning strategies to enhance our understanding of political sentiment analysis methodologies. Our findings underscore BanglaBERT's commendable accuracy of 88.10% among PLMs. However, the exploration into LLMs reveals even more promising results. Through the adept application of Few-Shot learning techniques, Gemini 1.5 Pro achieves an impressive accuracy of 96.33%, surpassing the remarkable performance of GPT 3.5 Turbo, which stands at 94%. This underscores Gemini 1.5 Pro's status as the superior performer in this comparison. △ Less

Submitted 28 July, 2024; originally announced July 2024.

Comments: Accepted for publication in "The IEEE Region 10 Symposium (TENSYMP 2024)"

arXiv:2407.18387 [pdf, other]

SCALE: Self-regulated Clustered federAted LEarning in a Homogeneous Environment

Authors: Sai Puppala, Ismail Hossain, Md Jahangir Alam, Sajedul Talukder, Zahidur Talukder, Syed Bahauddin

Abstract: Federated Learning (FL) has emerged as a transformative approach for enabling distributed machine learning while preserving user privacy, yet it faces challenges like communication inefficiencies and reliance on centralized infrastructures, leading to increased latency and costs. This paper presents a novel FL methodology that overcomes these limitations by eliminating the dependency on edge serve… ▽ More Federated Learning (FL) has emerged as a transformative approach for enabling distributed machine learning while preserving user privacy, yet it faces challenges like communication inefficiencies and reliance on centralized infrastructures, leading to increased latency and costs. This paper presents a novel FL methodology that overcomes these limitations by eliminating the dependency on edge servers, employing a server-assisted Proximity Evaluation for dynamic cluster formation based on data similarity, performance indices, and geographical proximity. Our integrated approach enhances operational efficiency and scalability through a Hybrid Decentralized Aggregation Protocol, which merges local model training with peer-to-peer weight exchange and a centralized final aggregation managed by a dynamically elected driver node, significantly curtailing global communication overhead. Additionally, the methodology includes Decentralized Driver Selection, Check-pointing to reduce network traffic, and a Health Status Verification Mechanism for system robustness. Validated using the breast cancer dataset, our architecture not only demonstrates a nearly tenfold reduction in communication overhead but also shows remarkable improvements in reducing training latency and energy consumption while maintaining high learning performance, offering a scalable, efficient, and privacy-preserving solution for the future of federated learning ecosystems. △ Less

Submitted 25 July, 2024; originally announced July 2024.

Comments: This research article got accepted in COMPSAC conference and going to be published to IEEE

arXiv:2407.18358 [pdf, other]

Generative AI like ChatGPT in Blockchain Federated Learning: use cases, opportunities and future

Authors: Sai Puppala, Ismail Hossain, Md Jahangir Alam, Sajedul Talukder, Jannatul Ferdaus, Mahedi Hasan, Sameera Pisupati, Shanmukh Mathukumilli

Abstract: Federated learning has become a significant approach for training machine learning models using decentralized data without necessitating the sharing of this data. Recently, the incorporation of generative artificial intelligence (AI) methods has provided new possibilities for improving privacy, augmenting data, and customizing models. This research explores potential integrations of generative AI… ▽ More Federated learning has become a significant approach for training machine learning models using decentralized data without necessitating the sharing of this data. Recently, the incorporation of generative artificial intelligence (AI) methods has provided new possibilities for improving privacy, augmenting data, and customizing models. This research explores potential integrations of generative AI in federated learning, revealing various opportunities to enhance privacy, data efficiency, and model performance. It particularly emphasizes the importance of generative models like generative adversarial networks (GANs) and variational autoencoders (VAEs) in creating synthetic data that replicates the distribution of real data. Generating synthetic data helps federated learning address challenges related to limited data availability and supports robust model development. Additionally, we examine various applications of generative AI in federated learning that enable more personalized solutions. △ Less

Submitted 25 July, 2024; originally announced July 2024.

Comments: We are going to submit this research article into a conference which is best fit for this topic

arXiv:2407.18284 [pdf]

Physics-guided machine learning predicts the planet-scale performance of solar farms with sparse, heterogeneous, public data

Authors: Jabir Bin Jahangir, Muhammad Ashraful Alam

Abstract: The photovoltaics (PV) technology landscape is evolving rapidly. To predict the potential and scalability of emerging PV technologies, a global understanding of these systems' performance is essential. Traditionally, experimental and computational studies at large national research facilities have focused on PV performance in specific regional climates. However, synthesizing these regional studies… ▽ More The photovoltaics (PV) technology landscape is evolving rapidly. To predict the potential and scalability of emerging PV technologies, a global understanding of these systems' performance is essential. Traditionally, experimental and computational studies at large national research facilities have focused on PV performance in specific regional climates. However, synthesizing these regional studies to understand the worldwide performance potential has proven difficult. Given the expense of obtaining experimental data, the challenge of coordinating experiments at national labs across a politically-divided world, and the data-privacy concerns of large commercial operators, however, a fundamentally different, data-efficient approach is desired. Here, we present a physics-guided machine learning (PGML) scheme to demonstrate that: (a) The world can be divided into a few PV-specific climate zones, called PVZones, illustrating that the relevant meteorological conditions are shared across continents; (b) by exploiting the climatic similarities, high-quality monthly energy yield data from as few as five locations can accurately predict yearly energy yield potential with high spatial resolution and a root mean square error of less than 8 kWhm$^{2}$, and (c) even with noisy, heterogeneous public PV performance data, the global energy yield can be predicted with less than 6% relative error compared to physics-based simulations provided that the dataset is representative. This PGML scheme is agnostic to PV technology and farm topology, making it adaptable to new PV technologies or farm configurations. The results encourage physics-guided, data-driven collaboration among national policymakers and research organizations to build efficient decision support systems for accelerated PV qualification and deployment across the world. △ Less

Submitted 25 July, 2024; originally announced July 2024.

arXiv:2407.11997 [pdf, other]

HydroTrack: Spectroscopic Analysis Prototype Enabling Real-Time Hydration Monitoring in Wearables

Authors: Nazim A. Belabbaci, Mohammad Arif Ul Alam

Abstract: In the rapidly growing field of wearable technology, optical devices are emerging as a significant innovation, offering non-invasive methods for analyzing skin and underlying tissue properties. Despite their promise, progress has been slowed by a lack of specialized prototypes and advanced analysis techniques. Addressing this gap, our study introduces, HydroTrack, an 18-channel spectroscopy sensor… ▽ More In the rapidly growing field of wearable technology, optical devices are emerging as a significant innovation, offering non-invasive methods for analyzing skin and underlying tissue properties. Despite their promise, progress has been slowed by a lack of specialized prototypes and advanced analysis techniques. Addressing this gap, our study introduces, HydroTrack, an 18-channel spectroscopy sensor, ingeniously embedded in a smart-watch. Accompanying this hardware, we present signal processing and data analysis techniques implemented at the edge, designed to maximize the utility of our system in comprehensive health tracking. A pivotal application of our device is the real-time assessment of hydration levels in physically active individuals. We validated our prototype and analytical approach through experiments on six participants, focusing on hydration dynamics during physical exercises. Our findings reveal an accuracy of avg. 95% in determining hydration states. △ Less

Submitted 12 June, 2024; originally announced July 2024.

Journal ref: Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) 2024

arXiv:2407.09747 [pdf, other]

SocialRec: User Activity Based Post Weighted Dynamic Personalized Post Recommendation System in Social Media

Authors: Ismail Hossain, Sai Puppala, Md Jahangir Alam, Sajedul Talukder

Abstract: User activities can influence their subsequent interactions with a post, generating interest in the user. Typically, users interact with posts from friends by commenting and using reaction emojis, reflecting their level of interest on social media such as Facebook, Twitter, and Reddit. Our objective is to analyze user history over time, including their posts and engagement on various topics. Addit… ▽ More User activities can influence their subsequent interactions with a post, generating interest in the user. Typically, users interact with posts from friends by commenting and using reaction emojis, reflecting their level of interest on social media such as Facebook, Twitter, and Reddit. Our objective is to analyze user history over time, including their posts and engagement on various topics. Additionally, we take into account the user's profile, seeking connections between their activities and social media platforms. By integrating user history, engagement, and persona, we aim to assess recommendation scores based on relevant item sharing by Hit Rate (HR) and the quality of the ranking system by Normalized Discounted Cumulative Gain (NDCG), where we achieve the highest for NeuMF 0.80 and 0.6 respectively. Our hybrid approach solves the cold-start problem when there is a new user, for new items cold-start problem will never occur, as we consider the post category values. To improve the performance of the model during cold-start we introduce collaborative filtering by looking for similar users and ranking the users based on the highest similarity scores. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: This research paper has been accepted in the Social Media Sway: Unraveling the Impact of Social Media on Human Behavior - SMS workshop, to be held in conjunction with the International Conference on Social Networks Analysis and Mining (ASONAM 2024) and will be published in Springer

arXiv:2407.09691 [pdf, other]

EVOLVE: Predicting User Evolution and Network Dynamics in Social Media Using Fine-Tuned GPT-like Model

Authors: Ismail Hossain, Md Jahangir Alam, Sai Puppala, Sajedul Talukder

Abstract: Social media platforms are extensively used for sharing personal emotions, daily activities, and various life events, keeping people updated with the latest happenings. From the moment a user creates an account, they continually expand their network of friends or followers, freely interacting with others by posting, commenting, and sharing content. Over time, user behavior evolves based on demogra… ▽ More Social media platforms are extensively used for sharing personal emotions, daily activities, and various life events, keeping people updated with the latest happenings. From the moment a user creates an account, they continually expand their network of friends or followers, freely interacting with others by posting, commenting, and sharing content. Over time, user behavior evolves based on demographic attributes and the networks they establish. In this research, we propose a predictive method to understand how a user evolves on social media throughout their life and to forecast the next stage of their evolution. We fine-tune a GPT-like decoder-only model (we named it E-GPT: Evolution-GPT) to predict the future stages of a user's evolution in online social media. We evaluate the performance of these models and demonstrate how user attributes influence changes within their network by predicting future connections and shifts in user activities on social media, which also addresses other social media challenges such as recommendation systems. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: This article has been accepted as a long paper in the MSNDS 2024 workshop, to be held in conjunction with the International Conference on Social Networks Analysis and Mining (ASONAM 2024), September 2-5, 2024. and will be published in Springer

arXiv:2407.07315 [pdf, other]

CosmoCLIP: Generalizing Large Vision-Language Models for Astronomical Imaging

Authors: Raza Imam, Mohammed Talha Alam, Umaima Rahman, Mohsen Guizani, Fakhri Karray

Abstract: Existing vision-text contrastive learning models enhance representation transferability and support zero-shot prediction by matching paired image and caption embeddings while pushing unrelated pairs apart. However, astronomical image-label datasets are significantly smaller compared to general image and label datasets available from the internet. We introduce CosmoCLIP, an astronomical image-text… ▽ More Existing vision-text contrastive learning models enhance representation transferability and support zero-shot prediction by matching paired image and caption embeddings while pushing unrelated pairs apart. However, astronomical image-label datasets are significantly smaller compared to general image and label datasets available from the internet. We introduce CosmoCLIP, an astronomical image-text contrastive learning framework precisely fine-tuned on the pre-trained CLIP model using SpaceNet and BLIP-based captions. SpaceNet, attained via FLARE, constitutes ~13k optimally distributed images, while BLIP acts as a rich knowledge extractor. The rich semantics derived from this SpaceNet and BLIP descriptions, when learned contrastively, enable CosmoCLIP to achieve superior generalization across various in-domain and out-of-domain tasks. Our results demonstrate that CosmoCLIP is a straightforward yet powerful framework, significantly outperforming CLIP in zero-shot classification and image-text retrieval tasks. △ Less

Submitted 9 July, 2024; originally announced July 2024.

Comments: Accepted at SPAICE Conference, ECSAT, UK, 2024

arXiv:2407.06817 [pdf, other]

AstroSpy: On detecting Fake Images in Astronomy via Joint Image-Spectral Representations

Authors: Mohammed Talha Alam, Raza Imam, Mohsen Guizani, Fakhri Karray

Abstract: The prevalence of AI-generated imagery has raised concerns about the authenticity of astronomical images, especially with advanced text-to-image models like Stable Diffusion producing highly realistic synthetic samples. Existing detection methods, primarily based on convolutional neural networks (CNNs) or spectral analysis, have limitations when used independently. We present AstroSpy, a hybrid mo… ▽ More The prevalence of AI-generated imagery has raised concerns about the authenticity of astronomical images, especially with advanced text-to-image models like Stable Diffusion producing highly realistic synthetic samples. Existing detection methods, primarily based on convolutional neural networks (CNNs) or spectral analysis, have limitations when used independently. We present AstroSpy, a hybrid model that integrates both spectral and image features to distinguish real from synthetic astronomical images. Trained on a unique dataset of real NASA images and AI-generated fakes (approximately 18k samples), AstroSpy utilizes a dual-pathway architecture to fuse spatial and spectral information. This approach enables AstroSpy to achieve superior performance in identifying authentic astronomical images. Extensive evaluations demonstrate AstroSpy's effectiveness and robustness, significantly outperforming baseline models in both in-domain and cross-domain tasks, highlighting its potential to combat misinformation in astronomy. △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2407.02528 [pdf, other]

Actionable Cyber Threat Intelligence using Knowledge Graphs and Large Language Models

Authors: Romy Fieblinger, Md Tanvirul Alam, Nidhi Rastogi

Abstract: Cyber threats are constantly evolving. Extracting actionable insights from unstructured Cyber Threat Intelligence (CTI) data is essential to guide cybersecurity decisions. Increasingly, organizations like Microsoft, Trend Micro, and CrowdStrike are using generative AI to facilitate CTI extraction. This paper addresses the challenge of automating the extraction of actionable CTI using advancements… ▽ More Cyber threats are constantly evolving. Extracting actionable insights from unstructured Cyber Threat Intelligence (CTI) data is essential to guide cybersecurity decisions. Increasingly, organizations like Microsoft, Trend Micro, and CrowdStrike are using generative AI to facilitate CTI extraction. This paper addresses the challenge of automating the extraction of actionable CTI using advancements in Large Language Models (LLMs) and Knowledge Graphs (KGs). We explore the application of state-of-the-art open-source LLMs, including the Llama 2 series, Mistral 7B Instruct, and Zephyr for extracting meaningful triples from CTI texts. Our methodology evaluates techniques such as prompt engineering, the guidance framework, and fine-tuning to optimize information extraction and structuring. The extracted data is then utilized to construct a KG, offering a structured and queryable representation of threat intelligence. Experimental results demonstrate the effectiveness of our approach in extracting relevant information, with guidance and fine-tuning showing superior performance over prompt engineering. However, while our methods prove effective in small-scale tests, applying LLMs to large-scale data for KG construction and Link Prediction presents ongoing challenges. △ Less

Submitted 30 June, 2024; originally announced July 2024.

Comments: 6th Workshop on Attackers and Cyber-Crime Operations, 12 pages, 1 figure, 9 tables

arXiv:2406.16926 [pdf, other]

Enhancing Wearable based Real-Time Glucose Monitoring via Phasic Image Representation Learning based Deep Learning

Authors: Yidong Zhu, Nadia B Aimandi, Mohammad Arif Ul Alam

Abstract: In the U.S., over a third of adults are pre-diabetic, with 80\% unaware of their status. This underlines the need for better glucose monitoring to prevent type 2 diabetes and related heart diseases. Existing wearable glucose monitors are limited by the lack of models trained on small datasets, as collecting extensive glucose data is often costly and impractical. Our study introduces a novel machin… ▽ More In the U.S., over a third of adults are pre-diabetic, with 80\% unaware of their status. This underlines the need for better glucose monitoring to prevent type 2 diabetes and related heart diseases. Existing wearable glucose monitors are limited by the lack of models trained on small datasets, as collecting extensive glucose data is often costly and impractical. Our study introduces a novel machine learning method using modified recurrence plots in the frequency domain to improve glucose level prediction accuracy from wearable device data, even with limited datasets. This technique combines advanced signal processing with machine learning to extract more meaningful features. We tested our method against existing models using historical data, showing that our approach surpasses the current 87\% accuracy benchmark in predicting real-time interstitial glucose levels. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Journal ref: 46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) 2024

arXiv:2406.15527 [pdf, other]

Data Efficient Evaluation of Large Language Models and Text-to-Image Models via Adaptive Sampling

Authors: Cong Xu, Gayathri Saranathan, Mahammad Parwez Alam, Arpit Shah, James Lim, Soon Yee Wong, Foltin Martin, Suparna Bhattacharya

Abstract: Evaluating LLMs and text-to-image models is a computationally intensive task often overlooked. Efficient evaluation is crucial for understanding the diverse capabilities of these models and enabling comparisons across a growing number of new models and benchmarks. To address this, we introduce SubLIME, a data-efficient evaluation framework that employs adaptive sampling techniques, such as cluster… ▽ More Evaluating LLMs and text-to-image models is a computationally intensive task often overlooked. Efficient evaluation is crucial for understanding the diverse capabilities of these models and enabling comparisons across a growing number of new models and benchmarks. To address this, we introduce SubLIME, a data-efficient evaluation framework that employs adaptive sampling techniques, such as clustering and quality-based methods, to create representative subsets of benchmarks. Our approach ensures statistically aligned model rankings compared to full datasets, evidenced by high Pearson correlation coefficients. Empirical analysis across six NLP benchmarks reveals that: (1) quality-based sampling consistently achieves strong correlations (0.85 to 0.95) with full datasets at a 10\% sampling rate such as Quality SE and Quality CPD (2) clustering methods excel in specific benchmarks such as MMLU (3) no single method universally outperforms others across all metrics. Extending this framework, we leverage the HEIM leaderboard to cover 25 text-to-image models on 17 different benchmarks. SubLIME dynamically selects the optimal technique for each benchmark, significantly reducing evaluation costs while preserving ranking integrity and score distribution. Notably, a minimal sampling rate of 1% proves effective for benchmarks like MMLU. Additionally, we demonstrate that employing difficulty-based sampling to target more challenging benchmark segments enhances model differentiation with broader score distributions. We also combine semantic search, tool use, and GPT-4 review to identify redundancy across benchmarks within specific LLM categories, such as coding benchmarks. This allows us to further reduce the number of samples needed to maintain targeted rank preservation. Overall, SubLIME offers a versatile and cost-effective solution for the robust evaluation of LLMs and text-to-image models. △ Less

Submitted 21 June, 2024; originally announced June 2024.

arXiv:2406.13720 [pdf, other]

On the Utility of Domain-Adjacent Fine-Tuned Model Ensembles for Few-shot Problems

Authors: Md Ibrahim Ibne Alam, Parikshit Ram, Soham Dan, Horst Samulowitz, Koushik Kar

Abstract: Large Language Models (LLMs) have been observed to perform well on a wide range of downstream tasks when fine-tuned on domain-specific data. However, such data may not be readily available in many applications, motivating zero-shot or few-shot approaches using domain-adjacent models. While several fine-tuned models for various tasks are available, finding an appropriate domain-adjacent model for a… ▽ More Large Language Models (LLMs) have been observed to perform well on a wide range of downstream tasks when fine-tuned on domain-specific data. However, such data may not be readily available in many applications, motivating zero-shot or few-shot approaches using domain-adjacent models. While several fine-tuned models for various tasks are available, finding an appropriate domain-adjacent model for a given task is often not straight forward. In this paper, we study DAFT-E, a framework that utilizes an Ensemble of Domain-Adjacent Fine-Tuned Foundation Models for few-shot problems. We show that for zero-shot problems, this ensembling method provides an accuracy performance close to that of the single best model. With few-shot problems, this performance improves further, at which point DEFT-E can outperform any single domain-adjacent model while requiring much less data for domain-specific fine-tuning. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: Main paper is 8 pages, followed by limitations, references and appendix

arXiv:2406.07599 [pdf, other]

CTIBench: A Benchmark for Evaluating LLMs in Cyber Threat Intelligence

Authors: Md Tanvirul Alam, Dipkamal Bhusal, Le Nguyen, Nidhi Rastogi

Abstract: Cyber threat intelligence (CTI) is crucial in today's cybersecurity landscape, providing essential insights to understand and mitigate the ever-evolving cyber threats. The recent rise of Large Language Models (LLMs) have shown potential in this domain, but concerns about their reliability, accuracy, and hallucinations persist. While existing benchmarks provide general evaluations of LLMs, there ar… ▽ More Cyber threat intelligence (CTI) is crucial in today's cybersecurity landscape, providing essential insights to understand and mitigate the ever-evolving cyber threats. The recent rise of Large Language Models (LLMs) have shown potential in this domain, but concerns about their reliability, accuracy, and hallucinations persist. While existing benchmarks provide general evaluations of LLMs, there are no benchmarks that address the practical and applied aspects of CTI-specific tasks. To bridge this gap, we introduce CTIBench, a benchmark designed to assess LLMs' performance in CTI applications. CTIBench includes multiple datasets focused on evaluating knowledge acquired by LLMs in the cyber-threat landscape. Our evaluation of several state-of-the-art models on these tasks provides insights into their strengths and weaknesses in CTI contexts, contributing to a better understanding of LLM capabilities in CTI. △ Less

Submitted 24 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.00367 [pdf, other]

RoBERTa-BiLSTM: A Context-Aware Hybrid Model for Sentiment Analysis

Authors: Md. Mostafizer Rahman, Ariful Islam Shiplu, Yutaka Watanobe, Md. Ashad Alam

Abstract: Effectively analyzing the comments to uncover latent intentions holds immense value in making strategic decisions across various domains. However, several challenges hinder the process of sentiment analysis including the lexical diversity exhibited in comments, the presence of long dependencies within the text, encountering unknown symbols and words, and dealing with imbalanced datasets. Moreover,… ▽ More Effectively analyzing the comments to uncover latent intentions holds immense value in making strategic decisions across various domains. However, several challenges hinder the process of sentiment analysis including the lexical diversity exhibited in comments, the presence of long dependencies within the text, encountering unknown symbols and words, and dealing with imbalanced datasets. Moreover, existing sentiment analysis tasks mostly leveraged sequential models to encode the long dependent texts and it requires longer execution time as it processes the text sequentially. In contrast, the Transformer requires less execution time due to its parallel processing nature. In this work, we introduce a novel hybrid deep learning model, RoBERTa-BiLSTM, which combines the Robustly Optimized BERT Pretraining Approach (RoBERTa) with Bidirectional Long Short-Term Memory (BiLSTM) networks. RoBERTa is utilized to generate meaningful word embedding vectors, while BiLSTM effectively captures the contextual semantics of long-dependent texts. The RoBERTa-BiLSTM hybrid model leverages the strengths of both sequential and Transformer models to enhance performance in sentiment analysis. We conducted experiments using datasets from IMDb, Twitter US Airline, and Sentiment140 to evaluate the proposed model against existing state-of-the-art methods. Our experimental findings demonstrate that the RoBERTa-BiLSTM model surpasses baseline models (e.g., BERT, RoBERTa-base, RoBERTa-GRU, and RoBERTa-LSTM), achieving accuracies of 80.74%, 92.36%, and 82.25% on the Twitter US Airline, IMDb, and Sentiment140 datasets, respectively. Additionally, the model achieves F1-scores of 80.73%, 92.35%, and 82.25% on the same datasets, respectively. △ Less

Submitted 1 June, 2024; originally announced June 2024.

arXiv:2405.20441 [pdf, other]

SECURE: Benchmarking Generative Large Language Models for Cybersecurity Advisory

Authors: Dipkamal Bhusal, Md Tanvirul Alam, Le Nguyen, Ashim Mahara, Zachary Lightcap, Rodney Frazier, Romy Fieblinger, Grace Long Torales, Nidhi Rastogi

Abstract: Large Language Models (LLMs) have demonstrated potential in cybersecurity applications but have also caused lower confidence due to problems like hallucinations and a lack of truthfulness. Existing benchmarks provide general evaluations but do not sufficiently address the practical and applied aspects of LLM performance in cybersecurity-specific tasks. To address this gap, we introduce the SECURE… ▽ More Large Language Models (LLMs) have demonstrated potential in cybersecurity applications but have also caused lower confidence due to problems like hallucinations and a lack of truthfulness. Existing benchmarks provide general evaluations but do not sufficiently address the practical and applied aspects of LLM performance in cybersecurity-specific tasks. To address this gap, we introduce the SECURE (Security Extraction, Understanding \& Reasoning Evaluation), a benchmark designed to assess LLMs performance in realistic cybersecurity scenarios. SECURE includes six datasets focussed on the Industrial Control System sector to evaluate knowledge extraction, understanding, and reasoning based on industry-standard sources. Our study evaluates seven state-of-the-art models on these tasks, providing insights into their strengths and weaknesses in cybersecurity contexts, and offer recommendations for improving LLMs reliability as cyber advisory tools. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.16683 [pdf, other]

Toward Digitalization: A Secure Approach to Find a Missing Person Using Facial Recognition Technology

Authors: Abid Faisal Ayon, S M Maksudul Alam

Abstract: Facial Recognition is a technique, based on machine learning technology that can recognize a human being analyzing his facial profile, and is applied in solving various types of realworld problems nowadays. In this paper, a common real-world problem, finding a missing person has been solved in a secure and effective way with the help of facial recognition technology. Although there exist a few wor… ▽ More Facial Recognition is a technique, based on machine learning technology that can recognize a human being analyzing his facial profile, and is applied in solving various types of realworld problems nowadays. In this paper, a common real-world problem, finding a missing person has been solved in a secure and effective way with the help of facial recognition technology. Although there exist a few works on solving the problem, the proposed work is unique with respect to its security, design, and feasibility. Impeding intruders in participating in the processes and giving importance to both finders and family members of a missing person are two of the major features of this work. The proofs of the works of our system in finding a missing person have been described in the result section of the paper. The advantages that our system provides over the other existing systems can be realized from the comparisons, described in the result summary section of the paper. The work is capable of providing a worthy solution to find a missing person on the digital platform. △ Less

Submitted 26 May, 2024; originally announced May 2024.

arXiv:2405.13267 [pdf, other]

FLARE up your data: Diffusion-based Augmentation Method in Astronomical Imaging

Authors: Mohammed Talha Alam, Raza Imam, Mohsen Guizani, Fakhri Karray

Abstract: The intersection of Astronomy and AI encounters significant challenges related to issues such as noisy backgrounds, lower resolution (LR), and the intricate process of filtering and archiving images from advanced telescopes like the James Webb. Given the dispersion of raw images in feature space, we have proposed a \textit{two-stage augmentation framework} entitled as \textbf{FLARE} based on \unde… ▽ More The intersection of Astronomy and AI encounters significant challenges related to issues such as noisy backgrounds, lower resolution (LR), and the intricate process of filtering and archiving images from advanced telescopes like the James Webb. Given the dispersion of raw images in feature space, we have proposed a \textit{two-stage augmentation framework} entitled as \textbf{FLARE} based on \underline{f}eature \underline{l}earning and \underline{a}ugmented \underline{r}esolution \underline{e}nhancement. We first apply lower (LR) to higher resolution (HR) conversion followed by standard augmentations. Secondly, we integrate a diffusion approach to synthetically generate samples using class-concatenated prompts. By merging these two stages using weighted percentiles, we realign the feature space distribution, enabling a classification model to establish a distinct decision boundary and achieve superior generalization on various in-domain and out-of-domain tasks. We conducted experiments on several downstream cosmos datasets and on our optimally distributed \textbf{SpaceNet} dataset across 8-class fine-grained and 4-class macro classification tasks. FLARE attains the highest performance gain of 20.78\% for fine-grained tasks compared to similar baselines, while across different classification models, FLARE shows a consistent increment of an average of +15\%. This outcome underscores the effectiveness of the FLARE method in enhancing the precision of image classification, ultimately bolstering the reliability of astronomical research outcomes. % Our code and SpaceNet dataset will be released to the public soon. Our code and SpaceNet dataset is available at \href{https://github.com/Razaimam45/PlanetX_Dxb}{\textit{https://github.com/Razaimam45/PlanetX\_Dxb}}. △ Less

Submitted 21 May, 2024; originally announced May 2024.

Comments: 15 pages main paper (including references), 3 pages supplementary material. Our code and SpaceNet dataset is available at https://github.com/Razaimam45/PlanetX_Dxb

arXiv:2405.07332 [pdf, other]

PotatoGANs: Utilizing Generative Adversarial Networks, Instance Segmentation, and Explainable AI for Enhanced Potato Disease Identification and Classification

Authors: Mohammad Shafiul Alam, Fatema Tuj Johora Faria, Mukaffi Bin Moin, Ahmed Al Wase, Md. Rabius Sani, Khan Md Hasib

Abstract: Numerous applications have resulted from the automation of agricultural disease segmentation using deep learning techniques. However, when applied to new conditions, these applications frequently face the difficulty of overfitting, resulting in lower segmentation performance. In the context of potato farming, where diseases have a large influence on yields, it is critical for the agricultural econ… ▽ More Numerous applications have resulted from the automation of agricultural disease segmentation using deep learning techniques. However, when applied to new conditions, these applications frequently face the difficulty of overfitting, resulting in lower segmentation performance. In the context of potato farming, where diseases have a large influence on yields, it is critical for the agricultural economy to quickly and properly identify these diseases. Traditional data augmentation approaches, such as rotation, flip, and translation, have limitations and frequently fail to provide strong generalization results. To address these issues, our research employs a novel approach termed as PotatoGANs. In this novel data augmentation approach, two types of Generative Adversarial Networks (GANs) are utilized to generate synthetic potato disease images from healthy potato images. This approach not only expands the dataset but also adds variety, which helps to enhance model generalization. Using the Inception score as a measure, our experiments show the better quality and realisticness of the images created by PotatoGANs, emphasizing their capacity to resemble real disease images closely. The CycleGAN model outperforms the Pix2Pix GAN model in terms of image quality, as evidenced by its higher IS scores CycleGAN achieves higher Inception scores (IS) of 1.2001 and 1.0900 for black scurf and common scab, respectively. This synthetic data can significantly improve the training of large neural networks. It also reduces data collection costs while enhancing data diversity and generalization capabilities. Our work improves interpretability by combining three gradient-based Explainable AI algorithms (GradCAM, GradCAM++, and ScoreCAM) with three distinct CNN architectures (DenseNet169, Resnet152 V2, InceptionResNet V2) for potato disease classification. △ Less

Submitted 12 May, 2024; originally announced May 2024.

arXiv:2405.05999 [pdf, other]

LLMPot: Automated LLM-based Industrial Protocol and Physical Process Emulation for ICS Honeypots

Authors: Christoforos Vasilatos, Dunia J. Mahboobeh, Hithem Lamri, Manaar Alam, Michail Maniatakos

Abstract: Industrial Control Systems (ICS) are extensively used in critical infrastructures ensuring efficient, reliable, and continuous operations. However, their increasing connectivity and addition of advanced features make them vulnerable to cyber threats, potentially leading to severe disruptions in essential services. In this context, honeypots play a vital role by acting as decoy targets within ICS n… ▽ More Industrial Control Systems (ICS) are extensively used in critical infrastructures ensuring efficient, reliable, and continuous operations. However, their increasing connectivity and addition of advanced features make them vulnerable to cyber threats, potentially leading to severe disruptions in essential services. In this context, honeypots play a vital role by acting as decoy targets within ICS networks, or on the Internet, helping to detect, log, analyze, and develop mitigations for ICS-specific cyber threats. Deploying ICS honeypots, however, is challenging due to the necessity of accurately replicating industrial protocols and device characteristics, a crucial requirement for effectively mimicking the unique operational behavior of different industrial systems. Moreover, this challenge is compounded by the significant manual effort required in also mimicking the control logic the PLC would execute, in order to capture attacker traffic aiming to disrupt critical infrastructure operations. In this paper, we propose LLMPot, a novel approach for designing honeypots in ICS networks harnessing the potency of Large Language Models (LLMs). LLMPot aims to automate and optimize the creation of realistic honeypots with vendor-agnostic configurations, and for any control logic, aiming to eliminate the manual effort and specialized knowledge traditionally required in this domain. We conducted extensive experiments focusing on a wide array of parameters, demonstrating that our LLM-based approach can effectively create honeypot devices implementing different industrial protocols and diverse control logic. △ Less

Submitted 9 May, 2024; originally announced May 2024.

arXiv:2405.04610 [pdf, other]

Exploring Explainable AI Techniques for Improved Interpretability in Lung and Colon Cancer Classification

Authors: Mukaffi Bin Moin, Fatema Tuj Johora Faria, Swarnajit Saha, Busra Kamal Rafa, Mohammad Shafiul Alam

Abstract: Lung and colon cancer are serious worldwide health challenges that require early and precise identification to reduce mortality risks. However, diagnosis, which is mostly dependent on histopathologists' competence, presents difficulties and hazards when expertise is insufficient. While diagnostic methods like imaging and blood markers contribute to early detection, histopathology remains the gold… ▽ More Lung and colon cancer are serious worldwide health challenges that require early and precise identification to reduce mortality risks. However, diagnosis, which is mostly dependent on histopathologists' competence, presents difficulties and hazards when expertise is insufficient. While diagnostic methods like imaging and blood markers contribute to early detection, histopathology remains the gold standard, although time-consuming and vulnerable to inter-observer mistakes. Limited access to high-end technology further limits patients' ability to receive immediate medical care and diagnosis. Recent advances in deep learning have generated interest in its application to medical imaging analysis, specifically the use of histopathological images to diagnose lung and colon cancer. The goal of this investigation is to use and adapt existing pre-trained CNN-based models, such as Xception, DenseNet201, ResNet101, InceptionV3, DenseNet121, DenseNet169, ResNet152, and InceptionResNetV2, to enhance classification through better augmentation strategies. The results show tremendous progress, with all eight models reaching impressive accuracy ranging from 97% to 99%. Furthermore, attention visualization techniques such as GradCAM, GradCAM++, ScoreCAM, Faster Score-CAM, and LayerCAM, as well as Vanilla Saliency and SmoothGrad, are used to provide insights into the models' classification decisions, thereby improving interpretability and understanding of malignant and benign image classification. △ Less

Submitted 14 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

Comments: Accepted in 4th International Conference on Computing and Communication Networks (ICCCNet-2024)

arXiv:2405.02792 [pdf]

Jointly Learning Spatial, Angular, and Temporal Information for Enhanced Lane Detection

Authors: Muhammad Zeshan Alam

Abstract: This paper introduces a novel approach for enhanced lane detection by integrating spatial, angular, and temporal information through light field imaging and novel deep learning models. Utilizing lenslet-inspired 2D light field representations and LSTM networks, our method significantly improves lane detection in challenging conditions. We demonstrate the efficacy of this approach with modified CNN… ▽ More This paper introduces a novel approach for enhanced lane detection by integrating spatial, angular, and temporal information through light field imaging and novel deep learning models. Utilizing lenslet-inspired 2D light field representations and LSTM networks, our method significantly improves lane detection in challenging conditions. We demonstrate the efficacy of this approach with modified CNN architectures, showing superior per- formance over traditional methods. Our findings suggest this integrated data approach could advance lane detection technologies and inspire new models that leverage these multidimensional insights for autonomous vehicle percep- tion. △ Less

Submitted 4 May, 2024; originally announced May 2024.

Comments: 5 pages, 3 Figures , Accepted IEEE Conference on Signal Processing and Communications Applications

arXiv:2405.02787 [pdf]

Light Field Spatial Resolution Enhancement Framework

Authors: Javeria Shabbir, Muhammad Zeshan. Alam, M. Umair Mukati

Abstract: Light field (LF) imaging captures both angular and spatial light distributions, enabling advanced photographic techniques. However, micro-lens array (MLA)- based cameras face a spatial-angular resolution tradeoff due to a single shared sensor. We propose a novel light field framework for resolution enhancement, employing a modular approach. The first module generates a high-resolution, all-in-focu… ▽ More Light field (LF) imaging captures both angular and spatial light distributions, enabling advanced photographic techniques. However, micro-lens array (MLA)- based cameras face a spatial-angular resolution tradeoff due to a single shared sensor. We propose a novel light field framework for resolution enhancement, employing a modular approach. The first module generates a high-resolution, all-in-focus image. The second module, a texture transformer network, enhances the resolution of each light field perspective independently using the output of the first module as a reference image. The final module leverages light field regularity to jointly improve resolution across all LF image perspectives. Our approach demonstrates superior performance to existing methods in both qualitative and quantitative evaluations. △ Less

Submitted 4 May, 2024; originally announced May 2024.

Comments: 5 pages, 6 figures, accepted in IEEE Conference on Signal Processing and Communications Applications

arXiv:2405.01130 [pdf, other]

Automated Virtual Product Placement and Assessment in Images using Diffusion Models

Authors: Mohammad Mahmudul Alam, Negin Sokhandan, Emmett Goodman

Abstract: In Virtual Product Placement (VPP) applications, the discrete integration of specific brand products into images or videos has emerged as a challenging yet important task. This paper introduces a novel three-stage fully automated VPP system. In the first stage, a language-guided image segmentation model identifies optimal regions within images for product inpainting. In the second stage, Stable Di… ▽ More In Virtual Product Placement (VPP) applications, the discrete integration of specific brand products into images or videos has emerged as a challenging yet important task. This paper introduces a novel three-stage fully automated VPP system. In the first stage, a language-guided image segmentation model identifies optimal regions within images for product inpainting. In the second stage, Stable Diffusion (SD), fine-tuned with a few example product images, is used to inpaint the product into the previously identified candidate regions. The final stage introduces an "Alignment Module", which is designed to effectively sieve out low-quality images. Comprehensive experiments demonstrate that the Alignment Module ensures the presence of the intended product in every generated image and enhances the average quality of images by 35%. The results presented in this paper demonstrate the effectiveness of the proposed VPP system, which holds significant potential for transforming the landscape of virtual advertising and marketing strategies. △ Less

Submitted 2 May, 2024; originally announced May 2024.

Comments: Accepted at the 6th AI for Content Creation (AI4CC) workshop at CVPR 2024

arXiv:2404.16133 [pdf]

Quantitative Characterization of Retinal Features in Translated OCTA

Authors: Rashadul Hasan Badhon, Atalie Carina Thompson, Jennifer I. Lim, Theodore Leng, Minhaj Nur Alam

Abstract: Purpose: This study explores the feasibility of using generative machine learning (ML) to translate Optical Coherence Tomography (OCT) images into Optical Coherence Tomography Angiography (OCTA) images, potentially bypassing the need for specialized OCTA hardware. Methods: The method involved implementing a generative adversarial network framework that includes a 2D vascular segmentation model and… ▽ More Purpose: This study explores the feasibility of using generative machine learning (ML) to translate Optical Coherence Tomography (OCT) images into Optical Coherence Tomography Angiography (OCTA) images, potentially bypassing the need for specialized OCTA hardware. Methods: The method involved implementing a generative adversarial network framework that includes a 2D vascular segmentation model and a 2D OCTA image translation model. The study utilizes a public dataset of 500 patients, divided into subsets based on resolution and disease status, to validate the quality of TR-OCTA images. The validation employs several quality and quantitative metrics to compare the translated images with ground truth OCTAs (GT-OCTA). We then quantitatively characterize vascular features generated in TR-OCTAs with GT-OCTAs to assess the feasibility of using TR-OCTA for objective disease diagnosis. Result: TR-OCTAs showed high image quality in both 3 and 6 mm datasets (high-resolution, moderate structural similarity and contrast quality compared to GT-OCTAs). There were slight discrepancies in vascular metrics, especially in diseased patients. Blood vessel features like tortuosity and vessel perimeter index showed a better trend compared to density features which are affected by local vascular distortions. Conclusion: This study presents a promising solution to the limitations of OCTA adoption in clinical practice by using vascular features from TR-OCTA for disease detection. Translation relevance: This study has the potential to significantly enhance the diagnostic process for retinal diseases by making detailed vascular imaging more widely available and reducing dependency on costly OCTA equipment. △ Less

Submitted 24 April, 2024; originally announced April 2024.

Comments: The article has been revised and edited

arXiv:2404.10992 [pdf, other]

How to deal with glare for improved perception of Autonomous Vehicles

Authors: Muhammad Z. Alam, Zeeshan Kaleem, Sousso Kelouwani

Abstract: Vision sensors are versatile and can capture a wide range of visual cues, such as color, texture, shape, and depth. This versatility, along with the relatively inexpensive availability of machine vision cameras, played an important role in adopting vision-based environment perception systems in autonomous vehicles (AVs). However, vision-based perception systems can be easily affected by glare in t… ▽ More Vision sensors are versatile and can capture a wide range of visual cues, such as color, texture, shape, and depth. This versatility, along with the relatively inexpensive availability of machine vision cameras, played an important role in adopting vision-based environment perception systems in autonomous vehicles (AVs). However, vision-based perception systems can be easily affected by glare in the presence of a bright source of light, such as the sun or the headlights of the oncoming vehicle at night or simply by light reflecting off snow or ice-covered surfaces; scenarios encountered frequently during driving. In this paper, we investigate various glare reduction techniques, including the proposed saturated pixel-aware glare reduction technique for improved performance of the computer vision (CV) tasks employed by the perception layer of AVs. We evaluate these glare reduction methods based on various performance metrics of the CV algorithms used by the perception layer. Specifically, we considered object detection, object recognition, object tracking, depth estimation, and lane detection which are crucial for autonomous driving. The experimental findings validate the efficacy of the proposed glare reduction approach, showcasing enhanced performance across diverse perception tasks and remarkable resilience against varying levels of glare. △ Less

Submitted 16 April, 2024; originally announced April 2024.

Comments: 14 pages, 9 figures, Accepted IEEE TIV

arXiv:2404.10789 [pdf, other]

PASA: Attack Agnostic Unsupervised Adversarial Detection using Prediction & Attribution Sensitivity Analysis

Authors: Dipkamal Bhusal, Md Tanvirul Alam, Monish K. Veerabhadran, Michael Clifford, Sara Rampazzi, Nidhi Rastogi

Abstract: Deep neural networks for classification are vulnerable to adversarial attacks, where small perturbations to input samples lead to incorrect predictions. This susceptibility, combined with the black-box nature of such networks, limits their adoption in critical applications like autonomous driving. Feature-attribution-based explanation methods provide relevance of input features for model predictio… ▽ More Deep neural networks for classification are vulnerable to adversarial attacks, where small perturbations to input samples lead to incorrect predictions. This susceptibility, combined with the black-box nature of such networks, limits their adoption in critical applications like autonomous driving. Feature-attribution-based explanation methods provide relevance of input features for model predictions on input samples, thus explaining model decisions. However, we observe that both model predictions and feature attributions for input samples are sensitive to noise. We develop a practical method for this characteristic of model prediction and feature attribution to detect adversarial samples. Our method, PASA, requires the computation of two test statistics using model prediction and feature attribution and can reliably detect adversarial samples using thresholds learned from benign samples. We validate our lightweight approach by evaluating the performance of PASA on varying strengths of FGSM, PGD, BIM, and CW attacks on multiple image and non-image datasets. On average, we outperform state-of-the-art statistical unsupervised adversarial detectors on CIFAR-10 and ImageNet by 14\% and 35\% ROC-AUC scores, respectively. Moreover, our approach demonstrates competitive performance even when an adversary is aware of the defense mechanism. △ Less

Submitted 12 April, 2024; originally announced April 2024.

Comments: 9th IEEE European Symposium on Security and Privacy

arXiv:2404.07917 [pdf, other]

DesignQA: A Multimodal Benchmark for Evaluating Large Language Models' Understanding of Engineering Documentation

Authors: Anna C. Doris, Daniele Grandi, Ryan Tomich, Md Ferdous Alam, Hyunmin Cheong, Faez Ahmed

Abstract: This research introduces DesignQA, a novel benchmark aimed at evaluating the proficiency of multimodal large language models (MLLMs) in comprehending and applying engineering requirements in technical documentation. Developed with a focus on real-world engineering challenges, DesignQA uniquely combines multimodal data-including textual design requirements, CAD images, and engineering drawings-deri… ▽ More This research introduces DesignQA, a novel benchmark aimed at evaluating the proficiency of multimodal large language models (MLLMs) in comprehending and applying engineering requirements in technical documentation. Developed with a focus on real-world engineering challenges, DesignQA uniquely combines multimodal data-including textual design requirements, CAD images, and engineering drawings-derived from the Formula SAE student competition. Different from many existing MLLM benchmarks, DesignQA contains document-grounded visual questions where the input image and input document come from different sources. The benchmark features automatic evaluation metrics and is divided into segments-Rule Comprehension, Rule Compliance, and Rule Extraction-based on tasks that engineers perform when designing according to requirements. We evaluate state-of-the-art models like GPT4 and LLaVA against the benchmark, and our study uncovers the existing gaps in MLLMs' abilities to interpret complex engineering documentation. Key findings suggest that while MLLMs demonstrate potential in navigating technical documents, substantial limitations exist, particularly in accurately extracting and applying detailed requirements to engineering designs. This benchmark sets a foundation for future advancements in AI-supported engineering design processes. DesignQA is publicly available at: https://github.com/anniedoris/design_qa/. △ Less

Submitted 11 April, 2024; originally announced April 2024.

arXiv:2403.17978 [pdf, other]

Holographic Global Convolutional Networks for Long-Range Prediction Tasks in Malware Detection

Authors: Mohammad Mahmudul Alam, Edward Raff, Stella Biderman, Tim Oates, James Holt

Abstract: Malware detection is an interesting and valuable domain to work in because it has significant real-world impact and unique machine-learning challenges. We investigate existing long-range techniques and benchmarks and find that they're not very suitable in this problem area. In this paper, we introduce Holographic Global Convolutional Networks (HGConv) that utilize the properties of Holographic Red… ▽ More Malware detection is an interesting and valuable domain to work in because it has significant real-world impact and unique machine-learning challenges. We investigate existing long-range techniques and benchmarks and find that they're not very suitable in this problem area. In this paper, we introduce Holographic Global Convolutional Networks (HGConv) that utilize the properties of Holographic Reduced Representations (HRR) to encode and decode features from sequence elements. Unlike other global convolutional methods, our method does not require any intricate kernel computation or crafted kernel design. HGConv kernels are defined as simple parameters learned through backpropagation. The proposed method has achieved new SOTA results on Microsoft Malware Classification Challenge, Drebin, and EMBER malware benchmarks. With log-linear complexity in sequence length, the empirical results demonstrate substantially faster run-time by HGConv compared to other methods achieving far more efficient scaling even with sequence length $\geq 100,000$. △ Less

Submitted 23 March, 2024; originally announced March 2024.

Comments: To appear in Proceedings of the 27th International Conference on Artificial Intelligence and Statistics (AISTATS) 2024, Valencia, Spain

arXiv:2403.17218 [pdf, other]

A Comprehensive Study of the Capabilities of Large Language Models for Vulnerability Detection

Authors: Benjamin Steenhoek, Md Mahbubur Rahman, Monoshi Kumar Roy, Mirza Sanjida Alam, Earl T. Barr, Wei Le

Abstract: Large Language Models (LLMs) have demonstrated great potential for code generation and other software engineering tasks. Vulnerability detection is of crucial importance to maintaining the security, integrity, and trustworthiness of software systems. Precise vulnerability detection requires reasoning about the code, making it a good case study for exploring the limits of LLMs' reasoning capabiliti… ▽ More Large Language Models (LLMs) have demonstrated great potential for code generation and other software engineering tasks. Vulnerability detection is of crucial importance to maintaining the security, integrity, and trustworthiness of software systems. Precise vulnerability detection requires reasoning about the code, making it a good case study for exploring the limits of LLMs' reasoning capabilities. Although recent work has applied LLMs to vulnerability detection using generic prompting techniques, their full capabilities for this task and the types of errors they make when explaining identified vulnerabilities remain unclear. In this paper, we surveyed eleven LLMs that are state-of-the-art in code generation and commonly used as coding assistants, and evaluated their capabilities for vulnerability detection. We systematically searched for the best-performing prompts, incorporating techniques such as in-context learning and chain-of-thought, and proposed three of our own prompting methods. Our results show that while our prompting methods improved the models' performance, LLMs generally struggled with vulnerability detection. They reported 0.5-0.63 Balanced Accuracy and failed to distinguish between buggy and fixed versions of programs in 76% of cases on average. By comprehensively analyzing and categorizing 287 instances of model reasoning, we found that 57% of LLM responses contained errors, and the models frequently predicted incorrect locations of buggy code and misidentified bug types. LLMs only correctly localized 6 out of 27 bugs in DbgBench, and these 6 bugs were predicted correctly by 70-100% of human participants. These findings suggest that despite their potential for other tasks, LLMs may fail to properly comprehend critical code structures and security-related concepts. Our data and code are available at https://figshare.com/s/78fe02e56e09ec49300b. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.17093 [pdf, other]

Enhancing UAV Security Through Zero Trust Architecture: An Advanced Deep Learning and Explainable AI Analysis

Authors: Ekramul Haque, Kamrul Hasan, Imtiaz Ahmed, Md. Sahabul Alam, Tariqul Islam

Abstract: In the dynamic and ever-changing domain of Unmanned Aerial Vehicles (UAVs), the utmost importance lies in guaranteeing resilient and lucid security measures. This study highlights the necessity of implementing a Zero Trust Architecture (ZTA) to enhance the security of unmanned aerial vehicles (UAVs), hence departing from conventional perimeter defences that may expose vulnerabilities. The Zero Tru… ▽ More In the dynamic and ever-changing domain of Unmanned Aerial Vehicles (UAVs), the utmost importance lies in guaranteeing resilient and lucid security measures. This study highlights the necessity of implementing a Zero Trust Architecture (ZTA) to enhance the security of unmanned aerial vehicles (UAVs), hence departing from conventional perimeter defences that may expose vulnerabilities. The Zero Trust Architecture (ZTA) paradigm requires a rigorous and continuous process of authenticating all network entities and communications. The accuracy of our methodology in detecting and identifying unmanned aerial vehicles (UAVs) is 84.59\%. This is achieved by utilizing Radio Frequency (RF) signals within a Deep Learning framework, a unique method. Precise identification is crucial in Zero Trust Architecture (ZTA), as it determines network access. In addition, the use of eXplainable Artificial Intelligence (XAI) tools such as SHapley Additive exPlanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME) contributes to the improvement of the model's transparency and interpretability. Adherence to Zero Trust Architecture (ZTA) standards guarantees that the classifications of unmanned aerial vehicles (UAVs) are verifiable and comprehensible, enhancing security within the UAV field. △ Less

Submitted 25 March, 2024; originally announced March 2024.

Comments: 6 pages, 5 figures

arXiv:2403.15937 [pdf, other]

Model, Analyze, and Comprehend User Interactions and Various Attributes within a Social Media Platform

Authors: Md Kaykobad Reza, S M Maksudul Alam, Yiran Luo, Youzhe Liu

Abstract: How can we effectively model, analyze, and comprehend user interactions and various attributes within a social media platform based on post-comment relationship? In this study, we propose a novel graph-based approach to model and analyze user interactions within a social media platform based on post-comment relationship. We construct a user interaction graph from social media data and analyze it t… ▽ More How can we effectively model, analyze, and comprehend user interactions and various attributes within a social media platform based on post-comment relationship? In this study, we propose a novel graph-based approach to model and analyze user interactions within a social media platform based on post-comment relationship. We construct a user interaction graph from social media data and analyze it to gain insights into community dynamics, user behavior, and content preferences. Our investigation reveals that while 56.05% of the active users are strongly connected within the community, only 0.8% of them significantly contribute to its dynamics. Moreover, we observe temporal variations in community activity, with certain periods experiencing heightened engagement. Additionally, our findings highlight a correlation between user activity and popularity showing that more active users are generally more popular. Alongside these, a preference for positive and informative content is also observed where 82.41% users preferred positive and informative content. Overall, our study provides a comprehensive framework for understanding and managing online communities, leveraging graph-based techniques to gain valuable insights into user behavior and community dynamics. △ Less

Submitted 23 March, 2024; originally announced March 2024.

Comments: 9 Pages, 8 Figures, 3 Tables

arXiv:2403.15143 [pdf, other]

Modular Deep Active Learning Framework for Image Annotation: A Technical Report for the Ophthalmo-AI Project

Authors: Md Abdul Kadir, Hasan Md Tusfiqur Alam, Pascale Maul, Hans-Jürgen Profitlich, Moritz Wolf, Daniel Sonntag

Abstract: Image annotation is one of the most essential tasks for guaranteeing proper treatment for patients and tracking progress over the course of therapy in the field of medical imaging and disease diagnosis. However, manually annotating a lot of 2D and 3D imaging data can be extremely tedious. Deep Learning (DL) based segmentation algorithms have completely transformed this process and made it possible… ▽ More Image annotation is one of the most essential tasks for guaranteeing proper treatment for patients and tracking progress over the course of therapy in the field of medical imaging and disease diagnosis. However, manually annotating a lot of 2D and 3D imaging data can be extremely tedious. Deep Learning (DL) based segmentation algorithms have completely transformed this process and made it possible to automate image segmentation. By accurately segmenting medical images, these algorithms can greatly minimize the time and effort necessary for manual annotation. Additionally, by incorporating Active Learning (AL) methods, these segmentation algorithms can perform far more effectively with a smaller amount of ground truth data. We introduce MedDeepCyleAL, an end-to-end framework implementing the complete AL cycle. It provides researchers with the flexibility to choose the type of deep learning model they wish to employ and includes an annotation tool that supports the classification and segmentation of medical images. The user-friendly interface allows for easy alteration of the AL and DL model settings through a configuration file, requiring no prior programming experience. While MedDeepCyleAL can be applied to any kind of image data, we have specifically applied it to ophthalmology data in this project. △ Less

Submitted 22 March, 2024; originally announced March 2024.

Comments: DFKI Technical Report

arXiv:2403.01983 [pdf]

Language and Speech Technology for Central Kurdish Varieties

Authors: Sina Ahmadi, Daban Q. Jaff, Md Mahfuz Ibn Alam, Antonios Anastasopoulos

Abstract: Kurdish, an Indo-European language spoken by over 30 million speakers, is considered a dialect continuum and known for its diversity in language varieties. Previous studies addressing language and speech technology for Kurdish handle it in a monolithic way as a macro-language, resulting in disparities for dialects and varieties for which there are few resources and tools available. In this paper,… ▽ More Kurdish, an Indo-European language spoken by over 30 million speakers, is considered a dialect continuum and known for its diversity in language varieties. Previous studies addressing language and speech technology for Kurdish handle it in a monolithic way as a macro-language, resulting in disparities for dialects and varieties for which there are few resources and tools available. In this paper, we take a step towards developing resources for language and speech technology for varieties of Central Kurdish, creating a corpus by transcribing movies and TV series as an alternative to fieldwork. Additionally, we report the performance of machine translation, automatic speech recognition, and language identification as downstream tasks evaluated on Central Kurdish varieties. Data and models are publicly available under an open license at https://github.com/sinaahmadi/CORDI. △ Less

Submitted 4 March, 2024; originally announced March 2024.

Comments: Accepted to LREC-COLING 2024

arXiv:2402.11953 [pdf, other]

Stealing the Invisible: Unveiling Pre-Trained CNN Models through Adversarial Examples and Timing Side-Channels

Authors: Shubhi Shukla, Manaar Alam, Pabitra Mitra, Debdeep Mukhopadhyay

Abstract: Machine learning, with its myriad applications, has become an integral component of numerous technological systems. A common practice in this domain is the use of transfer learning, where a pre-trained model's architecture, readily available to the public, is fine-tuned to suit specific tasks. As Machine Learning as a Service (MLaaS) platforms increasingly use pre-trained models in their backends,… ▽ More Machine learning, with its myriad applications, has become an integral component of numerous technological systems. A common practice in this domain is the use of transfer learning, where a pre-trained model's architecture, readily available to the public, is fine-tuned to suit specific tasks. As Machine Learning as a Service (MLaaS) platforms increasingly use pre-trained models in their backends, it's crucial to safeguard these architectures and understand their vulnerabilities. In this work, we present an approach based on the observation that the classification patterns of adversarial images can be used as a means to steal the models. Furthermore, the adversarial image classifications in conjunction with timing side channels can lead to a model stealing method. Our approach, designed for typical user-level access in remote MLaaS environments exploits varying misclassifications of adversarial images across different models to fingerprint several renowned Convolutional Neural Network (CNN) and Vision Transformer (ViT) architectures. We utilize the profiling of remote model inference times to reduce the necessary adversarial images, subsequently decreasing the number of queries required. We have presented our results over 27 pre-trained models of different CNN and ViT architectures using CIFAR-10 dataset and demonstrate a high accuracy of 88.8% while keeping the query budget under 20. △ Less

Submitted 19 February, 2024; originally announced February 2024.

arXiv:2402.09795 [pdf, other]

doi 10.1016/j.inffus.2023.102004

An advanced data fabric architecture leveraging homomorphic encryption and federated learning

Authors: Sakib Anwar Rieyan, Md. Raisul Kabir News, A. B. M. Muntasir Rahman, Sadia Afrin Khan, Sultan Tasneem Jawad Zaarif, Md. Golam Rabiul Alam, Mohammad Mehedi Hassan, Michele Ianni, Giancarlo Fortino

Abstract: Data fabric is an automated and AI-driven data fusion approach to accomplish data management unification without moving data to a centralized location for solving complex data problems. In a Federated learning architecture, the global model is trained based on the learned parameters of several local models that eliminate the necessity of moving data to a centralized repository for machine learning… ▽ More Data fabric is an automated and AI-driven data fusion approach to accomplish data management unification without moving data to a centralized location for solving complex data problems. In a Federated learning architecture, the global model is trained based on the learned parameters of several local models that eliminate the necessity of moving data to a centralized repository for machine learning. This paper introduces a secure approach for medical image analysis using federated learning and partially homomorphic encryption within a distributed data fabric architecture. With this method, multiple parties can collaborate in training a machine-learning model without exchanging raw data but using the learned or fused features. The approach complies with laws and regulations such as HIPAA and GDPR, ensuring the privacy and security of the data. The study demonstrates the method's effectiveness through a case study on pituitary tumor classification, achieving a significant level of accuracy. However, the primary focus of the study is on the development and evaluation of federated learning and partially homomorphic encryption as tools for secure medical image analysis. The results highlight the potential of these techniques to be applied to other privacy-sensitive domains and contribute to the growing body of research on secure and privacy-preserving machine learning. △ Less

Submitted 15 February, 2024; originally announced February 2024.

Journal ref: Information Fusion, 102, 102004 (2024)

arXiv:2402.07263 [pdf]

Trade-off Between Spatial and Angular Resolution in Facial Recognition

Authors: Muhammad Zeshan Alam, Sousso kelowani, Mohamed Elsaeidy

Abstract: Ensuring robustness in face recognition systems across various challenging conditions is crucial for their versatility. State-of-the-art methods often incorporate additional information, such as depth, thermal, or angular data, to enhance performance. However, light field-based face recognition approaches that leverage angular information face computational limitations. This paper investigates the… ▽ More Ensuring robustness in face recognition systems across various challenging conditions is crucial for their versatility. State-of-the-art methods often incorporate additional information, such as depth, thermal, or angular data, to enhance performance. However, light field-based face recognition approaches that leverage angular information face computational limitations. This paper investigates the fundamental trade-off between spatio-angular resolution in light field representation to achieve improved face recognition performance. By utilizing macro-pixels with varying angular resolutions while maintaining the overall image size, we aim to quantify the impact of angular information at the expense of spatial resolution, while considering computational constraints. Our experimental results demonstrate a notable performance improvement in face recognition systems by increasing the angular resolution, up to a certain extent, at the cost of spatial resolution. △ Less

Submitted 11 February, 2024; originally announced February 2024.

Comments: 12 pages,5 figures,International Conference on Emerging Trends and Applications in Artificial Intelligence (ICETAI) [Accepted]

arXiv:2402.05122 [pdf]

History of generative Artificial Intelligence (AI) chatbots: past, present, and future development

Authors: Md. Al-Amin, Mohammad Shazed Ali, Abdus Salam, Arif Khan, Ashraf Ali, Ahsan Ullah, Md Nur Alam, Shamsul Kabir Chowdhury

Abstract: This research provides an in-depth comprehensive review of the progress of chatbot technology over time, from the initial basic systems relying on rules to today's advanced conversational bots powered by artificial intelligence. Spanning many decades, the paper explores the major milestones, innovations, and paradigm shifts that have driven the evolution of chatbots. Looking back at the very basic… ▽ More This research provides an in-depth comprehensive review of the progress of chatbot technology over time, from the initial basic systems relying on rules to today's advanced conversational bots powered by artificial intelligence. Spanning many decades, the paper explores the major milestones, innovations, and paradigm shifts that have driven the evolution of chatbots. Looking back at the very basic statistical model in 1906 via the early chatbots, such as ELIZA and ALICE in the 1960s and 1970s, the study traces key innovations leading to today's advanced conversational agents, such as ChatGPT and Google Bard. The study synthesizes insights from academic literature and industry sources to highlight crucial milestones, including the introduction of Turing tests, influential projects such as CALO, and recent transformer-based models. Tracing the path forward, the paper highlights how natural language processing and machine learning have been integrated into modern chatbots for more sophisticated capabilities. This chronological survey of the chatbot landscape provides a holistic reference to understand the technological and historical factors propelling conversational AI. By synthesizing learnings from this historical analysis, the research offers important context about the developmental trajectory of chatbots and their immense future potential across various field of application which could be the potential take ways for the respective research community and stakeholders. △ Less

Submitted 4 February, 2024; originally announced February 2024.

arXiv:2402.03417 [pdf, other]

A Computer Vision Based Approach for Stalking Detection Using a CNN-LSTM-MLP Hybrid Fusion Model

Authors: Murad Hasan, Shahriar Iqbal, Md. Billal Hossain Faisal, Md. Musnad Hossin Neloy, Md. Tonmoy Kabir, Md. Tanzim Reza, Md. Golam Rabiul Alam, Md Zia Uddin

Abstract: Criminal and suspicious activity detection has become a popular research topic in recent years. The rapid growth of computer vision technologies has had a crucial impact on solving this issue. However, physical stalking detection is still a less explored area despite the evolution of modern technology. Nowadays, stalking in public places has become a common occurrence with women being the most aff… ▽ More Criminal and suspicious activity detection has become a popular research topic in recent years. The rapid growth of computer vision technologies has had a crucial impact on solving this issue. However, physical stalking detection is still a less explored area despite the evolution of modern technology. Nowadays, stalking in public places has become a common occurrence with women being the most affected. Stalking is a visible action that usually occurs before any criminal activity begins as the stalker begins to follow, loiter, and stare at the victim before committing any criminal activity such as assault, kidnapping, rape, and so on. Therefore, it has become a necessity to detect stalking as all of these criminal activities can be stopped in the first place through stalking detection. In this research, we propose a novel deep learning-based hybrid fusion model to detect potential stalkers from a single video with a minimal number of frames. We extract multiple relevant features, such as facial landmarks, head pose estimation, and relative distance, as numerical values from video frames. This data is fed into a multilayer perceptron (MLP) to perform a classification task between a stalking and a non-stalking scenario. Simultaneously, the video frames are fed into a combination of convolutional and LSTM models to extract the spatio-temporal features. We use a fusion of these numerical and spatio-temporal features to build a classifier to detect stalking incidents. Additionally, we introduce a dataset consisting of stalking and non-stalking videos gathered from various feature films and television series, which is also used to train the model. The experimental results show the efficiency and dynamism of our proposed stalker detection system, achieving 89.58% testing accuracy with a significant improvement as compared to the state-of-the-art approaches. △ Less

Submitted 5 February, 2024; originally announced February 2024.

Comments: Under review for publication in the PLOS ONE journal, 17 pages, 9 figures

arXiv:2402.01945 [pdf, other]

A Case Study on Filtering for End-to-End Speech Translation

Authors: Md Mahfuz Ibn Alam, Antonios Anastasopoulos

Abstract: It is relatively easy to mine a large parallel corpus for any machine learning task, such as speech-to-text or speech-to-speech translation. Although these mined corpora are large in volume, their quality is questionable. This work shows that the simplest filtering technique can trim down these big, noisy datasets to a more manageable, clean dataset. We also show that using this clean dataset can… ▽ More It is relatively easy to mine a large parallel corpus for any machine learning task, such as speech-to-text or speech-to-speech translation. Although these mined corpora are large in volume, their quality is questionable. This work shows that the simplest filtering technique can trim down these big, noisy datasets to a more manageable, clean dataset. We also show that using this clean dataset can improve the model's performance, as in the case of the multilingual-to-English Speech Translation (ST) model, where, on average, we obtain a 4.65 BLEU score improvement. △ Less

Submitted 2 February, 2024; originally announced February 2024.

arXiv:2402.01939 [pdf, other]

A Morphologically-Aware Dictionary-based Data Augmentation Technique for Machine Translation of Under-Represented Languages

Authors: Md Mahfuz Ibn Alam, Sina Ahmadi, Antonios Anastasopoulos

Abstract: The availability of parallel texts is crucial to the performance of machine translation models. However, most of the world's languages face the predominant challenge of data scarcity. In this paper, we propose strategies to synthesize parallel data relying on morpho-syntactic information and using bilingual lexicons along with a small amount of seed parallel data. Our methodology adheres to a real… ▽ More The availability of parallel texts is crucial to the performance of machine translation models. However, most of the world's languages face the predominant challenge of data scarcity. In this paper, we propose strategies to synthesize parallel data relying on morpho-syntactic information and using bilingual lexicons along with a small amount of seed parallel data. Our methodology adheres to a realistic scenario backed by the small parallel seed data. It is linguistically informed, as it aims to create augmented data that is more likely to be grammatically correct. We analyze how our synthetic data can be combined with raw parallel data and demonstrate a consistent improvement in performance in our experiments on 14 languages (28 English <-> X pairs) ranging from well- to very low-resource ones. Our method leads to improvements even when using only five seed sentences and a bilingual lexicon. △ Less

Submitted 2 February, 2024; originally announced February 2024.

arXiv:2401.15008 [pdf, other]

Reinforcement Learning-based Relay Selection for Cooperative WSNs in the Presence of Bursty Impulsive Noise

Authors: Hazem Barka, Md Sahabul Alam, Georges Kaddoum, Minh Au, Basile L. Agba

Abstract: The problem of relay selection is pivotal in the realm of cooperative communication. However, this issue has not been thoroughly examined, particularly when the background noise is assumed to possess an impulsive characteristic with consistent memory as observed in smart grid communications and some other wireless communication scenarios. In this paper, we investigate the impact of this specific t… ▽ More The problem of relay selection is pivotal in the realm of cooperative communication. However, this issue has not been thoroughly examined, particularly when the background noise is assumed to possess an impulsive characteristic with consistent memory as observed in smart grid communications and some other wireless communication scenarios. In this paper, we investigate the impact of this specific type of noise on the performance of cooperative Wireless Sensor Networks (WSNs) with the Decode and Forward (DF) relaying scheme, considering Symbol-Error-Rate (SER) and battery power consumption fairness across all nodes as the performance metrics. We introduce two innovative relay selection methods that depend on noise state detection and the residual battery power of each relay. The first method encompasses the adaptation of the Max-Min criterion to this specific context, whereas the second employs Reinforcement Learning (RL) to surmount this challenge. Our empirical outcomes demonstrate that the impacts of bursty impulsive noise on the SER performance can be effectively mitigated and that a balance in battery power consumption among all nodes can be established using the proposed methods. △ Less

Submitted 26 January, 2024; originally announced January 2024.

Comments: Accepted in 2024 IEEE Wireless Communications and Networking Conference

arXiv:2401.13926 [pdf, other]

Iterative Methods in GPU-Resident Linear Solvers for Nonlinear Constrained Optimization

Authors: Kasia Świrydowicz, Nicholson Koukpaizan, Maksudul Alam, Shaked Regev, Michael Saunders, Slaven Peleš

Abstract: Linear solvers are major computational bottlenecks in a wide range of decision support and optimization computations. The challenges become even more pronounced on heterogeneous hardware, where traditional sparse numerical linear algebra methods are often inefficient. For example, methods for solving ill-conditioned linear systems have relied on conditional branching, which degrades performance on… ▽ More Linear solvers are major computational bottlenecks in a wide range of decision support and optimization computations. The challenges become even more pronounced on heterogeneous hardware, where traditional sparse numerical linear algebra methods are often inefficient. For example, methods for solving ill-conditioned linear systems have relied on conditional branching, which degrades performance on hardware accelerators such as graphical processing units (GPUs). To improve the efficiency of solving ill-conditioned systems, our computational strategy separates computations that are efficient on GPUs from those that need to run on traditional central processing units (CPUs). Our strategy maximizes the reuse of expensive CPU computations. Iterative methods, which thus far have not been broadly used for ill-conditioned linear systems, play an important role in our approach. In particular, we extend ideas from [1] to implement iterative refinement using inexact LU factors and flexible generalized minimal residual (FGMRES), with the aim of efficient performance on GPUs. We focus on solutions that are effective within broader application contexts, and discuss how early performance tests could be improved to be more predictive of the performance in a realistic environment △ Less

Submitted 24 January, 2024; originally announced January 2024.

Comments: 15 pages, 8 figures, 5 tables

MSC Class: 65F05; 65F10; 65F50; 65K10; 65Y05; 65Y10; 90C51

arXiv:2401.12790 [pdf, other]

MORPH: Towards Automated Concept Drift Adaptation for Malware Detection

Authors: Md Tanvirul Alam, Romy Fieblinger, Ashim Mahara, Nidhi Rastogi

Abstract: Concept drift is a significant challenge for malware detection, as the performance of trained machine learning models degrades over time, rendering them impractical. While prior research in malware concept drift adaptation has primarily focused on active learning, which involves selecting representative samples to update the model, self-training has emerged as a promising approach to mitigate conc… ▽ More Concept drift is a significant challenge for malware detection, as the performance of trained machine learning models degrades over time, rendering them impractical. While prior research in malware concept drift adaptation has primarily focused on active learning, which involves selecting representative samples to update the model, self-training has emerged as a promising approach to mitigate concept drift. Self-training involves retraining the model using pseudo labels to adapt to shifting data distributions. In this research, we propose MORPH -- an effective pseudo-label-based concept drift adaptation method specifically designed for neural networks. Through extensive experimental analysis of Android and Windows malware datasets, we demonstrate the efficacy of our approach in mitigating the impact of concept drift. Our method offers the advantage of reducing annotation efforts when combined with active learning. Furthermore, our method significantly improves over existing works in automated concept drift adaptation for malware detection. △ Less

Submitted 23 January, 2024; originally announced January 2024.

arXiv:2401.12344 [pdf, other]

OCT-SelfNet: A Self-Supervised Framework with Multi-Modal Datasets for Generalized and Robust Retinal Disease Detection

Authors: Fatema-E Jannat, Sina Gholami, Minhaj Nur Alam, Hamed Tabkhi

Abstract: Despite the revolutionary impact of AI and the development of locally trained algorithms, achieving widespread generalized learning from multi-modal data in medical AI remains a significant challenge. This gap hinders the practical deployment of scalable medical AI solutions. Addressing this challenge, our research contributes a self-supervised robust machine learning framework, OCT-SelfNet, for d… ▽ More Despite the revolutionary impact of AI and the development of locally trained algorithms, achieving widespread generalized learning from multi-modal data in medical AI remains a significant challenge. This gap hinders the practical deployment of scalable medical AI solutions. Addressing this challenge, our research contributes a self-supervised robust machine learning framework, OCT-SelfNet, for detecting eye diseases using optical coherence tomography (OCT) images. In this work, various data sets from various institutions are combined enabling a more comprehensive range of representation. Our method addresses the issue using a two-phase training approach that combines self-supervised pretraining and supervised fine-tuning with a mask autoencoder based on the SwinV2 backbone by providing a solution for real-world clinical deployment. Extensive experiments on three datasets with different encoder backbones, low data settings, unseen data settings, and the effect of augmentation show that our method outperforms the baseline model, Resnet-50 by consistently attaining AUC-ROC performance surpassing 77% across all tests, whereas the baseline model exceeds 54%. Moreover, in terms of the AUC-PR metric, our proposed method exceeded 42%, showcasing a substantial increase of at least 10% in performance compared to the baseline, which exceeded only 33%. This contributes to our understanding of our approach's potential and emphasizes its usefulness in clinical settings. △ Less

Submitted 22 January, 2024; originally announced January 2024.

Comments: 12 pages, 7 figures, 6 tables

arXiv:2401.12210 [pdf, other]

Connecting the Dots: Leveraging Spatio-Temporal Graph Neural Networks for Accurate Bangla Sign Language Recognition

Authors: Haz Sameen Shahgir, Khondker Salman Sayeed, Md Toki Tahmid, Tanjeem Azwad Zaman, Md. Zarif Ul Alam

Abstract: Recent advances in Deep Learning and Computer Vision have been successfully leveraged to serve marginalized communities in various contexts. One such area is Sign Language - a primary means of communication for the deaf community. However, so far, the bulk of research efforts and investments have gone into American Sign Language, and research activity into low-resource sign languages - especially… ▽ More Recent advances in Deep Learning and Computer Vision have been successfully leveraged to serve marginalized communities in various contexts. One such area is Sign Language - a primary means of communication for the deaf community. However, so far, the bulk of research efforts and investments have gone into American Sign Language, and research activity into low-resource sign languages - especially Bangla Sign Language - has lagged significantly. In this research paper, we present a new word-level Bangla Sign Language dataset - BdSL40 - consisting of 611 videos over 40 words, along with two different approaches: one with a 3D Convolutional Neural Network model and another with a novel Graph Neural Network approach for the classification of BdSL40 dataset. This is the first study on word-level BdSL recognition, and the dataset was transcribed from Indian Sign Language (ISL) using the Bangla Sign Language Dictionary (1997). The proposed GNN model achieved an F1 score of 89%. The study highlights the significant lexical and semantic similarity between BdSL, West Bengal Sign Language, and ISL, and the lack of word-level datasets for BdSL in the literature. We release the dataset and source code to stimulate further research. △ Less

Submitted 22 January, 2024; originally announced January 2024.

arXiv:2401.08141 [pdf, other]

IoTWarden: A Deep Reinforcement Learning Based Real-time Defense System to Mitigate Trigger-action IoT Attacks

Authors: Md Morshed Alam, Israt Jahan, Weichao Wang

Abstract: In trigger-action IoT platforms, IoT devices report event conditions to IoT hubs notifying their cyber states and let the hubs invoke actions in other IoT devices based on functional dependencies defined as rules in a rule engine. These functional dependencies create a chain of interactions that help automate network tasks. Adversaries exploit this chain to report fake event conditions to IoT hubs… ▽ More In trigger-action IoT platforms, IoT devices report event conditions to IoT hubs notifying their cyber states and let the hubs invoke actions in other IoT devices based on functional dependencies defined as rules in a rule engine. These functional dependencies create a chain of interactions that help automate network tasks. Adversaries exploit this chain to report fake event conditions to IoT hubs and perform remote injection attacks upon a smart environment to indirectly control targeted IoT devices. Existing defense efforts usually depend on static analysis over IoT apps to develop rule-based anomaly detection mechanisms. We also see ML-based defense mechanisms in the literature that harness physical event fingerprints to determine anomalies in an IoT network. However, these methods often demonstrate long response time and lack of adaptability when facing complicated attacks. In this paper, we propose to build a deep reinforcement learning based real-time defense system for injection attacks. We define the reward functions for defenders and implement a deep Q-network based approach to identify the optimal defense policy. Our experiments show that the proposed mechanism can effectively and accurately identify and defend against injection attacks with reasonable computation overhead. △ Less

Submitted 16 January, 2024; originally announced January 2024.

Comments: 2024 IEEE Wireless Communications and Networking Conference (WCNC 2024)

arXiv:2401.07343 [pdf, other]

Privacy-Preserving Intrusion Detection in Software-defined VANET using Federated Learning with BERT

Authors: Shakil Ibne Ahsan, Phil Legg, S M Iftekharul Alam

Abstract: The absence of robust security protocols renders the VANET (Vehicle ad-hoc Networks) network open to cyber threats by compromising passengers and road safety. Intrusion Detection Systems (IDS) are widely employed to detect network security threats. With vehicles' high mobility on the road and diverse environments, VANETs devise ever-changing network topologies, lack privacy and security, and have… ▽ More The absence of robust security protocols renders the VANET (Vehicle ad-hoc Networks) network open to cyber threats by compromising passengers and road safety. Intrusion Detection Systems (IDS) are widely employed to detect network security threats. With vehicles' high mobility on the road and diverse environments, VANETs devise ever-changing network topologies, lack privacy and security, and have limited bandwidth efficiency. The absence of privacy precautions, End-to-End Encryption methods, and Local Data Processing systems in VANET also present many privacy and security difficulties. So, assessing whether a novel real-time processing IDS approach can be utilized for this emerging technology is crucial. The present study introduces a novel approach for intrusion detection using Federated Learning (FL) capabilities in conjunction with the BERT model for sequence classification (FL-BERT). The significance of data privacy is duly recognized. According to FL methodology, each client has its own local model and dataset. They train their models locally and then send the model's weights to the server. After aggregation, the server aggregates the weights from all clients to update a global model. After aggregation, the global model's weights are shared with the clients. This practice guarantees the secure storage of sensitive raw data on individual clients' devices, effectively protecting privacy. After conducting the federated learning procedure, we assessed our models' performance using a separate test dataset. The FL-BERT technique has yielded promising results, opening avenues for further investigation in this particular area of research. We reached the result of our approaches by comparing existing research works and found that FL-BERT is more effective for privacy and security concerns. Our results suggest that FL-BERT is a promising technique for enhancing attack detection. △ Less

Submitted 17 January, 2024; v1 submitted 14 January, 2024; originally announced January 2024.

arXiv:2401.00289 [pdf]

ASL Champ!: A Virtual Reality Game with Deep-Learning Driven Sign Recognition

Authors: Md Shahinur Alam, Jason Lamberton, Jianye Wang, Carly Leannah, Sarah Miller, Joseph Palagano, Myles de Bastion, Heather L. Smith, Melissa Malzkuhn, Lorna C. Quandt

Abstract: We developed an American Sign Language (ASL) learning platform in a Virtual Reality (VR) environment to facilitate immersive interaction and real-time feedback for ASL learners. We describe the first game to use an interactive teaching style in which users learn from a fluent signing avatar and the first implementation of ASL sign recognition using deep learning within the VR environment. Advanced… ▽ More We developed an American Sign Language (ASL) learning platform in a Virtual Reality (VR) environment to facilitate immersive interaction and real-time feedback for ASL learners. We describe the first game to use an interactive teaching style in which users learn from a fluent signing avatar and the first implementation of ASL sign recognition using deep learning within the VR environment. Advanced motion-capture technology powers an expressive ASL teaching avatar within an immersive three-dimensional environment. The teacher demonstrates an ASL sign for an object, prompting the user to copy the sign. Upon the user's signing, a third-party plugin executes the sign recognition process alongside a deep learning model. Depending on the accuracy of a user's sign production, the avatar repeats the sign or introduces a new one. We gathered a 3D VR ASL dataset from fifteen diverse participants to power the sign recognition model. The proposed deep learning model's training, validation, and test accuracy are 90.12%, 89.37%, and 86.66%, respectively. The functional prototype can teach sign language vocabulary and be successfully adapted as an interactive ASL learning platform in VR. △ Less

Submitted 30 December, 2023; originally announced January 2024.

Comments: 36 pages, 9 figures

arXiv:2312.15310 [pdf, other]

Towards Generalization in Subitizing with Neuro-Symbolic Loss using Holographic Reduced Representations

Authors: Mohammad Mahmudul Alam, Edward Raff, Tim Oates

Abstract: While deep learning has enjoyed significant success in computer vision tasks over the past decade, many shortcomings still exist from a Cognitive Science (CogSci) perspective. In particular, the ability to subitize, i.e., quickly and accurately identify the small (less than 6) count of items, is not well learned by current Convolutional Neural Networks (CNNs) or Vision Transformers (ViTs) when usi… ▽ More While deep learning has enjoyed significant success in computer vision tasks over the past decade, many shortcomings still exist from a Cognitive Science (CogSci) perspective. In particular, the ability to subitize, i.e., quickly and accurately identify the small (less than 6) count of items, is not well learned by current Convolutional Neural Networks (CNNs) or Vision Transformers (ViTs) when using a standard cross-entropy (CE) loss. In this paper, we demonstrate that adapting tools used in CogSci research can improve the subitizing generalization of CNNs and ViTs by developing an alternative loss function using Holographic Reduced Representations (HRRs). We investigate how this neuro-symbolic approach to learning affects the subitizing capability of CNNs and ViTs, and so we focus on specially crafted problems that isolate generalization to specific aspects of subitizing. Via saliency maps and out-of-distribution performance, we are able to empirically observe that the proposed HRR loss improves subitizing generalization though it does not completely solve the problem. In addition, we find that ViTs perform considerably worse compared to CNNs in most respects on subitizing, except on one axis where an HRR-based loss provides improvement. △ Less

Submitted 23 December, 2023; originally announced December 2023.

Comments: Accepted in 38th Annual AAAI Workshop on Neuro-Symbolic Learning and Reasoning in the Era of Large Language Models (NuCLeaR), 2024

Showing 1–50 of 332 results for author: Alam, M