subscribe to arXiv mailings

Designing Secure AI-based Systems: a Multi-Vocal Literature Review

Authors: Simon Schneider, Ananya Saha, Emanuele Mezzi, Katja Tuma, Riccardo Scandariato

Abstract: AI-based systems leverage recent advances in the field of AI/ML by combining traditional software systems with AI components. Applications are increasingly being developed in this way. Software engineers can usually rely on a plethora of supporting information on how to use and implement any given technology. For AI-based systems, however, such information is scarce. Specifically, guidance on how… ▽ More AI-based systems leverage recent advances in the field of AI/ML by combining traditional software systems with AI components. Applications are increasingly being developed in this way. Software engineers can usually rely on a plethora of supporting information on how to use and implement any given technology. For AI-based systems, however, such information is scarce. Specifically, guidance on how to securely design the architecture is not available to the extent as for other systems. We present 16 architectural security guidelines for the design of AI-based systems that were curated via a multi-vocal literature review. The guidelines could support practitioners with actionable advice on the secure development of AI-based systems. Further, we mapped the guidelines to typical components of AI-based systems and observed a high coverage where 6 out of 8 generic components have at least one guideline associated to them. △ Less

Submitted 26 July, 2024; originally announced July 2024.

Comments: IEEE Secure Development Conference (SecDev)

arXiv:2407.14540 [pdf]

Risks of uncertainty propagation in Al-augmented security pipelines

Authors: Emanuele Mezzi, Aurora Papotti, Fabio Massacci, Katja Tuma

Abstract: The use of AI technologies is percolating into the secure development of software-based systems, with an increasing trend of composing AI-based subsystems (with uncertain levels of performance) into automated pipelines. This presents a fundamental research challenge and poses a serious threat to safety-critical domains (e.g., aviation). Despite the existing knowledge about uncertainty in risk anal… ▽ More The use of AI technologies is percolating into the secure development of software-based systems, with an increasing trend of composing AI-based subsystems (with uncertain levels of performance) into automated pipelines. This presents a fundamental research challenge and poses a serious threat to safety-critical domains (e.g., aviation). Despite the existing knowledge about uncertainty in risk analysis, no previous work has estimated the uncertainty of AI-augmented systems given the propagation of errors in the pipeline. We provide the formal underpinnings for capturing uncertainty propagation, develop a simulator to quantify uncertainty, and evaluate the simulation of propagating errors with two case studies. We discuss the generalizability of our approach and present policy implications and recommendations for aviation. Future work includes extending the approach and investigating the required metrics for validation in the aviation domain. △ Less

Submitted 14 July, 2024; originally announced July 2024.

arXiv:2407.02305 [pdf]

The Equality Maturity Model: an actionable tool to advance gender balance in leadership and participation roles

Authors: Paloma Díaz, Paula Alexandra Silva, Katja Tuma

Abstract: The underrepresentation of women in Computer Science and Engineering is a pervasive issue, impacting the enrolment and graduation rates of female students as well as the presence of women in leadership positions in academia and industry. The European Network For Gender Balance in Informatics (EUGAIN) COST action seeks to share data, experiences, best practices, and lessons from failures, and to pr… ▽ More The underrepresentation of women in Computer Science and Engineering is a pervasive issue, impacting the enrolment and graduation rates of female students as well as the presence of women in leadership positions in academia and industry. The European Network For Gender Balance in Informatics (EUGAIN) COST action seeks to share data, experiences, best practices, and lessons from failures, and to provide actionable tools that may contribute to the advancement of gender balance in the field. This paper summarises results from the Ph.D./Postdoc to Professor workgroup that were gathered in two booklets of best practices. Specifically, we introduce the Equality Maturity Model (EMM), a conceptual tool aimed at supporting organisations in measuring how they are doing concerning equality and identifying potential areas of improvement and that was inspired by both booklets. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: 10 pages, 2 figures

MSC Class: H.m MISCELLANEOUS ACM Class: H.m

arXiv:2403.09537 [pdf, other]

Analyzing and Mitigating (with LLMs) the Security Misconfigurations of Helm Charts from Artifact Hub

Authors: Francesco Minna, Fabio Massacci, Katja Tuma

Abstract: Background: Helm is a package manager that allows defining, installing, and upgrading applications with Kubernetes (K8s), a popular container orchestration platform. A Helm chart is a collection of files describing all dependencies, resources, and parameters required for deploying an application within a K8s cluster. Objective: The goal of this study is to mine and empirically evaluate the securit… ▽ More Background: Helm is a package manager that allows defining, installing, and upgrading applications with Kubernetes (K8s), a popular container orchestration platform. A Helm chart is a collection of files describing all dependencies, resources, and parameters required for deploying an application within a K8s cluster. Objective: The goal of this study is to mine and empirically evaluate the security of Helm charts, comparing the performance of existing tools in terms of misconfigurations reported by policies available by default, and measure to what extent LLMs could be used for removing misconfiguration. We also want to investigate whether there are false positives in both the LLM refactorings and the tool outputs. Method: We propose a pipeline to mine Helm charts from Artifact Hub, a popular centralized repository, and analyze them using state-of-the-art open-source tools, such as Checkov and KICS. First, such a pipeline will run several chart analyzers and identify the common and unique misconfigurations reported by each tool. Secondly, it will use LLMs to suggest mitigation for each misconfiguration. Finally, the chart refactoring previously generated will be analyzed again by the same tools to see whether it satisfies the tool's policies. At the same time, we will also perform a manual analysis on a subset of charts to evaluate whether there are false positive misconfigurations from the tool's reporting and in the LLM refactoring. △ Less

Submitted 14 March, 2024; originally announced March 2024.

Comments: MSR 2024 - Registered Reports

arXiv:2310.04097 [pdf, other]

Impact of Gender on the Evaluation of Security Decisions

Authors: Winnie Mbaka, Katja Tuma

Abstract: Security decisions are made by human analysts under uncertain conditions which leaves room for bias judgement. However, little is known about how demographics like gender and education impact these judgments. We conducted an empirical study to investigate their influence on security decision evaluations, addressing this knowledge gap. Security decisions are made by human analysts under uncertain conditions which leaves room for bias judgement. However, little is known about how demographics like gender and education impact these judgments. We conducted an empirical study to investigate their influence on security decision evaluations, addressing this knowledge gap. △ Less

Submitted 6 October, 2023; originally announced October 2023.

arXiv:2208.01895 [pdf, other]

The Role of Diversity in Cybersecurity Risk Analysis: An Experimental Plan

Authors: Katja Tuma, Romy Van Der Lee

Abstract: Cybersecurity threat and risk analysis (RA) approaches are used to identify and mitigate security risks early-on in the software development life-cycle. Existing approaches automate only parts of the analysis procedure, leaving key decisions in identification, feasibility and risk analysis, and quality assessment to be determined by expert judgement. Therefore, in practice teams of experts manuall… ▽ More Cybersecurity threat and risk analysis (RA) approaches are used to identify and mitigate security risks early-on in the software development life-cycle. Existing approaches automate only parts of the analysis procedure, leaving key decisions in identification, feasibility and risk analysis, and quality assessment to be determined by expert judgement. Therefore, in practice teams of experts manually analyze the system design by holding brainstorming workshops. Such decisions are made in face of uncertainties, leaving room for biased judgement (e.g., preferential treatment of category of experts). Biased decision making during the analysis may result in unequal contribution of expertise, particularly since some diversity dimensions (i.e., gender) are underrepresented in security teams. Beyond the work of risk perception of non-technical threats, no existing work has empirically studied the role of diversity in the risk analysis of technical artefacts. This paper proposes an experimental plan for identifying the key diversity factors in RA. △ Less

Submitted 3 August, 2022; originally announced August 2022.

arXiv:2208.01524 [pdf, other]

A replication of a controlled experiment with two STRIDE variants

Authors: Winnie Mbaka, Katja Tuma

Abstract: To avoid costly security patching after software deployment, security-by-design techniques (e.g., STRIDE threat analysis) are adopted in organizations to root out security issues before the system is ever implemented. Despite the global gap in cybersecurity workforce and the high manual effort required for performing threat analysis, organizations are ramping up threat analysis activities. However… ▽ More To avoid costly security patching after software deployment, security-by-design techniques (e.g., STRIDE threat analysis) are adopted in organizations to root out security issues before the system is ever implemented. Despite the global gap in cybersecurity workforce and the high manual effort required for performing threat analysis, organizations are ramping up threat analysis activities. However, past experimental results were inconclusive regarding some performance indicators of threat analysis techniques thus practitioners have little evidence for choosing the technique to adopt. To address this issue, we replicated a controlled experiment with STRIDE. Our study was aimed at measuring and comparing the performance indicators (productivity and precision) of two STRIDE variants (element and interaction). We conclude the paper by comparing our results to the original study. △ Less

Submitted 2 August, 2022; originally announced August 2022.

arXiv:2208.01512 [pdf, ps, other]

Human Aspect of Threat Analysis: A Replication

Authors: Katja Tuma, Winnie Mbaka

Abstract: Background: Organizations are experiencing an increasing demand for security-by-design activities (e.g., STRIDE analyses) which require a high manual effort. This situation is worsened by the current lack of diverse (and sufficient) security workforce and inconclusive results from past studies. To date, the deciding human factors (e.g., diversity dimensions) that play a role in threat analysis hav… ▽ More Background: Organizations are experiencing an increasing demand for security-by-design activities (e.g., STRIDE analyses) which require a high manual effort. This situation is worsened by the current lack of diverse (and sufficient) security workforce and inconclusive results from past studies. To date, the deciding human factors (e.g., diversity dimensions) that play a role in threat analysis have not been sufficiently explored. Objective: To address this issue, we plan to conduct a series of exploratory controlled experiments. The main objective is to empirically measure the human-aspects that play a role in threat analysis alongside the more well-known measures of analysis performance. Method: We design the experiments as a differentiated replication of past experiments with STRIDE. The replication design is aimed at capturing some similar measures (e.g., of outcome quality) and additional measures (e.g., diversity dimensions). We plan to conduct the experiments in an academic setting. Limitations: Obtaining a balanced population (e.g., wrt gender) in advanced computer science courses is not realistic. The experiments we plan to conduct with MSc level students will certainly suffer this limitation. △ Less

Submitted 2 August, 2022; originally announced August 2022.

arXiv:2205.14498 [pdf, other]

Towards a Security Stress-Test for Cloud Configurations

Authors: Francesco Minna, Fabio Massacci, Katja Tuma

Abstract: Securing cloud configurations is an elusive task, which is left up to system administrators who have to base their decisions on ``trial and error'' experimentations or by observing good practices (e.g., CIS Benchmarks). We propose a knowledge, AND/OR, graphs approach to model cloud deployment security objects and vulnerabilities. In this way, we can capture relationships between configurations, pe… ▽ More Securing cloud configurations is an elusive task, which is left up to system administrators who have to base their decisions on ``trial and error'' experimentations or by observing good practices (e.g., CIS Benchmarks). We propose a knowledge, AND/OR, graphs approach to model cloud deployment security objects and vulnerabilities. In this way, we can capture relationships between configurations, permissions (e.g., CAP\_SYS\_ADMIN), and security profiles (e.g., AppArmor and SecComp), as first-class citizens. Such an approach allows us to suggest alternative and safer configurations, support administrators in the study of what-if scenarios, and scale the analysis to large scale deployments. We present an initial validation and illustrate the approach with three real vulnerabilities from known sources. △ Less

Submitted 7 June, 2022; v1 submitted 28 May, 2022; originally announced May 2022.

Comments: Conference: The IEEE International Conference on Cloud Computing (CLOUD) 2022

arXiv:2108.08579 [pdf]

doi 10.1007/s10270-022-00991-5

Checking Security Compliance between Models and Code

Authors: Katja Tuma, Sven Peldszus, Daniel Strüber, Riccardo Scandariato, Jan Jürjens

Abstract: It is challenging to verify that the planned security mechanisms are actually implemented in the software. In the context of model-based development, the implemented security mechanisms must capture all intended security properties that were considered in the design models. Assuring this compliance manually is labor intensive and can be error-prone. This work introduces the first semi-automatic te… ▽ More It is challenging to verify that the planned security mechanisms are actually implemented in the software. In the context of model-based development, the implemented security mechanisms must capture all intended security properties that were considered in the design models. Assuring this compliance manually is labor intensive and can be error-prone. This work introduces the first semi-automatic technique for secure data flow compliance checks between design models and code. We develop heuristic-based automated mappings between a design-level model (SecDFD, provided by humans) and a code-level representation (Program Model, automatically extracted from the implementation) in order to guide users in discovering compliance violations, and hence potential security flaws in the code. These mappings enable an automated, and project-specific static analysis of the implementation with respect to the desired security properties of the design model. We developed two types of security compliance checks and evaluated the entire approach on open source Java projects. △ Less

Submitted 18 March, 2022; v1 submitted 19 August, 2021; originally announced August 2021.

arXiv:1910.03422 [pdf, other]

Finding Security Threats That Matter: An Industrial Case Study

Authors: Katja Tuma, Christian Sandberg, Urban Thorsson, Mathias Widman, Riccardo Scandariato

Abstract: Recent trends in the software engineering (i.e., Agile, DevOps) have shortened the development life-cycle limiting resources spent on security analysis of software designs. In this context, architecture models are (often manually) analyzed for potential security threats. Risk-last threat analysis suggests identifying all security threats before prioritizing them. In contrast, risk-first threat ana… ▽ More Recent trends in the software engineering (i.e., Agile, DevOps) have shortened the development life-cycle limiting resources spent on security analysis of software designs. In this context, architecture models are (often manually) analyzed for potential security threats. Risk-last threat analysis suggests identifying all security threats before prioritizing them. In contrast, risk-first threat analysis suggests identifying the risks before the threats, by-passing threat prioritization. This seems promising for organizations where developing speed is of great importance. Yet, little empirical evidence exists about the effect of sacrificing systematicity for high-priority threats on the performance and execution of threat analysis. To this aim, we conduct a case study with industrial experts from the automotive domain, where we empirically compare a risk-first technique to a risk-last technique. In this study, we consciously trade the amount of participants for a more realistic simulation of threat analysis sessions in practice. This allows us to closely observe industrial experts and gain deep insights into the industrial practice. This work contributes with: (i) a quantitative comparison of performance, (ii) a quantitative and qualitative comparison of execution, and (iii) a comparative discussion of the two techniques. We find no differences in the productivity and timeliness of discovering high-priority security threats. Yet, we find differences in analysis execution. In particular, participants using the risk-first technique found twice as many high-priority threats, developed detailed attack scenarios, and discussed threat feasibility in detail. On the other hand, participants using the risk-last technique found more medium and low-priority threats and finished early. △ Less

Submitted 8 October, 2019; originally announced October 2019.

arXiv:1906.01961 [pdf, other]

Inspection Guidelines to Identify Security Design Flaws

Authors: Katja Tuma, Danial Hosseini, Kyriakos Malamas, Riccardo Scandariato

Abstract: Recent trends in the software development practices (Agile, DevOps, CI) have shortened the development life-cycle causing the need for efficient security-by-design approaches. In this context, software architectures are analyzed for potential vulnerabilities and design flaws. Yet, design flaws are often documented with natural language and require a manual analysis, which is inefficient. Besides l… ▽ More Recent trends in the software development practices (Agile, DevOps, CI) have shortened the development life-cycle causing the need for efficient security-by-design approaches. In this context, software architectures are analyzed for potential vulnerabilities and design flaws. Yet, design flaws are often documented with natural language and require a manual analysis, which is inefficient. Besides low-level vulnerability databases (e.g., CWE, CAPEC) there is little systematized knowledge on security design flaws. The purpose of this work is to provide a catalog of security design flaws and to empirically evaluate the inspection guidelines for detecting security design flaws. To this aim, we present a catalog of 19 security design flaws and conduct empirical studies with master and doctoral students. This paper contributes with: (i) a catalog of security design flaws, (ii) an empirical evaluation of the inspection guidelines with master students, and (iii) a replicated evaluation with doctoral students. We also account for the shortcomings of the inspection guidelines and make suggestions for their improvement with respect to the generalization of guidelines, catalog re-organization, and format of documentation. We record similar precision, recall, and productivity in both empirical studies and discuss the potential for automating the security design flaw detection. △ Less

Submitted 5 June, 2019; originally announced June 2019.

Showing 1–12 of 12 results for author: Tuma, K