Research Archive

Journal Article Open Access Cybersecurity

Security Scanning Integration in DevSecOps Pipelines: Comparative Effectiveness of SAST, DAST, SCA, and Container Image Scanning Across Vulnerability Classes

Automated security scanning has become an integral component of DevSecOps pipelines, yet practitioner selection of scanning tool categories is frequently driven by tool familiarity rather than empirical evidence of coverage effectiveness across vulnerability classes. This paper presents a controlled empirical evaluation of four scanning modalities — Static Application Security Testing (SAST), Dynamic Application Security Testing (DAST), Software Composition Analysis (SCA), and Container Image Scanning — across a benchmark corpus of 3,600 intentionally introduced vulnerabilities spanning 18 CWE categories in Python, Java, and Node.js applications. We find that no single scanning modality achieves greater than 58% coverage across all CWE categories, and that modality coverage profiles are largely complementary: SAST excels at injection and logic vulnerabilities, DAST at authentication and session management issues, SCA at known CVE-catalogued dependency vulnerabilities, and Container Scanning at OS-layer and configuration vulnerabilities. A pipeline implementing all four modalities achieves 84.3% aggregate vulnerability coverage. We introduce the Security Scanning Coverage Matrix (SSCM) as a decision tool for pipeline architects and evaluate 12 commercial and open-source tools — including Semgrep, SonarQube, OWASP ZAP, Snyk, and Trivy — against the matrix. We also analyze false positive rates and their impact on developer adoption, finding that false positive rates above 18% trigger systematic alert fatigue and scanning suppression.

Funmilayo Oladapo, Erik Strand, Yuki Yoshida, Pedro Neves· Nov 2023· 312 citations

Journal Article Open Access Human-Computer Interaction

Explainable AI Interfaces for Clinical Decision Support: Design Principles, Physician Trust Calibration, and Patient Safety Outcomes in Diagnostic Assistance Systems

Artificial intelligence diagnostic assistance systems are being deployed in clinical settings with the potential to improve diagnostic accuracy, yet poorly designed explanation interfaces risk creating overtrust, undertrust, and automation bias -- cognitive failure modes with direct patient safety consequences. This paper investigates explainable AI interface design for clinical decision support systems (CDSS) through a mixed-methods study combining a randomized controlled experiment (n=187 physicians across radiology, pathology, and emergency medicine specialties), eye-tracking analysis, and think-aloud protocol sessions. We evaluate four explanation modality conditions -- no explanation, confidence score only, feature attribution (SHAP values), and counterfactual explanation -- for their effects on diagnostic accuracy, trust calibration error, and decision time. Counterfactual explanations (presenting alternative diagnoses the AI would have made under modified input conditions) achieve the highest diagnostic accuracy improvement (9.4 percentage points above unaided baseline) and the lowest trust calibration error. Feature attribution (SHAP) explanations are most valued by physicians in think-aloud sessions but do not improve diagnostic accuracy for non-expert AI users due to feature space unfamiliarity. We develop the Clinical XAI Design Principles (CXDP) framework comprising 12 evidence-grounded interface design guidelines, and demonstrate their application in redesigning a commercial CDSS explanation interface with a 14-point improvement in physician trust calibration accuracy.

Adaeze Obi, Maja Bergstrom, Akiko Suzuki, Yasmin Khalil· Nov 2023· 334 citations

Journal Article Open Access Privacy Engineering

Privacy-Preserving Federated Analytics: Secure Aggregation Protocols, Homomorphic Encryption Integration, and Scalability Analysis for Cross-Organizational Data Collaboration

Privacy-preserving analytics across organizational data siloes -- enabling statistical insights from combined datasets without any party sharing raw data -- requires cryptographic protocols that provide formal privacy guarantees while remaining computationally feasible at the scale of real analytical workloads. This paper presents a systematic engineering evaluation of three privacy-preserving analytics approaches -- Secure Aggregation (SecAgg), Partial Homomorphic Encryption (PHE) using the Paillier cryptosystem, and Fully Homomorphic Encryption (FHE) using CKKS scheme in the Microsoft SEAL library -- for four representative analytical query types: count and sum aggregation, histogram construction, linear regression, and gradient-boosted tree inference. Experiments are conducted with up to 100 participating organizations on a WAN-simulated testbed, measuring query latency, communication overhead, and accuracy loss from approximation or noise addition. SecAgg achieves the best latency for aggregation queries (mean 1.4 seconds for 50-party sum) with no accuracy loss, but does not support non-linear computations. PHE supports linear regression at 50-party scale in 8.2 seconds with zero approximation error. FHE-CKKS enables approximate gradient tree inference at 50-party scale in 94 seconds, with 0.8 percent mean accuracy loss from CKKS approximation. We introduce the Privacy-Analytics Performance Index (PAPI) that aggregates latency, communication cost, accuracy retention, and implementation complexity into a single score, and provide a cryptographic protocol selection guide for 12 common multi-party analytics scenarios.

Nneka Obi, Lars Magnusson, Takuya Yoshida, Amira Hassan· Aug 2023· 267 citations

Journal Article Subscription Cloud Computing

FinOps and DevOps Convergence: Cost Observability, Cloud Waste Reduction, and Shared Financial Accountability in Engineering Organizations

As cloud infrastructure costs have become a dominant line item for technology organizations, the intersection of financial accountability and DevOps engineering practices — commonly termed FinOps — has emerged as a critical organizational capability. This paper examines how DevOps organizations can embed cost observability and financial accountability into their engineering workflows without impeding delivery velocity. We present findings from an embedded case study of three cloud-native organizations that implemented FinOps-DevOps integration programs over 18 months, supplemented by a survey of 284 engineering and finance professionals. Our analysis identifies three primary sources of cloud waste in DevOps environments: orphaned resources from automated provisioning with insufficient deprovisioning hooks, oversized baseline configurations inherited from legacy lift-and-shift migrations, and test environment sprawl from branching CI/CD strategies. We introduce the Cost-Aware Pipeline Model (CAPM), which embeds cost estimation gates, anomaly-flagging, and tagging-compliance checks directly into CI/CD pipelines, and demonstrates its deployment using AWS Cost Explorer APIs integrated with GitHub Actions and Terraform. Organizations implementing CAPM reduced monthly cloud spend by 28–41% without measurable impact on deployment frequency. We argue that FinOps represents the next frontier of DevOps maturity and propose a unified DevOps-FinOps capability model.

Adebayo Oladele, Katharina Weiss, Sun Mingzhu, Isabel Ferreira· Aug 2023· 289 citations

Journal Article Open Access Data Engineering

Large-Scale Graph Neural Networks for Fraud Detection in Financial Transaction Networks: Architecture Design, Sampling Strategies, and Real-Time Inference at Billion-Edge Scale

Financial transaction networks -- in which nodes represent accounts and edges represent transactions -- exhibit graph-structural fraud patterns (ring networks, layering chains, velocity clustering) that are invisible to transaction-level classifiers but detectable through graph neural network architectures that aggregate multi-hop neighborhood information. This paper presents FraudGNN, a production-oriented Graph Neural Network system for real-time fraud detection at billion-edge transaction graph scale, and reports its design, training, and deployment experience at a large payments processor. FraudGNN employs GraphSAGE with neighbor sampling for scalable inductive inference, incorporating temporal edge features, account behavioral embeddings, and network centrality features into a heterogeneous graph transformer architecture. Key engineering contributions include a mini-batch training pipeline supporting 8-billion-edge graphs on 64-GPU clusters using gradient checkpointing and heterogeneous graph partitioning, and a real-time inference serving architecture that delivers GNN predictions within 45ms P99 latency for payment authorization decisions. FraudGNN achieves 94.3% AUC on a held-out fraud detection benchmark, representing an 8.7 percentage point improvement over the XGBoost baseline. We characterize the graph data pipeline engineering challenges -- including temporal graph construction, feature freshness management, and graph store selection -- that represent the majority of production deployment effort. This paper provides the most complete engineering treatment of production-scale GNN fraud detection systems published to date.

Obiora Eze, Kristina Nilsson, Takashi Morita, Yasmin El-Masri· May 2023· 389 citations

Journal Article Open Access Software Engineering

DevOps Team Topologies in Practice: Cross-Functional Team Design, Cognitive Load Management, and Interaction Mode Evolution in Scaling Engineering Organizations

Team Topologies, as proposed by Skelton and Pais (2019), has rapidly become an influential framework for organizing software engineering teams in DevOps contexts, yet empirical evidence of its application outcomes and adaptation challenges in real organizations remains sparse. This paper presents a longitudinal empirical study of Team Topologies adoption across 18 organizations ranging from 50 to 3,400 engineers, tracking team structure, interaction mode adherence, and delivery performance over 18 months through quarterly assessments. We operationalize the four team types — Stream-Aligned, Platform, Enabling, and Complicated Subsystem — and three interaction modes — Collaboration, X-as-a-Service, and Facilitating — as measurable constructs and instrument their presence and quality through a validated survey instrument we term the Team Topology Adherence Index (TTAI). Our analysis finds that Stream-Aligned teams with clear X-as-a-Service dependencies on Platform teams exhibit the highest delivery performance, but that this configuration requires Platform team maturity as a prerequisite — organizations that adopt the topology before Platform teams achieve self-service capability experience a net negative performance effect for 6–9 months. We identify the Cognitive Load Threshold as a predictive indicator of team restructuring need, and find that proactive team splitting triggered by cognitive load measures outperforms reactive splitting triggered by delivery slowdown by an average of 4.2 months.

Abiodun Ojo, Malin Persson, Takeshi Ikeda, Filipa Santos· May 2023· 378 citations

Journal Article Open Access Robotics

Sim-to-Real Transfer in Robot Manipulation: Domain Randomization Strategies, Tactile Sensor Simulation, and Adaptive Policy Refinement for Dexterous Grasping of Novel Objects

Sim-to-real transfer -- the process of training robot control policies in simulation and deploying them on physical hardware -- offers the promise of unlimited safe training experience but is undermined by the reality gap: systematic discrepancies between simulated and physical dynamics that cause policies to fail on deployment. Dexterous manipulation is particularly sensitive to this gap due to its dependence on contact dynamics, friction, and object deformation that are notoriously difficult to simulate accurately. This paper presents a comprehensive study of sim-to-real transfer techniques for dexterous grasping, evaluating domain randomization strategies, tactile sensor simulation fidelity, and adaptive policy refinement methods on a 16-DOF robotic hand platform. We evaluate three domain randomization approaches -- uniform randomization, adaptive domain randomization (ADR), and automated domain randomization using Bayesian optimization -- across 48 novel object categories not present in training. ADR with tactile sensor simulation achieves 78.4 percent grasp success rate on novel objects, compared to 51.2 percent for uniform randomization without tactile sensing. A key finding is that tactile feedback simulation -- implemented through a GelSight sensor model integrated into the IsaacGym physics engine -- contributes more to sim-to-real transfer success than any single domain randomization parameter, improving novel object grasp success by 19.3 percentage points. Post-deployment adaptive policy refinement using 30 minutes of physical interaction data (DAgger-based) closes the remaining sim-to-real gap to within 4.1 percentage points of simulation performance.

Adunola Eze, Kristina Bergqvist, Hiroshi Suzuki, Nadia Mansour· Feb 2023· 312 citations

Journal Article Open Access Artificial Intelligence

MLOps: Operationalizing Machine Learning Pipelines Through DevOps Principles — Lifecycle Management, Drift Detection, and Governance Frameworks

The deployment of machine learning models into production systems introduces operational challenges that extend well beyond those encountered in traditional software delivery. MLOps — the application of DevOps principles to machine learning systems — has emerged as a discipline addressing model versioning, reproducibility, continuous training, drift detection, and governance at scale. This paper presents a systematic mapping study of 148 MLOps publications combined with practitioner case studies from five organizations operating large-scale ML systems in production. We propose the MLOps Lifecycle Reference Model (MLRM), which delineates eight lifecycle stages from data ingestion through model retirement, and maps DevOps practices to each stage with explicit articulation of how software delivery and ML-specific concerns intersect and diverge. A central contribution is our empirical evaluation of model drift detection strategies — including statistical process control, population stability index, and concept drift detectors — under real deployment conditions across tabular, NLP, and computer vision models. We find that concept drift is systematically underdetected by metric-only monitoring approaches in 78% of evaluated deployments. We also introduce an ML Governance Maturity Index (MGMI) and discuss how regulatory frameworks such as the EU AI Act interact with MLOps pipeline design. This paper provides the most comprehensive unified treatment of MLOps to date from an engineering lifecycle perspective.

Blessing Okwu, Annika Larsson, Hiroshi Yamamoto, Priya Chandrasekaran· Feb 2023· 618 citations