Research Archive

Journal Article Subscription Network Engineering

Programmable Networks at Scale: P4-Based Data Plane Programming, In-Network Computation, and Offloading Strategies for Stateful Network Functions in 400G Data Center Fabrics

Programmable data planes, enabled by the P4 domain-specific language and supporting hardware including Barefoot Tofino and Intel IPU platforms, have enabled a paradigm shift in network function implementation -- from software running on general-purpose servers to compiled programs executing at line rate within network switch ASICs. This paper presents a systematic investigation of in-network computation design patterns and their performance benefits for a representative set of stateful network functions in data center fabric environments. We design and evaluate P4 implementations of five network functions -- consistent hashing load balancer, stateful DDoS rate limiter, network telemetry collector with sketch-based traffic analytics, key-value cache for hot-object acceleration, and congestion control assistance with ECN marking -- on Barefoot Tofino and a software P4 emulation environment using the P4PI framework. In-network load balancing achieves 97 percent of hardware line rate throughput at 400 Gbps per port with sub-microsecond latency, versus 12 Gbps maximum throughput for the equivalent Linux kernel IPVS implementation. The in-network DDoS rate limiter enforces per-flow rate limits across 100,000 simultaneous flows with less than 0.3 percent false positive rate. We introduce the In-Network Offload Suitability Framework (INOSF) that characterizes which network function properties -- state size, update rate, operation complexity, and match-action fit -- determine feasibility and performance benefit of P4 offloading, providing data center network architects with principled offloading decision guidance.

Tochukwu Okafor, Lars Svensson, Kenji Inoue, Nour El-Din Benali· Jul 2021· 298 citations

Journal Article Open Access Software Engineering

Platform Engineering as an Organizational Capability: Internal Developer Platforms, Golden Paths, and Developer Experience Outcomes

As organizations scale their DevOps practices, the proliferation of tools, configurations, and operational responsibilities borne by individual development teams has created a phenomenon known as "cognitive overhead inflation" — a condition where the non-functional burden on developers impedes rather than enables delivery speed. Platform engineering has emerged as an organizational response to this challenge, centralizing infrastructure tooling into curated Internal Developer Platforms (IDPs) that provide development teams with self-service access to standardized golden paths. This paper presents a mixed-methods study of platform engineering adoption across 11 organizations, combining practitioner interviews (n=78) with pre/post quantitative measurement of developer experience (DevEx) metrics. We find that organizations with mature IDPs report 49% higher developer satisfaction scores, 41% reduction in environment provisioning time, and 34% decrease in on-call escalations attributable to infrastructure misconfiguration. We introduce the Platform Engineering Capability Model (PECM), which characterizes IDP maturity across five dimensions: Self-Service Coverage, Abstraction Quality, Observability Integration, Security Posture Automation, and Developer Feedback Loops. This work establishes platform engineering as a first-class organizational capability distinct from traditional DevOps tooling and provides empirically grounded design principles for IDP construction.

Olawale Adesanya, Miriam Schultz, Tadashi Inoue, Rebecca Fernandez-Cruz· Jul 2021· 376 citations

Journal Article Open Access Data Engineering

Lakehouse Architecture: Unifying Data Lake Flexibility and Data Warehouse Reliability Through Delta Lake, Apache Iceberg, and Apache Hudi Transaction Layers

The traditional separation of enterprise data platforms into analytical data warehouses and raw data lakes -- each optimized for different workload types and managed by distinct teams -- has created organizational and technical friction that impedes time-to-insight for analytical consumers. The Lakehouse architecture, which adds ACID transaction semantics, schema enforcement, and time travel capabilities to data lake storage through open table format layers, promises to unify these paradigms. This paper presents the first systematic academic evaluation of Lakehouse architectures, comparing three leading open table format implementations -- Delta Lake, Apache Iceberg, and Apache Hudi -- across six operational dimensions: ACID transaction correctness, concurrent write throughput, schema evolution flexibility, time travel query performance, storage efficiency, and ecosystem compatibility. Evaluations are conducted on a 50-node Spark cluster processing a 20TB synthetic dataset with real-world distribution characteristics derived from a financial institution data platform. Delta Lake achieves the highest concurrent write throughput (340 transactions/second) and strongest ecosystem compatibility. Iceberg demonstrates superior schema evolution flexibility and cross-engine portability. Hudi delivers the lowest storage overhead for change-heavy workloads through its record-level upsert optimization. We introduce the Lakehouse Platform Fitness Score (LPFS) and provide a selection framework based on workload mix, team expertise, and ecosystem lock-in tolerance.

Adaeze Okonjo, Erik Carlsson, Masahiro Fujita, Ana Lopes· Apr 2021· 436 citations

Journal Article Open Access Software Engineering

Zero-Downtime Deployment Architectures: Blue-Green, Rolling, and Canary Strategies Under Stateful Service Constraints

Zero-downtime deployment is a foundational requirement for high-availability systems, yet achieving it under real-world conditions — involving stateful services, database schema changes, distributed transactions, and session state management — is considerably more complex than the simplified presentations common in practitioner tooling documentation. This paper presents an empirical evaluation of three primary zero-downtime deployment patterns — Blue-Green, Rolling Update, and Canary — across stateful and stateless service categories, using a controlled experimental environment replicating production conditions at a mid-scale e-commerce platform. We measure six deployment outcome dimensions: user-perceived error rate during deployment, rollback latency, resource overhead, data consistency incident rate, deployment duration, and blast radius containment. Blue-Green deployments achieve the fastest rollback (mean 47 seconds) but incur the highest resource overhead (2× baseline). Rolling updates minimize resource overhead but exhibit the highest data consistency incident rate under concurrent schema migration scenarios. Canary deployments offer the best blast radius containment with moderate rollback speed, but require sophisticated traffic routing and observability instrumentation. We introduce a Deployment Pattern Selection Matrix that maps service statefulness, data migration complexity, rollback tolerance, and resource budget to optimal pattern selection. Real-world case evidence from three production deployments is used to validate the matrix.

Seun Adeyemo, Frida Carlsson, Kenji Ishida, Mariana Ferreira· Apr 2021· 369 citations

Journal Article Open Access Blockchain

Decentralized Finance Protocol Security: Formal Verification of Automated Market Maker Invariants, Flash Loan Attack Surfaces, and Governance Mechanism Vulnerabilities

Decentralized Finance (DeFi) protocols collectively managing hundreds of billions of dollars in on-chain value have suffered over 3.8 billion USD in losses to exploits between 2020 and 2022, with a significant proportion attributable to formally verifiable protocol invariant violations. This paper presents a formal verification framework for DeFi protocol security, applied to three core protocol categories: Automated Market Makers (AMMs), lending protocols, and governance systems. Using the K Framework for reachability logic verification and the Certora Prover for Solidity specification checking, we formalize and verify 34 safety properties across Uniswap V3 AMM invariants, Compound lending protocol solvency conditions, and Governor Bravo governance mechanism integrity. Formal verification identifies 7 previously undisclosed vulnerability classes, including a novel AMM sandwich attack surface arising from tick-boundary liquidity discontinuities and a governance quorum bypass exploitable through flash loan-amplified voting. We introduce the DeFi Protocol Security Score (DPSS), a composite metric aggregating formal property coverage, attack surface exposure, and economic incentive alignment, and apply it to rate 18 production DeFi protocols. We release formal specifications and verification toolchains as open-source artifacts to lower the barrier for security-rigorous DeFi protocol development.

Tunde Adesanya, Frida Lindberg, Takashi Okamoto, Mariam Khalil· Jan 2021· 509 citations

Journal Article Open Access Healthcare Informatics

DevOps in Regulated Industries: Reconciling Deployment Agility with Compliance Requirements in Healthcare IT Systems

Healthcare IT organizations face a unique tension: the operational benefits of DevOps demand deployment agility, while regulatory frameworks such as HIPAA, FDA 21 CFR Part 11, and SOX impose stringent change management, audit trail, and validation requirements that are difficult to reconcile with continuous delivery practices. This paper presents a systematic examination of this tension through a combination of regulatory analysis and a cross-sectional survey of 198 healthcare IT practitioners. We identify 23 specific regulatory requirements that conflict with or require adaptation of standard DevOps practices, and categorize them into four conflict types: Change Velocity Conflicts, Evidence Integrity Conflicts, Environment Separation Conflicts, and Accountability Attribution Conflicts. We then evaluate four regulatory-DevOps reconciliation strategies — Compliance-as-Code, Immutable Audit Pipelines, Policy-Gated Deployment Gates, and Automated Validation Evidence Generation — through case evidence from three healthcare organizations that have achieved both compliance and DevOps maturity. Our analysis demonstrates that all four conflict types are addressable through thoughtful toolchain design, and that compliance instrumentation can be largely automated without sacrificing delivery speed. We provide a compliance-aware DevOps implementation blueprint validated against the identified regulatory requirements.

Adaeze Obi, Patrick Steinmann, Lisa Johansson, Manish Gupta· Jan 2021· 412 citations

Journal Article Open Access Edge Computing

Edge-Cloud Continuum Computing: Task Offloading Optimization, Latency-Aware Scheduling, and Mobility-Driven Workload Migration in 5G-Enabled Mobile Edge Environments

Mobile Edge Computing (MEC) in 5G-enabled environments introduces a computational continuum spanning ultra-low-latency edge nodes, regional fog nodes, and centralized cloud data centers, enabling latency-sensitive applications -- including augmented reality, industrial automation, and autonomous vehicle coordination -- that cannot tolerate cloud-only round-trip latencies. Optimal task placement across this continuum requires dynamic offloading decisions that account for task computational requirements, data transfer costs, edge node capacity, user mobility patterns, and SLA constraints simultaneously. This paper presents EdgeOpt, a multi-objective task offloading optimization framework for 5G MEC environments that employs a Deep Q-Network (DQN) agent trained to balance execution latency, energy consumption, and edge resource utilization in real-time. EdgeOpt is evaluated in a 5G MEC testbed comprising three edge nodes, one fog aggregation layer, and simulated cloud infrastructure, processing workloads representative of AR rendering, industrial sensor fusion, and V2X communication scenarios. EdgeOpt achieves 38% lower mean task execution latency and 27% lower edge energy consumption compared to greedy offloading baselines, while maintaining edge utilization above 78% under high mobility scenarios. We characterize the mobility-induced workload migration problem and introduce the Mobility-Aware Migration Cost Model (MAMCM) to quantify handover-induced service disruption risk. This work provides architectural and algorithmic foundations for latency-optimized edge-cloud continuum orchestration.

Adekunle Fashola, Emma Svensson, Yusuke Watanabe, Heba Mansour· Oct 2020· 318 citations

Journal Article Subscription Software Engineering

Value Stream Mapping for DevOps: Identifying and Eliminating Waste in Software Delivery Pipelines Using Lean Principles

Value Stream Mapping (VSM), a lean manufacturing technique, has been increasingly advocated as a tool for visualizing and optimizing software delivery pipelines in DevOps contexts. However, empirical evidence on its effectiveness and practical application nuances in software organizations remains sparse. This paper reports an action research study conducted across two organizations — one in telecommunications and one in retail banking — in which cross-functional teams applied VSM to their end-to-end software delivery processes over a 12-month period. We adapt traditional VSM notation to account for software-specific waste categories: unplanned work, context switching, approval bottlenecks, environment contention, and test instability. Our findings reveal that approval bottlenecks and environment contention account for 58% of total lead time waste across both organizations. Following VSM-guided interventions, the telecommunications organization reduced pipeline lead time from 34 days to 9 days, while the banking organization reduced its from 48 days to 14 days. We derive eight VSM adaptation principles for software delivery contexts and propose a Digital VSM notation standard compatible with DevOps toolchain data extraction. This work demonstrates that lean thinking remains powerfully applicable in digital delivery contexts when appropriately adapted.

Chioma Ezenwachi, Henrik Lindqvist, Rahul Bose, Theresa MacGregor· Oct 2020· 303 citations

Journal Article Subscription Cloud Computing

Multi-Cloud DevOps: Portability Architectures, Vendor Lock-in Mitigation Strategies, and Operational Complexity in Heterogeneous Cloud Environments

As organizations distribute workloads across multiple cloud providers to optimize cost, latency, regulatory compliance, and vendor risk, their DevOps pipelines must increasingly operate across heterogeneous cloud environments. Multi-cloud DevOps introduces significant engineering challenges: toolchain fragmentation, authentication model divergence, network topology complexity, and observability aggregation across provider-siloed data streams. This paper presents the first systematic empirical study of multi-cloud DevOps, examining 11 organizations operating production workloads across two or more major cloud providers. Through interviews (n=67), pipeline architecture analysis, and an industry survey (n=412), we identify and characterize three multi-cloud DevOps architectural patterns — Cloud-Agnostic Abstraction, Provider-Native Federation, and Workload Partitioning — and evaluate their trade-offs across portability, operational complexity, and performance dimensions. We find that the Cloud-Agnostic Abstraction pattern, typically implemented through Terraform with provider-agnostic modules and cloud-neutral container orchestration, achieves the highest portability score but incurs a 34% higher operational complexity rating than single-cloud environments. We introduce the Multi-Cloud Operational Overhead Index (MOOI) and provide a decision framework for selecting multi-cloud architecture patterns based on organizational maturity, compliance requirements, and engineering capacity.

Adaeze Okonkwu, Marcus Lindblom, Akihiro Watanabe, Clara Rodrigues· Jul 2020· 298 citations