Programmable Networks at Scale: P4-Based Data Plane Programming, In-Network Computation, and Offloading Strategies for Stateful Network Functions in 400G Data Center Fabrics
Programmable data planes, enabled by the P4 domain-specific language and supporting hardware including Barefoot Tofino and Intel IPU platforms, have enabled a paradigm shift in network function implementation -- from software running on general-purpose servers to compiled programs executing at line rate within network switch ASICs. This paper presents a systematic investigation of in-network computation design patterns and their performance benefits for a representative set of stateful network functions in data center fabric environments. We design and evaluate P4 implementations of five network functions -- consistent hashing load balancer, stateful DDoS rate limiter, network telemetry collector with sketch-based traffic analytics, key-value cache for hot-object acceleration, and congestion control assistance with ECN marking -- on Barefoot Tofino and a software P4 emulation environment using the P4PI framework. In-network load balancing achieves 97 percent of hardware line rate throughput at 400 Gbps per port with sub-microsecond latency, versus 12 Gbps maximum throughput for the equivalent Linux kernel IPVS implementation. The in-network DDoS rate limiter enforces per-flow rate limits across 100,000 simultaneous flows with less than 0.3 percent false positive rate. We introduce the In-Network Offload Suitability Framework (INOSF) that characterizes which network function properties -- state size, update rate, operation complexity, and match-action fit -- determine feasibility and performance benefit of P4 offloading, providing data center network architects with principled offloading decision guidance.