Adversarial Machine Learning Attacks on Network Intrusion Detection Systems: Threat Taxonomy, Evasion Techniques, and Robust Defense Architectures
Machine learning-based Network Intrusion Detection Systems (NIDS) have been widely adopted as a replacement for signature-based approaches, yet their vulnerability to adversarial perturbations -- inputs crafted to deceive the classifier while preserving malicious functionality -- represents a critical and underaddressed threat. This paper presents a comprehensive adversarial ML threat taxonomy for NIDS, categorizing attacks across four dimensions: perturbation scope (feature-space vs problem-space), attacker knowledge (white-box, grey-box, black-box), attack timing (training-time poisoning vs inference-time evasion), and adversarial goal (evasion, impersonation, denial of service against the detector). We implement and evaluate 14 adversarial attack strategies against six representative NIDS architectures -- including Random Forest, LSTM, and Transformer-based classifiers -- using the CICIDS2017 and NSL-KDD benchmark datasets supplemented by proprietary traffic captures from a financial institution. Problem-space attacks -- which modify actual network packets rather than feature vectors -- reduce NIDS detection rates by 34-71% depending on the classifier architecture. We evaluate five defense strategies: adversarial training, ensemble diversity, feature randomization, input preprocessing, and certified robustness bounds. Adversarial training combined with ensemble diversity achieves the best robustness profile but requires 3.4x the training compute of undefended baselines. We release a standardized adversarial NIDS evaluation framework to facilitate reproducible future research.