Latest
Call for Papers: Vol. 42 closes 30 JuneNew: Quantum Security Summit registration openAxiom Standard 7042-2024 now ratifiedGrant cycle 2025 — $4.2M committedFellows election voting opens 15 JulyCall for Papers: Vol. 42 closes 30 JuneNew: Quantum Security Summit registration openAxiom Standard 7042-2024 now ratifiedGrant cycle 2025 — $4.2M committedFellows election voting opens 15 July
Digital Library

Research Archive

Search across 2.4 million peer-reviewed documents from journals, conferences, and standards.

Showing 2 of 2,418,902 results

Journal Article Open Access Artificial Intelligence

AI-Assisted Code Review in DevOps Pipelines: Empirical Evaluation of Large Language Model Integration for Automated Quality Gates

The integration of large language models (LLMs) into software engineering workflows has generated significant practitioner interest, yet rigorous empirical evaluation of LLM-assisted code review within DevOps pipelines remains limited. This paper presents a controlled empirical study evaluating the effectiveness of GPT-4 and Code Llama as automated code review agents within CI/CD quality gate implementations across three enterprise organizations. Our evaluation uses a benchmark corpus of 8,400 pull requests from production codebases spanning Python, Java, and TypeScript, with ground truth labels established by senior engineers reviewing each PR in a blinded protocol. LLM-based code review agents achieved 84.2% precision and 79.7% recall for security vulnerability identification, outperforming static analysis tools (SAST) on logical vulnerability classes while underperforming on injection-type vulnerabilities. For maintainability feedback, LLM agents produced actionable suggestions in 71% of cases, with engineer acceptance rates of 63% for GPT-4 and 54% for Code Llama. We introduce the Code Review Quality Score (CRQS) to standardize evaluation across dimensions. We also analyze prompt engineering strategies, context window management, and cost-latency trade-offs relevant to CI/CD integration constraints. Our findings provide the most comprehensive empirical assessment of LLM code review integration in DevOps environments to date, offering actionable deployment guidance for practitioners.

Emeka Chukwu, Lara Hoffmann, Takeshi Morikawa, Pooja Nair· Mar 2024· 341 citations
Journal Article Open Access Computer Vision

Vision-Language Models for Industrial Quality Control: Zero-Shot and Few-Shot Defect Detection Using CLIP, GPT-4V, and Gemini Vision in Manufacturing Inspection Pipelines

Industrial visual quality control -- the automated detection and classification of surface defects, dimensional anomalies, and assembly errors in manufactured components -- has traditionally required large labeled training datasets for each new product and defect category, creating deployment friction in high-mix manufacturing environments where products change frequently. Vision-language foundation models, including CLIP, GPT-4V, and Gemini Vision, offer the potential for zero-shot and few-shot defect detection through natural language defect description, potentially eliminating dataset collection requirements for new inspection tasks. This paper presents the first systematic evaluation of vision-language models for industrial defect detection, using a benchmark suite comprising 14,400 images across six manufactured component categories (printed circuit boards, machined metal parts, woven textiles, glass panels, silicon wafers, and food products) with ground-truth defect annotations from domain expert inspectors. CLIP-based zero-shot classification achieves 74.3 percent mean detection accuracy across categories with carefully engineered text prompts, compared to 94.1 percent for fine-tuned ResNet50 on the same categories. GPT-4V few-shot with 5 defect exemplars achieves 88.7 percent accuracy, reducing the gap to supervised learning while requiring no training pipeline. We characterize the prompt engineering patterns that most strongly influence zero-shot detection performance and introduce the Industrial Vision-Language Benchmark (IVLB) as an open evaluation resource. We also analyze the latency and cost profiles of API-based vision-language model deployment in production inspection pipelines.

Emeka Okafor, Sofia Svensson, Keiko Yamamoto, Rania El-Amin· Feb 2024· 234 citations