Research Archive

Journal Article Open Access Bioinformatics

Deep Learning Architectures for Genomic Variant Pathogenicity Prediction: Evaluation of CNN, LSTM, and Attention-Based Models on ClinVar and gnomAD Population Databases

The clinical interpretation of genomic variants of uncertain significance (VUS) is one of the most pressing bottlenecks in genomic medicine, with over 60 percent of variants identified in clinical sequencing classified as uncertain significance in the ClinVar database. Machine learning approaches to variant pathogenicity prediction offer the potential to reduce this uncertainty, but the relative merits of different deep learning architectures for this task -- and the generalizability of published models across population diversity -- remain incompletely understood. This paper presents a systematic evaluation of four deep learning architectures for variant pathogenicity prediction: one-dimensional CNN with nucleotide sequence context, bidirectional LSTM with epigenomic feature integration, transformer with self-attention over genomic windows, and a novel hybrid CNN-Transformer architecture we term VariantNet. Evaluation uses a benchmark dataset of 48,000 pathogenic and benign variants curated from ClinVar with gnomAD population frequency annotations, stratified by variant type (SNV, indel, splice-site) and ancestry group. VariantNet achieves the highest AUC (0.943) on the combined benchmark, with particularly strong performance on splice-site variants (AUC 0.961) where sequential context is most informative. A critical finding is significant performance degradation for all models on African ancestry variants (mean AUC drop of 0.041) due to underrepresentation in training data, which we address through ancestry-stratified training with transfer learning. We release VariantNet weights, training code, and the curated benchmark dataset as open-source resources for the bioinformatics community.

Adaeze Obi, Frida Magnusson, Hiromi Yamamoto, Yasmin Hassan· Aug 2018· 467 citations

Journal Article Subscription Software Architecture

Microservices Decomposition Strategies and Their Operational Consequences in DevOps Environments: Domain-Driven Design, Bounded Contexts, and Service Granularity

The decision to decompose a system into microservices is one of the most consequential architectural choices a DevOps organization makes, yet the criteria governing appropriate service granularity remain poorly defined in both academic and practitioner literature. This paper examines microservices decomposition strategies and their downstream operational consequences for DevOps pipeline complexity, observability overhead, inter-service coordination cost, and deployment independence. We conduct a retrospective analysis of decomposition decisions at six organizations over three-year time horizons, supplemented by a survey of 267 software architects and DevOps engineers. Domain-Driven Design (DDD) bounded contexts are used as the theoretical lens, and we evaluate how closely organizations` decomposition decisions align with DDD principles and how alignment correlates with operational outcomes. Organizations with DDD-aligned decompositions report 47% lower inter-service incident rates and 38% fewer deployment pipeline interdependencies compared to organizations using ad-hoc decomposition heuristics. We identify five granularity anti-patterns — Nano-service Proliferation, Shared Database Coupling, Chatty Service Mesh, God Service Regression, and Temporal Coupling Latency — and provide detection heuristics and refactoring guidance for each. The paper provides a practical decomposition decision framework integrating DDD, operational complexity, and DevOps pipeline cost dimensions.

Emeka Nwosu, Annika Bergman, Takashi Suzuki, Ana Cristina Pires· Aug 2018· 362 citations

Deep Learning Architectures for Genomic Variant Pathogenicity Prediction: Evaluation of CNN, LSTM, and Attention-Based Models on ClinVar and gnomAD Population Databases

Microservices Decomposition Strategies and Their Operational Consequences in DevOps Environments: Domain-Driven Design, Bounded Contexts, and Service Granularity

Registration Required