A $35,000 interdisciplinary pilot study harnessing 4 terabytes of Virginia health data to combat severe maternal morbidity — using machine learning to unlock answers traditional data sources cannot provide.
Severe maternal morbidity is one of the most urgent and under-addressed challenges in U.S. healthcare. The numbers demand action.
Women of color face a disproportionately higher risk of severe maternal morbidity — including increased mortality — compared to white women. Systemic inequities in access to risk-appropriate care are a root cause our research directly targets.
Traditional hospital discharge data captures only a snapshot of the delivery hospitalization — missing the full prenatal history that shapes maternal risk. All-Payer Claims Databases (APCDs) change everything.
APCDs aggregate all medical claims, pharmacy claims, dental claims, and eligibility records across all insurers into a single database — providing the most complete picture of healthcare delivery ever assembled at a population scale.
"Up to 40% of cases of severe maternal morbidity could be prevented with timely identification and treatment."
This project pilots the use of Virginia's APCD — five years of data covering approximately 500,000 pregnancies — to establish the team's competency with this novel data source and lay the groundwork for two major federal funding proposals.
Introduced in 2011, APCDs are statewide databases mandated by 18 states (including Nevada) to consolidate all insurance claims. They provide a longitudinal, population-level record from prenatal care through delivery and beyond.
Virginia's APCD slice is 4 terabytes of records. Machine learning and deep learning algorithms running on UNLV's $500K NSF-funded GPU cluster (housed at Switch Cloud) are the only viable path to extracting actionable patterns at this scale.
The UNLV Data Analytics Lab hosts 18 machines each with NVIDIA RTX™ A4000 GPUs, plus a high-end cluster at Switch — providing the backbone for all deep learning analyses in this project.
Each question builds our team's competency with APCD data and generates publishable findings to support our R01 federal grant proposals.
What proportion of risk-factors identified on claims during the prenatal period can also be identified in the delivery hospitalization claim record? This demonstrates our ability to link prenatal claims to delivery records across the APCD.
Do women with high-risk delivery complications have lower costs of care when their high-risk conditions are identified in the delivery hospitalization claim record? This tests our ability to work with the APCD cost data.
Do women with high-risk complications have lower odds of severe maternal morbidity when they receive risk-appropriate care? This links APCD data to the American Hospital Association Database to measure hospital level of care.
Can machine learning and data mining identify risk factors for severe maternal morbidity not yet recognized by clinical literature? This pilot leverages UNLV's GPU cluster for deep learning-based multinomial categorization on the full 4TB dataset.
Uses ICD codes to identify 20 conditions that elevate risk for maternal end-organ damage. Our team developed a validated binary classifier (high-risk yes/no) applied to the full sample.
Uses 21 ICD diagnostic and procedure codes to identify conditions where maternal deterioration requires intervention to prevent death — the gold standard outcome measure.
A validated dichotomous variable indicating whether a high-risk woman delivered at a facility with obstetric critical care services, using our team's validated hospital-level-of-care method.
Costs of care derived from insurance claim charges using the Agency for Healthcare Research and Quality's standardized Cost-to-Charge Ratio for Inpatient Files.
The primary statistical framework, guided by research question, underlying theory, data availability, and nature of the data. Substantial attention to measurement error, estimation bias, and model assumption failures.
The 4TB Virginia APCD slice requires advanced deep learning for multinomial categorization. The CS team will clean and prepare data for GPU-cluster-based algorithms to surface previously unidentified risk patterns.
Women receiving healthcare in Virginia during pregnancy, 2019–2023. ~95,000 births/year → ~500,000 pregnancies over 5 years. Identified by any ICD or DRG code indicating pregnancy.
A purposeful pairing of clinical nursing expertise and computational power.
A certified nurse-midwife with a Master's in Public Health and PhD in Nursing (Emory University), Dr. Vanderlaan sits at the intersection of clinical care, health policy, and data science. She holds an AHRQ R36 grant for risk-appropriate care research, has developed and validated novel measures for hospital maternal level of care and sample selection methodology for high-risk women, and leads the American College of Nurse-Midwives Midwifery Workforce Study (Johnson & Johnson Foundation).
ROLE IN THIS PROJECT
Conceptualization · Funding Acquisition · Project Administration · Formal Analysis · Writing
A pioneer in large-scale information retrieval and database systems, Dr. Taghva led development of the U.S. Department of Energy's Licensing Support Network database from 1989 to 2009 — one of the largest legal-discovery databases ever built. Known internationally for model-based retrieval from noisy text (featured in Croft, Metzler & Strohman's Search Engines), he brings foundational expertise in data engineering, OCR, and NLP to the team's AI pipeline.
ROLE IN THIS PROJECT
Data Curation · Resources & Infrastructure · Software Development · Machine Learning Lead · Writing
Purchase 5 years of Virginia APCD data (~$35,000 for 25M records over 2 years). Data stored on UNLV's Computer Science servers. Timeline and cost learnings will directly inform our R01 budget proposals.
Assess data completeness; apply the Obstetric Comorbidity Index to every record; add hospital level of care using our validated method; identify data slices for each research question.
GLM-based analyses for questions 1–3; GPU-accelerated deep learning for question 4. Prepare manuscripts and conference abstracts demonstrating competency with APCD data.
Submit two federal R01 proposals — one to AHRQ (February 2026) and one to NIH IMPROVE Initiative (June 2026) — using pilot data as proof of competency and feasibility.
24-month project timeline. Each bar represents approximate duration and start point.
AHRQ R01 (February 2026): Missed opportunities to diagnose and treat high-risk maternal conditions.
NIH R01 (June 2026): System-level factors in disparities in risk-appropriate care (PAR-24-059).
The project will produce at least two peer-reviewed publications demonstrating APCD competency, plus two federal grant proposals targeting $2M+ in follow-on funding.
| Type | Topic | Lead | Status |
|---|---|---|---|
| Manuscript | Risk for severe maternal morbidity: Agreement between prenatal and hospital claims data | Dr. Vanderlaan | In Progress |
| Manuscript | Machine learning to identify previously unrecognized risks for severe maternal morbidity | Dr. Taghva | In Progress |
| Grant | Missed opportunity to diagnose and treat maternal high-risk (AHRQ R01) | Dr. Vanderlaan | Due Feb 2026 |
| Grant | System-level factors in disparities in risk-appropriate care (NIH R01, PAR-24-059) | Dr. Vanderlaan | Due Jun 2026 |
We welcome collaborations with clinicians, health systems, state health departments, and researchers who share our commitment to reducing maternal morbidity disparities.
Contact the Team All Research