Medicine

A Supervised Learning Framework for Stroke Hospitalization Factors Selection Using the Lasso-MIDAS Model

AI Insight

This study introduces a Lasso-MIDAS statistical framework to identify key drivers of stroke hospitalizations by combining meteorological, air quality, and socioeconomic variables drawn from different data frequencies. Among 21 candidate variables, 11 were retained as significant: wind speed, carbon monoxide, sulfur dioxide, and temperature variability were positively associated with stroke admissions, while nitrogen dioxide showed a negative correlation, possibly due to behavioral changes during high-pollution periods. The Consumer Price Index for food, tobacco, and alcohol emerged as a notable socioeconomic risk factor, suggesting that cost-of-living pressures may influence stroke risk through lifestyle-related pathways.


These findings offer a data-driven foundation for public health authorities to develop early warning systems that integrate environmental and economic indicators for stroke risk monitoring. The methodological approach may also be applicable to other acute conditions where data come from heterogeneous sources at varying time scales.


⚠️ Preprint – Noch nicht peer-reviewed

Dieser Artikel wurde noch nicht von unabhängigen Experten begutachtet. Die Ergebnisse sind vorläufig und sollten mit Vorsicht interpretiert werden.

Stroke, as an acute cerebrovascular disease with significant public health implications, is influenced by a complex interplay of meteorological conditions, air quality, and socioeconomic factors. However, the inherent challenges of mixed-frequency data from diverse sources and high-dimensional variable spaces limit the effectiveness of traditional regression models. This study develops a Lasso-MIDAS model framework to identify the key multidimensional drivers of stroke admissions. Using this approach, 21 candidate variables encompassing meteorological, environmental, and economic indicators were screened. The empirical results identified 11 core influencing factors. In the meteorological and environmental dimensions, Wind Speed, Carbon Monoxide (CO), and Sulfur Dioxide (SO2) were identified as significant positive drivers, with Temperature Difference also positively correlating with admission risks. Conversely, Nitrogen Dioxide (NO2) exhibited a negative correlation, potentially reflecting behavioral adaptation and exposure reduction during peak pollution periods. In the socioeconomic dimension, the Consumer Price Index (CPI) for Food, Tobacco, and Alcohol emerged as a major risk factor, highlighting the impact of living cost pressures on public health. The findings demonstrate the superiority of the Lasso-MIDAS model in handling large-scale healthcare data. It effectively addresses the frequency mismatch problem while enhancing the robustness of causal identification through variable shrinkage. These conclusions provide a scientific basis for health authorities to establish early warning systems and optimize public health policy interventions.

Source: A Supervised Learning Framework for Stroke Hospitalization Factors Selection Using the Lasso-MIDAS Model