Developing and validating machine learning models to predict vaccine hesitancy and literacy among adults in the United States
Frontiers in Public Health, 2026
Background
Vaccine hesitancy and literacy are multifaceted and context-specific phenomena that affect vaccination uptake. Comprehensively examining the simultaneous effects of various factors influencing vaccine hesitancy and literacy remains a challenge. This study aimed to better understand key determinants of adults' vaccination decision-making, regarding both their own vaccinations and those of their children, using different machine learning algorithms to analyze survey data.
Methods
A cross-sectional survey of US adults was conducted in 2022. Participants were categorized based on whether they had children under age 18 ("parents," N = 692) or not ("adults," N = 1,183). Survey responses were analyzed using multiple machine learning algorithms (logistic regression, decision tree, random forest, extreme gradient boost [XGBoost], support vector machine, and neural network) to predict adults' and parents' hesitancy toward vaccination, and literacy about their own or their children's vaccination. Potential predictors included demographics, health literacy, information-seeking behavior, attitudes, and beliefs. Model performance was evaluated using F1 score and area under the receiver operating characteristic curve (AUROC) or the precision-recall curve (AUPRC). Feature importance was evaluated using Shapley values.
Results
Among parents making vaccination decisions for their children, the random forest model achieved the highest predictive performance for vaccine hesitancy (F1 = 0.86, AUROC = 93.0%), and the XGBoost model performed best when predicting vaccine literacy (F1 = 0.64, AUPRC = 81.3%). Based on these models, the belief that "there is no need for my child to get vaccinated because everybody else does" emerged as the strongest predictor of hesitancy among parents, whereas low familiarity with the pediatric vaccination schedule was the main predictor of low literacy. Among adults making vaccination decisions for themselves, the XGBoost outperformed other models for both vaccine hesitancy (F1 = 0.77, AUROC = 90.3%) and vaccine literacy (F1 = 0.80, AUPRC = 86.0%). According to this model, having received an influenza vaccine was the strongest predictor of non-hesitancy among adults, and low familiarity with the adult vaccination schedule was the strongest predictor of low literacy.
Conclusion
This study demonstrated the effectiveness of machine learning approaches in analyzing robust survey data. These models identified key determinants of vaccine hesitancy and literacy, offering valuable insights into the behavioral and informational factors influencing vaccination decisions among US adults.
Authors
Zheng Y, Frew PM, Wang D, Song Y, Patterson-Lomba O, Feizi A, Li T, Eiden AL