Abstract
Background: Today, cardiovascular disease (CVD) is the most important cause of death around the world. In this study, our main aim was to predict CVD using some of the most important indicators of this disease and present a tree-based statistical framework for detecting CVD patients according to these indicators.
Methods: We used data from the baseline phase of the Fasa Cohort Study (FACS). The outcome variable was the presence of CVD. The ordinary Tree and generalized linear mixed models (GLMM) were fitted to the data and their predictive power for detecting CVD was compared with the obtained results from the GLMM tree. Statistical analysis was performed using the RStudio software.
Results: Data of 9499 participants aged 35‒70 years were analyzed. The results of the multivariable mixed-effects logistic regression model revealed that participants’ age, total cholesterol, marital status, smoking status, glucose, history of cardiac disease or myocardial infarction (MI) in first- and second-degree relatives, and presence of other diseases (like hypertension, depression, chronic headaches, and thyroid disease) were significantly related to the presence of CVD (P<0.05). Fitting the ordinary tree, GLMM, and GLMM tree resulted in area under the curve (AUC) values of 0.58 (0.56, 0.61), 0.81 (0.77, 0.84), and 0.80 (0.76, 0.83), respectively, among the study population. In addition, the tree model had the best specificity at 81% but the lowest sensitivity at 65% compared to the other models.
Conclusion: Given the superior performance of the GLMM tree compared with the standard tree and the lack of significant difference with the GLMM, using this model is suggested due to its simpler interpretation and fewer assumptions. Using updated statistical models for more accurate CVD prediction can result in more precise frameworks to aid in proactive patient detection planning.