Abstract
Background. Compared with coronary artery bypass grafting (CABG) under cardiopulmonary bypass, off-pump coronary artery bypass (OPCAB) is minimally invasive and reduces the risk of intraoperative blood transfusion and acute kidney injury. Nonetheless, OPCAB-related complications still pose a threat. Machine learning technology can analyze a large number of clinical data, establish risk prediction models and help clinicians make early and correct clinical decisions.
Objectives. Risk prediction models are available for mortality and morbidity after cardiac surgery, but they are not specific to OPCAB. This study aimed to develop a predictive model of severe complications after OPCAB, based on machine learning.
Materials and methods. Anesthesia records of OPCAB from the General Hospital of the Northern Theater Command (Shenyang, China) collected between January 1, 2019, and June 15, 2020, were analyzed. The endpoint of the study was the occurrence of serious complications after OPCAB (postoperative unplanned intra-aortic balloon pump, secondary surgery and death). The features entered into the models were as follows: intraoperative ventricular fibrillation, number of saphenous vein grafts, nerve block (NeB), venous oxygen saturation (SvO2), skin incision-bypass time, and hypertension. A total of 8 machine learning algorithms were tested: logistic regression analysis (LRA), k-nearest neighbor (KNN), naïve Bayes (NB), support vector machine (SVM), random forest (RF), extreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM), and categorical features gradient boosting (CatBoost).
Results. Among the 506 patients found in the records, 27 met the endpoint. The highest area under the curve (AUC) value was achieved with the XGBoost model (AUC = 0.94), and the lowest with the SVM model (AUC = 0.75). The highest and lowest accuracy were observed with the XGBoost and NB models, respectively, while the highest and lowest precision were achieved using the SVM and NB models, respectively. Based on the receiver operating characteristic (ROC) curves, the XGBoost model was selected as the most useful in this study.
Conclusions. This study suggests using the XGBoost model to predict the risk of complications after OPCAB.
Key words: complications, machine learning, off-pump coronary artery bypass grafting, prediction model
Background
Revascularization is paramount to the management of acute coronary syndrome (ACS). It aims to improve blood flow to the myocardium1 and is performed using percutaneous coronary intervention or coronary artery bypass graft (CABG).1, 2, 3 The latter can be performed either off-pump (i.e., without the assistance of a heart-lung machine) or on-pump. On-pump CABG is associated with more severe surgical trauma, while off-pump coronary artery bypass (OPCAB) can reduce perioperative bleeding and allogeneic blood transfusions, as well as reduce the risk of acute kidney injury (AKI) in patients with kidney dysfunction.4 The OPCAB does not appear to increase 30-day mortality compared with on-pump CABG, but an extensive systematic review of observational studies suggested that OPCAB might reduce short-term mortality.5, 6 Therefore, OPCAB is probably a good option for selected patients.7, 8
There are still some risks related to the use of OPCAB,4, 9, 10 including perioperative complications such as mortality, stroke, kidney failure, respiratory failure, and blood loss.11, 12, 13 Furthermore, OPCAB appears to be associated with higher 10-year rates of incomplete revascularization, repeat revascularization and mortality, compared with on-pump CABG.10 Additionally, OPCAB is associated with increased adverse events at 1 year and mortality at 5 years.14, 15 Although OPCAB has a similar risk of myocardial infarction compared to on-pump CABG, the data are inconsistent for the risk of stroke.5, 6, 16 A decreased left ventricular ejection fraction (LVEF) is observed in about 22% of the patients after OPCAB and can compromise their short- and long-term outcomes.17, 18, 19
Some tools are available for estimating the risk of mortality and morbidity after CABG. The Society of Thoracic Surgeons (STS) score can be used to determine the risk of mortality and morbidity after cardiac surgery, but it is not specific to OPCAB.20 The European System for Cardiac Operative Risk Evaluation (EuroSCORE) can overestimate the risk of complications in the highest-risk and lowest-risk patients undergoing CABG,21 as well as in patients undergoing OPCAB.22 Other risk models are available but they are not specific to OPCAB.23, 24, 25
Machine learning algorithms can be used to analyze data and establish risk models more accurately than traditional statistical models.26, 27 Indeed, machine learning has been used to create models that predict mortality after cardiac surgery,28, 29, 30, 31 as well as estimate the length of hospital stay after CABG.32 Regardless, these models are still not specific to OPCAB.
Objectives
Using machine learning, this study aimed to build a predictive model for the detection of early postoperative serious complications after OPCAB. The results could provide reference data for optimizing the clinical pathway for OPCAB, and anesthesia strategy to maintain vital signs, regulate the circulation, balance myocardial oxygen supply and demand, and reduce complications.
Materials and methods
Study design
All data were taken from the Do-care anesthesia record system (v. 5.0, MEDICAL SYSTEM Co., Ltd., Suzhou, China). All records of OPCABs performed from January 1, 2019, to June 15, 2020, at the 2nd Ward of the Department of Anesthesiology of the General Hospital of the Northern Theater Command (Shenyang, China) were included. At that hospital, OPCAB has been carried out for 20 years. In this study, 3 teams comprised of 15 surgeons were involved, who all had the qualification of chief surgeon, with an annual operation volume of 300–450 cases per surgeon.
This study was approved by the Ethics Committee of the hospital (Approval No. k(2020)01). The requirement for individual informed consent was waived by the Committee due to the retrospective nature of this study.
Inclusion and exclusion criteria
All patients who underwent CABG were screened. The exclusion criteria were: 1) CABG under cardiopulmonary bypass; 2) CABG combined with other surgical procedures; 3) cancellation of the operation; or 4) intraoperative change of the initial surgical plan (e.g., intraoperative decision for valve replacement).
Data collection and definitions
Demographic data (sex, age and body mass index (BMI)), data on comorbidities and intraoperative parameters (heart rate (HR), mean arterial pressure (MAP), respiratory rate, and mixed venous oxygen saturation (SvO2)) were collected retrospectively. The SvO2 of the patients was continuously measured and recorded using a Swan–Ganz catheter (Edward Company, Irvine, USA) placed through the internal jugular vein.
The endpoint was the occurrence of serious complications after OPCAB, defined as postoperative unplanned intra-aortic balloon pump (IABP) assistance, secondary surgery (e.g., thoracotomy and repeat revascularization), intraoperative emergency conversion to on-pump CABG, and death. Revascularization was defined as revascularization for acute graft failure during the same hospital stay and emergency revascularization for bleeding (i.e., hemorrhagic shock caused by bleeding from the anastomotic site of the transplanted blood vessel during postoperative hospitalization, with repeat revascularization after 2 emergency operations). A patient in whom any of the above events occurred after OPCAB and before discharge from the hospital was considered to have met the endpoint.
Feature selection and model evaluation
The core principle of model feature screening was based on the feature importance of the machine learning model, combined with Pearson’s correlation analysis and statistical analysis of the difference. The specific implementation was as follows:
1. Four algorithms with characteristically important parameters were selected: logical regression analysis (LRA), support vector machine (SVM), random forest (RF), and extreme gradient boosting (XGBoost).
2. The feature importance of the standard features was calculated and ranked based on the above 4 models.
3. The top 10 features of each model were selected as the feature groups (a total of 4 feature groups).
4. Pearson’s correlation analysis was carried out on the standard features, and the features with the top 10 correlation coefficients were selected as the feature group.
5. The χ2 test was performed on the standard features, and the features with statistically significant differences (p < 0.05) were selected as the feature group.
6. All 6 feature groups were compared, and the features that appeared 4 times or more were selected as the main features.
7. Finally, Pearson’s correlation coefficient was used to distinguish the variables that might affect the endpoint.
After removing meaningless features (Supplementary Table 1), interpolating missing values (Supplementary Table 2), discretizing the numerical variables, and selecting the features, the remaining 6 features were entered into the model. The features were intraoperative ventricular fibrillation (VF), number of saphenous vein grafts (SVG), nerve block (NeB), mixed venous oxygen saturation, skin incision-bypass time (T1), and hypertension (HBP). Results of the tests used for feature selection are shown in Supplementary File 1 and Supplementary Figure 3. Results of verifying the assumptions for the application of the preferred tests are shown in Supplementary Tables 3–10.
“Simpleimputer” in the “sklearn” module was used for the interpolation of the missing data. Mean interpolation was used for numerical variables and mode interpolation for binary and hierarchical variables. Meaningless features were first deleted, and the remaining features were interpolated one by one according to the characteristics and distribution of each feature, rather than based on the 54 features.
All numerical features were discretized by segmentation of continuous numerical data into discrete intervals. The segmentation principle was based on equal frequency, equal distance or optimization methods. Data discretization is also required by many algorithms, since discretization can speed up model training and enhance the robustness of the model by converting the continuous variables into category variables through discretization. In order to unify the characteristic segmentation of different dimensions, this study used the mean ±standard deviation (M ±SD) as the segmentation principle. Specifically, all values of the characteristic column were divided into 4 segments according to the nodes of M-1×SD or M or M+1×SD, and each segment was marked as 0, 1, 2, or 3. All features were defined as standard features after meaningless feature removal and numerical feature discretization.
Eight machine learning algorithms were tested in this study: LRA, k-nearest neighbor (KNN), naïve Bayes (NB), SVM, RF, XGBoost, light gradient boosting machine (LightGBM), and categorical features gradient boosting (Catboost).
Statistical analyses
Statistical analysis was performed using the SciPy v. 1.4.1 scientific computing module within the Python 3.8 environment (https://pypi.org/project/scipy/1.4.1/). Data were assessed for normality using the Shapiro–Wilk test. Continuous data conforming to a normal distribution were presented as M ±SD and analyzed using the independent samples t test. Those not conforming to a normal distribution were presented as median (range) and analyzed using the Mann–Whitney U test. Categorical data were presented as n (%) and analyzed using the χ2 test. Correlation analyses were performed using the Pearson’s analysis. Training and validation sets were divided using k-fold cross-validation. The k-fold module in sklearn (https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.KFold.html) was used to randomly divide the database into 5 equal and non-overlapping groups, and the proportion of negative and positive samples in each group was the same. Each time, 4 groups were used as the training set, and 1 group was used as the validation set for model training verification. Precision, recall, F1-score (combining precision and recall into one metric by calculating the harmonic mean between those two33), and the area under the curve (AUC) were calculated. The above process was performed 5 times to ensure that each group was used as the validation set. Each time, the model was retrained and validated to avoid overfitting, and the average score of the 5 cross-validations was used as the final performance score of the model. The value of p < 0.05 was considered statistically significant.
Results
Patient selection
Figure 1 presents the patient selection process. Among the 11,495 patients included in the database, 1238 underwent CABG, and 506 from them were selected based on the eligibility criteria. They were then divided into the training set (n = 405) and the validation set (n = 101). Table 1 presents the characteristics of the patients. Among the 506 patients chosen, 27 met the endpoint (positive group), including postoperative emergency IABP assistance (n = 10), secondary surgery (n = 2), death without other outcomes (n = 3), postoperative emergency IABP assistance with secondary surgery (n = 2), death after postoperative emergency IABP (n = 6), death after secondary surgery (n = 2), and death after postoperative emergency IABP assistance and secondary surgery (n = 2). The in-hospital mortality rate was 2.6% (13/506). Compared with the controls, the patients who met the endpoint had a lower LVEF (52 ±7% compared to 55 ±6%, p = 0.027), lower fractional shortening (26 ±5% compared to 28 ±4%, p = 0.013), worse New York Heart Association (NYHA) classification score (p = 0.041), lower frequency of preoperative diabetes mellitus (DM; 22% compared to 46%, p = 0.016), lower intraoperative urine output (639 ±2512 mL compared to 771 ±426 mL, p = 0.018), shorter T1 (62.0 ±17.8 min compared to 69.3 ±18.7 min, p = 0.047), higher SvO2 values (74% compared to 53%, p = 0.036), and a smaller numbers of grafts (p < 0.001). Unilateral (left) internal mammary artery to the anterior descending artery anastomosis was performed in all patients. Radial arteries or other arteries were not used as graft vessels.
Feature selection
Figure 2 presents the feature selection process. From the initial 60 features, 6 were removed due to meaninglessness. After missing data imputation and discretization of the continuous variables, 54 clean features were tested, and 6 were retained (intraoperative VF, number of SVG, NeB, SvO2, T1, and HBP). The results of the correlation analysis of the 6 features are presented in Figure 3.
Algorithms
Prediction values for the 8 machine learning models are presented in Table 2. The highest AUC was achieved with the XGBoost model (AUC = 0.94) and the lowest AUC with the SVM model (AUC = 0.75). The highest and lowest accuracy were observed with the XGBoost and NB models, respectively, while the highest and lowest precision were achieved using the SVM and NB models, respectively. Based on receiver operating characteristics (ROC) curve analysis, the XGBoost model was selected as the final model for the study (Figure 4). Figure 5 shows the importance of the different variables when analyzed by the different models. Table 3 displays all of the variables evaluated in this study.
Discussion
Results suggest that it is possible to use machine learning algorithms to predict the risk of complications after OPCAB. The highest predictive value was achieved using the XGBoost model, based on VF, SVG, NeB, SvO2, T1, and HBP, as revealed by the AUC, which can be used as the main metric to determine the optimal classifier.34
Previous studies used machine learning to predict the mortality and morbidity of cardiac surgery. In a study by Kartal, mortality risk was predicted using the EuroSCORE and the C4.5 algorithm: both the EuroSCORE and the C4.5 algorithm included age, serum creatinine, LVEF, and mean pulmonary hypertension (mPAP).28 They used their algorithm to develop a web application for risk prediction after cardiac surgery. Castela Forte et al. used machine learning to evaluate 88 perioperative variables in order to predict 5-year mortality after cardiac surgery; they observed that postoperative urea concentration, age and creatinine concentration, achieved the best predictive values across different cardiac surgery types.29 Kim et al. examined deep neural network, GBM and a generalized linear model to predict major adverse cardiovascular events 1, 6 and 12 months after cardiac surgery, and achieved accuracies >95%.30 Zhong et al. used deep learning to predict the risk of septic shock, thrombocytopenia and liver dysfunction after open-heart surgery.31 They examined the performance of XGBoost, RF, KNN and logistic regression, and showed that the XGBoost model achieved the best predictive value for complications. Alshakhs et al. used machine learning to determine the length of hospital stay after CABG, which might be considered a surrogate for the occurrence of postoperative complications.32 They also showed that an RF model including age, height, EuroSCORE II, and the use of IABP achieved the best predictive value.
In the present study, the Pearson’s correlation analysis was used to consider the importance of extracting features from different directions (machine learning direction and statistical direction) to make the screened features more convincing. The data indicated that VF, SVG, NeB, SvO2, T1, and HBP in the XGBoost model achieved the best predictive value. The XGBoost is an advanced complex implementation of gradient boosting algorithms.35, 36 It can handle both regularization and over/underfitting issues.35, 36 The parameters selected by the user (i.e., the hyperparameters) usually have a strong effect on the performance of a machine learning algorithm.37, 38 Still, XGBoost can adapt to the selected hyperparameters to achieve the best fitting,35, 36, 39 which explains its good performance in the present study.
Comparisons among studies are difficult. Indeed, various studies have examined different machine learning models based on a wide variety of different variables. In addition, the endpoints and the definitions of complications vary, and the study populations have included various types of surgery. In the present study, only patients who underwent OPCAB were included, and the endpoint was the occurrence of IABP assistance, secondary surgery and death. Nonetheless, various studies have shown that the XGBoost model achieved good predictive value. Indeed, similar to above, Zhong et al. used the XGBoost model for predicting complications after open-heart surgery.31 Kilic et al. used the XGBoost model to predict the occurrence of operative mortality (AUC = 0.771), renal failure (AUC = 0.776), prolonged ventilation (AUC = 0.739), reoperation (AUC = 0.637), stroke (AUC = 0.684), and deep wound infection (AUC = 0.599) after aortic valve replacement.40 Additionally, Lee et al. showed that the XGBoost model had the highest predictive power for AKI after cardiac surgery.41
Apart from comparison with other deep learning models, the model established here should be compared with well-known and recognized models. Indeed, the EuroSCORE II and the original EuroSCORE have been used for decades to predict mortality risk after cardiac surgery and help improve patient outcomes.42, 43 The STS score can also be used to determine the risk of CABG.20 However, both scores are not specific to OPCAB. Furthermore, the data used in the present study were taken directly from the anesthesia monitor system, and some components were not included in the EuroSCORE II and STS scores. Future studies should be set up to allow such direct comparisons using the same set of patients.
Patients who met the study endpoint had a low frequency of DM and high SvO2. Diabetes mellitus is associated with poor outcomes after CABG or cardiac surgery.44, 45, 46, 47 On the other hand, poor outcomes after CABG have been associated with either high SvO248 or low SvO2.49 Considering the small number of patients who met the endpoint in the present study, no conclusion can be drawn on these points.
Limitations
This study has a number of limitations. The data were unbalanced, with the proportion of patients who met the complication endpoint being small. Although the category imbalance was corrected at the data and algorithm levels, it inevitably affected the fitting degree of the model. Follow-up studies are required to optimize the algorithm based on category imbalance characteristics, in order to reduce the impact of category imbalance on the model performance. Although the predictive factors selected in this study related to the endpoint as much as possible, some predictive factors that had not been discovered or confirmed might have been omitted. In the future, more predictive factors could be added through an in-depth study of OPCAB-related risk factors to improve the performance of the model.
The sample size of this study was small, and it was a single-center retrospective study. The data were from a single center or a single physician team, which limited generalizability and probably introduced some bias caused by varying experience of the surgeons and anesthesiologists. Future studies should be extended to multiple centers. As a retrospective study, this investigation collected the data of all patients who met the criteria in our center during the study period. Relevant data from this period are relatively complete, and data quality cannot be guaranteed in earlier cases. After June 2020, the number of operations decreased due to the coronavirus pandemic, which might have led to bias. An independent validation dataset was also lacking. Therefore, the final model might have poor generalizability. Continuous iterations of the model, through large multicenter samples and prospective validation studies, should increase the generalizability of the model. Since this study only predicted specific, not all complications of OPCAB surgery, its purpose was not to compare the performance of the final model with the EuroSCORE. Data were insufficient to allow separate analyses of patients undergoing total artery bypass grafting.
Conclusions
This study verified the effectiveness of different machine learning models and provided suggestions for the best mathematical model for predicting the risk of complications after OPCAB. This knowledge could be used to continuously optimize the model and introduce it into the clinical medical electronic system, which would allow clinicians to use optimizing treatment strategies in real-time.
Data availability
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.
Supplementary files
The Supplementary Files are available at https://doi.org/10.
5281/zenodo.7063461. The package contains the following files:
Supplementary Fig. 1. Results of the Pearson’s correlation analysis.
Supplementary File 2. Features selection results of Wilcoxon and χ2 tests).
Supplementary Table 1. The list of meaningless features.
Supplementary Table 2. The list of missing data.
Supplementary Table 3. Verification results of LRA.
Supplementary Table 4. Verification results of KNN.
Supplementary Table 5. Verification results of naive Bayes (NB).
Supplementary Table 6. Verification results of SVM.
Supplementary Table 7. Verification results of RF.
Supplementary Table 8. Verification results of XGBoost.
Supplementary Table 9. Verification results of LightGBM.
Supplementary Table 10. Verification results of CatBoost.