## Abstract

Background. Off-pump coronary artery bypass grafting-associated acute kidney injury (OPCAB-AKI) is related to 30-day perioperative mortality. Existing mathematical models cannot be applied to help clinicians make early diagnosis and intervention decisions.

Objectives. This study used an interpretable machine learning method to establish and screen an optimized OPCAB-AKI prediction model.

Materials and methods. Clinical data of 1110 patients who underwent OPCAB in the Department of Cardiac Surgery of General Hospital of Northern Theater Command (Shenyang, China) from January 2018 to December 2020 were collected retrospectively. Four machine learning models were used, including logistic regression (LR), decision tree (DT), random forest (RF), and eXtreme Gradient Boosting (XGBoost). The SHapley Additive exPlanation (SHAP) tool was used for explanatory analysis of the black-box model. The mean absolute value of the characteristic SHAP parameter was defined and sorted. The correlation between the characteristic parameters and OPCAB-AKI was determined based on the SHAP value. A quantitative analysis of a single characteristic and an interaction analysis of multiple characteristics were carried out for the main risk factors.

Results. The RF prediction model had the best performance, with an area under the curve (AUC) of 0.90, a precision rate of 0.80, an accuracy rate of 0.83, a recall rate of 0.74, and an F1 score of 0.78 for positive samples. The interpretation analysis of the SHAP model results showed that intraoperative urine volume contributed to the greatest extent to the RF model, and other parameters included intraoperative sufentanil dosage, intraoperative dexmedetomidine dosage, cyclic variation coefficient during the induction period, intraoperative hypotension duration, age, preoperative baseline serum creatinine, body mass index (BMI), and Acute Physiology, Age and Chronic Health Evaluation (APACHE) II score.

Conclusions. The model constructed by the RF ensemble learning algorithm predicted OPCAB-AKI, and indicators such as intraoperative urine volume were closely related to OPCAB-AKI.

Key words: coronary artery bypass grafting, acute kidney injury, machine learning, interpretability research

## Background

Cardiac surgery-associated acute kidney injury (CSA-AKI) incidence is approx. 22–30%, of which 1% of cases must be treated with emergency dialysis.^{1} Off-pump coronary bypass grafting (OPCAB) avoids the cardiopulmonary bypass risk factors of non-physiological perfusion and ischemia-reperfusion injury. However, circulatory fluctuations caused by surgical procedures can still cause cardiac and renal insufficiency, while low cardiac output syndrome further increases the risk of OPCAB-AKI. A previous study demonstrated that transient minor serum creatinine (sCr) elevation after cardiac surgery is associated with 30-day mortality.^{2} Patients with stage I AKI have an increased risk of death by 56%, and patients with stage II or III AKI have a mortality risk of up to 3.5 times higher that in the general population. Even after curing CSA-AKI symptoms, the risk of progression to chronic kidney disease (CKD) and death remains increased.^{3} Early intervention could prevent AKI from progressing to a severe stage, which is crucial for reducing perioperative mortality.

Current mathematical models used to predict CSA-AKI are regression models based on preoperative data, including demographic variables, such as the European System for Cardiac Operative Risk Evaluation (EuroSCORE II) published in 2012, the Society of Thoracic Surgeons Score (STS score)^{4} published in 2008, and the Sino-System for Coronary Operative Risk Evaluation (SinoSCORE) published in 2009 in China.^{5} A common limitation of these models is that they only include the analysis of a few intraoperative risk factors. Since these factors correspond to multiple surgical types or complications, their single-risk prediction ability has to be improved. Clinicians must be able to understand and interpret the correlation between risk factors based on the accurate prediction of AKI risk to make correct decisions. However, achieving good predictability and interpretability is challenging because the computational process of most models is almost a “black-box” for researchers. Machine learning combined with SHapley Additive exPlanation (SHAP) could explain the output results of the prediction model, thereby solving this issue.^{6}^{, }^{7}^{, }^{8}^{, }^{9}

## Objectives

This study combined preoperative features with various intraoperative clinical parameters, such as decisive surgical decisions and hemodynamic fluctuations. The aim of the study was to establish a risk prediction model including intraoperative features set as the primary objective variables, with OPCAB-AKI set as the sole outcome. The research hypothesized that the constructed OPCAB-AKI prediction model based on machine learning exhibits good predictive performance, and the SHAP explanatory toolkit used to analyze the weight and clinical significance of single or multiple risk factors may be helpful for an accurate early prediction of OPCAB-AKI and precise clinical decision-making.

## Materials and methods

### Data collection

The Ethics Committee of the General Hospital of Northern Theater Command (Shenyang, China) approved the retrospective data analysis (approval No. k (2020) 01) and exempted it from informed consent. The clinical data of 1110 patients undergoing elective OPCAB in the above hospital from January 2018 to December 2020 were retrospectively collected on the Do-Care automatic anesthesia recording system and electronic medical record system (EMRS). Figure 1 illustrates the data collection process. Levels of sCr measured within the 24 h before surgery defined as the baseline. The single outcome was AKI within 7 days post-surgery. The diagnostic criteria were defined according to the Kidney Disease: Improving Global Outcomes (KDIGO) 2012 guidelines: 1) increased sCr level ≥26.5 μmol/L (≥0.3 mg/dL) within 48 h, or 2) sCr level increased >1.5 times compared to the baseline value within 7 days, or 3) urine volume <0.5 mL/kg/h^{10} for 6 consecutive hours. The patients were divided into AKI-negative and AKI-positive groups according to whether AKI occurred after the operation.

### Statistical hypothesis testing and characteristic parameter selection

According to the EuroSCORE II value and the results of previous studies on CSA-AKI risk factors,^{11}^{, }^{12}^{, }^{13}^{, }^{14} 87 characteristic parameters were included and evaluated. Supplementary Table 1 provides a detailed description of the distribution of all parameters in different groups and datasets. Variance inflation factor (VIF) and the Box–Tidwell test verified predictors. Predictors of linear relationships with logit functions of outcomes and collinearity predictors were excluded. The remaining predictors were used to establish a logistic regression model, with 3 methods implemented to select feature predictors in the development cohort. First, predictors with p < 0.05 in the univariate analysis were chosen. Second, the least absolute shrinkage and selection operator (LASSO) regularization algorithm selected potential predictors with non-zero coefficients. Third, the random forest recursive feature elimination (RF-RFE) algorithm combined with backward stepwise selection produced a compact model. The χ^{2} test was used to analyze categorical variables. The assessment of numerical variables used a Mann–Whitney U rank-sum test to conduct a univariate analysis of sample characteristics. The prediction performance of traditional logistic regression and machine learning models were then compared.

### Data preprocessing

#### Treatment of vital signs in a perioperative time series

Two time windows were selected for the study: 1. Anesthesia induction duration (t1) from the time when the patient entered the operation room to establish vital signs monitoring to 10 min after anesthesia induction; the assimilated indexes included heart rate (HR) and mean arterial pressure (MAP); 2. Operation duration (t2) from skin incision to intravenous infusion of protamine; the collection indexes included HR, MAP and mean pulmonary artery pressure (mPAP). A polynomial curve function Y(X,W) = W0+W1X+W2X2+W3X3+...+WMXM was used to fit the continuous vital signs of patients (the collection interval was 1 min). In the formula, X was defined as the timepoint between the 2 time windows (t1 and t2) from the 1^{st} minute till minute X, while Y specified the patient’s vital signs (HR/MAP/mPAP) at the corresponding time point, and W was a coefficient of timepoint X in the polynomial function. The absolute values of each coefficient (W) of the function were summed to obtain 5 characteristic parameters (Supplementary Fig. 1): the coefficient of variation of HR during anesthesia induction, the coefficient of variation of MAP during anesthesia induction, the coefficient of variation of HR during the operation, the coefficient of variation of MAP during the operation, and the coefficient of variation of mPAP during the operation.

Acute hypotensive episodes (AHEs)^{15} and hypotension duration^{16} were defined and estimated according to the relevant literature: the total duration of MAP < 65 mm Hg from the moment of entering the operation room for establishing circulation monitoring to leaving the operating room and acute hypotension incidence (MAP < 65 mm Hg for >5 min).

#### Handling missing values

For characteristic parameters with a missing ratio <10%, missing value interpolation did not effectuate any bias on the results.^{17} Deep learning technology was used to fill in the missing values.^{18} Ten parameters lacked values, and 3 had ≥10% of values lacking, which were excluded from the models (Supplementary Table 2).

#### Characteristic parameter determination

After handling missing values, VIF and Box–Tidwell test were used to verify the remaining 84 predictors, resulting in the removal of 5 predictors of linear relationships or collinearity (Supplementary Table 3). The remaining 79 predictors were used to establish the logistic regression model. One-way analysis of variance (ANOVA) and Recursive Feature Elimination and LASSO regression were applied to select feature predictors in the development cohort. Finally, 39 predictors were selected to establish machine learning models (Supplementary Fig. 1 and Supplementary Table 4), and 21 statistically significant clinical characteristics were tested (p < 0.05).

### Machine learning model establishment

This study examined a small sample high-dimensional dataset using 4 common machine learning algorithms simultaneously, including logistic regression (LR), classification decision tree (DT), RF, and eXtreme Gradient Boosting (XGBoost). The parameters (options activated) for each analysis are listed in Supplementary Table 5. The sample size of the dataset conformed to the rule of “10 events per variable” for characteristic parameters, which meets the sample size demands of machine learning. The train_test_split tool in the sklearn module randomly divided the preprocessed data into training and test sets at a ratio of 7:3, with 70% of the training sets included in the training model database. Cross-validation reduced the overfitting to some extent and allowed for obtaining critical information from the limited data. The training set data were randomly divided into 5 equal parts using the five-fold cross-validation method, with 4 used for the training model and 1 for model verification. The cycle was repeated 5 times. The model parameters were adjusted according to the area under the receiver operating characteristic (ROC) curve (AUC) to prevent overfitting of the modeling process, and the remaining 30% of the test sets were used for internal verification to evaluate the performance of the trained models on the new data.

### Model performance evaluation

Accuracy, recall and precision rates evaluated the prediction results. The F1 score was used to balance the model precision and recall, and to evaluate the performance of the binary model. The F1 score ranges from 0 to 1, with larger values indicating better results. The ROC curve and AUC were also used to evaluate model performance, and the calibration curve was used to represent the accuracy of the model prediction probability.

### Interpretive analysis

The interpretation analysis of the black-box model with the best predictive performance was done using the Python SHAP model interpretation package. Based on common theory and local interpretation, SHAP is a classic post-hoc interpretation framework that provides values to estimate the contribution of each characteristic. A SHAP value describes the weight or importance of a specific characteristic in predicting a particular data point by the model, which is the core of the parameter. Compared to traditional characteristic importance methods, SHAP has better consistency and presents a positive/negative correlation of each predictor relative to the target variable, which can be used for local and global interpretation. For local interpretability, each characteristic had its own set of Shapley values that might explain and quantify the contribution of each characteristic of each sample to the prediction, increasing the transparency and allowing clinicians to analyze the reliability of the prediction model. The global interpretation could be obtained based on the mean Shapley value of the corresponding variables in all samples as the significance value of the specific characteristic.

## Results

### Dataset description

After preprocessing the original electronic medical record data, 1110 samples were divided into AKI-positive (405 cases) and AKI-negative (705 cases) groups, according to whether AKI occurred postoperatively. The patients in the positive group had a higher mean age and a prolonged mean duration of intraoperative hypotension (MAP < 65 mm Hg). The incidence of abnormal preoperative sCr, preoperative electrocardiogram ventricular premature beat, intraoperative sudden atrial fibrillation, ventricular fibrillation, intraoperative use of intra-aortic balloon counterpulsation-assisted circulation, and intraoperative acute hypotension (MAP < 65 mm Hg for >5 min) was higher in the AKI-positive group than in the AKI-negative group. Dexmedetomidine dosage and urine volume were lower in the AKI-positive group than in the AKI-negative group.

### Model prediction results

and performance comparison

The test set (n = 333) results showed that the AUC of the RF model for positive samples (0.9, 95% confidence interval (95% CI): 0.86–0.94) was better than that for the other model groups (LR-AUC: 0.73, 95% CI: 0.67–0.79; DT-AUC: 0.75, 95% CI: 0.69–0.81; XGBoost-AUC: 0.86, 95% CI: 0.82–0.90) (Supplementary Table 6). However, the recall rate (0.74) and F1 score (0.78) performance indicators of the RF group did not differ significantly from the other integration algorithms (Supplementary Table 6 and Figure 2A), and the calibration curve indicated that the prediction probability of the RF model was rather accurate (Figure 2B). Compared with the traditional statistical binary logistic regression model (positive prediction accuracy: 0.71; AUC: 0.73, 95% CI: 0.70–0.76), the RF model and other integration algorithms showed better predictive ability for OPCAB-AKI (Figure 3).

### Interpretative analysis of the random forest model

The ranking results of the characteristic parameters showed that intraoperative urine volume contributed maximally to the RF model, followed by intraoperative sufentanil dosage, intraoperative dexmedetomidine dosage, the coefficient of variation of circulation during the induction period, the duration of intraoperative hypotension, age, preoperative baseline sCr, body mass index (BMI), and Acute Physiology, Age and Chronic Health Evaluation (APACHE) II scores (Figure 4A).

Further analysis established a positive correlation between the coefficient of variation of circulation during the induction period, the dosage of sufentanil, duration of intraoperative hypotension, preoperative baseline sCr, APACHE II score, age, and postoperative AKI occurrence. As such, the higher the standard values corresponding to these characteristics, the greater the possibility of AKI in the model samples. On the other hand, intraoperative urine volume and intraoperative dexmedetomidine dosage correlated negatively with OPCAB-AKI incidence (Figure 4B).

In the SHAP summary of the top 20 characteristics, the ordinate was characteristic and the abscissa was the SHAP value, sorted according to the mean absolute characteristic parameter value. The higher the SHAP value of the characteristic, the greater the OPCAB-AKI incidence. Each line represented a characteristic, a point represented a sample, and the color represented the characteristic value (red was high and blue was low). The positive/negative correlation between each characteristic and OPCAB-AKI was determined based on the distribution of the actual characteristic value and the SHAP value.

The SHAP value was used to analyze how the top-ranked characteristics in the RF black-box model affected the prediction results by comparing and quantifying the linear correlation between the SHAP values of each characteristic and the risk outcomes. The results showed that the OPCAB-AKI risk significantly increased when 3 consecutive characteristics reached specific thresholds: age >55 years, APACHE II score >19 points and BMI > 28 kg/m^{2} (Figure 5).

The SHAP dependency analysis revealed the importance and direction of the influence of the 2 pairs of characteristics on the model output, and their complex nonlinear effects were obtained and described. The results showed that the risk of OPCAB-AKI increased significantly with prolonged intraoperative hypotension duration and decreased intraoperative urine volume. Accordingly, the OPCAB-AKI risk was low at a short intraoperative hypotension duration (SHAP < 0) and high intraoperative urine volume (approx. 700 mL) (Figure 6A). A high dexmedetomidine dose was positively associated with increased intraoperative urine volume, which corresponded to a low risk of OPCAB-AKI (Figure 6B).

## Discussion

This retrospective cohort study employed a machine learning method to establish a risk prediction model for OPCAB-AKI using a small sample (1110 patients) of perioperative data collected from a single center over the course of 3 years. The results showed that the prediction effect of the integrated machine learning model was better than that of a traditional LR model, and the RF model showed the best prediction performance after integrating the intraoperative hemodynamic parameters (AUC = 0.9, 95% CI: 0.86–0.94). This helped the clinicians make an early prediction and choose an appropriate AKI intervention before the end of surgery.

Among the screened OPCAB-AKI influencing factors, the top 5 items (intraoperative urine volume, intraoperative sufentanil dosage, intraoperative dexmedetomidine dosage, the coefficient of variation of MAP during the induction period, and intraoperative hypotension duration) are the intraoperative indicators that are not a primary concern in the classical prediction models, and the remaining items, such as age, preoperative baseline sCr, BMI, and APACHE II score, were the known CSA-AKI influencing factors.^{19} A single-center cohort study of patients undergoing any surgery showed that 40% of them were assessed as low risk for AKI by classical models but reassessed as high risk by machine learning models after incorporating intraoperative factors.^{20} Compared to classical models, the results of this study highlight the impact of acute intraoperative pathophysiological reactions on renal function and the potential benefits of close monitoring and timely intervention.^{21}

Intraoperative urine volume (with a mean SHAP value weight of 2.87%) was a major influencing factor in the OPCAB-AKI prediction model established in this study. Previous studies on CSA-AKI have shown that urine volume predicts AKI after cardiopulmonary bypass surgery.^{22} This phenomenon is consistent with the results of the present study, suggesting that real-time monitoring and maintenance of adequate intraoperative urine volume could protect renal function in OPCAB patients.^{23}

Two other intraoperative influencing factors in the model were the coefficient of variation of circulation during induction and intraoperative hypotension duration. Some studies have shown that the risk of AKI is independently associated with intraoperative hypotension,^{24}^{, }^{25} and hemodynamic fluctuation is a major risk factor for inducing postoperative AKI.^{26}^{, }^{27}^{, }^{28} Perioperative supportive care for MAP could reduce the risk of postoperative complications such as AKI.^{29} Off-pump coronary artery bypass grafting has unique hemodynamic characteristics and is prone to severe hemodynamic fluctuations during specific periods, such as anesthesia induction, fixation and compression of coronary arteries.^{30} Since the blood pressure or HR at a single timepoint could not reflect the significance of the patient’s hemodynamic fluctuations over time, this study used a polynomial higher-order function fitting curve to represent continuous intraoperative circulation indicators. The results suggest that intraoperative hemodynamic fluctuations represented by the coefficient of variation of circulation can accurately predict OPCAB-AKI.

The impact of general anesthetics on postoperative AKI was rarely considered in previous models. In this study, dexmedetomidine was shown to be a critical influencing factor (negatively correlated) of OPCAB-AKI. A randomized controlled trial by Zhai et al. demonstrated that dexmedetomidine reduced CSA-AKI incidence and severity in patients undergoing cardiac surgery.^{31} Another meta-analysis concluded that dexmedetomidine infusion could be used as a preventive strategy for CSA-AKI. However, they did not specify the optimal dose or duration of intravenous dexmedetomidine infusion.^{32}

The model established in this study suggested that an excessive intraoperative sufentanil dosage might be a risk factor for inducing OPCAB-AKI. Based on the concept of Enhanced Recovery After Surgery (ERAS®), low-opioid anesthesia regimens have been widely accepted by clinicians.^{33}^{, }^{34} Although there was no evidence that low-opioid anesthesia reduced the risk of CSA-AKI or OPCAB-AKI, the results of the present study suggest that reducing intraoperative opioid dosage exerts a protective effect on the renal function of OPCAB patients.

Several studies have identified a high BMI and advanced age as major OPCAB-AKI risk factors.^{35} Moreover, the APACHE II score is prognostic for critically ill patients immediately after admission to the intensive care unit (ICU), and was used in this study to replace laboratory indicators, such as hemoglobin, to predict postoperative AKI risk. After conducting the interpretive analysis of the black-box model with the use of the classic SHAP tool, the present study identified 3 continuous characteristics, including age, APACHE II score and BMI, that affected the critical threshold of the RF model for predicting the risk of OPCAB-AKI. These factors helped the clinicians to understand the influence of characteristics on the prediction outcome.

Although novel serum and urine biomarkers can predict AKI,^{36} there are disadvantages to expensive tests, repeated tests during diagnosis and increased hospitalization costs. The main risk factors involved in this research model were routine items that were easy to collect and did not increase the medical burden.

The interaction analysis showed mutual influences among several OPCAB-AKI factors. For example, high intraoperative dexmedetomidine dosage and high intraoperative urine volume were associated with a low risk of postoperative AKI. Although the results cannot determine causality among factors, they suggested putative changing trends in specific environments, which were not captured by most analytical models. Combining this information with the clinical experience of the doctors aided in making individualized clinical decisions at an early stage.

The advantage of this study was that five-fold cross-validation was used to construct a stable performance model. Predictive variables used in the study are readily available in clinical practice, ensuring model applicability. The model exhibits clinical interpretability and predictive reliability, which could help doctors understand the interaction between variables and targets as well as between 2 variables.

### Limitations

The limitations of this study include its retrospective design, a small sample of subjects from the same tertiary general hospital, and lack of evaluation of different AKI stages. As such, the results require further external validation before they can be generalized, and additional prospective trials are needed to assess their clinical utility.

The major characteristics and inflection points found in this study could be used as early signs of OPCAB-AKI risk. However, whether these could act as a reference for clinical diagnosis in the recommended range needs further substantiation. If found to be externally valid, clinicians might incorporate the available web-based application into clinical practice to aid decision-making and optimize preoperative prevention efforts.

## Conclusions

The ensemble learning algorithm represented by RF predicted OPCAB-AKI. Intraoperative urine volume, circulatory fluctuation during the induction period, intraoperative dexmedetomidine dosage, intraoperative hypotension duration, preoperative baseline sCr, APACHE II score, BMI, and age were the main factors influencing OPCAB-AKI. An explanatory framework increased model transparency, allowing clinicians to analyze the reliability of the predictive models.

### Data availability statement

The datasets used and/or analyzed during the current study are available from the corresponding author upon reasonable request.

### Supplementary data

The supplementary materials are available at https://doi.org/10.5281/zenodo.8128783. The package contains the following files:

Supplementary Fig. 1. Polynomial curve fitting of vital signs.

Supplementary Table 1. Distribution of each characteristic in the base dataset.

Supplementary Table 2. Parameters lacking values and parameters excluded from the models.

Supplementary Table 3. Predictors of linear relationships or collinearity that were removed.

Supplementary Table 4. Predictors selected and used in machine learning modeling.

Supplementary Table 5. The parameters (options activated) of each analysis.

Supplementary Table 6. Machine learning model performance results.