Abstract
The updated statistical reporting guidelines for Advances in Clinical and Experimental Medicine (ACEM) streamline the way in which authors describe their statistical analyses and results. In response to the rising popularity of advanced techniques – such as Bayesian statistics and machine learning – the revisions balance methodological rigor with article readability.
Key words: statistical analysis guidelines, test assumptions, machine learning, Bayesian statistics
Introduction
With the growth of data in medical and other sciences and the development of techniques for their analysis (e.g. nonparametric tests and models, machine learning and methods used in bioinformatics), the importance of the correctness of statistical analysis of collected data and the appropriateness of its description and obtained results is also growing. Unfortunately, many studies show numerous errors and deficiencies in this area.1, 2 Therefore, various efforts are made with the ultimate goal of achieving a sufficiently high level of description of the statistical analysis methods used and the presentation of their results. A sufficiently high level means meeting 2 basic requirements: enabling each reader to verify the truthfulness of the findings and to repeat the research in an identical manner. Among the above-mentioned activities are:
– publishing articles evaluating scientific articles in terms of the statistical analyses used therein,2, 3, 4
– publishing general recommendations and guides on statistical analysis addressed to all interested parties, including authors, reviewers and editors of journals,5, 6, 7, 8
– formulating guidelines by individual editorial offices for authors who intend to publish articles,9, 10
– integrating interested parties into organizations aimed at coordinating activities in the field of setting standards for the use of statistical analyses in scientific publications, e.g., the World Association of Medical Editors (WAME) (https://wame.org/index.php) and International Committee of Medical Journal Editors (ICMJE) (https://www.icmje.org).
The Advances in Clinical and Experimental Medicine (ACEM) statistical guidelines provide a set of clear instructions, supported by examples, to ensure the journal’s requirements are as understandable as possible and aligned with the recommendations outlined in the above sources. Among other things, emphasis is placed on the presentation of the results of verification of test or model assumptions, which is the basis for assessing the reliability of the results obtained using them. However, sometimes a need for changes is noticed, resulting from the development of data analysis methods, as well as from critical opinions regarding these requirements, including from authors of publications and members of the editorial board. This article presents new, slightly modified guidelines for statistical analyses in articles published in ACEM.
The primary objective of the statistical analysis guidelines is to help authors prepare manuscripts that use appropriate methods for the given research problem. The changes to the guidelines were made to make it easier for authors to prepare manuscripts correctly in terms of statistical analysis, and to enable editors to prepare the final version of articles more quickly.
The primary objective of the statistical analysis guidelines is to assist authors in preparing articles that employ suitable methods for the specific research issue. Therefore, the requirement to publish test values has been removed. In certain cases, rather than requiring the inclusion of ‘technical’ details of a given analysis, authors may be asked to submit unedited, raw reports that are automatically generated by statistical analysis software. These reports will not be included in the publication but will allow statistical reviewers to assess the accuracy of the relevant part of the statistical analysis.
The guidelines have also been supplemented with requirements for the use of Bayesian statistics and machine learning.
List of ACEM’s requirements concerning statistical analyses
Statistical analysis description
Introductory remarks
a) All of the information regarding statistical analyses should be collected in the “Statistical analyses” section and enable a reader to repeat the analyses in exactly the same way.
b) A set of the essential statistical analysis results (see Statistical analysis description and Introductory remarks for details) should be presented in the main paper body.
c) The authors may be requested to provide additional detailed results of the analyses to assess the correctness of the analysis. In such cases, raw reports from the statistical software are acceptable, as they are not intended for publication.
d) Results of checking test or model assumptions must also be included in the form of simplified supplementary materials (point I.3), which should present only essential information, such as a list of variables that meet the given assumption and those that do not, along with p-values or small-sized graphs (e.g., histograms, scatterplots, etc.). Other details may be requested by reviewers, if necessary, solely for the purpose of verifying the appropriateness of a given test, etc.
Providing a list of statistical analysis tools and explaining how a given tool was used, e.g.:
a) χ2 test: specify which test type was used: a test of independence or consistency, Pearson’s χ2 test or maximum likelihood, and whether Yates’s correction was used.
b) Analysis of variance (ANOVA): explain which type was used (e.g., one-way, two-way, repeated measures, nested) and which correction was applied when the assumption of sphericity/homogeneity of variance was not met.
c) Student’s t-test: Determine whether the test was used for matched pairs or for independent groups and whether the correction for heterogeneity of variances (e.g., Welch’s) was applied.
d) Multivariate regression: Explain whether and how any predictor selection was made, including the procedure and threshold values used. If a selection mode was applied, also provide the initial set of predictors.
e) Meta-analysis should include the following items:
• a table with selected publications along with effects size or other measures;
• the type of the model used (fixed or random) with the criteria for making a choice (prior analysis);
• the summary of the effect size integration (forest plots);
• heterogeneity analysis;
• sensitivity analysis;
• paper quality evaluation;
• publication bias assessment using funnel plots (in each case), standard Begg’s and Mazumdar’s test or Egger’s test (when n > 10, in both cases).
f) Statistical analysis tools available in programming languages such as Python and R:
• provide complete list of the packages along with their version number;
• provide a key part of the script in a supplementary file or list values of the given function’s parameters, when a given analysis may be run in many ways.
g) Bayesian statistics:
• Clearly state the Bayesian approach – explicitly mention that Bayesian inference was used and justify why Bayesian methods were chosen over frequentist alternatives. Explain whether a full or empirical Bayes approach was used.
• Define the prior distributions (gaussian/beta/gamma/Dirichlet/Laplace etc.), declaring the type of priors (informative/weakly-informative/non-informative) for each parameter, referring to the literature/domain knowledge based on which the a priori distribution was assumed – in case of using informative or weakly-informative priors. State whether the results of the analysis changed under different prior assumptions; the Statistical Editor can demand results from sensitivity analysis.
• Specify the likelihood function used to generate data.
• Feature whole posterior distributions instead of showing only point estimates. Use credible intervals instead of confidence intervals (CIs).
• Show key Bayesian metrics (posterior mean, median or mode etc.). Use statements such as “the probability of effect being >0 is X%” or “with probability of p, the parameter is between X1 and X2”. Abstain from using frequentist language (“statistical significance”, etc.).
• Perform model diagnostics. Perform posterior predictive checks. Simulate new data from posterior and compare it with observed data. Check for model fit with use of posterior predictive plots. In case of Markov Chain Monte Carlo (MCMC) methods, declare the used software (Stan/JAGS/PyMC3 etc.), number of chains, iterations, burn-in period. Report R-hat (convergence), and effective sample size,
• For visualization, use appropriate types of plots (density plots, histograms, ridge plots, violin plots, trace plots).
• Compare the different models and select the most suitable one for further description in the manuscript (e.g., Bayes factor, Watanabe–Akaike information criterion, leave-one-out cross-validation).
• Discuss the clinical implications of the model, referring to real-life situations
h) Machine learning methods:
• Explicitly state the purpose of the model. Was it used for visualization, prediction, feature engineering, modelling of some phenomena? For methods used solely for visualization or exploratory purposes (e.g., t-distributed stochastic neighbor embedding (t-SNE) and similar techniques), the same level of methodological rigor and detailed reporting is not required as for predictive or inferential models. However, the purpose of the method must be explicitly stated, and overinterpretation of visualizations should be avoided.
• Explain the rationale for choosing a particular machine learning model. In case of using the less-explainable (“black box”) models, use techniques to interpret the results, such as Shapley (SHAP) values. If multiple models are available for the task, the optimal one should be selected for further interpretation based on the goodness of fit or performance metrics (depending on the aim of the study).
• Describe how was the data used: How was it split into training/validation/test sets? Was stratification or other kind of sampling used?
• Specify the data transformations leading to the creation of the input vector. Provide information on how each variable was scaled.
• Check for potential data leakage or the “perfect separation” problem, and address these issues if they arise.
• Provide a list of hyperparameters for the models and a rationale for setting them on the chosen levels. Describe calibration procedures if they were performed.
• Each model presented in the manuscript should be evaluated using the appropriate performance metrics: R-squared or mean absolute percentage error (MAPE) for regression models; a confusion matrix for classification models; and mutual information or the Calinski–Harabasz (C–H) index for clustering models. Additionally, the receiver operating characteristic (ROC) curve should be used to describe the performance of binary classifiers.
• When presenting multiple models, compare them using a common metric. Ideally, this metric should be clinically relevant.
• Upload the final model as a set of weights or as a supplementary file. Alternatively, especially when comparing multiple models, provide the code (including the random seed used) that was used to train the models. If neither of these solutions is possible, please provide an explanation.
• Where possible (e.g., for decision trees, regression trees and simple artificial neural networks), provide a visualisation of the model architecture.
• We do not accept manuscripts regarding automated data mining, natural language processing, complex agent systems or generative artificial intelligence.
Checking the assumptions of the models or statistical tests used
The results of the tests used to verify whether assumptions are met should be included in the supplementary materials. The list of test examples provided below contains only the most commonly used methods and should be expanded as appropriate to reflect any additional tests employed.
a) Testing the normality of distributions: The choice of test should be related to the sample size (N) (Table 1).
b) Student’s t-test, parametric ANOVA: Data normal distribution and variance homogeneity.
c) ANOVA (parametric) for repeated measurements: Data normal distribution, between-group variance and in-group variance (sphericity).
d) Pearson correlation: Normal data distribution, linear relationships between the variables.
e) Linear regression:
• linear relationship between predictors and the response variable (small-sized scatter plots);
• no multicollinearity, e.g., using variance inflation factor (VIF);
• homoscedasticity, i.e., the residual constant variance (Breusch–Pagan test);
• normal distribution of model residuals.
f) Logistic regression:
• a linear relationship between predictors and the logit of the response variable (Box–Tidwell test or visual assessment based on an appropriate plot);
• no multicollinearity among explanatory variables e.g., using VIF or generalized variance inflation factor (GVIF);
• no extreme outliers.
g) Cox regression:
• proportional hazard assumption;
• linearity: the log-hazard function is linearly related to all predictors;
• no multicollinearity among the predictors.
Family of hypotheses
When applicable, apply the appropriate correction to the p-values.
Data presentation in tables or graphs
For a normal distribution, the correct measures of central tendency are the mean and standard deviation (SD). Otherwise, the median and the first and third quartiles (Q1 and Q3) should be used. If n is greater than 8, use the quartiles; otherwise, use the min–max range.
Presentation of statistical analysis results
General rules
a) Presentation of a given test result should include (excluding post hoc test – see below):
• test name,
• degrees of freedom (df) or sample/group sizes (n) when df is not applicable,
• statistical significance (p) with an accuracy of 3 decimal places; however, if p is less than 0.001, we use the expression “p < 0.001”, and when p is greater than 0.999, we use the expression “p > 0.999”.
b) Post hoc test results:
• only p-values are required;
• present all p-values as 3 decimal places, both below and above the alpha level;
• when the table is large or the number of tables with post hoc results is large, they can be included as supplementary materials.
c) Include effect size measures, always with their CIs, such as odds ratio (OR), hazard ratio (HR), beta coefficients, etc. If applicable, eplain the method of CI calculation.
d) Avoid using unreliable methods, such as stepwise regression, selecting predictors for multiple regression based on the results of univariate regressions, and post hoc least significant difference (LSD) tests (which are too liberal). The use of such methods requires justification.
The test results should be reported following the examples below:
a) For Student’s t-test: “t-test: df = 26; p = 0.004”;
b) For analysis of variance test: “ANOVA: df = (2, 150); p = 0.076”;
c) For Mann–Whitney test (Wilcoxon rank sum test): “M–W test: n = (12, 15); p = 0.010”;
d) For Kruskal–Wallis test: “K–W test: n = (14, 15, 18), p = 0.211”;
e) For χ2 test: “χ2 test: df = 1, p = 0.032”;
f) For Pearson’s correlation: “r = 0.62, n = 45, p = 0.004”;
g) For multivariate regression models:
• Coefficients (B) and/or standardized coefficient (β), their CIs, and p-values, e.g.: Table 2
• Overall model statistics, including the model name and specification, the p-value, and an appropriate goodness-of-fit measure (e.g., adjusted R2, Nagelkerke R2, Concordance Index, etc.) An example: p = 0.005,
adj-R2 = 0.23.
Tables concerning statistical analysis in the published supplementary materials
a) In accordance with general guidelines, they should contain only necessary content (i.e., variable names, df or n, and p).
b) Raw reports produced automatically by statistical programs are unacceptable because they contain many unnecessary elements.
Rules for preparing figures (graphs)
Figures should be titled and explained (preferably in the footer) so that they are understandable without referring to other parts of the publication
Detailed rules
a) Each axis label should contain the units when applicable.
b) Bar plots can only be used to present frequency or count data.
c) If a sample size is less than 10, all data should be presented as dots along with the measure of central tendency appropriate for a given test (usually median).
d) A line representing the relationship should be presented alongside the CI.
e) If statistical significance is indicated in the chart by symbols (e.g., * for p < 0.05), the tables containing all the elements required for a given statistical analysis must be referenced.
Conclusions
As data analysis methods continue to evolve, so too do the expectations for precision – both in the description of methods used and in the presentation of results. The updated guidelines, informed by past experience with submitted manuscripts, aim to strike a sensible balance. The goal is to reconcile the demand for methodological rigor with the need for clarity and readability – essential for a fair and effective peer review process. This balance involves limiting the scope of results presented in the main text, while allowing draft versions of full results to be submitted to the editors at the reviewers’ request – without requiring them to be fully polished for publication.


