The credibility of the results from a randomized trial depends strongly on the quality of the trial’s study design, endpoint definitions, and statistical analysis. Pre-specification in a study protocol (and in a public trial register) is crucial. Without this, endpoints and analyses can be modified, using random variation in order to produce statistical results that appears to be more favourable for the investigator.

The phenomenon is, of course, well known, and many guidelines and recommendations have also been published (1, 2). Nevertheless, many published reports show that this practice of embellishing results is surprisingly common (3, 4). In many cases, the pre-specified documentation is inadequate (5), based on unclear definitions that are based on confusing outcomes (variables) with endpoints (parameters), unspecified length of follow up, vague or ambiguous descriptions of measurements, ignoring to address multiplicity issues, etc. However, without clear and specific operational definitions in the statistical analysis plan (a part of the study protocol), the analysis requires decisions based on post-hoc considerations that undermine the trial’s credibility and casts doubt about the accuracy of the findings.

**References**

1. Evans S. When and How Can Endpoints Be Changed after Initiation of a Randomized Clinical Trial? PLoS Clin Trials. 2007;2:e18.

2. Kahan BC, Forbes G, Suzie Cro S. How to design a pre-specified statistical analysis approach to limit p-hacking in clinical trials: the Pre-SPEC framework. BMC Med 2020;18:253.

3. Serpas VJ, Raghav KP, Halperin DM, Yao J, Overman MJ. Discrepancies in endpoints between clinical trial protocols and clinical trial registration in randomized trials in oncology. BMC Medical Research Methodology 2018;18:169

4. Shepshelovich D, Yahav D, Tibau A, Amir E, Assessment of frequency and reporting of design changes among clinical drug trials published in influential medical journals.

Eur J Intern Med. 2020;71:45-49.

5. Greenberg L, Jairath V, Pearse R, Kahan BC. Pre-specification of statistical analysis approaches in published clinical trial protocols was inadequate. J Clin Epidemiol 2018;101:53-60.

]]>Technical terms are becoming more and more frequent in medical research reports, especially terms developed for use in randomised trials. Without methodological insights, it is becoming increasingly harder to distinguish between observational and experimental studies (1). Parts of this usage probably reflect spin but not all.

For example, terms such as “correlation”, “endpoint”, and “adverse event” are used more or less indiscriminately without adherence to their proper definitions, and technical terms usually have very clear definitions. For example, an adverse event refers to any untoward medical occurrence in a patient during a trial. The event does not need to be causally related to the treatment. An endpoint refers to a parameter, not to a variable, and a correlation is about a linear relationship, not a general association. For example, the variables x and y, where x = {-3, -2, -1, 0, 1, 2, 3} and y = x² are completely related but not correlated. The correlation coefficient is 0.

The best way to avoid misunderstanding technical terms is to use their proper technical definition. The ICMJE (2) therefore recommends avoiding nontechnical uses of technical terms. Check your terminology.

**References**

1. Koletsi D, Pandis N, Polychronopoulou A, Eliades T. What’s in a title? An assessment of whether randomized controlled trial in the title means that it is one. Am J Dentofacial Orthop 2012:141:679-685.

2. International Committee of Medical Journal Editors. Recommendations for the Conduct, Reporting, Editing and Publication of Scholarly Work in Medical Journals, May 25 2021. Available from: http://www.ICMJE.org.

]]>When programming Stata, it is, in some cases, necessary to call an ado file from within a do file, for example in order to calculate p-values and confidence intervals with the information from an estimation command. An alternative can then be to include a python function in the do-file. Here is an example.

set more off ****************************************************************** * pcalc, a python function called with an estimate, its standard * * error and degrees of freedom and returning the p-value and 95% * * confidence interval * ****************************************************************** python: from scipy.stats import t def pcalc(b,se,df): t = abs(b/se) p = "p = " + str(round(2*(1-t.cdf(t,df)),4)) l1 = round(b-t.ppf(0.025,df)*se,2) l2 = round(b+t.ppf(0.025,df)*se,2) ll = min(l1,l2) ul = max(l1,l2) ci = "(" + str(ll) + " to " + str(ul) + ")" return (p,ci) end **************** * Get the data * **************** sysuse auto ************************** * Fit a regression model * ************************** regress price length weight ******************************************* * Collect information about the estimates * ******************************************* mat b1=e(b) mat v1=e(V) local b = b1[1,1] local se = v1[1,1]^.5 local df = e(df_r) **************************************************************** * Call the python function and retreive the p-value for length * **************************************************************** python: print(pcalc(`b',`se',`df')[0]) ***************************************************************** * Call the python function and retrieve the confidence interval * ***************************************************************** python: print(pcalc(`b',`se',`df')[1]) ************ * Finished * ************]]>

For those familiar with SQL, Stata’s lack of SQL support (apart from ODBC) can be perceived as a problem. However, Stata’s integration with Python offers a solution. Here are examples with one Stata script for exporting data to an SQLite database and another one for querying an existing SQLite database. In each case, just a few lines of coding are required. First, to read a Stata data file and export the data in the form of an SQLite database, this code can be used.

/* Convert a Stata datafile to an SQLite database */ local filename "/home/ranstam/test.dta" local dbname "/home/ranstam/test.db" local tablename "file1" python: import sqlite3, pandas as pd conn = sqlite3.connect('`dbname'') df = pd.read_stata('`filename'') df.to_sql('`tablename'', conn, if_exists='replace', index=False) end

In Stata, this code reads and queries an SQLite database.

/* Import SQLite data */ local dbname "/home/ranstam/test.db" local filename "/home/ranstam/test.dta" local query "select * from file1" python: import sqlite3, pandas as pd conn = sqlite3.connect('`dbname'') df = pd.read_sql('`query'', conn) df.to_stata('`filename'') end use "`filename'", clear]]>

It is quite easy to get a manuscript accepted in a scientific journal if only a few basic requirements are met, i.e. a reasonably interesting research question, a sound study design, and generally acceptable results. Unfortunately, the chance of succeeding with this in a randomised trial is not great. Randomised trials are expensive, time-consuming, and require extensive knowledge of regulatory provisions and scientific guidelines. Even worse, as the outcome cannot be predicted, there may not be any return on the invested time and money.

A much more successful, simpler, cheaper and quicker alternative is to use a convenience sample, data already collected for another purpose, such as register data or a cohort of patients already studied for something else. Only two additional tasks are then required, a) to identify a couple of statistically significant differences, which usually is not difficult with access to statistical software, and b) to formulate an interesting, preferably politically correct, explanation of the observed differences. The result is a generated hypothesis. When the scientific report is written, the reasoning is just reverted. The hypothesis is presented first and the results of the statistical analysis are used to “demonstrate” that the hypothesis is correct.

This approach to scientific research is known as HARKing (1), and it is surprisingly common. One drawback of the approach is that the findings are prone to primarily reflect bias or sampling variation and can usually not be reproduced in well-design and well-performed confirmatory studies. However, many authors don’t wish to engage in reproducing results as they consider it more meritorious to generate new hypotheses.

**References**

1. Kerr NL. HARKing: hypothesizing after the results are known. Pers Soc Psychol Rev 1998;2:196-217.

]]>Two variables are associated, statistically dependent, if one of the variables says something about the other, and this can range from nothing to everything. Correlated variables are always dependent, but dependent variables are not necessarily correlated because correlation refers to linear relationships. It is, for example, easy to show that the two mathematically coupled variables X and Y, where Y = X² and completely dependent, are not correlated when x = -3, -2, -1, 0, 1, 2, 3.

Many authors use the word association when describing relationships between risk factors and disease or between treatments and recovery. The reason for this may be that they wish to avoid being accused of interpreting observed correlations in terms of cause and effect. It is well known that this is a controversial subject.

However, in order to study risk factor and treatment effects, the statistical analysis needs to be based on assumptions regarding cause and effect, i.e. in terms of a statistical model Y = f(X), that the studied outcome, Y, is an effect of exposure to a factor X. Moreover, if Y = f(X, Z) where Z is a biasing factor, the effect estimate must be adjusted for the influence of Z, and this requires further cause-effect relations. Otherwise attempts to reduce confounding bias may induce adjustment bias. This would happen when a mediator or collider is included in the statistical model instead of a confounder (mediators, colliders and confounders being defined by different cause-effect relationships).

The word association is often used as a camouflage for avoiding discussing considerations regarding technical issues about used effect estimates (RR, HR, OR, IR, SMR, etc.) and for avoiding presentation and motivation of assumptions underlying the confounding adjustment.

]]>It is important to check the statistical terminology in manuscripts as this often reveals confusion and misunderstandings. However, the terminology is often not prioritized by reviewers, and corrections are not always appreciated. Nevertheless, the ideal scientific writing is clear, specific, and unambiguous. The reader does not have to guess what the author really means, for example when using terms such as variable, parameter, and quartile. The correct definitions of statistical terms can be found in The International Statistical Institute. The Oxford Dictionary of Statistical Terms. Oxford University Press, New York 2003

In order to facilitate my own checking of the statistical terminology in manuscripts, I use the Linux command-line utility pdfgrep, a program that scans one or more pdf documents for defined keywords and returns information on detected occurrences. A Window version of the program exists, but the Linux version can be run directly using the Windows subsystem for Linux (WSL).

I have written a short shell file in bash to facilitate my terminology checks. The routine calls pdfgrep and searches the pdf manuscripts in the specified folder for the keywords defined in a separate text-file.

#!/bin/bash if [ "$1" == "-h" ] || [ "$1" == "--help" ]; then echo "Usage: statrev.sh [OPTION]" echo echo "With no option, matching terms are listed." echo " -c, --context describe each separate occurence of the matched term in its context" echo " -t, --terms list matching terms" echo " -h, --help display this help and exit" echo echo "This program matches the terms specified in $HOME/statrevterms.cfg with the contents of the pdf files located in the $HOME/Downloads/ directory, writing to OUTPUT (or standard output)." echo exit 1 fi cd $HOME/Downloads if [ ! -f *.pdf ] ; then echo "There are no pdf files in $HOME/Downloads/." echo exit 1 fi echo echo "Statrev Terminology Check" echo "=========================" if [ -z $1 ] || [ "$1" == "-t" ] || [ "$1" == "--terms" ] ; then echo echo "Matched terms" echo "-------------" echo pdfgrep -Hio -f$HOME/statrevterms.cfg *.pdf | sort | uniq -ic | tr '[:upper:]' '[:lower:]' echo else echo echo "Context" echo "-------" echo pdfgrep -FHin -C1 -f$HOME/statrevterms.cfg *.pdf fi echo echo "Finish" echo

The keywords I wish to check are registered in separate text-file, statrevterms.cfg. The content of this file is currently:

### Descriptive ### range tertile quartile quintile ### Models ### multivariateR2 predictor prediction demonstrate strongly univariate stepwise propensity ### Study design ### cohort cross-sectional case-control ### Parameters ### odds ratio ### Epidemiology ### prevalence incidence mortality rate fatality rate ### Randomised trials ### primary endpoint primary outcome adverse event crossover cross-over multiplicity non-inferiority equivalence ### Results ### not differ no difference no effect no significant difference a significant difference statistical difference normally distributed normal distribution ### Miscellaneous ### goodness-of-fit R2 α β association predictor prediction demonstrate strongly ### Finished ###

Running the shell file with a pdf manuscript in the folder provides useful output for a simple and quick check that the manuscript is based on correct terminology. The files can be downloaded here.

]]>The statistical analysis of a dataset is always dependent on how the data have been collected. An experimental study, e.g. a randomised clinical trial, can be designed in such a way that validity problems (selection bias, misclassification bias, confounding bias) are prevented, for example using concealed treatment allocation, randomisation of patients to treatment, and masking of the treatment. The statistical analysis can then focus entirely on precision issues such as estimating sample size and statistical power, testing null hypotheses and estimating effect size. Such interventions are, however, impossible in an observational study. Validity issues need instead to be addressed in the statistical analysis, usually by adjustments, but this can be successful only when the analyst knows what to adjust for and if the necessary data have been available in the dataset. This is generally not the case.

One consequence of these differences in study design is that the results from experimental studies are considered more accurate and reliable than results from observational studies. Another consequence is that even with the same statistical methods, different analyses strategies may be necessary. For example, regression analyses are often used in an experimental study to account for randomisation stratification factors and for baseline value when estimating change from baseline, but the purpose of using regression analysis in an observational study is usually to adjust estimated effect sizes from confounding by association with competing risk factors.

Common mistakes in manuscripts include the usage of trial terminology (e.g. primary and secondary outcomes and intention-to-treat) in observational studies, where these terms have no relevant definition, and the usage of observational analysis strategies (confounding adjustments) in randomised trials, where confounding already has been prevented by randomisation.

]]>In order to communicate without misunderstandings, it is important to realise that in contrast to spoken language, a combination of words; gestures; facial expressions; and with possibilities for immediate questioning and correcting, written language relies entirely on usage of the right words and compliance with the grammatical rules for combining these into understandable sentences at the time of writing. Furthermore, scientific communication is based on the tenet that author and reader have identical definitions of the used words. Otherwise, confusion can be expected.

Statistical terminology is well defined. The terms are described in statistical dictionaries, the most well-known probably being published by The International Statistical Institute (The Oxford Dictionary of Statistical Terms. Oxford University Press, New York 2003). However, statistical terminology is systematically misused in medical research. A sort of pigeon terminology has arisen. For example, terms such as range, quartile, and correlation, are almost always used incorrectly. The range is not the smallest and greatest values but the difference between these. A quartile is one of the three, not four, values (Q1, Q2, Q3) that divide the ordered observations into four parts with equal size (Q2 is also the median). A correlation is not a general association but a measure of the strength of the statistical relationship between two variables. When used incorrectly, the terminology needs to be corrected.

Other terms, such as primary endpoint, adverse event, and intention to treat, play important roles in randomised trials, but are increasingly often used in observational studies where their meaning is unclear. The primary endpoint is part of a strategy for addressing multiplicity issues in confirmatory trials but observational studies are not confirmatory. Adverse events refer to any untoward medical occurrence in a patient. The event does not necessarily have to have a causal relationship with the studies treatment, and such information is usually not collected in an observational study. Intention to treat refers to a feature of the study design that cannot be achieved in an observational study. The use of these trial-related terms in manuscripts presenting observational studies is thus obscure. An explanation of how the terms should be interpreted and a motivation for their use is needed. The ICMJE recommendation is to “Avoid nontechnical uses of technical terms in statistics”.

Correcting the terminology is not generally popular among physicians, but it is an important part of a statistical review.

]]>