Inadequate pre-specification of the statistical analysis The credibility of the results from a randomized trial depends strongly on the quality of the trial's study design, endpoint definitions, and statistical analysis. Pre-specification in a study protocol (and in a public trial register) is crucial. Without this, endpoints and analyses can be modified, using random variation in…

Nontechnical use of technical terms Technical terms are becoming more and more frequent in medical research reports, especially terms developed for use in randomised trials. Without methodological insights, it is becoming increasingly harder to distinguish between observational and experimental studies (1). Parts of this usage probably reflect spin but not all. For example, terms such…

Including a python function in a Stata do-file When programming Stata, it is, in some cases, necessary to call an ado file from within a do file, for example in order to calculate p-values and confidence intervals with the information from an estimation command. An alternative can then be to include a python function in…

Stata and SQLite For those familiar with SQL, Stata's lack of SQL support (apart from ODBC) can be perceived as a problem. However, Stata's integration with Python offers a solution. Here are examples with one Stata script for exporting data to an SQLite database and another one for querying an existing SQLite database. In each…


HARKing It is quite easy to get a manuscript accepted in a scientific journal if only a few basic requirements are met, i.e. a reasonably interesting research question, a sound study design, and generally acceptable results. Unfortunately, the chance of succeeding with this in a randomised trial is not great. Randomised trials are expensive, time-consuming,…


Associations Two variables are associated, statistically dependent, if one of the variables says something about the other, and this can range from nothing to everything. Correlated variables are always dependent, but dependent variables are not necessarily correlated because correlation refers to linear relationships. It is, for example, easy to show that the two mathematically coupled…


pdfgrep It is important to check the statistical terminology in manuscripts as this often reveals confusion and misunderstandings. However, the terminology is often not prioritized by reviewers, and corrections are not always appreciated. Nevertheless, the ideal scientific writing is clear, specific, and unambiguous. The reader does not have to guess what the author really means,…

Experimental and observational studies The statistical analysis of a dataset is always dependent on how the data have been collected. An experimental study, e.g. a randomised clinical trial, can be designed in such a way that validity problems (selection bias, misclassification bias, confounding bias) are prevented, for example using concealed treatment allocation, randomisation of patients…


Terminology In order to communicate without misunderstandings, it is important to realise that in contrast to spoken language, a combination of words; gestures; facial expressions; and with possibilities for immediate questioning and correcting, written language relies entirely on usage of the right words and compliance with the grammatical rules for combining these into understandable sentences…