2KMM's blog official opening and our participation in *R in Pharma 2020* conference

2KMM showed up at the 'R in Pharma 2020' conference!

“A journey of a thousand miles begins with a single step” says Lao-Tzu /Chinese proverb/.

With this blog post we start a unique series on our company's blog. From this day forward, we will be sharing interesting (in our opinion) stories from the field of clinical biostatistics, data management in clinical trials and reporting news from a pioneering journey we started a few years ago, which is about bringing the R statistical package and other Open Source solutions to Clinical Research. We will show our measures in doing the usual statistical job with R. This is not as straight forward as many might think. Not so rarely were odds against us and not so rarely did solutions need some serious effort.

Why do we find our work pioneering? Admittedly, there are more and more pharmaceutical companies using R in some way and sharing their experience openly, but it does not cover regular submissions to regulatory agencies, like the FDA or EMA. Still, most of the tasks intended for a submission are done with acclaimed commercial software, like SAS, SPSS, Stata, nQuery, Win Nonlin and others. R plays there only a supportive role, far from replacing the commercial software entirely.

At 2KMM we use R for everything, from trial design to the final statistical report. It covers, to name a few:

  • trial design
  • data querying and making working data sets (in our own, proprietary format and experimentally - in CDISC)
  • the full-coverage analysis of the trial data along with the validation of the programs
  • producing relevant tabular and graphical output, combined into the final report in automated manner
  • documenting the process of the analysis (LOG)
  • automating tasks and developing supportive tools
  • performing ad hoc analyses per Sponsors' request, data reviews and investigations

Our next goal is clear: the first, complete submission of an experimental, randomized controlled trial to a regulatory agency. Exclusively with R. That will make us one of the pioneering CROs (in 2020) entirely based on the R statistical package.

But it is not that easy to just install R and start using it, while forgetting about the commercial software once in a lifetime. The pharmaceutical industry is thoroughly regulated in practically every aspect. The use of a statistical software is not an exception. Validation of the software used for the statistical analysis is the major requirement raised by the regulatory agencies.

And with R, fulfilling this requirement is a challenging. R is a great tool, the one and unquestionable leader among the Open Source statistical packages. Actually, it is the only package, that can compete with SAS, Stata, SPSS and other commercial software, that constitute the industry standards. Although the core of R is developed and maintained by a serious, respectable team of experienced statisticians, the ecosystem of its numerous packages (16.5k and counting) delivered by the external contributors is kind of a "wild creature, which often does not follow the quality of the core expressed in terms of stability and validity of calculations. This does not correspond with the strict, conservative world of the pharmaceutical industry.

In the last years, individual attempts to fulfill the validation requirement and bring R to the controlled environments were made by a number of companies independently. Over time, some of the companies started to cooperate. With combined efforts of the biggest pharmaceutical companies the R Validation Hub project has been set up.

But while most of the initiatives seem to focus on documentation and package quality assessment, relying on the results of unit tests delivered “as is” by the authors of R packages, we at 2KMM set different priorities, driven by the importance of exhaustive numerical validation done in the first place. Without that, there is a risk, that all the efforts on documentation and quality assurance will pertain to routines which results differ from those obtained with other trusted software in a way that cannot be adequately justified.

While we do not undermine the importance of documentation and early unit testing, we believe that numerical validation, going far beyond running those tests, is mandatory to achieve a satisfying level of reliability. In particular, one of the key challenges here is to explain all the (numerous!) discrepancies between R and other, commercial software, to be 100% sure the results are trustworthy.

Driven by the above considerations, we showed up at the R in Pharma 2020 virtual conference, to raise the awareness and start the, for too long avoided, discussion. Below you will find a video with 2KMM's principal biostatistitian Adrian Olszewski monologue and a link to his presentation in pdf format.

"It is not easy to be a pioneer - but oh, it is fascinating!" said Dr. Elizabeth Blackwell, the first woman in the United States, who earned a medical degree. And we could not agree more with these words. If we caught your attention and interest, please stay tuned for more news on our efforts in this difficult, yet highly satisfying area!


Presentation slides: