Should I Use Your R Package?

The answer to this simple, innocuous question is: it depends.
It depends on the package in question, of course. Perhaps less obviously, but just as importantly, it depends on who’s asking the question.
We’re sure if we asked you about “package quality”, we would all come up with what makes a good package:
- Documentation
- Unit tests
- Author credibility
- Does the package have a web page?
- Security vulnerabilities
- Bug closure rate
- Are there multiple maintainers?
- Does the package have any reverse dependencies?
We could (and have) come up with another twenty of these attributes. With 95% confidence, we’re sure that most people would agree that everything we’ve thought of is important. But with 100% confidence, we are certain we would disagree on how substantial these characteristics are. Surely, unit testing is more important than the popularity of the package? But how important is the documentation quality relative to the number of maintainers?
It all depends on why we are asking. It’s all about your risk appetite.
What is Risk Appetite?
Risk appetite is all about the risks you are and aren’t willing to take. It ranges from “Our packages need to be vaguely sensible, not compromise our system and have a place where I can log bugs” to “if our packages aren’t thoroughly tested and proven to be fit for purpose, I can’t use them in production”. The former is fairly easy to report on, whereas the latter is quite a bit more complicated.

The Risk Seekers!
Who amongst us wouldn’t want a top-quality R package? Who are the risk seekers? Most of us, at some point or another. If you are experimenting with building Shiny applications, as long as the package is “secure”, any old package is fine - you just want to experiment. Likewise, if you are an academic and you want to compare your method to one already published, as long the package is “correct”, that’s good enough.
During our training courses, we are often asked this question about quality. How bad can a package be to be usable? A thought experiment we like to do is “suppose you had an R package, with only one version. It’s never updated, no one has heard from the maintainer in ten years. But it provides code for an algorithm you want to use. What would you do?” The obvious answer for those who have a high risk appetite is “something is better than nothing” and “proceed with caution”.
Risk Averse
There are lots of examples of where we are (and should be) risk-averse when it comes to R packages. For example:
- In the pharmaceutical industry, we need reassurance that the statistics used in reporting are correct. It’s vital that these packages are highly regulated!
- Accuracy and stability are crucial for official Government reports on the state of the economy. A minor bug could have significant consequences.
- Banks also work in a regulated environment, running complex models, so have to be careful about the accuracy of their data.
Another crucial aspect is that not only do they need to consider what packages they are using, but also demonstrate this thinking in an auditable manner. This is not dissimilar from the ISO 9001 process. In the context of the Pharmaceutical industry, the holy grail is using R packages in FDA submissions for new therapies.
The R Validation Hub is Paving the Way
The pharmaceutical industry is the first to address these requirements in a meaningful way. The R Validation Hub put out a white paper which addresses the use of R and its packages for statistical analysis in pharmaceutical regulatory submissions, proposing a risk-based approach for validating R packages within validated infrastructure. The paper suggests that base R packages present minimal risk, whilst contributed packages require risk assessment based on their purpose, maintenance practices, community usage, and testing protocols.
The proposed framework classifies packages as either “Intended for Use” (loaded directly by users) or “Imports” (supporting dependencies), focusing validation efforts primarily on the former. Risk assessment should evaluate whether packages are statistical or non-statistical in nature, examine development practices, consider community adoption metrics, and review testing coverage. Organisations can use this assessment to determine package inclusion in validated systems and identify additional testing requirements, with high-risk packages needing more rigorous validation.
The approach required for those not working in regulated industries will probably not be as serious as this, but this gives an idea of what the gold standard for R package validation should be, which we can draw inspiration from for less strict applications. They’ve also created some helpful tools, like {riskmetric} which allows us to pull metadata about packages, and create quality scores for these data.
How Do We Enable Risk Assessment for Everyone Across the Risk Spectrum?
This is the question we have been grappling with over the past few months. How do we gather all of the information required to make informed decisions about including packages in production environments, using a flexible framework that meets the needs of everyone on the risk appetite spectrum? Especially considering…
There are so many packages on CRAN!
This is both a blessing and a curse, as anyone who’s ever worked in a regulated environment can tell you. The obvious answer is to automate, automate, automate! This is exactly what we’ve done in the creation of the Litmus package validation framework.
Our process relies on automation wherever possible:
- We have written code based on {riskmetric} that pulls package metadata from CRAN, git repositories and Posit Package Manager to provide a comprehensive overview of the package’s qualities
- We have created a framework to analyse and score packages based on these data
- We have created reporting and dashboarding workflows that allow us to generate package- and collection-level overviews of the scores for each package
- We’ve implemented automatic acceptance/rejection of a package based on client-specified criteria
- Our process also enables automated reporting of any additional manual steps taken to save a package from the bin, for example writing additional remedial tests or documentation
Keep an eye out for future blogs on this topic, as we dive a little deeper into the underlying principles driving our approach to package validation.
Does Your Package Pass the Litmus Test?
Ready to find out how we can help you validate your R package collection? Check out the Litmusverse and Get in touch.
