Interested in R package validation? Learn about Litmus

Litmus FAQ

Want to know more about R package validation with Litmus? Checkout the FAQs below. If you still have questions then please don't hesitate to get in touch.

Service Offering

How can Jumping Rivers help me validate my package collection?

Our pre-sales process works like this: You provide us with a list of your packages and their versions, built package files, or a link to a specific repository. We then perform a preliminary analysis of your collection to let you know how many packages would be high, medium and low risk using our standard scoring framework. Based on this preliminary risk analysis, we can discuss a way forward to meet your organisation's needs. This could involve creating custom assessments, scoring, reporting, and dashboards to support your decision-making.

Why is the Litmus approach more flexible and light-weight than other approaches?

Unlike our competitors, we do not provide you with a pre-validated list of packages, so you do not need to pay for packages that you have no interest in using. We allow you to provide your own package list instead, and based on the risk assessment, make recommendations on how to proceed with acceptance of the packages into your environment, outright rejection or the possibility of package remediation, wherein we address issues with the package (for example, test coverage, statistical reproducibility, documentation). We do not dictate what your risk appetite should be, but instead collaborate with you to establish what your priorities are, and use these to inform targeted interventions with respect to the risk of your package collection.

What are the outputs of the Litmusverse package validation offering?

We believe in flexibility and somewhat of a 'choose-your-own-adventure' model. We are able to tailor the outputs to your needs. We are ready to provide technical documentation detailing the risk assessment process, output as PDFs for long term storage. We are able to provide compact one-page reports summarising our findings per-package. We are able to provide collection-level reports, as well as collection-level dashboards that enable you to record decisions about your package collection (see our demo version to give it a spin). Our goal is to make risk assessment as versatile and painless as possible, while retaining the rigour demanded by statistical programming. We also provide all addtional code written to remediate a package.

What is remediation and how does it work?

Remediation refers to the process defined by the R Validation Hub in their white paper on the topic of package validation. It is the process they recommend for all stastical packages, as well as packages that are insufficiently tested and documented. Our approach towards remediation follows the initial risk assessment detailed above. Following this, we will have a list of packages, their overall risk score, as well as specific metrics that contribute to their risk. With this data, we are empowered to implement acceptance/rejection decisions for this collection, based on criteria we will establish with you based on your risk appetite. If a package is rejected based on these criteria, but is business critical, it is a candidate for remediation. Activities to remediate a package include performing additional research (e.g. citing published examples of its use), writing additional unit tests for exported functions, writing reproducible statistical tests and writing additional documentation to facilitate its use within your organisation.

Scoring Package Quality

Do you validate packages and their dependencies? Which dependencies?

Not all R package dependencies are created equal. We recommend risk assessing so-called 'main' packages (i.e. the packages that users are likely to import their workflows), as well as the 'Imports', 'Depends' and 'LinkingTo'

What scores do you use? How are these scores calculated?

We have created a custom scoring framework called litmus.score which provides individual metrics for package risk in four categories, namely code quality, maintenance, popularity and documentation quality. These metrics are combined into a scoring strategy where each is weighed according to their relative importance within a category. Each category is then assigned a weight within the total score, so that each package assessed is awarded an overall score out of 1.

How are scores used to determine package risk?

Ultimately this depends on your own risk appetite. We recommend that some metrics should be considered 'show-stoppers'. For example, if your organisation has a low tolerance for risk, we would pump the breaks if a package is found to fail R CMD check, or shown to have security vulnerabilities. The overall score for the package might be relatively high, but if qualitatively the package misses the mark, we would consider it high risk anyway. The overall and within-category scoring would be something we establish together based on your risk appetite, and if any qualitative show-stoppers might need to be considered.

I don't want to think scoring too much. Can you give us a recommendation based on your expertise?

Yes! We totally understand that not all organisations have the time or desire to nerd out over the nitty gritties of scoring strategies and calculations. So, we can provide pre-set strategies that allow for high, medium or low risk appetites based on our extensive research into this subject, following the guidelines set our by the R Validation Hub.

Who performs the risk assessment? How is it performed?

Our risk assessment itself is performed programmatically, using the Litmus suite of R packages. A reviewer, usually a MSc or PhD statistics graduate, reviews the data provided by the risk assessment, and makes a recommendation about what further steps are required to include the package in a production environment, based on your organisation's risk appetite.

Can I double-check your findings?

Absolutely! All clients will be provided with access to the code used to assess and remediate their package collection.

What Operating Systems and R versions are supported?

We currently support most Linux systems. We support any R version above 4.0.0.

How do you deal with issues of package compatibility?

We require that a snapshot date is specified when the package list is provided, to ensure that all packages are compatible.

Do you support packages that are not hosted on CRAN?

Yes! We really mean it when we say 'bring your own package'. Some of the assessments will not run (e.g. risk of removal from CRAN), but will not contribute to the overall score for the package.