Porting a Shiny App to Observable Framework: Part 1
Preamble
This post, Part 1 in a series of two, looks at porting the functional code of a Shiny app - written in R - into JavaScript code to be used in an Observable Framework application. Part 2 will look at styling and deploying the ported application.
Background and Motivation
If you’re interested in interactive data visualisation you’ve probably heard of the d3 JavaScript library, even if you’ve never used it or even know any JavaScript. Mike Bostock, the creator of d3, and colleagues followed this up with d3.express, which was quickly renamed to Observable. In Mike’s words:
It’s for exploratory data analysis, for understanding systems and algorithms, for teaching and sharing techniques in code, and for sharing interactive visual explanations. To make visualization easier—to make discovery easier—we first need to make coding easier.
If you’re not familiar with Observable, think of Jupyter notebooks or Mathematica but with JavaScript (sort of).
And following on from Observable came Observable Plot:
Observable Plot is a free, open-source, JavaScript library for visualizing tabular data, focused on accelerating exploratory data analysis. It has a concise, memorable, yet expressive interface, featuring scales and layered marks in the grammar of graphics style popularized by Leland Wilkinson and Hadley Wickham and inspired by the earlier ideas of Jacques Bertin.
If you like ggplot2 and like the look of d3 but are put off by the idea of having to dive deep into hardcore JavaScript and low-level SVG primitives, Observable Plot could be just the thing for you. Even if you don’t know JavaScript, if you can read JSON and have some experience with reactive programming in a notebook, then I suspect you could probably pick up Observable Plot in the Observable environment fairly quickly.
More recently, the Observable team released Observable Framework (often shortened to just “Framework” with a capital “F”), in their own words:
Observable Framework is an open-source static site generator for data apps, dashboards, reports, and more. Framework includes a preview server for local development, and a command-line interface for automating builds & deploys.
You write simple Markdown pages — with interactive charts and inputs in reactive JavaScript, and with data snapshots generated by loaders in any programming language (SQL, Python, R, and more) — and Framework compiles it into a static site with instant page loads for a great user experience. Since everything is just files, you can use your preferred editor and source control, write unit tests, share code with other apps, integrate with CI/CD, and host projects anywhere.
Having a background in both data science and web development, I’ve spent many hours with the {ggplot2} and {shiny} packages and many more wrangling and visualising data using d3. I’ve also dabbled with the Observable environment but, until now, never used Observable Plot. With the addition of Observable Framework, this seemed like an opportune time to take a look at both and see how they compare to Shiny.
The Shiny App
To pick a suitable app to experiment with I scoured the Shiny gallery page. I wanted a “Goldilocks” example: not really simple but not highly complex, either. And, obviously, something with a chart. The Movie explorer seemed to fit the bill perfectly: a single chart but with lots of permitted modifications. Perfect for some reactive programming. A zoomed out screenshot of the app (below) shows that it is, perhaps, too tall. This means that users would have to scroll to see those controls lying at the bottom, putting the top of the chart out of view.
GitHub
You can follow along with this blog post yourself by adding and removing code, step-by-step. You can also clone our repository from GitHub:
git clone https://github.com/jumpingrivers/observable-framework-movie-explorer.git
The “main” branch here is in the “final” state of the app but there are also tags marking the commits for the end of each step we take, that can be easily switched to, as noted at the end of each section with the short code blocks that look like this.
Creating the Default Framework App
The website for Observable Framework has an excellent Getting started guide. Here we’ll just steal from step 1 of that. You’ll need a fairly recent version (version 18 or above at the time of writing) of Node.js installed.
At the command line, in the parent directory for your future project, run
npx @observablehq/framework@latest create
and simply accept all the default values by pressing Enter.
To get a live-updating preview of the site run
npm run dev
This will launch your default browser and you’ll now have something that looks like the image below.
git switch --detach start
Generating the Data File
From following the above we end up with a bunch of stuff we don’t actually need for our app. But some of it is useful for pointing us in the right direction, we’ll clear the rest out later.
In the “src/data” directory there’s a file with the slightly odd-looking “.csv.js” extension. The .js extension tells Observable Framework that the content of the file is JavaScript. Observable then knows to execute the file using the node CLI. The .csv extension is used for the generated file name, i.e. Observable Framework sees launches.csv.js and passes it to node, the output from the script is then saved to a file called launches.csv.
But it’s not just the .js extension Framework knows what to do with. It also understands that .py is Python, .rs is rust and .go is Go. And, most importantly for us, it knows to use Rscript
when a file has the extension .R. To work, these scripts (called data loaders in the Framework documentation) need to write to standard output. In an R script we can do that explicitly with the print
function.
All we need now is our own data. Helpfully the data and code for the Shiny app is MIT-licensed and on GitHub.
The top of the server.R file looks like this
Top of the Shiny app's server.R file
library(ggvis)
library(dplyr)
if (FALSE) {
library(RSQLite)
library(dbplyr)
}
# Set up handles to database tables on app start
db <- src_sqlite("movies.db")
omdb <- tbl(db, "omdb")
tomatoes <- tbl(db, "tomatoes")
# Join tables, filtering out those with <10 reviews, and select specified columns
all_movies <- inner_join(omdb, tomatoes, by = "ID") %>%
filter(Reviews >= 10) %>%
select(ID, imdbID, Title, Year, Rating_m = Rating.x, Runtime, Genre, Released,
Director, Writer, imdbRating, imdbVotes, Language, Country, Oscars,
Rating = Rating.y, Meter, Reviews, Fresh, Rotten, userMeter, userRating, userReviews,
BoxOffice, Production, Cast)
We can use this as a starting point but:
- We don’t need the {ggvis} library;
- We have to reference our own copy of the movies.db SQLite database;
- The
src_sqlite
function is deprecated; - It turns out there’s more data in the
all_movies
object than we actually need.
The following code, that I put in a file called movies.json.R and placed in the “src/data” directory alongside the movies.db database, deals with all these issues:
src/data/movies.json.R
library(dplyr)
library(RSQLite)
library(dbplyr)
# Hack to find the database path
script_directory = gsub("--file=", "", commandArgs()[4])
db_path = file.path(dirname(script_directory), "movies.db")
# Updated code to no longer use deprecated function
conn = dbConnect(RSQLite::SQLite(), db_path)
omdb = tbl(conn, "omdb")
tomatoes = tbl(conn, "tomatoes")
# Removed films without a BoxOffice value
# Select only the variables we actually use
all_movies = inner_join(omdb, tomatoes, by = "ID") %>%
filter(Reviews >= 10 & !is.na(BoxOffice)) %>%
select(Title, Runtime, Genre, Released, Director, Oscars,
Rating = Rating.y, Meter, Reviews, BoxOffice, Cast)
# Convert data to a JSON string
json = all_movies %>%
collect() %>%
jsonlite::toJSON()
# Tidy up database connection
dbDisconnect(conn)
# Print data
print(json)
If you followed the server.R code from the original Shiny app then hopefully most of these changes make sense. The exception is probably the “Hack to find the database path”:
script_directory = gsub("--file=", "", commandArgs()[4])
db_path = file.path(dirname(script_directory), "movies.db")
I know I’ve put the database in the same directory as my R script but the script needs to know the path relative to where it’s executed from. This isn’t actually obvious at this point. But we can find the path from the execution location to the script using the commandArgs
function. The rest then is just some ugly code to take the output of the commandArgs
function, find the script relative to the execution location and then replace the script file name with the database file name that we know lives in the same directory.
We can test our script by installing dplyr, RSQLite and dbplyr as necessary and then running (from the root of the project):
Rscript src/data/movies.json.R > movies.json
This will create a JSON file, movies.json, with our data in the root of the project. You can delete this as it’s not needed.
We also no longer need the initial files in the “src/data” directory — events.json and launches.csv.js — and can delete them.
That’s everything we want to do in R covered. Now to actually build the movies app.
git switch --detach data
The Markdown File
For a simple app made of a single page the expectation is that the content of the app is placed inside a markdown file called index.md directly inside of the “src” directory of the project. This already exists in our generated project, alongside another couple of markdown files we can safely delete.
So now we write the “content” of our app in the index.md file in place of original content we generated in the “Getting Started” section. Being a markdown file, you may think this would end up containing a load of markdown syntax. It turns out that in our case the file mostly looks like blocks of JavaScript… because that’s what it is.
The page starts,however, with use of explicit HTML markup: <h1></h1>
. That’s because Observable Framework automatically turns headings created using # markdown syntax into anchor points (i.e. links to that specific part of the page). This is useful for writing “Help” or other documentation, as you can easily link to specific parts of the page, but isn’t particularly useful here.
As already noted, most of the markdown file is “fenced” blocks of JavaScript using the syntax ```js…```. The critical thing here to understand is that these blocks are actually executed in the browser. They are not simply there for displaying code to the user. Framework is reactive by default and the bit I had (and still have, if we’re honest) to get my head around is that each fenced block forms a “cell”. The thing that made most sense to me was thinking of cells in Excel: you change the value in a cell and the values of other cells that depend on it automatically update regardless of where the cell is positioned in the two-dimensional grid of the spreadsheet. Still, with Framework, I’m not sure how much “stuff” should go in a single cell: What is the best practice here? Does it matter so long as the output is correct in terms of both value and position on the page? Is there any significant effect on performance? How do the answers to the previous questions change when we go from creating notebooks to creating dashboards?
My current thinking on this can be summarised roughly as “create blocks of stuff that looks like it goes together and seems to work, with some cells dealing with the UI and some cells responsible for the graphic”. So let’s cover each block/cell in turn.
Building the UI
The first cell covers the loading of the data and some basic processing of it:
// Load the data from the file we generated
const movies = await FileAttachment('./data/movies.json').json();
// Sort the data by number of oscars won. This ends up putting the
// multi-oscar-winning movies at the end of the data array so that
// they get drawn last in our scatter plot and thus appear on top
movies.sort((a, b) => a.Oscars - b.Oscars);
// Modify/extend our data objects for easier future use
movies.forEach(function(d) {
// Add a Boolean stating whether or not the movie won any Oscars
d.OscarWinner = d.Oscars > 0;
// Convert the release date string to a JS Date object
d.Released = new Date(d.Released);
// Add a property that is just the four-digit year of release
d.YearReleased = d.Released.getFullYear();
// Add an array of Genres and remove any excess whitespace
d.Genres = d.Genre?.split(',').map(s => s.trim()) || [];
// Convert the Director string to lowercase for simpler searching
d.Director = d.Director?.toLowerCase() || '';
// Convert the Cast string to lowercase for simpler searching
d.Cast = d.Cast?.toLowerCase() || '';
// Turn the BoxOffice revenue figures into millions of dollars
d.BoxOffice = (d.BoxOffice || 0) / 1e6;
});
// Create an array containing all the different genres found in the
// dataset and sort alphabetically
const genres = Array.from(
movies.reduce(function(set, d) {
d.Genres.forEach(g => set.add(g));
return set;
}, new Set())
)
.sort((a, b) => a.localeCompare(b));
// Extract a two-element array giving the earliest and latest
// release years of films in the dataset
const yearExtent = d3.extent(movies, d => d.YearReleased);
Most of this is “vanilla” JavaScript but there are a couple of functions that aren’t: FileAttachment
and d3.extent
. FileAttachment
is a function created specifically for Observable notebooks that also works with Observable Framework. It simplifies the code required to load data files like JSON, CSV and XLSX. It doesn’t need to be explicitly import
ed into a Framework markdown file. The same is true of d3.extent
(and all other methods of the d3 library). This method takes an input array of data and an “accessor function” that is applied to each element of the array. The return value is then a two-element array of the minimum and maximum values returned when the accessor function is applied to each element of the input array.
The second JavaScript cell creates some, not especially interesting, utility functions and an object that are used later on in the construction of the controls and graphics. This is all vanilla JavaScript.
// Create function for defining a middle grey of varying opacity
const gy = 150;
const getGrey = opacity => `rgba(${gy},${gy},${gy},${opacity})`;
// Create function for converting a Boolean value to a text label
const getWonOscarText = bool => bool ? 'Won Oscar(s)' : 'Didn\'t Win an Oscar';
// Create an array of objects that can be used to map between data properties
// and their more human-friendly labels and vice-versa
const axisVariables = [
{name: 'Tomatometer', prop: 'Meter'},
{name: 'Numeric Rating', prop: 'Rating'},
{name: 'Number of Reviews', prop: 'Reviews'},
{name: 'Box-office revenue ($million)', prop: 'BoxOffice'},
{name: 'Year', prop: 'Released'},
{name: 'Length (minutes)', prop: 'Runtime'},
];
In the third block we finally start to build the user interface, adding all the controls for our sidebar. This is the point where we start utilising the power of Framework through the in-built view
function and Inputs
object.
In Observable, a view is a user interface element that directly controls a value in the notebook. A view consists of two parts:
- The view, which is typically an interactive DOM element […].
- The value, which is any JavaScript value.
For the Inputs
methods the first argument typically represents the allowed values for the control and a second argument provides additional details using an object. The declaration order transfers to the order in which the corresponding UI elements appear in the HTML and thus the ordering, top to bottom, in the sidebar panel. We change the order here from the original Shiny example to something that seems a bit more logical. Specifically, the select menus for choosing the two axes are moved from the bottom of the controls to the top.
const xVariable = view(
Inputs.select(axisVariables, {
label: 'X-axis Variable',
format: d => d.name,
value: axisVariables.find((d) => d.prop === 'Meter')
})
);
const yVariable = view(
Inputs.select(axisVariables, {
label: 'Y-axis Variable',
format: d => d.name,
value: axisVariables.find((d) => d.prop === 'Reviews')
})
);
const reviewsMin = view(
Inputs.range(
[10, 300],
{ label: 'Minimum number of reviews on Rotten Tomatoes', step: 1, value: 80 }
)
);
const yearMin = view(
Inputs.number(
yearExtent,
{ label: 'Earliest release year', step: 1, value: 1970 }
)
);
const yearMax = view(
Inputs.number(
yearExtent,
{ label: 'Latest release year', step: 1, value: yearExtent[1] }
)
);
const dollarsMin = view(
Inputs.number(
[0, 800],
{ label: 'Minimum box-office revenue ($million)', step: 10, value: 0 }
)
);
const dollarsMax = view(
Inputs.number(
[0, 800],
{ label: 'Maximum box-office revenue ($million)', step: 10, value: 800 }
)
);
const oscarsMin = view(
Inputs.radio(
[0, 1, 2, 3, 4],
{ label: 'Minimum number of Oscars won', value: 0 }
)
)
const selectedGenre = view(
Inputs.select(['All'].concat(genres), {
label: 'Genre',
value: 'All'
})
);
const directorText = view(
Inputs.text({
label: 'Director name contains',
value: ''
})
);
const castText = view(
Inputs.text({
label: 'Cast contains',
value: ''
})
);
Inside a block in which a variable (or const
) is declared using view
, that variable will be an object representing that view. In other code blocks, however, that variable name can be used to directly retrieve the value associated with that view: there’s no requirement to de-reference the object. This can help make code look a lot nicer but can also be confusing. For instance, a view
can be declared as a const
(it is an object whose properties are still mutable) but the value of the variable with the same name changes in other blocks.
You may also notice that minimum and maximum values are set with separate range controls. This is because, despite the name, browser-native range inputs only support a single handle.
Our page now has a title and our controls, plus a footer we’ll get rid of later.
git switch --detach ui
Building the Graphic
We then add a block to process our data based on the values of our inputs:
const data = movies.filter(function(d) {
return (
d.Reviews >= reviewsMin
&& d.Oscars >= oscarsMin
&& d.YearReleased >= yearMin && d.YearReleased <= yearMax
&& (selectedGenre === 'All' || d.Genres.includes(selectedGenre))
&& d.Director.includes(directorText.toLowerCase())
&& d.Cast.includes(castText.toLowerCase())
&& d.BoxOffice >= dollarsMin && d.BoxOffice <= dollarsMax
);
});
const xLabel = xVariable.name;
const yLabel = yVariable.name;
Finally, we can add the code to render our scatter chart using Observable plot:
Plot.plot({
width: 500,
height: 500,
color: {
type: 'categorical',
range: [getGrey(1), 'orange'],
domain: [getWonOscarText(false), getWonOscarText(true)], // Required for when filtering on oscar wins
legend: true,
},
grid: true,
marks: [
Plot.axisX({ labelAnchor: 'center', labelArrow: 'none', label: xLabel }),
Plot.axisY({ labelAnchor: 'center', labelArrow: 'none', label: yLabel }),
Plot.dot(
data,
{
x: xVariable.prop,
y: yVariable.prop,
stroke: d => getWonOscarText(d.OscarWinner),
fill: getGrey(0.4),
r: 4,
channels: {
filmTitle: { value: 'Title', label: '' },
year: { value: 'YearReleased', label: '' },
revenue: { value: 'BoxOffice', label: '' },
},
tip: {
format: {
filmTitle: true,
year: d => `Year of release: ${d}`,
revenue: d => `Revenue: $${d.toFixed(d < 10 ? 1: 0)} million`,
x: false, y: false, stroke: false
}
}
}
),
]
})
We finish with a line of markdown that includes inline JavaScript using ${}
syntax:
Number of movies selected: ${ d3.format(',')(data.length) }
This line simply prints the number of movies plotted at any given time.
Our app is now fully interactive but everything is arranged down a single column, regardless of screen width!
Next Up
We’ve now got a functioning app but the layout isn’t great and we haven’t yet deployed it anywhere useful. We’ll cover both of these things in Part 2.