I have yet to see an interactive data visualization in a scientific article. I think this is a lost opportunity for journals and researchers to allow plots to give additional insights. With e-publishing being standard for most journals, incorporating dynamic visualizations would really not require that much.
Interactive plots can be saved as HTML widgets to a file which could be included in e-publications. The reader does not need R or additional software to interact with the plot. I think there are many cases where interactive (sometimes called dynamic) plots are useful such as displaying a tooltip when the reader hovers over an individual data point (See Fig. 1 below). This removes removes the need for cluttered annotations (a challenge which to some extent can be overcome using ggrepel::geom_text_repel()).
Let’s load some packages and define some reusables:
Click to view code
"%ni%"<-Negate("%in%")# Load packages# ---------------------------------------------------------------------------->library(scholar)library(ggiraph)library(tidyverse)library(lubridate)library(ggpubr)library(patchwork)library(ggtext)library(glue)# Define a theme# ---------------------------------------------------------------------------->theme_simple <-function() {theme_minimal(base_family ="Roboto") +theme(axis.title =element_blank(),panel.grid.minor =element_blank(),plot.title =element_text(face ="bold", hjust =0),plot.subtitle =element_text(hjust =0),strip.text =element_text(hjust =0, size =5, color ="#444444") )}# Set a color palette# ---------------------------------------------------------------------------->palette <-c(`Cofactors and Vitamins`="#d4e080", Lipid ="#f17367", Nucleotide ="#b3646c", Peptide ="#ff85a5", `Partially Characterized Molecules`="#e06bdf",Unknown ="#9db98b",Energy ="#71dcca",Carbohydrate ="#66c4ec",`Amino Acid`="#7392ba",Xenobiotics ="#ffd38b")
Using an integrated metagenome–metabolome dataset
The data used in this part was downloaded from a Nature Communications article from Dekkers et al., where the authors used paired analyses of blood and stool to learn more about the associations between circulating (blood) metabolites and the fecal microbiome. In the supplement to the article, the authors generously supply both data on the variance of individual metabolite levels explained by the microbiome and associations between individual metabolites and the Shannon diversity index, which is a measure of microbial diversity (higher Shannon index indicates a more diverse microbiome).
We load these data, as well as metabolite annotation data from Metabolon (the platform Dekkers et al., used to analyze the circulating metabolome) into R and generate an interactive plot showing the bivariate association between the metabolite’s variance explained by the microbiome and Shannon diversity.
I’ve used the ggiraph package which is available on CRAN. In brief, this package allows you to add interactive geoms to your ggplot, which’ aesthetics require both a tooltip and a data_id to be defined. These arguments define what should be displayed, e.g. when you hover over or click on a plot element, as well as data_id which defines the ID to be associated with the elements. There are loads of ways you can customize the appearances with CSS and JavaScript, but since this is my first attempt at making interactive plots I’ve tried to keep it simple.
Loading data and data wrangling (click to view code)
# Load and wrangle data# ----------------------------------------------------------------------------># Load metabolite annotation data from Metabolonmet_ann <-read_delim("data/met_ann.txt") %>%mutate(SuperPathway =ifelse(is.na(SuperPathway), "Unknown", SuperPathway))# Load Supplementary Data 5. Variance explained of plasma metabolite levels by variation in the gut microbiotavar_expl <-read_delim("data/var_expl_dekkers.txt")# Load Supplementary Data 4. Association between Shannon diversity index and plasma metabolitesalpha_div <-read_delim("data/alpha_div_dekkers.txt")# Join the datadf <- var_expl %>%left_join(alpha_div, by ="biochem") %>%left_join(met_ann %>%select(biochem = Biochemical, super_pathw = SuperPathway, sub_pathw = SubPathway), by ="biochem") %>%filter(!is.na(super_pathw)) %>%# For some reason the tooltip cannot contain the ' symbolmutate(biochem =gsub("'", "", biochem))
Now we can create a tooltip and produce the plot (Fig. 1).
Creating the interactive plot (click to view code)
dekkers_p <- df %>%mutate(tooltip =glue("<b>{biochem}</b><br>","R<sup>2</sup> = {r2}<br>","Shannon diversity, rho = {partial_spearman}" )) %>%# Arrange to get points with higher r2 appear in upper layersarrange(r2) %>%ggplot(aes(x = partial_spearman, y = r2, fill = super_pathw, size = r2, tooltip = tooltip, data_id = tooltip)) +geom_point_interactive(shape =21, color ="white") +theme_simple() +theme(legend.position ="top",legend.title =element_blank(),legend.key.height =unit(2, "mm"),legend.key.width =unit(1, "mm"),axis.title.x =element_text(color ="#444444"),axis.title.y =element_text(color ="#444444") ) +labs(x ="Partial correlation with alpha diversity (Spearman)",y =bquote("Variance explained by gut microbiota, "~ R^2) ) +guides(fill =guide_legend(label.position ="right", nrow =2, override.aes =list(size =4)), size ="none") +scale_size(range =c(1, 4)) +scale_fill_manual(values = palette)# Define CSS theming# ---------------------------------------------------------------------------->tooltip_css <-" background-color: #F9F9F9; color: #FFFFFF; padding: 5px; border-radius: 2px; font-family: 'Arial', sans-serif; font-size: 13px;"# Use ggiraph::girafe() to create the interactive plot and apply effects and CSS # theming# ---------------------------------------------------------------------------->dekkers_p_int <-girafe(ggobj = dekkers_p,options =list(opts_hover(css ="fill: black;"),opts_tooltip(css = tooltip_css, use_fill =TRUE),opts_sizing(rescale =FALSE) ),height_svg =5.5,width_svg =7.5)
Fig. 1. Bivariate association between metabolite’s variance explained by the gut microbiota and their association with Shannon diversity. Data sourced from https://www.nature.com/articles/s41467-022-33050-0.
Metabolite enrichment
Enrichment plots are ubiqutous in the scientific literature. What if you could see the elements which contribute to enrichment of a given pathway? Let’s explore how we can use ggiraph to add more information to these plots, using the Dekkers et al. dataset from above. We can e.g. explore which metabolic pathway is most enriched for microbial metabolites, and, when hovering over the individual pathways, display the names of the 10 metabolite which contribute the most to each pathway’s enrichment (Fig. 2).
Click to view code
# Load the fgsea package (available on BioConductor)# if (!require("BiocManager", quietly = TRUE))# install.packages("BiocManager")# BiocManager::install("fgsea")library(fgsea)# We define the "metabolite set" (the GSEA method is most frequently used in transcriptomics data, where you would call this a "gene set")# ----------------------------------------------------------------------------->metabo_set_df <- df %>%select(biochem, super_pathw) metabo_set <-split(metabo_set_df$biochem, metabo_set_df$super_pathw)# Rank metabolites by R2 (variance explained by the gut microbiome)# ----------------------------------------------------------------------------->ranked_biochems <- df %>%arrange(desc(r2)) %>%select(biochem, r2)r2 <- ranked_biochems$r2biochems <- ranked_biochems$biochem# Create a named vector where the values are from the 'r2' vector and the names are from the 'biochems' vector.stats <-setNames(r2, biochems)# Run FGSEA# ----------------------------------------------------------------------------->metabo_gsea <-fgsea(pathways = metabo_set, stats = stats, gseaParam =0) # Create a GSEA bar plot indicating which metabolite contribute the most to the enrichment# ----------------------------------------------------------------------------->pal <-c("Negative enrichment"="#66c4ec", "Positive enrichment"="#f17367")fgsea_interactive <-metabo_gsea %>%mutate(color =ifelse(NES >0, "Positive enrichment", "Negative enrichment")) %>%mutate(leading_edge_metabs =sapply(leadingEdge, function(x) paste(x, collapse ="\n"))) %>%mutate(leading_edge_metabs =sapply( leadingEdge, function(x) paste(head(x, 10), collapse ="\n") ) ) %>%ggplot(aes(x = NES, y =reorder(pathway, NES), fill = color, tooltip = leading_edge_metabs, data_id = leading_edge_metabs))+geom_col_interactive(width =0.5)+theme_simple()+theme(panel.grid.major.y =element_blank(),legend.position ="top",legend.title =element_blank(),legend.key.height =unit(1.5, "mm") )+scale_fill_manual(values = pal)+guides(fill =guide_legend(label.position ="bottom"))fgsea_interactive <- fgsea_interactive +plot_spacer()fgsea_interactive <-girafe(ggobj = fgsea_interactive,options =list(opts_hover(css ="fill: black;"),opts_tooltip(css = tooltip_css, use_fill =TRUE),opts_sizing(rescale =FALSE) ),height_svg =3,width_svg =6)
Fig. 2. Interactive enrichment plot. Positive enrichment scores indicate that the metabolic pathway is enriched for metabolites which’ variance is explained by the gut microbiome.
Using Google scholar citations data
I am always hopeful that my research will have an impact. Citations is one measure of this. I’ve previously written about how you can scrape Google Scholar data to gain more insight into these metrics than what is directly available on Google Scholar’s website. I now wanted to create an interactive plot which gives some more information, including a bar plot with a hover effect. The tooltip gives a text-based visualisation of citations over time.
Data wrangling (click to view code)
# Scrape scholar data# ---------------------------------------------------------------------------->author_id <- scholar::get_scholar_id(last_name ="Braadland", first_name ="Peder")pubs <- scholar::get_publications(author_id) %>%# Remove data with no cid (except for preprints)filter(!(is.na(cid))) %>%# One article had been duplicated albeit with different titlesmutate(title =ifelse(title =="Ex vivo metabolic fingerprinting identifies biomarkers predictive of prostate cancer recurrence","Ex vivo metabolic fingerprinting identifies biomarkers predictive of prostate cancer recurrence following radical prostatectomy", title )) %>%# For duplicated articles, chose just first occurrencegroup_by(title) %>%slice(1) %>%filter(!str_detect(title, "Back Cover")) %>%rename(cites_total = cites,year_published = year )# We can also collect data on each article ('pubid') and the number of citations# received any year after the article was publishedpubs_cit_per_yr <-tibble(year =numeric(), cites =numeric(), pubid =character())articles <- pubs %>%pull(pubid)for (i in1:length(articles)) { article <- articles[i] article_info <-get_article_cite_history(id = author_id, article = article) pubs_cit_per_yr <-bind_rows(pubs_cit_per_yr, article_info)}# Append publication titlespubs_cit_per_yr <- pubs_cit_per_yr %>%left_join(pubs)
We can produce an interactive plot, again using ggirafe (Fig. 3)
Producing an interactive plot (click to view code)
Fig. 3. Google Scholar citations data. Interactive barplot with hover effect - the mouse-over reveals additional information in the data.
Interacting plots
Dynamic plots can also be connected, in that interacting with one plot leads to a visual change in another, which can be handy when two plots convey different sets of information from the same data. I use the Google scholar data to illustrate publications “over-performing” in terms of citations since the year of publication alongside the plot from above (Fig. 4).
Creating two interacting plots (click to view code)
# Create a scatterplot that will interact with the bar plot# ---------------------------------------------------------------------------->p2 <- pubs_cit_per_yr %>%group_by(title) %>%mutate(total_cites =sum(cites)) %>%mutate(yearly_cites =paste0(year, ": ", strrep("|", cites))) %>%select(year_published, yearly_cites, total_cites, title, journal) %>%group_by(year_published, total_cites, title, journal) %>%summarize(yearly_cites_list =list(yearly_cites), .groups ="drop") %>%mutate(ann =glue("{journal} (<b>{year_published}</b>)")) %>%ggplot(aes(x = year_published, y = total_cites)) +geom_point_interactive(aes(size = total_cites, tooltip = total_cites, data_id = ann)) +stat_smooth(method ="loess", span =1, alpha =0.25, color ="#999999", lwd =0.5, linetype ="dashed") +theme_simple() +theme(axis.title.y =element_text(color ="#444444"),axis.text.x =element_text(angle =45, hjust =1, vjust =1),legend.position ="none" ) +labs(x ="",y ="N citations" ) +scale_x_continuous(breaks =seq(2016, 2024, 2)) +scale_size(range =c(2, 4.5))# Combine the two plots using the pathwork package# ---------------------------------------------------------------------------->combi <- (p2 + p_cites_per_yr) +plot_layout(widths =c(2, 2.5))p_combi <-girafe(ggobj = combi,options =list(opts_hover(css ="fill: orange;"),opts_tooltip(css = tooltip_css, use_fill =TRUE) ),height_svg =3.8,width_svg =8)
Fig. 4. Google Scholar citations data. The individual plots in the plot composite interact with each other when you hover your mouse over plot elements.
Interactive tile plot
During the April 2024 30 day chart challenge I had a look at my favorite football club Lyn’s seasons since 1990. I thought it would be nice if you could get information about the individual matches when you hover your pointer over them. I’ve also added an interactive effect on the bumper plot shown below the tile plot.