<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>LIBD rstats club</title>
    <link>http://LieberInstitute.github.io/rstatsclub/</link>
      <atom:link href="http://LieberInstitute.github.io/rstatsclub/index.xml" rel="self" type="application/rss+xml" />
    <description>LIBD rstats club</description>
    <generator>Wowchemy (https://wowchemy.com)</generator><language>en-us</language><copyright>© 2018-2024  under (CC) BY-NC-SA 4.0. All thoughts and opinions here are own own. Icon is the [R logo](https://www.r-project.org/logo/) modified</copyright><lastBuildDate>Wed, 01 Nov 2023 00:00:00 +0000</lastBuildDate>
    <image>
      <url>http://LieberInstitute.github.io/rstatsclub/images/logo_huedd053945c8b535337cc8577c9427bba_3096_300x300_fit_lanczos_3.png</url>
      <title>LIBD rstats club</title>
      <link>http://LieberInstitute.github.io/rstatsclub/</link>
    </image>
    
    <item>
      <title>L. Collado-Torres</title>
      <link>http://LieberInstitute.github.io/rstatsclub/about/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>http://LieberInstitute.github.io/rstatsclub/about/</guid>
      <description></description>
    </item>
    
    <item>
      <title>Our first adventure with Visium Spatial Proteogenomics</title>
      <link>http://LieberInstitute.github.io/rstatsclub/2023/11/01/our-first-adventure-with-visium-spatial-proteogenomics/</link>
      <pubDate>Wed, 01 Nov 2023 00:00:00 +0000</pubDate>
      <guid>http://LieberInstitute.github.io/rstatsclub/2023/11/01/our-first-adventure-with-visium-spatial-proteogenomics/</guid>
      <description>&lt;p&gt;&lt;em&gt;By &lt;a href=&#34;https://twitter.com/sanghokwon17&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Sang Ho Kwon&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Recent advancements in spatially-resolved transcriptomics (SRT) technologies have ushered in a new era of possibilities for biological research. These technologies offer the unique ability to map biomolecular information within the native tissue architecture. Preserving the spatial resolution of genome-wide gene expression allows researchers to obtain a more holistic view of the tissue microenvironment, particularly the underlying molecular and cellular dynamics in a spatial-anatomical context, which is useful to understand the composition, states, and function of individual cell types, as well as their interactions with one another in a defined microenvironment. Visium Spatial Gene Expression from 10x Genomics is a widely used and validated next generation sequencing (NGS)-based SRT platform.&lt;/p&gt;
&lt;p&gt;In 2021, 10x Genomics extended the capabilities of the Visium Spatial platform, creating the &lt;a href=&#34;https://www.10xgenomics.com/products/spatial-gene-and-protein-expression&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Visium Spatial Proteogenomics (Visium-SPG)&lt;/a&gt; platform by introducing immunofluorescence protein staining into the workflow. This integration of spatial transcriptomics and immunofluorescence-based protein identification significantly enhanced the power of spatial -omics. This capability provides a more comprehensive breadth of biomolecular information, bridging genome-wide gene expression with the specific protein expression of interest within undissociated tissue sections at a high level of spatial resolution.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://doi.org/10.1089/genbio.2023.0019&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;img src=&#34;images/GenBiotech_Image_SpatialAD.png&#34; alt=&#34;&#34;&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Our study serves as a proof-of-concept for the power of spatial proteogenomic profiling in characterizing human brain pathology. Leveraging the Visium-SPG platform, we first mapped local brain microenvironments bearing Alzheimer’s disease pathology by identifying the presence of amyloid-beta and phosphorylated tau. We then investigated the transcriptional signatures surrounding amyloid-beta pathology in postmortem human brain tissue from donors diagnosed with Alzheimer&amp;rsquo;s disease. We further conducted a comprehensive computational analysis that allowed us to deconvolute Visium data at the level of individual expression spots, which allowed us to predict the relative enrichment of astrocyte and microglia populations surrounding amyloid plaques in comparison to neurons. Additionally, we employed an orthogonal RNA detection technology, RNAscope single molecule Fluorescence In Situ Hybridization (smFISH), to finely resolve gene expression changes of a selected subset of differentially expressed genes (DEGs) identified with Visium-SPG at cellular resolution.&lt;/p&gt;
&lt;p&gt;Overall, our study provides a roadmap for a comprehensive data analysis workflow that encompasses various experimental platforms, such as Visium-SPG and RNAscope smFISH, along with a diverse range of computational software tools, including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;VistoSeg&lt;/code&gt; for image-based preprocessing/segmentation,&lt;/li&gt;
&lt;li&gt;&lt;code&gt;spatialLIBD&lt;/code&gt;, &lt;code&gt;scran&lt;/code&gt;, &lt;code&gt;limma&lt;/code&gt;, and other Bioconductor packages,&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Harmony&lt;/code&gt; for batch correction,&lt;/li&gt;
&lt;li&gt;&lt;code&gt;BayesSpace&lt;/code&gt; for unsupervised clustering,&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Cell2location&lt;/code&gt; for spot deconvolution,&lt;/li&gt;
&lt;li&gt;&lt;code&gt;MAGMA&lt;/code&gt; for genetic risk enrichment analysis.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This integrated approach empowered us to delve into spatial proteogenomic analysis and examine the local tissue microenvironment harboring neuropathological lesions enriched with amyloid-beta pathology at the genome-wide gene expression level. We anticipate that our work will lay the groundwork for deploying multi-omic approaches in advancing the next frontiers of spatial multi-omics and contribute to a more comprehensive understanding of complex human brain biology and pathology.&lt;/p&gt;
&lt;p&gt;For a more in-depth exploration of our recent work, we share a link to our paper officially published in a special issue of &lt;em&gt;GEN Biotechnology&lt;/em&gt; focusing on spatial -omics: &lt;a href=&#34;https://doi.org/10.1089/genbio.2023.0019&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;https://doi.org/10.1089/genbio.2023.0019&lt;/a&gt;.&lt;/p&gt;
&lt;blockquote class=&#34;twitter-tweet&#34;&gt;&lt;p lang=&#34;en&#34; dir=&#34;ltr&#34;&gt;After 10+ years of research, my FIRST 1st-author paper &amp;amp; &lt;a href=&#34;https://twitter.com/biorxivpreprint?ref_src=twsrc%5Etfw&#34;&gt;@biorxivpreprint&lt;/a&gt; is finally OUT🥹I delved deep into the human inferior temporal cortex🧠in &lt;a href=&#34;https://twitter.com/hashtag/Alzheimersdisease?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#Alzheimersdisease&lt;/a&gt;, using &lt;a href=&#34;https://twitter.com/hashtag/VisiumSPG?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#VisiumSPG&lt;/a&gt; &lt;a href=&#34;https://twitter.com/hashtag/VectraPolaris?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#VectraPolaris&lt;/a&gt; &lt;a href=&#34;https://twitter.com/hashtag/RNAscope?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#RNAscope&lt;/a&gt; &amp;amp; &lt;a href=&#34;https://twitter.com/hashtag/HALO?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#HALO&lt;/a&gt; &lt;a href=&#34;https://twitter.com/Indica_Labs?ref_src=twsrc%5Etfw&#34;&gt;@Indica_Labs&lt;/a&gt;. Check out our📜at &lt;a href=&#34;https://t.co/QQEy1ZWc1V&#34;&gt;https://t.co/QQEy1ZWc1V&lt;/a&gt;🔥 &lt;a href=&#34;https://t.co/dBZNzL1S96&#34;&gt;pic.twitter.com/dBZNzL1S96&lt;/a&gt;&lt;/p&gt;&amp;mdash; Sang Ho (Sangho) Kwon (@sanghokwon17) &lt;a href=&#34;https://twitter.com/sanghokwon17/status/1650589385379962881?ref_src=twsrc%5Etfw&#34;&gt;April 24, 2023&lt;/a&gt;&lt;/blockquote&gt;
&lt;script async src=&#34;https://platform.twitter.com/widgets.js&#34; charset=&#34;utf-8&#34;&gt;&lt;/script&gt;


&lt;blockquote class=&#34;twitter-tweet&#34;&gt;&lt;p lang=&#34;en&#34; dir=&#34;ltr&#34;&gt;The inaugural SPECIAL ISSUE on &lt;a href=&#34;https://twitter.com/hashtag/SpatialOmics?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#SpatialOmics&lt;/a&gt; is here!&lt;br&gt;-Profiling Alzheimer’s using &lt;a href=&#34;https://twitter.com/10xGenomics?ref_src=twsrc%5Etfw&#34;&gt;@10xGenomics&lt;/a&gt; &lt;a href=&#34;https://twitter.com/lcolladotor?ref_src=twsrc%5Etfw&#34;&gt;@lcolladotor&lt;/a&gt;&lt;br&gt;-Spatial proteome of head and neck tumors &lt;a href=&#34;https://twitter.com/AkoyaBio?ref_src=twsrc%5Etfw&#34;&gt;@AkoyaBio&lt;/a&gt;&lt;br&gt;-Hyperspectral imaging &lt;a href=&#34;https://twitter.com/yaojj02?ref_src=twsrc%5Etfw&#34;&gt;@yaojj02&lt;/a&gt;&lt;br&gt;-Spatial omics and &lt;a href=&#34;https://twitter.com/hashtag/organoids?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#organoids&lt;/a&gt; &lt;a href=&#34;https://twitter.com/ahmetfcoskun?ref_src=twsrc%5Etfw&#34;&gt;@ahmetfcoskun&lt;/a&gt;&lt;br&gt;...and more!&lt;a href=&#34;https://t.co/ERpXTlih08&#34;&gt;https://t.co/ERpXTlih08&lt;/a&gt; &lt;a href=&#34;https://t.co/Moar3K3T3Q&#34;&gt;pic.twitter.com/Moar3K3T3Q&lt;/a&gt;&lt;/p&gt;&amp;mdash; GEN Biotechnology (@GENBiotechJrnl) &lt;a href=&#34;https://twitter.com/GENBiotechJrnl/status/1714304038320349394?ref_src=twsrc%5Etfw&#34;&gt;October 17, 2023&lt;/a&gt;&lt;/blockquote&gt;
&lt;script async src=&#34;https://platform.twitter.com/widgets.js&#34; charset=&#34;utf-8&#34;&gt;&lt;/script&gt;


&lt;p&gt;&lt;em&gt;Original draft by &lt;a href=&#34;https://twitter.com/sanghokwon17&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Sang Ho Kwon&lt;/a&gt; M.S. Reviewed and edited by &lt;a href=&#34;https://twitter.com/martinowk&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Keri Martinowich&lt;/a&gt; Ph.D and &lt;a href=&#34;https://twitter.com/lcolladotor&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Leonardo Collado-Torres&lt;/a&gt; Ph.D.&lt;/em&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Introduction to BiocMAP</title>
      <link>http://LieberInstitute.github.io/rstatsclub/2023/09/27/introduction-to-biocmap/</link>
      <pubDate>Wed, 27 Sep 2023 00:00:00 +0000</pubDate>
      <guid>http://LieberInstitute.github.io/rstatsclub/2023/09/27/introduction-to-biocmap/</guid>
      <description>&lt;p&gt;By &lt;a href=&#34;https://nick-eagles.github.io/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Nick Eagles&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Over the past few years, I&amp;rsquo;ve had the opportunity to work with a lot of whole-genome bisulfite-sequencing (WGBS) datasets. They provide a powerful opportunity to look at DNA methylation on a complete scale, in contrast to microarrays which target a narrower set of important CpG sites across the genome. But for this same reason, the data is often unwieldy, and can feel difficult to tackle even with access to powerful computational resources. At LIBD, we were excited by the opportunity to better characterize the role of methylation in development and psychiatric disorders like schizophrenia, and we&amp;rsquo;ve performed WGBS on thousands of samples in just a few years.&lt;/p&gt;
&lt;h2 id=&#34;the-challenges-of-wgbs&#34;&gt;The Challenges of WGBS&lt;/h2&gt;
&lt;p&gt;Despite the massive research opportunity, we had a huge computational challenge in the way. How could we turn thousands of raw sequencing files into methylation proportions for each gene? It&amp;rsquo;s not like the basic logistics of this preprocessing task is unsolved&amp;ndash; in fact, some great tools like &lt;a href=&#34;https://github.com/nf-core/methylseq&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;nf-core/methylseq&lt;/a&gt; exist to chain together the various steps (alignment to a reference genome, counting methylated and unmethylated reads of each gene, etc) into a fairly easy-to-use workflow. Could we just use something like &lt;code&gt;nf-core/methylseq&lt;/code&gt;?&lt;/p&gt;
&lt;p&gt;At the scale of our datasets, existing pipeline tools could take &lt;em&gt;years&lt;/em&gt; of (wall clock!) computational time, even with access to a high-performance computing cluster. We also noticed that many existing solutions would simply run out of memory, even when allocated gigantic (hundreds of GBs) amounts of RAM. We knew that our situation was unique&amp;ndash; and we&amp;rsquo;d need to carefully implement a workflow that was optimized for speed and efficient memory use.&lt;/p&gt;
&lt;h2 id=&#34;our-solution&#34;&gt;Our Solution&lt;/h2&gt;
&lt;p&gt;We developed &lt;a href=&#34;https://doi.org/10.1186/s12859-023-05461-3&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;BiocMAP&lt;/a&gt; after refining our internal preprocessing workflow. Much of the speed gains were simply achieved by using &lt;a href=&#34;https://doi.org/10.1093/bioinformatics/bty167&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Arioc&lt;/a&gt;, a GPU-based tool for alignment to the reference genome, when the standard in the field was to use &lt;a href=&#34;https://doi.org/10.1093/bioinformatics/btr167&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Bismark&lt;/a&gt; or other CPU-based tools. We limited memory usage by tricks like breaking data by chromosome, and using disk-based backends where possible, details we describe in the manuscript.&lt;/p&gt;
&lt;p&gt;The &amp;ldquo;Bioc&amp;rdquo; in BiocMAP stand for &lt;em&gt;Bioconductor-friendly&lt;/em&gt;&amp;ndash; BiocMAP collects all the methylation counts and proportions into &lt;code&gt;SummarizedExperiment&lt;/code&gt;-based objects in R, since these objects are how the Bioconductor community likes to represent experimental data of all kinds. A whole ecosystem of R packages is built around performing statistical analyses on &lt;code&gt;SummarizedExperiment&lt;/code&gt;-based objects.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2023-09-27-introduction-to-biocmap/SummarizedExperiment_structure.svg&#34; alt=&#34;Image credit: Morgan et al, retrieved from https://bioconductor.org/packages/release/bioc/vignettes/SummarizedExperiment/inst/doc/SummarizedExperiment.html&#34;&gt;&lt;/p&gt;
&lt;p&gt;So now I&amp;rsquo;m excited we just published the paper and the software is ready to share with the world!&lt;/p&gt;
&lt;h2 id=&#34;using-biocmap&#34;&gt;Using BiocMAP&lt;/h2&gt;
&lt;p&gt;We aimed to make BiocMAP simple to install and use on a variety of computing environments, while allowing a good deal of customization for interested users. I&amp;rsquo;ll show examples of running BiocMAP on a SLURM-managed cluster, though running on an SGE-managed cluster or just a single machine is possible too.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&#34;language-bash&#34;&gt;#   Install BiocMAP with singularity
git clone git@github.com:LieberInstitute/BiocMAP.git
cd BiocMAP
bash install_software.sh singularity

sbatch run_first_half_slurm.sh
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;That &lt;code&gt;install_software.sh&lt;/code&gt; script allows you to use Docker or Singularity to set BiocMAP up, and then we have shell scripts and configuration files that make it easy to run on computing clusters that have job schedulers like SLURM or SGE.&lt;/p&gt;
&lt;p&gt;We split the BiocMAP pipeline into two pieces because our experience with processing WGBS data involved collaboration and the use of more than one computing cluster. Since the use of GPUs is still sometimes seen as a new thing, some clusters may have more impressive GPU resources than others, while others may have more CPUs or overall memory. We found it useful to allow the flexibility of running the GPU-intensive alignment in a potentially different location than the remaining analysis steps. Nothing&amp;rsquo;s stopping you from running everything on one machine or cluster though:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&#34;language-bash&#34;&gt;sbatch run_first_half_slurm.sh

#   Once the first module finishes:
sbatch run_second_half_slurm.sh
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Do you have a lot of WGBS data and access to GPUs? BiocMAP may be helpful in powering through the preprocessing so you can help focus on the interesting part of your research&amp;ndash; the statistical analysis.&lt;/p&gt;
&lt;p&gt;Check out our &lt;a href=&#34;https://doi.org/10.1186/s12859-023-05461-3&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;manuscript&lt;/a&gt;, &lt;a href=&#34;https://github.com/LieberInstitute/BiocMAP&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;code&lt;/a&gt;, and &lt;a href=&#34;http://research.libd.org/BiocMAP/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;documentation&lt;/a&gt;!&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Lessons Learned Applying Tangram on Visium Data</title>
      <link>http://LieberInstitute.github.io/rstatsclub/2021/03/09/lessons-learned-applying-tangram-on-visium-data/</link>
      <pubDate>Tue, 09 Mar 2021 00:00:00 +0000</pubDate>
      <guid>http://LieberInstitute.github.io/rstatsclub/2021/03/09/lessons-learned-applying-tangram-on-visium-data/</guid>
      <description>
&lt;script src=&#34;http://LieberInstitute.github.io/rstatsclub/rstatsclub/rmarkdown-libs/header-attrs/header-attrs.js&#34;&gt;&lt;/script&gt;


&lt;p&gt;By &lt;a href=&#34;https://github.com/Nick-Eagles&#34;&gt;Nick Eagles&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;We’ve recently been interested in exploring the (largely python-based) tools others have published to process spatial transcriptomics data for various end goals. A common goal is to integrate data from platforms like &lt;a href=&#34;https://www.10xgenomics.com/products/spatial-gene-expression&#34;&gt;Visium&lt;/a&gt;, which provides some information about how gene expression is spatially organized, with other approaches with potentially better spatial resolution or gene throughput. In particular, we came across a &lt;a href=&#34;https://www.biorxiv.org/content/10.1101/2020.08.29.272831v3&#34;&gt;paper&lt;/a&gt; by Biancalani, Scalia et al. presenting a tool called &lt;a href=&#34;https://github.com/broadinstitute/Tangram&#34;&gt;Tangram&lt;/a&gt;, and were particularly interested in a component of the tool which could map individual cells from single cell gene expression data onto the spatial voxels probed by Visium. I encourage you to check out the paper for a more detailed description of their approach, as well as the other capabilities of their software which I won’t be covering.&lt;/p&gt;
&lt;p&gt;There’s a lot to talk about around these topics– integrating spatial gene expression data with other forms of data, installing and running external software, and much more– but this blog post will focus on data science and machine learning lessons I learned while trying to apply Tangram on some private data.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; I will regularly refer to &lt;a href=&#34;https://www.biorxiv.org/content/10.1101/2020.08.29.272831v3&#34;&gt;the Tangram manuscript&lt;/a&gt;, and several conceptual points I make (especially the role of data sparsity, trusting scores of well-selected training genes, and descriptions of the mapping learned by Tangram) are inspired by or are paraphrased from the manuscript. I intend this blog post in part to discuss these ideas, for which the authors deserve credit.&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2021-03-09-lessons-learned-applying-tangram-on-visium-data_files/tangram_puzzle.jpg&#34; width=&#34;400&#34; alt=&#34;&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;Image credit: By Nevit Dilmen - Own work, CC BY-SA 3.0, &lt;a href=&#34;https://commons.wikimedia.org/w/index.php?curid=1798693&#34; class=&#34;uri&#34;&gt;https://commons.wikimedia.org/w/index.php?curid=1798693&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;the-initial-plan&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;The Initial Plan&lt;/h2&gt;
&lt;p&gt;OK, so I had read through the paper and felt I got a basic understanding of what Tangram was doing, and some concept about how it worked. Admittedly, I had to skip over some of the more technical and advanced biological details, but I felt I had enough of the context I needed to get started. We could take single cell expression data, and learn where to “place” individual cells onto the voxels containing Visium measurements. Deep learning was used to infer this mapping?&lt;/p&gt;
&lt;p&gt;My coworkers and I got started with &lt;a href=&#34;https://github.com/broadinstitute/Tangram/blob/master/example/1_tutorial_tangram.ipynb&#34;&gt;Tangram’s main tutorial&lt;/a&gt;, where we saw that Tangram trained its mapping using a subset of genes, and that the proper selection of training genes was crucial for a robust mapping. I had prepared &lt;code&gt;AnnData&lt;/code&gt; objects for some private data, containing single cell and Visium data, as required by the tutorial, and was ready to follow along with the code. Then, I realized that none of the marker genes selected for use in the example tutorial were present in our own data.&lt;/p&gt;
&lt;p&gt;Somewhat naively, I was ready to experiment with different gene selection approaches, and rank them by average similarity score achieved in the test genes. Other metrics might be preferable to evaluate performance (the paper makes use of “spatial correlation”), but cosine similarity by gene was readily available as output from the tangram function &lt;code&gt;tg.compare_spatial_geneexp&lt;/code&gt;, to compare mapped and actual expression. For each gene selection method, I took the spatially-mapped single cell expression object &lt;code&gt;ad_ge&lt;/code&gt;, and the Visium expression object &lt;code&gt;ad_sp&lt;/code&gt;, and computed the following metrics:&lt;/p&gt;
&lt;pre class=&#34;python&#34;&gt;&lt;code&gt;df_all_genes = tg.compare_spatial_geneexp(ad_ge, ad_sp)

#  Compute average cosine similarity for test genes
np.mean(df_all_genes.score[np.logical_not(df_all_genes.is_training)])

#  Compute average cosine similarity for training genes
np.mean(df_all_genes.score[df_all_genes.is_training])&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Strangely, regardless of the selection approach used, I achieved training and test scores around 0.9 and 0.16, respectively, of course with some variation. Even for the example tutorial’s data and gene set, I achieved scores with a similarly large performance gap. What was going on here?&lt;/p&gt;
&lt;p&gt;I more carefully reviewed the paper, and thought some more.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;the-importance-of-understanding-your-data&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;The Importance of Understanding your Data&lt;/h2&gt;
&lt;p&gt;As was well described in the manuscript, sparsity of gene expression, especially in the spatial data, could negatively impact similarity scores. In a particularly extreme hypothetical case, we might have a gene expressed somewhat in many cells, but whose expression in the spatial data is nonzero in only a few voxels. No matter the arrangement of cells onto voxels, it will receive a poor score owing to a relative lack of data needed to demonstrate the “true expression profile”. Figure 4f in the manuscript provides a visual display of the impact of sparsity on model performance by gene.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2021-03-09-lessons-learned-applying-tangram-on-visium-data_files/fig4f.jpg&#34; width=&#34;400&#34; /&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://www.biorxiv.org/content/biorxiv/early/2020/09/24/2020.08.29.272831/F4.large.jpg&#34;&gt;Image source&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;In my case, I was disproportionately selecting genes with large and diverse expression levels for training genes, and not considering expression at all when selecting test genes. We were dealing with data that &lt;strong&gt;fundamentally exhibited sparsity&lt;/strong&gt;, a feature significantly influencing the model’s performance, and by ignoring it, I had created imbalanced training and test sets.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;the-importance-of-understanding-your-model&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;The Importance of Understanding your Model&lt;/h2&gt;
&lt;p&gt;My first instinct when observing the training/test performance gap was that serious overfitting was occurring. I was under the impression a deep neural network was used as the model which was to learn the mapping from cell to spatial voxel. However, the more I thought about it, the less it made sense that a basic arrangement task would require or even benefit from a deep neural network- this felt almost like a regression problem. Would features could a neural network even learn to solve a problem like this?&lt;/p&gt;
&lt;p&gt;I dug deeper into the methods of the paper, and discovered the model at hand was simply a matrix, assigning probabilities for each cell to each spatial voxel. These probabilities were “directly” optimized by gradient descent to maximize a similarity score (cosine similarity) between assigned and observed gene expression levels. I made an assumption perhaps based on the title of the manuscript- but the reality was that a very “shallow”, relatively simple model was being used. In this sense, we should already be less afraid about the potential of overfitting.&lt;/p&gt;
&lt;p&gt;Thinking more about the task Tangram’s model was learning to perform, I found it more helpful to consider the single cell- spatial mapping to be a complete jigsaw puzzle, and cells to be pieces (yes, I know the whole purpose of the name “Tangram” is as a similar metaphor). Well– imagine these pieces could hypothetically fit together in any arrangement so long as the complete picture’s shape was fixed. In this metaphor, using carefully selected training genes would be like erasing small, uninformative parts of the image on each puzzle piece. When we complete the puzzle, we will see the underlying big picture fairly well. Then, we wouldn’t particularly be worried that the erased segments might contradict what we already have in place, since we already have solid visual evidence our arrangement is good. Analogously, provided our training genes are well-selected, good training scores can give us confidence a robust mapping was found, in which we can trust test genes to be well-placed.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;takeaways&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Takeaways&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;consider the nature of your data when interpreting results, such as performance of a model on subsets of this data&lt;/li&gt;
&lt;li&gt;take time to understand your model, so that you can understand how to interpret its performance&lt;/li&gt;
&lt;li&gt;some papers need more than a brief skim :)&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Using tidymodels to Predict Health Insurance Cost</title>
      <link>http://LieberInstitute.github.io/rstatsclub/2021/02/15/using-tidymodels-to-predict-health-insurance-cost/</link>
      <pubDate>Mon, 15 Feb 2021 00:00:00 +0000</pubDate>
      <guid>http://LieberInstitute.github.io/rstatsclub/2021/02/15/using-tidymodels-to-predict-health-insurance-cost/</guid>
      <description>
&lt;script src=&#34;http://LieberInstitute.github.io/rstatsclub/rstatsclub/rmarkdown-libs/header-attrs/header-attrs.js&#34;&gt;&lt;/script&gt;

&lt;p&gt;By &lt;a href=&#34;https://twitter.com/artaseyedian&#34;&gt;Arta Seyedian&lt;/a&gt;&lt;/p&gt;
&lt;div id=&#34;medical-cost-personal-datasets&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Medical Cost Personal Datasets&lt;/h1&gt;
&lt;div id=&#34;insurance-forecast-by-using-linear-regression&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Insurance Forecast by using Linear Regression&lt;/h2&gt;
&lt;p&gt;&lt;a href=&#34;https://www.kaggle.com/mirichoi0218/insurance&#34;&gt;Link to Kaggle Page&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/stedy/Machine-Learning-with-R-datasets/blob/master/insurance.csv&#34;&gt;Link to GitHub Source&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Around the end of October 2020, I attended the Open Data Science Conference primarily for the workshops and training sessions that were offered. The first workshop I attended was a demonstration by &lt;a href=&#34;https://www.jaredlander.com/&#34;&gt;Jared Lander&lt;/a&gt; on how to implement machine learning methods in R using a new package named &lt;em&gt;tidymodels&lt;/em&gt;. I went into that training knowing almost nothing about machine learning, and have since then drawn exclusively from free online materials to understand how to analyze data using this “meta-package.”&lt;/p&gt;
&lt;p&gt;As a brief introduction, tidymodels is, like tidyverse, not a single package but rather a collection of data science packages designed according to &lt;a href=&#34;https://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.html&#34;&gt;tidyverse principles&lt;/a&gt;. Many of the packages present in tidymodels are also present in tidyverse. What makes tidymodels different from tidyverse, however, is that many of these packages are meant for predictive modeling and provide a universal standard interface for all of the different machine learning methods available in R.&lt;/p&gt;
&lt;p&gt;Today, we are using a data set of health insurance information from ~1300 customers of a health insurance company. This data set is sourced from a book titled &lt;em&gt;Machine Learning with R&lt;/em&gt; by Brett Lantz.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(tidyverse)
library(tidymodels)
library(data.table)

download.file(&amp;quot;https://raw.githubusercontent.com/stedy/Machine-Learning-with-R-datasets/master/insurance.csv&amp;quot;, 
              &amp;quot;insurance.csv&amp;quot;)

insur_dt &amp;lt;- fread(&amp;quot;insurance.csv&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;insur_dt %&amp;gt;% colnames()&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] &amp;quot;age&amp;quot;      &amp;quot;sex&amp;quot;      &amp;quot;bmi&amp;quot;      &amp;quot;children&amp;quot; &amp;quot;smoker&amp;quot;   &amp;quot;region&amp;quot;   &amp;quot;charges&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;insur_dt$age %&amp;gt;% summary()&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   18.00   27.00   39.00   39.21   51.00   64.00&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;insur_dt$sex %&amp;gt;% table()&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## .
## female   male 
##    662    676&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;insur_dt$bmi %&amp;gt;% summary()&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   15.96   26.30   30.40   30.66   34.69   53.13&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;insur_dt$smoker %&amp;gt;% table()&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## .
##   no  yes 
## 1064  274&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;insur_dt$charges %&amp;gt;% summary()&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1122    4740    9382   13270   16640   63770&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Above, you’ll noticed I loaded packages such as &lt;code&gt;parsnip&lt;/code&gt; and &lt;code&gt;recipes&lt;/code&gt;. These packages, together with others, form the meta-package &lt;code&gt;tidymodels&lt;/code&gt; used for modeling and statistical analysis. You can learn more about it &lt;a href=&#34;https://www.tidymodels.org/&#34;&gt;here&lt;/a&gt;. Usually, you can simply call &lt;code&gt;library(tidymodels)&lt;/code&gt;, but Kaggle R notebooks seem unable to install and/or load it for the time being, which is fine.&lt;/p&gt;
&lt;p&gt;As you can see, there are 7 different relatively self-explanatory variables in this data set, some of which are presumably used by the benevolent private health insurance company in question to determine how much a given individual is ultimately charged. &lt;code&gt;age&lt;/code&gt;, &lt;code&gt;sex&lt;/code&gt; and &lt;code&gt;region&lt;/code&gt; appear to be demographics; with age going no lower than 18 and no greater than 64 with a mean of about 40. The two factor levels in &lt;code&gt;sex&lt;/code&gt; seem to be about the same in quantity.&lt;/p&gt;
&lt;p&gt;Assuming that the variable &lt;code&gt;bmi&lt;/code&gt; corresponds to Body Mass Index, according to the &lt;a href=&#34;https://www.cdc.gov/healthyweight/assessing/bmi/adult_bmi/index.html&#34;&gt;CDC&lt;/a&gt;, a BMI of 30 or above is considered clinically obese. In our present data set, the average is just over the cusp of obese.&lt;/p&gt;
&lt;p&gt;Next we have the number of smokers vs non-smokers. As someone who has filled out even one form before in my life, I can definitely tell you that &lt;code&gt;smoker&lt;/code&gt; is going to be important going forward in determining the &lt;code&gt;charge&lt;/code&gt; of each given heath insurance customer.&lt;/p&gt;
&lt;p&gt;Lastly, we have &lt;code&gt;charge&lt;/code&gt;. The average annual charge for health insurance is a modest $13,000.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;exploratory-data-analysis&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Exploratory Data Analysis&lt;/h2&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;skimr::skim(insur_dt)&lt;/code&gt;&lt;/pre&gt;
&lt;table&gt;
&lt;caption&gt;&lt;span id=&#34;tab:unnamed-chunk-3&#34;&gt;Table 1: &lt;/span&gt;Data summary&lt;/caption&gt;
&lt;tbody&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;left&#34;&gt;Name&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;insur_dt&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td align=&#34;left&#34;&gt;Number of rows&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;1338&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;left&#34;&gt;Number of columns&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td align=&#34;left&#34;&gt;_______________________&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;left&#34;&gt;Column type frequency:&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td align=&#34;left&#34;&gt;character&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;left&#34;&gt;numeric&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td align=&#34;left&#34;&gt;________________________&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;left&#34;&gt;Group variables&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;None&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;Variable type: character&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr class=&#34;header&#34;&gt;
&lt;th align=&#34;left&#34;&gt;skim_variable&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;n_missing&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;complete_rate&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;min&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;max&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;empty&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;n_unique&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;whitespace&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;left&#34;&gt;sex&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;1&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;4&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;6&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td align=&#34;left&#34;&gt;smoker&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;1&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;3&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;left&#34;&gt;region&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;1&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;9&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;9&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;4&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;Variable type: numeric&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr class=&#34;header&#34;&gt;
&lt;th align=&#34;left&#34;&gt;skim_variable&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;n_missing&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;complete_rate&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;mean&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;sd&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;p0&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;p25&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;p50&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;p75&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;p100&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;hist&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;left&#34;&gt;age&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;1&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;39.21&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;14.05&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;18.00&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;27.00&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;39.00&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;51.00&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;64.00&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;▇▅▅▆▆&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td align=&#34;left&#34;&gt;bmi&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;1&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;30.66&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;6.10&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;15.96&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;26.30&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;30.40&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;34.69&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;53.13&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;▂▇▇▂▁&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;left&#34;&gt;children&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;1&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;1.09&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;1.21&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.00&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.00&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;1.00&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2.00&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;5.00&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;▇▂▂▁▁&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td align=&#34;left&#34;&gt;charges&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;1&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;13270.42&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;12110.01&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;1121.87&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;4740.29&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;9382.03&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;16639.91&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;63770.43&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;▇▂▁▁▁&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;table(insur_dt$sex)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 
## female   male 
##    662    676&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I want to note that this data set is pretty clean; you will probably never encounter a data set like this in the wild. There are no &lt;code&gt;NA&lt;/code&gt;s and, as I mentioned before, no class imbalance along &lt;code&gt;sex&lt;/code&gt;. Let’s look at the distribution of children:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;table(insur_dt$children)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 
##   0   1   2   3   4   5 
## 574 324 240 157  25  18&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Pretty standard; the plurality of people in this set do not have children. The next highest amount is 1, the second highest 2, etc.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;options(repr.plot.width=15, repr.plot.height = 10)

insur_dt %&amp;gt;%
    select(age, bmi, children, smoker, region, charges) %&amp;gt;%
    GGally::ggpairs(mapping = aes(color = region))&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Registered S3 method overwritten by &amp;#39;GGally&amp;#39;:
##   method from   
##   +.gg   ggplot2&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/rstatsclub/post/2021-02-15-Using-tidymodels-to-predict-medical-insurance-costs_files/figure-html/unnamed-chunk-5-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;GGally&lt;/code&gt; is a package that facilitates the process of exploratory data analysis by automatically generating &lt;code&gt;ggplots&lt;/code&gt; with the variables present in the input data frame to help you get a better understanding of the relationships that might exist between them. Most of these plots are just noise, but there are a few interesting ones, such as the two on the bottom left assessing &lt;code&gt;charge&lt;/code&gt; vs &lt;code&gt;age&lt;/code&gt; and &lt;code&gt;charge&lt;/code&gt; vs &lt;code&gt;bmi&lt;/code&gt;. Further to the right, there is also &lt;code&gt;charge&lt;/code&gt; vs &lt;code&gt;smoker&lt;/code&gt;. Let’s take a closer look at some of these relationships:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;insur_dt %&amp;gt;% ggplot(aes(color = region)) + facet_wrap(~ region)+
  geom_point(mapping = aes(x = bmi, y = charges))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/rstatsclub/post/2021-02-15-Using-tidymodels-to-predict-medical-insurance-costs_files/figure-html/unnamed-chunk-6-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;I wanted to see if there are regions that are somehow charged at a different rate than the others, but these plots all look basically the same. If you’ll notice, there are about two different blobs projecting from 0,0 to the center of the plot. We’ll get back to that later.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;insur_dt %&amp;gt;% ggplot(aes(color = region)) + facet_wrap(~ region)+
  geom_point(mapping = aes(x = age, y = charges))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/rstatsclub/post/2021-02-15-Using-tidymodels-to-predict-medical-insurance-costs_files/figure-html/unnamed-chunk-7-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Here, I wanted to see if there was any sort of noticeable relationship between &lt;code&gt;age&lt;/code&gt; and &lt;code&gt;charges&lt;/code&gt;. Across the four &lt;code&gt;region&lt;/code&gt;s, most tend to lie on a slope near the X-axis increasing modestly with &lt;code&gt;age&lt;/code&gt;. There are, however, a pattern that appears to be two levels coming off of that baseline. Since we don’t have a variable for the type of health insurance plan these people are using, we should probably hold off on any judgements on what this could be for now.&lt;/p&gt;
&lt;p&gt;Let’s move onto what is undoubtedly the pièce de résistance of health insurance coverage: smokers.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;insur_dt %&amp;gt;%
    select(smoker, bmi, charges) %&amp;gt;%
    ggplot(aes(color = smoker)) +
    geom_point(mapping = aes(x = bmi, y = charges))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/rstatsclub/post/2021-02-15-Using-tidymodels-to-predict-medical-insurance-costs_files/figure-html/unnamed-chunk-8-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Wow. What a stark difference. Here, you can see that &lt;code&gt;smoker&lt;/code&gt; almost creates a whole new blob of points separate from non-smokers… and that blob sharply rises after &lt;code&gt;bmi = 30&lt;/code&gt;. Say, what was the CDC official cutoff for obesity again?&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;insur_dt$age_bins &amp;lt;- cut(insur_dt$age,
                breaks = c(18,20,30,40,50,60,70,80,90),
                include.lowest = TRUE,
                right = TRUE)

insur_dt %&amp;gt;%
    select(bmi, charges, sex, age_bins) %&amp;gt;%
    ggplot(aes(color = age_bins)) +
    geom_point(mapping = aes(x = bmi, y = charges))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/rstatsclub/post/2021-02-15-Using-tidymodels-to-predict-medical-insurance-costs_files/figure-html/unnamed-chunk-9-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;You can see that &lt;code&gt;age&lt;/code&gt; does play a role in &lt;code&gt;charge&lt;/code&gt;, but it’s still stratified within the 3-ish clusters of points, so even among the high-&lt;code&gt;bmi&lt;/code&gt; smokers, younger people still pay less money than older people in a consistent way, so it makes sense. However, it does not appear that age interacts with &lt;code&gt;bmi&lt;/code&gt; or &lt;code&gt;smoker&lt;/code&gt;, meaning that it independently effects the &lt;code&gt;charge&lt;/code&gt;.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;insur_dt %&amp;gt;%
    select(children, charges, sex) %&amp;gt;%
    ggplot(aes(x = children, y = charges, group = children)) +
    geom_boxplot(outlier.alpha = 0.5, aes(fill = children)) +
    theme(legend.position = &amp;quot;none&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/rstatsclub/post/2021-02-15-Using-tidymodels-to-predict-medical-insurance-costs_files/figure-html/unnamed-chunk-10-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Finally, &lt;code&gt;children&lt;/code&gt; does not affect &lt;code&gt;charge&lt;/code&gt; significantly.&lt;/p&gt;
&lt;p&gt;I think we’ve done enough exploratory analysis to establish that &lt;code&gt;bmi&lt;/code&gt; and &lt;code&gt;smoker&lt;/code&gt; together form a synergistic effect on &lt;code&gt;charge&lt;/code&gt;, and that &lt;code&gt;age&lt;/code&gt; also influences &lt;code&gt;charge&lt;/code&gt; as well.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;build-model&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Build Model&lt;/h2&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;set.seed(123)

insur_split &amp;lt;- initial_split(insur_dt, strata = smoker)

insur_train &amp;lt;- training(insur_split)
insur_test &amp;lt;- testing(insur_split)

# we are going to do data processing and feature engineering with recipes

# below, we are going to predict charges using everything else(&amp;quot;.&amp;quot;)
insur_rec &amp;lt;- recipe(charges ~ bmi + age + smoker, data = insur_train) %&amp;gt;%
    step_dummy(all_nominal()) %&amp;gt;%
    step_normalize(all_numeric(), -all_outcomes()) %&amp;gt;%
    step_interact(terms = ~ bmi:smoker_yes)

test_proc &amp;lt;- insur_rec %&amp;gt;% prep() %&amp;gt;% bake(new_data = insur_test)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning: partial match of &amp;#39;object&amp;#39; to &amp;#39;objects&amp;#39;

## Warning: partial match of &amp;#39;object&amp;#39; to &amp;#39;objects&amp;#39;

## Warning: partial match of &amp;#39;object&amp;#39; to &amp;#39;objects&amp;#39;

## Warning: partial match of &amp;#39;object&amp;#39; to &amp;#39;objects&amp;#39;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We first split our data into training and testing sets. We stratify sampling by &lt;code&gt;smoker&lt;/code&gt; status because there is an imbalance there and we want them to be equally represented in both the training and testing data sets. This is accomplished by first conducting random sampling within these classes.&lt;/p&gt;
&lt;p&gt;An explanation of the &lt;code&gt;recipe&lt;/code&gt;:&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;&lt;p&gt;We are going to model the effect of &lt;code&gt;bmi&lt;/code&gt;, &lt;code&gt;age&lt;/code&gt; and &lt;code&gt;smoker&lt;/code&gt; on &lt;code&gt;charges&lt;/code&gt;. We do not specify interactions in this step because &lt;code&gt;recipe&lt;/code&gt; handles interactions as a step.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;We create dummy variables (&lt;code&gt;step_dummy&lt;/code&gt;) for all nominal predictors, so &lt;code&gt;smoker&lt;/code&gt; becomes &lt;code&gt;smoker_yes&lt;/code&gt; and &lt;code&gt;smoker_no&lt;/code&gt; is “implied” through omission (so if a row has &lt;code&gt;smoker_yes == 0&lt;/code&gt;) because some models cannot have all dummy variables present as columns. To include all dummy variables, you can use &lt;code&gt;one_hot = TRUE&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;We then normalize all numeric predictors &lt;strong&gt;except&lt;/strong&gt; our outcome variable(&lt;code&gt;step_normalize(all_numeric(), -all_outcomes())&lt;/code&gt;), because you generally want to avoid transformations on outcomes when training and developing a model lest another data set inconsistent with the one you’re using comes along and breaks your model. It’s best do do transformations on outcomes before creating a &lt;code&gt;recipe&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;We are setting an interaction term; &lt;code&gt;bmi&lt;/code&gt; and &lt;code&gt;smoker_yes&lt;/code&gt; (the dummy variable for &lt;code&gt;smoker&lt;/code&gt;), all interact with each other when effecting the outcome. Earlier, we noticed that older patients are charged more, and that older patients with higher &lt;code&gt;bmi&lt;/code&gt; are charged even more than that. Well, older patients with a higher &lt;code&gt;bmi&lt;/code&gt; who smoke are charged the most out of anyone in our data set. We observed this visually when looking at the plot, so we are going to also test this in the model we will develop.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Let’s actually specify the model. We are going to be working with a k-Nearest Neighbors model to compare it later with another model. The KNN model is simply defined &lt;a href=&#34;https://bookdown.org/tpinto_home/Regression-and-Classification/k-nearest-neighbours-regression.html&#34;&gt;as follows:`&lt;/a&gt;):&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;KNN regression is a non-parametric method that, in an intuitive manner, approximates the association between independent variables and the continuous outcome by averaging the observations in the same neighbourhood. The size of the neighbourhood needs to be set by the analyst or can be chosen using cross-validation (we will see this later) to select the size that minimises the mean-squared error.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;To keep things simple, we are not going to use cross-validation to find the optimal &lt;code&gt;k&lt;/code&gt;. Instead, we are just going to say &lt;code&gt;k = 10&lt;/code&gt;.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;knn_spec &amp;lt;- nearest_neighbor(neighbors = 10) %&amp;gt;%
    set_engine(&amp;quot;kknn&amp;quot;) %&amp;gt;%
    set_mode(&amp;quot;regression&amp;quot;)

knn_fit &amp;lt;- knn_spec %&amp;gt;%
    fit(charges ~ age + bmi + smoker_yes + bmi_x_smoker_yes,
        data = juice(insur_rec %&amp;gt;% prep()))&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning: partial match of &amp;#39;object&amp;#39; to &amp;#39;objects&amp;#39;

## Warning: partial match of &amp;#39;object&amp;#39; to &amp;#39;objects&amp;#39;&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;insur_wf &amp;lt;- workflow() %&amp;gt;%
    add_recipe(insur_rec) %&amp;gt;%
    add_model(knn_spec)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We specified the model &lt;code&gt;knn_spec&lt;/code&gt; by calling the model itself from &lt;code&gt;parsnip&lt;/code&gt;, then we &lt;code&gt;set_engine&lt;/code&gt; and set the mode to regression. Note the &lt;code&gt;neighbors&lt;/code&gt; parameter in &lt;code&gt;nearest_neighbor&lt;/code&gt;. That corresponds to the &lt;code&gt;k&lt;/code&gt; in &lt;code&gt;knn&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;We then fit the model using the model specification to our data. Because we already computed columns for the &lt;code&gt;bmi&lt;/code&gt; and &lt;code&gt;smoker_yes&lt;/code&gt; interaction, we do not need to represent the interaction formulaically again.&lt;/p&gt;
&lt;p&gt;Let’s evaluate this model to see how it does.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;insur_cv &amp;lt;- vfold_cv(insur_train, prop = 0.9)

insur_rsmpl &amp;lt;- fit_resamples(insur_wf,
                           insur_cv,
                           control = control_resamples(save_pred = TRUE))&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 
## Attaching package: &amp;#39;rlang&amp;#39;&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## The following object is masked from &amp;#39;package:data.table&amp;#39;:
## 
##     :=&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## The following objects are masked from &amp;#39;package:purrr&amp;#39;:
## 
##     %@%, as_function, flatten, flatten_chr, flatten_dbl, flatten_int,
##     flatten_lgl, flatten_raw, invoke, list_along, modify, prepend,
##     splice&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 
## Attaching package: &amp;#39;vctrs&amp;#39;&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## The following object is masked from &amp;#39;package:dplyr&amp;#39;:
## 
##     data_frame&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## The following object is masked from &amp;#39;package:tibble&amp;#39;:
## 
##     data_frame&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## ! Fold01: preprocessor 1/1: partial match of &amp;#39;object&amp;#39; to &amp;#39;objects&amp;#39;&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## ! Fold01: preprocessor 1/1, model 1/1 (predictions): partial match of &amp;#39;object&amp;#39; to ...&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## ! Fold02: preprocessor 1/1: partial match of &amp;#39;object&amp;#39; to &amp;#39;objects&amp;#39;&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## ! Fold02: preprocessor 1/1, model 1/1 (predictions): partial match of &amp;#39;object&amp;#39; to ...&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## ! Fold03: preprocessor 1/1: partial match of &amp;#39;object&amp;#39; to &amp;#39;objects&amp;#39;&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## ! Fold03: preprocessor 1/1, model 1/1 (predictions): partial match of &amp;#39;object&amp;#39; to ...&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## ! Fold04: preprocessor 1/1: partial match of &amp;#39;object&amp;#39; to &amp;#39;objects&amp;#39;&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## ! Fold04: preprocessor 1/1, model 1/1 (predictions): partial match of &amp;#39;object&amp;#39; to ...&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## ! Fold05: preprocessor 1/1: partial match of &amp;#39;object&amp;#39; to &amp;#39;objects&amp;#39;&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## ! Fold05: preprocessor 1/1, model 1/1 (predictions): partial match of &amp;#39;object&amp;#39; to ...&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## ! Fold06: preprocessor 1/1: partial match of &amp;#39;object&amp;#39; to &amp;#39;objects&amp;#39;&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## ! Fold06: preprocessor 1/1, model 1/1 (predictions): partial match of &amp;#39;object&amp;#39; to ...&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## ! Fold07: preprocessor 1/1: partial match of &amp;#39;object&amp;#39; to &amp;#39;objects&amp;#39;&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## ! Fold07: preprocessor 1/1, model 1/1 (predictions): partial match of &amp;#39;object&amp;#39; to ...&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## ! Fold08: preprocessor 1/1: partial match of &amp;#39;object&amp;#39; to &amp;#39;objects&amp;#39;&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## ! Fold08: preprocessor 1/1, model 1/1 (predictions): partial match of &amp;#39;object&amp;#39; to ...&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## ! Fold09: preprocessor 1/1: partial match of &amp;#39;object&amp;#39; to &amp;#39;objects&amp;#39;&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## ! Fold09: preprocessor 1/1, model 1/1 (predictions): partial match of &amp;#39;object&amp;#39; to ...&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## ! Fold10: preprocessor 1/1: partial match of &amp;#39;object&amp;#39; to &amp;#39;objects&amp;#39;&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## ! Fold10: preprocessor 1/1, model 1/1 (predictions): partial match of &amp;#39;object&amp;#39; to ...&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;insur_rsmpl %&amp;gt;% collect_metrics()&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 2 x 6
##   .metric .estimator     mean     n  std_err .config             
##   &amp;lt;chr&amp;gt;   &amp;lt;chr&amp;gt;         &amp;lt;dbl&amp;gt; &amp;lt;int&amp;gt;    &amp;lt;dbl&amp;gt; &amp;lt;chr&amp;gt;               
## 1 rmse    standard   4916.       10 274.     Preprocessor1_Model1
## 2 rsq     standard      0.827    10   0.0194 Preprocessor1_Model1&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;summary(insur_dt$charges)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1122    4740    9382   13270   16640   63770&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We set &lt;code&gt;vfold_cv&lt;/code&gt; (which is the cross validation that most people are familiar with, wherein the training data is split into V folds and then is trained on V-1 folds in order to make a prediction on the last fold, and is repeated so that all folds are trained and used as a prediction fold) to a &lt;code&gt;prop&lt;/code&gt; of &lt;code&gt;0.9&lt;/code&gt;, which is the same as specifying 9 training folds and 1 testing fold (within our training data).&lt;/p&gt;
&lt;p&gt;We then finally run the cross validation by using &lt;code&gt;fit_resamples&lt;/code&gt;. As you can see, we used our workflow object as our input.&lt;/p&gt;
&lt;p&gt;Finally, we call &lt;code&gt;collect_metrics&lt;/code&gt; to examine the model effectiveness. We end up with an &lt;code&gt;rmse&lt;/code&gt; of 4,915 and an &lt;code&gt;rsq&lt;/code&gt; of &lt;code&gt;0.82&lt;/code&gt;. The RMSE would suggest that, on average, our predictions varied from observed values by an absolute measure of 4,915, in this case, dollars in &lt;code&gt;charges&lt;/code&gt;. The R^2 would suggest that our regression has a fit of ~82%, although a high R^2 doesn’t always mean the model has a good fit and a low R^2 doesn’t always mean that a model has a poor fit.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;insur_rsmpl %&amp;gt;%
    unnest(.predictions) %&amp;gt;%
    ggplot(aes(charges, .pred, color = id)) + 
    geom_abline(lty = 2, color = &amp;quot;gray80&amp;quot;, size = 1.5) + 
    geom_point(alpha = 0.5) + 
    theme(legend.position = &amp;quot;none&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/rstatsclub/post/2021-02-15-Using-tidymodels-to-predict-medical-insurance-costs_files/figure-html/unnamed-chunk-14-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Above is a demonstration of our regression fit to a line. There is a large cluster of values that are model simply does not capture, and we could learn more about these points, but instead we are going to move on to applying our model to our test data, which we defined much earlier in this project.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;insur_test_res &amp;lt;- predict(knn_fit, new_data = test_proc %&amp;gt;% select(-charges))&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning: partial match of &amp;#39;fit&amp;#39; to &amp;#39;fitted.values&amp;#39;&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;insur_test_res &amp;lt;- bind_cols(insur_test_res, insur_test %&amp;gt;% select(charges))

insur_test_res&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 334 x 2
##     .pred charges
##     &amp;lt;dbl&amp;gt;   &amp;lt;dbl&amp;gt;
##  1  4339.   3757.
##  2 27038.  27809.
##  3  2231.   1837.
##  4  6500.   6204.
##  5  2794.   4688.
##  6  6057.   6314.
##  7 14335.  12630.
##  8  1663.   2211.
##  9  5655.   3580.
## 10 39401.  37743.
## # … with 324 more rows&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We’ve now applied our model to &lt;code&gt;test_proc&lt;/code&gt;, which is the test set after we’ve used the &lt;code&gt;recipes&lt;/code&gt; preprocessing steps on them to transform them in the same way we transformed our training data. We bind the resulting predictions with the actual &lt;code&gt;charges&lt;/code&gt; found in the training data to create a two-column table with our predictions and the corresponding real values we attempted to predict.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;ggplot(insur_test_res, aes(x = charges, y = .pred)) +
  # Create a diagonal line:
  geom_abline(lty = 2) +
  geom_point(alpha = 0.5) +
  labs(y = &amp;quot;Predicted Charges&amp;quot;, x = &amp;quot;Charges&amp;quot;) +
  # Scale and size the x- and y-axis uniformly:
  coord_obs_pred()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/rstatsclub/post/2021-02-15-Using-tidymodels-to-predict-medical-insurance-costs_files/figure-html/unnamed-chunk-16-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;rmse(insur_test_res, truth = charges, estimate = .pred)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 1 x 3
##   .metric .estimator .estimate
##   &amp;lt;chr&amp;gt;   &amp;lt;chr&amp;gt;          &amp;lt;dbl&amp;gt;
## 1 rmse    standard       4985.&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;insur_rsmpl %&amp;gt;% 
    collect_metrics()&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 2 x 6
##   .metric .estimator     mean     n  std_err .config             
##   &amp;lt;chr&amp;gt;   &amp;lt;chr&amp;gt;         &amp;lt;dbl&amp;gt; &amp;lt;int&amp;gt;    &amp;lt;dbl&amp;gt; &amp;lt;chr&amp;gt;               
## 1 rmse    standard   4916.       10 274.     Preprocessor1_Model1
## 2 rsq     standard      0.827    10   0.0194 Preprocessor1_Model1&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Nice! The RMSE generated by our test data is insignificantly different from the one generated by our cross-validation! That means our model can reliably reproduce predictions with approximately the same level of error.&lt;/p&gt;
&lt;p&gt;Another great thing about &lt;code&gt;tidymodels&lt;/code&gt; is that it streamlines the process of comparing predictive performance between two different models. Allow me to demonstrate.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;linear-regression&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Linear Regression&lt;/h2&gt;
&lt;p&gt;We already have the recipe. All we need now is to specify a linear model and cross-validate the fit to test it on the testing data.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;lm_spec &amp;lt;- linear_reg() %&amp;gt;% 
    set_engine(&amp;quot;lm&amp;quot;)

lm_fit &amp;lt;- lm_spec %&amp;gt;%
    fit(charges ~ age + bmi + smoker_yes + bmi_x_smoker_yes,
        data = juice(insur_rec %&amp;gt;% prep()))&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning: partial match of &amp;#39;object&amp;#39; to &amp;#39;objects&amp;#39;

## Warning: partial match of &amp;#39;object&amp;#39; to &amp;#39;objects&amp;#39;&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;insur_lm_wf &amp;lt;- workflow() %&amp;gt;%
    add_recipe(insur_rec) %&amp;gt;%
    add_model(lm_spec)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We just repeat &lt;em&gt;some&lt;/em&gt; of the same steps that we did for KNN but for the linear model. We can even cross-validate by using (almost) the same command:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;insur_lm_rsmpl &amp;lt;- fit_resamples(insur_lm_wf,
                           insur_cv,
                           control = control_resamples(save_pred = TRUE))&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## ! Fold01: preprocessor 1/1: partial match of &amp;#39;object&amp;#39; to &amp;#39;objects&amp;#39;&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## ! Fold01: preprocessor 1/1, model 1/1 (predictions): partial match of &amp;#39;object&amp;#39; to ...&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## ! Fold02: preprocessor 1/1: partial match of &amp;#39;object&amp;#39; to &amp;#39;objects&amp;#39;&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## ! Fold02: preprocessor 1/1, model 1/1 (predictions): partial match of &amp;#39;object&amp;#39; to ...&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## ! Fold03: preprocessor 1/1: partial match of &amp;#39;object&amp;#39; to &amp;#39;objects&amp;#39;&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## ! Fold03: preprocessor 1/1, model 1/1 (predictions): partial match of &amp;#39;object&amp;#39; to ...&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## ! Fold04: preprocessor 1/1: partial match of &amp;#39;object&amp;#39; to &amp;#39;objects&amp;#39;&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## ! Fold04: preprocessor 1/1, model 1/1 (predictions): partial match of &amp;#39;object&amp;#39; to ...&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## ! Fold05: preprocessor 1/1: partial match of &amp;#39;object&amp;#39; to &amp;#39;objects&amp;#39;&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## ! Fold05: preprocessor 1/1, model 1/1 (predictions): partial match of &amp;#39;object&amp;#39; to ...&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## ! Fold06: preprocessor 1/1: partial match of &amp;#39;object&amp;#39; to &amp;#39;objects&amp;#39;&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## ! Fold06: preprocessor 1/1, model 1/1 (predictions): partial match of &amp;#39;object&amp;#39; to ...&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## ! Fold07: preprocessor 1/1: partial match of &amp;#39;object&amp;#39; to &amp;#39;objects&amp;#39;&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## ! Fold07: preprocessor 1/1, model 1/1 (predictions): partial match of &amp;#39;object&amp;#39; to ...&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## ! Fold08: preprocessor 1/1: partial match of &amp;#39;object&amp;#39; to &amp;#39;objects&amp;#39;&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## ! Fold08: preprocessor 1/1, model 1/1 (predictions): partial match of &amp;#39;object&amp;#39; to ...&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## ! Fold09: preprocessor 1/1: partial match of &amp;#39;object&amp;#39; to &amp;#39;objects&amp;#39;&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## ! Fold09: preprocessor 1/1, model 1/1 (predictions): partial match of &amp;#39;object&amp;#39; to ...&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## ! Fold10: preprocessor 1/1: partial match of &amp;#39;object&amp;#39; to &amp;#39;objects&amp;#39;&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## ! Fold10: preprocessor 1/1, model 1/1 (predictions): partial match of &amp;#39;object&amp;#39; to ...&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;insur_lm_rsmpl %&amp;gt;% 
    collect_metrics()&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 2 x 6
##   .metric .estimator     mean     n  std_err .config             
##   &amp;lt;chr&amp;gt;   &amp;lt;chr&amp;gt;         &amp;lt;dbl&amp;gt; &amp;lt;int&amp;gt;    &amp;lt;dbl&amp;gt; &amp;lt;chr&amp;gt;               
## 1 rmse    standard   4866.       10 251.     Preprocessor1_Model1
## 2 rsq     standard      0.832    10   0.0162 Preprocessor1_Model1&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;insur_rsmpl %&amp;gt;% 
    collect_metrics()&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 2 x 6
##   .metric .estimator     mean     n  std_err .config             
##   &amp;lt;chr&amp;gt;   &amp;lt;chr&amp;gt;         &amp;lt;dbl&amp;gt; &amp;lt;int&amp;gt;    &amp;lt;dbl&amp;gt; &amp;lt;chr&amp;gt;               
## 1 rmse    standard   4916.       10 274.     Preprocessor1_Model1
## 2 rsq     standard      0.827    10   0.0194 Preprocessor1_Model1&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Fascinating! It appears that the good, ol’ fashioned linear model beat k-Nearest Neighbors both in terms of RMSE but also R^2 across 10 cross-validation folds.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;insur_test_lm_res &amp;lt;- predict(lm_fit, new_data = test_proc %&amp;gt;% select(-charges))

insur_test_lm_res &amp;lt;- bind_cols(insur_test_lm_res, insur_test %&amp;gt;% select(charges))

insur_test_lm_res&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 334 x 2
##     .pred charges
##     &amp;lt;dbl&amp;gt;   &amp;lt;dbl&amp;gt;
##  1  6335.   3757.
##  2 31938.  27809.
##  3  3171.   1837.
##  4  7878.   6204.
##  5  3081.   4688.
##  6  7815.   6314.
##  7 14070.  12630.
##  8  2656.   2211.
##  9  3498.   3580.
## 10 36293.  37743.
## # … with 324 more rows&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now that we have our predictions, let’s look at how well the linear model fared:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;ggplot(insur_test_lm_res, aes(x = charges, y = .pred)) +
  # Create a diagonal line:
  geom_abline(lty = 2) +
  geom_point(alpha = 0.5) +
  labs(y = &amp;quot;Predicted Charges&amp;quot;, x = &amp;quot;Charges&amp;quot;) +
  # Scale and size the x- and y-axis uniformly:
  coord_obs_pred()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/rstatsclub/post/2021-02-15-Using-tidymodels-to-predict-medical-insurance-costs_files/figure-html/unnamed-chunk-21-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;It seems as though the area on the bottom left corner had the greatest concentration of charges, and explains most of the &lt;code&gt;lm&lt;/code&gt; fit. Look at both of these plots makes me wonder if there was a better model we could have used, but our model was sufficient given our purposes and level of accuracy.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;combind_dt &amp;lt;- mutate(insur_test_lm_res,
      lm_pred = .pred,
      charges = charges
      ) %&amp;gt;% select(-.pred) %&amp;gt;%
    add_column(knn_pred = insur_test_res$.pred)

ggplot(combind_dt, aes(x = charges)) +
    geom_line(aes(y = knn_pred, color = &amp;quot;kNN Fit&amp;quot;), size = 1) +
    geom_line(aes(y = lm_pred, color = &amp;quot;lm Fit&amp;quot;), size = 1) +
    geom_point(aes(y = knn_pred, alpha = 0.5), color = &amp;quot;#F99E9E&amp;quot;) +
    geom_point(aes(y = lm_pred, alpha = 0.5), color = &amp;quot;#809BF4&amp;quot;) +
    geom_abline(size = 0.5, linetype = &amp;quot;dashed&amp;quot;) +
    xlab(&amp;#39;Charges&amp;#39;) +
    ylab(&amp;#39;Predicted Charges&amp;#39;) +
    guides(alpha = FALSE)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/rstatsclub/post/2021-02-15-Using-tidymodels-to-predict-medical-insurance-costs_files/figure-html/unnamed-chunk-22-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Above is a comparison of the two methods with their respective predictions, and with the dotted line representing the “correct” values. In this case, the two models were not different enough from each other for their differences to be readily observed when plotted against each other, but there will be instances in the future wherein your two models do differ substantially, and this sort of plot will bolster your case for using one model over another.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;conclusion&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;Here, we were able to build a KNN model with our training data and use it to predict values in our testing data. To do this, we:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;performed EDA&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;preprocessed our data using &lt;code&gt;recipes&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;specified our model to be KNN&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;fit it to our training data&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;ran cross validation to produce accurate error statistics&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;predicted values in our test set&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;compared observed test set values with our predictions&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;specified another model, lm&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;performed a cross-validation&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;discovered lm to be the better model&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I’m very excited to continue using tidymodels in R as a way to apply machine learning methods. If you’re interested, I recommend checking out &lt;a href=&#34;https://www.tmwr.org/&#34;&gt;Tidy Modeling with R by Max Kuhn and Julia Silge&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Using VisiumExperiment at spatialLIBD package</title>
      <link>http://LieberInstitute.github.io/rstatsclub/2020/11/06/using-visiumexperiment-at-spatiallibd-package/</link>
      <pubDate>Fri, 06 Nov 2020 00:00:00 +0000</pubDate>
      <guid>http://LieberInstitute.github.io/rstatsclub/2020/11/06/using-visiumexperiment-at-spatiallibd-package/</guid>
      <description>
&lt;script src=&#34;http://LieberInstitute.github.io/rstatsclub/rstatsclub/rmarkdown-libs/header-attrs/header-attrs.js&#34;&gt;&lt;/script&gt;
&lt;link href=&#34;http://LieberInstitute.github.io/rstatsclub/rstatsclub/rmarkdown-libs/anchor-sections/anchor-sections.css&#34; rel=&#34;stylesheet&#34; /&gt;
&lt;script src=&#34;http://LieberInstitute.github.io/rstatsclub/rstatsclub/rmarkdown-libs/anchor-sections/anchor-sections.js&#34;&gt;&lt;/script&gt;


&lt;p&gt;By &lt;a href=&#34;https://github.com/bpardo99&#34;&gt;Brenda Pardo&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;A month ago, I started an enriching adventure by joining Leonardo Collado-Torres’ team at Lieber Institute for Brain Development. Since then, I have been working on modifying &lt;em&gt;&lt;a href=&#34;https://bioconductor.org/packages/3.12/spatialLIBD&#34;&gt;spatialLIBD&lt;/a&gt;&lt;/em&gt;, a package to interactively visualize the LIBD human dorsolateral pre-frontal cortex (DLPFC) spatial transcriptomics data &lt;a id=&#39;cite-Maynard_2020&#39;&gt;&lt;/a&gt;(&lt;a href=&#39;https://www.biorxiv.org/content/10.1101/2020.02.28.969931v1&#39;&gt;Maynard, Collado-Torres, Weber, Uytingco, et al., 2020&lt;/a&gt;). The performed modifications allow &lt;em&gt;&lt;a href=&#34;https://bioconductor.org/packages/3.12/spatialLIBD&#34;&gt;spatialLIBD&lt;/a&gt;&lt;/em&gt; to use objects of the &lt;code&gt;VisiumExperiment&lt;/code&gt; class, which is designed to specifically store spatial transcriptomics data &lt;a id=&#39;cite-Righelli_2020&#39;&gt;&lt;/a&gt;(&lt;a href=&#39;#bib-Righelli_2020&#39;&gt;Righelli and Risso, 2020&lt;/a&gt;). In this blog post, I describe the changes we carried out to the package and happily share a piece of my journey through my research internship at LIBD.&lt;/p&gt;
&lt;div id=&#34;starting-internship-at-lieber-institute&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Starting internship at Lieber Institute&lt;/h2&gt;
&lt;p&gt;As part of the Genomic Sciences undergraduate program at Universidad Nacional Autónoma de México (UNAM), I attended a single cell data analysis course imparted by Leo Collado. During the sessions, I found quite fun and useful programming in R and decided I wanted to go deeper into the use of this programming language. My interest was enhanced when Leo highlighted in the CDSB Workshop 2020, we could not just be R users but developers, and generate helpful tools for biological data analysis. With this motivation, I reached out to Leo, and that’s how the adventure started: I joined Leonardo Collado’s team at LIBD, and my research internship was inaugurated with this tweet.&lt;/p&gt;
&lt;blockquote class=&#34;twitter-tweet&#34;&gt;&lt;p lang=&#34;en&#34; dir=&#34;ltr&#34;&gt;I&amp;#39;m super happy that I&amp;#39;m starting a research internship mentored by &lt;a href=&#34;https://twitter.com/fellgernon?ref_src=twsrc%5Etfw&#34;&gt;@fellgernon&lt;/a&gt;  at &lt;a href=&#34;https://twitter.com/LieberInstitute?ref_src=twsrc%5Etfw&#34;&gt;@LieberInstitute&lt;/a&gt; and &lt;a href=&#34;https://twitter.com/jhubiostat?ref_src=twsrc%5Etfw&#34;&gt;@jhubiostat&lt;/a&gt; 😆.&lt;a href=&#34;https://twitter.com/hashtag/CDSB2020?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#CDSB2020&lt;/a&gt; and the course at LIIGH definitely impacted me by increasing my interest in R. Attending was such an encouraging and enriching experience. &lt;a href=&#34;https://t.co/1iwqKwolTb&#34;&gt;https://t.co/1iwqKwolTb&lt;/a&gt;&lt;/p&gt;&amp;mdash; Brenda Pardo (@PardoBree) &lt;a href=&#34;https://twitter.com/PardoBree/status/1302095163116802048?ref_src=twsrc%5Etfw&#34;&gt;September 5, 2020&lt;/a&gt;&lt;/blockquote&gt;
&lt;script async src=&#34;https://platform.twitter.com/widgets.js&#34; charset=&#34;utf-8&#34;&gt;&lt;/script&gt;


&lt;p&gt;Since then, I have been working with Leo on adapting the package &lt;em&gt;&lt;a href=&#34;https://bioconductor.org/packages/3.12/spatialLIBD&#34;&gt;spatialLIBD&lt;/a&gt;&lt;/em&gt; to use R objects structured specifically to store spatial transcriptomics data. This is work that I’ve been doing part-time while also attending the third &lt;a href=&#34;#fn1&#34; class=&#34;footnote-ref&#34; id=&#34;fnref1&#34;&gt;&lt;sup&gt;1&lt;/sup&gt;&lt;/a&gt; year classes at LCG-UNAM-EJ.&lt;/p&gt;
&lt;iframe src=&#34;https://giphy.com/embed/lluj1cauAlO2vQEm8A&#34; width=&#34;480&#34; height=&#34;270&#34; frameBorder=&#34;0&#34; class=&#34;giphy-embed&#34; allowFullScreen&gt;
&lt;/iframe&gt;
&lt;p&gt;
&lt;a href=&#34;https://giphy.com/gifs/facethetruthtv-lluj1cauAlO2vQEm8A&#34;&gt;via GIPHY&lt;/a&gt;
&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;what-is-spatiallibd&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;What is spatialLIBD?&lt;/h2&gt;
&lt;p&gt;Spatial transcriptomics allows to know the transcriptome of a small group of cells in a tissue sample, and to map the exact location of the cells with that expression profile. This technology has generated the need for tools to visualize the data produced by it. &lt;em&gt;&lt;a href=&#34;https://bioconductor.org/packages/3.12/spatialLIBD&#34;&gt;spatialLIBD&lt;/a&gt;&lt;/em&gt; is a Bioconductor package to interactively inspect the DLPFC spatial transcriptomics 10x Genomics Visium data from Maynard, Collado-Torres et al, 2020 analyzed by LIBD researchers and collaborators.&lt;/p&gt;
&lt;p&gt;It contains functions for:&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;accessing the spatial transcriptomics data,&lt;/li&gt;
&lt;li&gt;visualizing the spot-level spatial gene expression data and clusters, and&lt;/li&gt;
&lt;li&gt;inspecting the data interactively either on the user’s computer or through a &lt;a href=&#34;http://spatial.libd.org/spatialLIBD/&#34;&gt;web application&lt;/a&gt;.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The &lt;em&gt;&lt;a href=&#34;https://bioconductor.org/packages/3.12/spatialLIBD&#34;&gt;spatialLIBD&lt;/a&gt;&lt;/em&gt; package used to employ R objects from the &lt;code&gt;SingleCellExperiment&lt;/code&gt; class to store the data, nevertheless Righelli et al created a more accurate class, &lt;code&gt;VisiumExperiment&lt;/code&gt;, for this data. This class is in the package &lt;em&gt;&lt;a href=&#34;https://bioconductor.org/packages/3.12/SpatialExperiment&#34;&gt;SpatialExperiment&lt;/a&gt;&lt;/em&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;the-visiumexperiment-class&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;The VisiumExperiment class&lt;/h2&gt;
&lt;p&gt;The &lt;code&gt;VisiumExperiment&lt;/code&gt; class inherits from the &lt;code&gt;SingleCellExperiment&lt;/code&gt; class, however, it includes new attributes and methods that allow optimal accommodation of the spatial transcriptomics data. It contains specific slots to store:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The spatial coordinates for each small group of cells (contained in a spot).&lt;/li&gt;
&lt;li&gt;The paths to the tissue images.&lt;/li&gt;
&lt;li&gt;Information about which spots are covered by tissue.&lt;/li&gt;
&lt;li&gt;The scale factors to know the location of the spots in pixels.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;and methods to set and retrieve the information contained in these slots.&lt;/p&gt;
&lt;p&gt;Now that I have introduced the context, I can start relating my task: to adapt the set of functions that make up &lt;em&gt;&lt;a href=&#34;https://bioconductor.org/packages/3.12/spatialLIBD&#34;&gt;spatialLIBD&lt;/a&gt;&lt;/em&gt; so that they could work with &lt;code&gt;VisiumExperiment&lt;/code&gt; objects. And now, I’ll drive you through the updates.&lt;/p&gt;
&lt;p&gt;Before starting, remember to install &lt;em&gt;&lt;a href=&#34;https://bioconductor.org/packages/3.12/spatialLIBD&#34;&gt;spatialLIBD&lt;/a&gt;&lt;/em&gt; by using the following commands in your R session.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;if (!requireNamespace(&amp;quot;BiocManager&amp;quot;, quietly = TRUE)) {
      install.packages(&amp;quot;BiocManager&amp;quot;)
  }

BiocManager::install(&amp;quot;spatialLIBD&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now, please load the package.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(&amp;quot;spatialLIBD&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;downloading-dlpfc-data-contained-in-a-visiumexperiment-object&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Downloading DLPFC data contained in a VisiumExperiment object&lt;/h2&gt;
&lt;p&gt;Let’s start with the function &lt;a href=&#34;http://research.libd.org/spatialLIBD/reference/fetch_data.html&#34;&gt;&lt;code&gt;fetch_data()&lt;/code&gt;&lt;/a&gt;, which was previously designed to retrieve a &lt;code&gt;SingleCellExperiment&lt;/code&gt; object called &lt;code&gt;sce&lt;/code&gt;, containing DLPFC spatial transcriptomics data. With our updates, the function is able to return &lt;code&gt;ve&lt;/code&gt;, an object belonging to the &lt;code&gt;VisiumExperiment&lt;/code&gt; class, by calling a new function &lt;a href=&#34;http://research.libd.org/spatialLIBD/reference/sce_to_ve.html&#34;&gt;&lt;code&gt;sce_to_ve()&lt;/code&gt;&lt;/a&gt;, that takes data from &lt;code&gt;sce&lt;/code&gt; and rearranges it to the structure of the &lt;code&gt;VisiumExperiment&lt;/code&gt; class (defined by Righelli et al).&lt;/p&gt;
&lt;p&gt;Below, we obtain &lt;code&gt;ve&lt;/code&gt;.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;## Download ve object
ve &amp;lt;- fetch_data(type = &amp;quot;ve&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## snapshotDate(): 2020-10-02&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 2020-11-09 14:59:36 loading file /Users/lcollado/Library/Caches/BiocFileCache/c4f432e69d6_Human_DLPFC_Visium_processedData_sce_scran_spatialLIBD.Rdata%3Fdl%3D1&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And once we downloaded the object, we can explore it&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;ve&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## class: VisiumExperiment 
## dim: 33538 47681 
## metadata(0):
## assays(2): counts logcounts
## rownames(33538): ENSG00000243485 ENSG00000237613 ... ENSG00000277475
##   ENSG00000268674
## rowData names(9): source type ... gene_search is_top_hvg
## colnames(47681): AAACAACGAATAGTTC-1 AAACAAGTATCTCCCA-1 ...
##   TTGTTTCCATACAACT-1 TTGTTTGTGTAAATTC-1
## colData names(66): Cluster height ... pseudobulk_UMAP_spatial
##   markers_UMAP_spatial
## reducedDimNames(6): PCA TSNE_perplexity50 ... TSNE_perplexity80
##   UMAP_neighbors15
## altExpNames(0):
## spatialCoordinates(7): Cell_ID sample_name ... pxl_row_in_fullres
##   pxl_col_in_fullres
## inTissue(1): 47681
## imagePaths(12):
##   /Users/lcollado/Library/Caches/BiocFileCache/c4f3c2dc99_151507_tissue_lowres_image.png
##   /Users/lcollado/Library/Caches/BiocFileCache/c4f6e20c2bc_151508_tissue_lowres_image.png
##   ...
##   /Users/lcollado/Library/Caches/BiocFileCache/c4f2196b8e6_151675_tissue_lowres_image.png
##   /Users/lcollado/Library/Caches/BiocFileCache/c4f2e451544_151676_tissue_lowres_image.png&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Observe that our &lt;code&gt;ve&lt;/code&gt; object is a bit more complex than a regular &lt;code&gt;VisiumExperiment&lt;/code&gt; one because it contains multiple samples, and as a consequence, multiple images. This has an impact on the arrangement we made for part of the data.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;storing-and-retrieving-scale-factors-from-a-visiumexperiment-object&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Storing and retrieving scale factors from a VisiumExperiment object&lt;/h2&gt;
&lt;p&gt;As previously mentioned, there is a new slot in &lt;code&gt;VisiumExperiment&lt;/code&gt; class created to store the values to convert spots coordinates into pixels named &lt;code&gt;scaleFactors&lt;/code&gt;. This slot has a storage limitation: it has to contain a list with the exact names for the four scale factors for a given sample (&lt;code&gt;tissue_lowres_scalef&lt;/code&gt;, &lt;code&gt;fiducial_diameter_fullres&lt;/code&gt;, &lt;code&gt;tissue_hires_scalef&lt;/code&gt;, &lt;code&gt;spot_diameter_fullres&lt;/code&gt;). Given that in DLFPC data we do not have just one, but multiple samples, we decided to create a list with the required four scale factors names, but also a fifth slot with the full list of scale factors for all our samples, and it looks like this:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;## Get scale factors
facs &amp;lt;- SpatialExperiment::scaleFactors(ve)

## &amp;quot;current&amp;quot; scale factors
facs[1:4]&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## $spot_diameter_fullres
## [1] 96.37511
## 
## $tissue_hires_scalef
## [1] 0.150015
## 
## $fiducial_diameter_fullres
## [1] 144.5627
## 
## $tissue_lowres_scalef
## [1] 0.0450045&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;## Data for the rest of the 12 images
class(facs[5])&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] &amp;quot;list&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;length(facs[[5]])&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] 12&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In addition, we created a function called &lt;a href=&#34;http://research.libd.org/spatialLIBD/reference/update_scaleFactors.html&#34;&gt;&lt;code&gt;update_scaleFactors()&lt;/code&gt;&lt;/a&gt; that generates a &lt;code&gt;VisiumExperiment&lt;/code&gt; object with updated scale factors for a given input sample ID in case the user wants uniquely a set of scale factors.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;visualizing-the-histology-image-from-a-visiumexperiment-object&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Visualizing the histology image from a VisiumExperiment object&lt;/h2&gt;
&lt;p&gt;Now let’s talk about the new location of the histology images in &lt;code&gt;ve&lt;/code&gt; and how to display them. The object &lt;code&gt;sce&lt;/code&gt; has a tibble in its metadata slot that contains a grob for each sample image. In contrast, &lt;code&gt;ve&lt;/code&gt; has a list of image paths contained in the &lt;code&gt;imagePath&lt;/code&gt; slot.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;downloading-the-histology-images&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Downloading the histology images&lt;/h2&gt;
&lt;p&gt;&lt;code&gt;VisiumExperiment&lt;/code&gt; class validity code checks that the image files, whose paths are in &lt;code&gt;imagePaths&lt;/code&gt;, exist locally instead of being available remotely through a URL. Thus, we decided to download the images at the moment the &lt;code&gt;ve&lt;/code&gt; object is created; this process happens in our function &lt;code&gt;sce_to_ve&lt;/code&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;the-geom_spatial-function&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;The geom_spatial() function&lt;/h2&gt;
&lt;p&gt;For visualizing the histology image from visium, the function &lt;a href=&#34;http://research.libd.org/spatialLIBD/reference/geom_spatial.html&#34;&gt;&lt;code&gt;geom_spatial()&lt;/code&gt;&lt;/a&gt; was previously created. To do this, it defines a [ggplot2::layer()] , taking the information from the metadata tibble of the &lt;code&gt;sce&lt;/code&gt; object. In order to make &lt;a href=&#34;http://research.libd.org/spatialLIBD/reference/geom_spatial.html&#34;&gt;&lt;code&gt;geom_spatial()&lt;/code&gt;&lt;/a&gt; available to use &lt;code&gt;ve&lt;/code&gt; as an input, we created a novel function &lt;a href=&#34;http://research.libd.org/spatialLIBD/reference/read_image.html&#34;&gt;&lt;code&gt;read_image()&lt;/code&gt;&lt;/a&gt; that takes the image path of the desired samples, creates the grobs and puts them on a tibble.&lt;/p&gt;
&lt;p&gt;Here you can observe an example of the usage of these functions.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;## Extract data from a sample (with ID 151507)
sample_id &amp;lt;- &amp;quot;151507&amp;quot;
ve_sub &amp;lt;- ve[, SpatialExperiment::spatialCoords(ve)$sample_name == sample_id]
sample_df &amp;lt;- as.data.frame(SpatialExperiment::spatialCoords(ve_sub))

## Plot with geom_spatial
ggplot2::ggplot(
    sample_df,
    ggplot2::aes(
        x = imagecol,
        y = imagerow,
    )
) +
geom_spatial(
    data = read_image(ve_sub, sample_id),
    ggplot2::aes(grob = grob),
    x = 0.5,
    y = 0.5
)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2020-11-06-using-visiumexperiment-at-spatiallibd-package_files/figure-html/geom_spatial-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;making-other-functions-compatible-with-visiumexperiment-class&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Making other functions compatible with VisiumExperiment class&lt;/h2&gt;
&lt;p&gt;Other functions just like &lt;a href=&#34;http://research.libd.org/spatialLIBD/reference/sce_image_gene.html&#34;&gt;&lt;code&gt;sce_image_gene()&lt;/code&gt;&lt;/a&gt;, &lt;a href=&#34;http://research.libd.org/spatialLIBD/reference/sce_image_clus.html&#34;&gt;&lt;code&gt;sce_image_clus()&lt;/code&gt;&lt;/a&gt;, &lt;a href=&#34;http://research.libd.org/spatialLIBD/reference/sce_image_grid.html&#34;&gt;&lt;code&gt;sce_imge_grid()&lt;/code&gt;&lt;/a&gt; and &lt;a href=&#34;http://research.libd.org/spatialLIBD/reference/sce_image_grid_gene.html&#34;&gt;&lt;code&gt;sce_image_grid_gene()&lt;/code&gt;&lt;/a&gt; access the column containing the sample IDs, which is located in different slots depending on the object class. For &lt;code&gt;sce&lt;/code&gt;, &lt;code&gt;sample_name&lt;/code&gt; is in &lt;code&gt;colData&lt;/code&gt; slot, and for &lt;code&gt;ve&lt;/code&gt; it is in the &lt;code&gt;spatialCoords&lt;/code&gt; slot. Hence, conditionals evaluating the object class were added in order to access the information correctly.&lt;/p&gt;
&lt;p&gt;Finally, the &lt;a href=&#34;http://research.libd.org/spatialLIBD/reference/sce_image_clus_p.html&#34;&gt;&lt;code&gt;sce_image_clus_p()&lt;/code&gt;&lt;/a&gt; and &lt;a href=&#34;http://research.libd.org/spatialLIBD/reference/sce_image_gene_p.html&#34;&gt;&lt;code&gt;sce_image_gene_p()&lt;/code&gt;&lt;/a&gt; functions also had important modifications. They visualize the gene expression (or any continuous variable) and clusters for one given sample at the spot-level using the histology information on the background; both functions receive a data frame with information residing in &lt;code&gt;colData&lt;/code&gt; of the object &lt;code&gt;sce&lt;/code&gt;, including the spatial coordinates. Given the &lt;code&gt;VisiumExperiment&lt;/code&gt; object &lt;code&gt;ve&lt;/code&gt; contains spatial coordinates in &lt;code&gt;spatialCoords&lt;/code&gt; slot, a function called &lt;a href=&#34;http://research.libd.org/spatialLIBD/reference/ve_image_colData.html&#34;&gt;&lt;code&gt;ve_image_colData()&lt;/code&gt;&lt;/a&gt; was created in order to generate the data frame by joining the required columns for the input for the functions &lt;code&gt;sce_image_clus_p()&lt;/code&gt; and &lt;code&gt;sce_image_gene_p()&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;An example of how these functions can accept the &lt;code&gt;VisiumExperiment&lt;/code&gt; object is shown here:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;## Use the data previously extracted for a sample 
## Prepare the data for the plotting function
df &amp;lt;- colData(ve_sub)
df$COUNT &amp;lt;- df$expr_chrM_ratio

sce_image_gene_p(
    sce = ve_sub,
    d = df,
    sampleid = sample_id,
    title = &amp;quot;151507 chrM expr ratio&amp;quot;,
    spatial = FALSE
)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2020-11-06-using-visiumexperiment-at-spatiallibd-package_files/figure-html/sce_image_gene_p-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Although the goal is that the end user won’t need to use these lower-level functions and can just run code like this with our original &lt;code&gt;SingleCellExperiment&lt;/code&gt; objects as well as the new &lt;code&gt;VisiumExperiment&lt;/code&gt; objects:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;sce_image_gene(
    sce = ve_sub,
    sampleid = sample_id,
    spatial = TRUE
)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2020-11-06-using-visiumexperiment-at-spatiallibd-package_files/figure-html/sce_image_gene-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;wrapping-up&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Wrapping up&lt;/h2&gt;
&lt;p&gt;And that’s it, those are the main modifications the package went through to accept our &lt;code&gt;VisiumExperiment object&lt;/code&gt;. These changes are part of &lt;a href=&#34;http://bioconductor.org/news/bioc_3_12_release/&#34;&gt;Bioconductor 3.12&lt;/a&gt; that was just recently released, so you can try them out!&lt;/p&gt;
&lt;p&gt;I could make a long list of things that I have learned and gained during the process, but I will briefly summarize it. Firstly, I had so much fun coding in R, I got very familiar with the programming language and started feeling like a fish in the water. Secondly, I learned how to write R packages and found it quite useful to order code and document it. And lastly, I acquired a set of skills and good habits to organize my code and projects in R. I want to emphasize that, an element of great importance to this learning and growth process is having a mentor who is enthusiastic about sharing knowledge, always attentive and patient with the student. All his teaching has driven and inspired me to program and enjoy while doing it. Undoubtedly, the road continues and there is much to continue learning!&lt;/p&gt;
&lt;iframe src=&#34;https://giphy.com/embed/l3dj09hpsfuYkijDi&#34; width=&#34;480&#34; height=&#34;270&#34; frameBorder=&#34;0&#34; class=&#34;giphy-embed&#34; allowFullScreen&gt;
&lt;/iframe&gt;
&lt;p&gt;
&lt;a href=&#34;https://giphy.com/gifs/thegoldbergs--1990-the-goldbergs-l3dj09hpsfuYkijDi&#34;&gt;via GIPHY&lt;/a&gt;
&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;future-plans&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Future Plans?&lt;/h2&gt;
&lt;p&gt;I’m looking forward to learning more about data analysis while working with 10x Genomics Visium data with Leo and colleagues at LIBD. All while potentially working on more R packages as I keep learning new skills that will help me join grad school programs. In the meantime, I’m going to apply to present these updates at EuroBioC2020 and would love to meet you there!&lt;/p&gt;
&lt;p&gt;Thanks for reading this post and you are welcome to keep exploring &lt;em&gt;&lt;a href=&#34;https://bioconductor.org/packages/3.12/spatialLIBD&#34;&gt;spatialLIBD&lt;/a&gt;&lt;/em&gt; and the DLPFC data.&lt;/p&gt;
&lt;iframe src=&#34;https://giphy.com/embed/6wmz6Qo40eTDf4tW3Z&#34; width=&#34;480&#34; height=&#34;270&#34; frameBorder=&#34;0&#34; class=&#34;giphy-embed&#34; allowFullScreen&gt;
&lt;/iframe&gt;
&lt;p&gt;
&lt;a href=&#34;https://giphy.com/gifs/6wmz6Qo40eTDf4tW3Z&#34;&gt;via GIPHY&lt;/a&gt;
&lt;/p&gt;
&lt;div id=&#34;acknowledgements&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Acknowledgements&lt;/h3&gt;
&lt;p&gt;This blog post was made possible thanks to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=blogdown&#34;&gt;blogdown&lt;/a&gt;&lt;/em&gt; &lt;a id=&#39;cite-Xie_2017&#39;&gt;&lt;/a&gt;(&lt;a href=&#39;https://github.com/rstudio/blogdown&#39;&gt;Xie, Hill, and Thomas, 2017&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=knitcitations&#34;&gt;knitcitations&lt;/a&gt;&lt;/em&gt; &lt;a id=&#39;cite-Boettiger_2020&#39;&gt;&lt;/a&gt;(&lt;a href=&#39;https://github.com/cboettig/knitcitations&#39;&gt;Boettiger, 2020&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=sessioninfo&#34;&gt;sessioninfo&lt;/a&gt;&lt;/em&gt; &lt;a id=&#39;cite-Csardi_2018&#39;&gt;&lt;/a&gt;(&lt;a href=&#39;https://CRAN.R-project.org/package=sessioninfo&#39;&gt;Csárdi, core, Wickham, Chang, et al., 2018&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;a href=&#34;https://bioconductor.org/packages/3.12/BiocStyle&#34;&gt;BiocStyle&lt;/a&gt;&lt;/em&gt; &lt;a id=&#39;cite-Oles_2020&#39;&gt;&lt;/a&gt;(&lt;a href=&#39;https://github.com/Bioconductor/BiocStyle&#39;&gt;Oleś, Morgan, and Huber, 2020&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div id=&#34;references&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;References&lt;/h3&gt;
&lt;p&gt;
&lt;a id=&#39;bib-Boettiger_2020&#39;&gt;&lt;/a&gt;&lt;a href=&#34;#cite-Boettiger_2020&#34;&gt;[1]&lt;/a&gt;&lt;cite&gt;
C. Boettiger.
&lt;em&gt;knitcitations: Citations for ‘Knitr’ Markdown Files&lt;/em&gt;.
R package version 1.0.10.
2020.
URL: &lt;a href=&#34;https://github.com/cboettig/knitcitations&#34;&gt;https://github.com/cboettig/knitcitations&lt;/a&gt;.&lt;/cite&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;a id=&#39;bib-Csardi_2018&#39;&gt;&lt;/a&gt;&lt;a href=&#34;#cite-Csardi_2018&#34;&gt;[2]&lt;/a&gt;&lt;cite&gt;
G. Csárdi, R. core, H. Wickham, W. Chang, et al.
&lt;em&gt;sessioninfo: R Session Information&lt;/em&gt;.
R package version 1.1.1.
2018.
URL: &lt;a href=&#34;https://CRAN.R-project.org/package=sessioninfo&#34;&gt;https://CRAN.R-project.org/package=sessioninfo&lt;/a&gt;.&lt;/cite&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;a id=&#39;bib-Maynard_2020&#39;&gt;&lt;/a&gt;&lt;a href=&#34;#cite-Maynard_2020&#34;&gt;[3]&lt;/a&gt;&lt;cite&gt;
K. R. Maynard, L. Collado-Torres, L. M. Weber, C. Uytingco, et al.
“Transcriptome-scale spatial gene expression in the human dorsolateral prefrontal cortex”.
In: &lt;em&gt;bioRxiv&lt;/em&gt; (2020).
DOI: &lt;a href=&#34;https://doi.org/10.1101/2020.02.28.969931&#34;&gt;10.1101/2020.02.28.969931&lt;/a&gt;.
URL: &lt;a href=&#34;https://www.biorxiv.org/content/10.1101/2020.02.28.969931v1&#34;&gt;https://www.biorxiv.org/content/10.1101/2020.02.28.969931v1&lt;/a&gt;.&lt;/cite&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;a id=&#39;bib-Oles_2020&#39;&gt;&lt;/a&gt;&lt;a href=&#34;#cite-Oles_2020&#34;&gt;[4]&lt;/a&gt;&lt;cite&gt;
A. Oleś, M. Morgan, and W. Huber.
&lt;em&gt;BiocStyle: Standard styles for vignettes and other Bioconductor documents&lt;/em&gt;.
R package version 2.18.0.
2020.
URL: &lt;a href=&#34;https://github.com/Bioconductor/BiocStyle&#34;&gt;https://github.com/Bioconductor/BiocStyle&lt;/a&gt;.&lt;/cite&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;a id=&#39;bib-Righelli_2020&#39;&gt;&lt;/a&gt;&lt;a href=&#34;#cite-Righelli_2020&#34;&gt;[5]&lt;/a&gt;&lt;cite&gt;
D. Righelli and D. Risso.
&lt;em&gt;SpatialExperiment: S4 Class for Spatial Experiments handling&lt;/em&gt;.
R package version 1.0.0.
2020.&lt;/cite&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;a id=&#39;bib-Xie_2017&#39;&gt;&lt;/a&gt;&lt;a href=&#34;#cite-Xie_2017&#34;&gt;[6]&lt;/a&gt;&lt;cite&gt;
Y. Xie, A. P. Hill, and A. Thomas.
&lt;em&gt;blogdown: Creating Websites with R Markdown&lt;/em&gt;.
ISBN 978-0815363729.
Boca Raton, Florida: Chapman and Hall/CRC, 2017.
URL: &lt;a href=&#34;https://github.com/rstudio/blogdown&#34;&gt;https://github.com/rstudio/blogdown&lt;/a&gt;.&lt;/cite&gt;
&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;reproducibility&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Reproducibility&lt;/h3&gt;
&lt;pre&gt;&lt;code&gt;## ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
##  setting  value                       
##  version  R version 4.0.3 (2020-10-10)
##  os       macOS Catalina 10.15.7      
##  system   x86_64, darwin17.0          
##  ui       X11                         
##  language (EN)                        
##  collate  en_US.UTF-8                 
##  ctype    en_US.UTF-8                 
##  tz       America/New_York            
##  date     2020-11-09                  
## 
## ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
##  package                * version  date       lib source                                 
##  AnnotationDbi            1.52.0   2020-10-27 [1] Bioconductor                           
##  AnnotationHub            2.22.0   2020-10-27 [1] Bioconductor                           
##  assertthat               0.2.1    2019-03-21 [1] CRAN (R 4.0.2)                         
##  attempt                  0.3.1    2020-05-03 [1] CRAN (R 4.0.2)                         
##  backports                1.2.0    2020-11-02 [1] CRAN (R 4.0.3)                         
##  beachmat                 2.6.0    2020-10-27 [1] Bioconductor                           
##  beeswarm                 0.2.3    2016-04-25 [1] CRAN (R 4.0.2)                         
##  benchmarkme              1.0.4    2020-05-09 [1] CRAN (R 4.0.2)                         
##  benchmarkmeData          1.0.4    2020-04-23 [1] CRAN (R 4.0.2)                         
##  Biobase                * 2.50.0   2020-10-27 [1] Bioconductor                           
##  BiocFileCache            1.14.0   2020-10-27 [1] Bioconductor                           
##  BiocGenerics           * 0.36.0   2020-10-27 [1] Bioconductor                           
##  BiocManager              1.30.10  2019-11-16 [1] CRAN (R 4.0.2)                         
##  BiocNeighbors            1.8.0    2020-10-27 [1] Bioconductor                           
##  BiocParallel             1.24.0   2020-10-27 [1] Bioconductor                           
##  BiocSingular             1.6.0    2020-10-27 [1] Bioconductor                           
##  BiocStyle              * 2.18.0   2020-10-27 [1] Bioconductor                           
##  BiocVersion              3.12.0   2020-05-14 [1] Bioconductor                           
##  bit                      4.0.4    2020-08-04 [1] CRAN (R 4.0.2)                         
##  bit64                    4.0.5    2020-08-30 [1] CRAN (R 4.0.2)                         
##  bitops                   1.0-6    2013-08-17 [1] CRAN (R 4.0.2)                         
##  blob                     1.2.1    2020-01-20 [1] CRAN (R 4.0.2)                         
##  blogdown               * 0.21     2020-10-11 [1] CRAN (R 4.0.3)                         
##  bmp                      0.3      2017-09-11 [1] CRAN (R 4.0.2)                         
##  bookdown                 0.21     2020-10-13 [1] CRAN (R 4.0.3)                         
##  cli                      2.1.0    2020-10-12 [1] CRAN (R 4.0.2)                         
##  codetools                0.2-16   2018-12-24 [1] CRAN (R 4.0.3)                         
##  colorout                 1.2-2    2020-11-03 [1] Github (jalvesaq/colorout@726d681)     
##  colorspace               1.4-1    2019-03-18 [1] CRAN (R 4.0.2)                         
##  config                   0.3      2018-03-27 [1] CRAN (R 4.0.2)                         
##  cowplot                  1.1.0    2020-09-08 [1] CRAN (R 4.0.2)                         
##  crayon                   1.3.4    2017-09-16 [1] CRAN (R 4.0.2)                         
##  curl                     4.3      2019-12-02 [1] CRAN (R 4.0.1)                         
##  data.table               1.13.2   2020-10-19 [1] CRAN (R 4.0.2)                         
##  DBI                      1.1.0    2019-12-15 [1] CRAN (R 4.0.2)                         
##  dbplyr                   2.0.0    2020-11-03 [1] CRAN (R 4.0.3)                         
##  DelayedArray             0.16.0   2020-10-27 [1] Bioconductor                           
##  DelayedMatrixStats       1.12.0   2020-10-27 [1] Bioconductor                           
##  desc                     1.2.0    2018-05-01 [1] CRAN (R 4.0.2)                         
##  digest                   0.6.27   2020-10-24 [1] CRAN (R 4.0.2)                         
##  dockerfiler              0.1.3    2019-03-19 [1] CRAN (R 4.0.2)                         
##  doParallel               1.0.16   2020-10-16 [1] CRAN (R 4.0.2)                         
##  dotCall64                1.0-0    2018-07-30 [1] CRAN (R 4.0.2)                         
##  dplyr                    1.0.2    2020-08-18 [1] CRAN (R 4.0.2)                         
##  DT                       0.16     2020-10-13 [1] CRAN (R 4.0.2)                         
##  ellipsis                 0.3.1    2020-05-15 [1] CRAN (R 4.0.2)                         
##  evaluate                 0.14     2019-05-28 [1] CRAN (R 4.0.1)                         
##  ExperimentHub            1.16.0   2020-10-27 [1] Bioconductor                           
##  fansi                    0.4.1    2020-01-08 [1] CRAN (R 4.0.2)                         
##  farver                   2.0.3    2020-01-16 [1] CRAN (R 4.0.2)                         
##  fastmap                  1.0.1    2019-10-08 [1] CRAN (R 4.0.2)                         
##  fields                   11.6     2020-10-09 [1] CRAN (R 4.0.2)                         
##  foreach                  1.5.1    2020-10-15 [1] CRAN (R 4.0.2)                         
##  fs                       1.5.0    2020-07-31 [1] CRAN (R 4.0.2)                         
##  generics                 0.1.0    2020-10-31 [1] CRAN (R 4.0.2)                         
##  GenomeInfoDb           * 1.26.0   2020-10-27 [1] Bioconductor                           
##  GenomeInfoDbData         1.2.4    2020-11-03 [1] Bioconductor                           
##  GenomicRanges          * 1.42.0   2020-10-27 [1] Bioconductor                           
##  ggbeeswarm               0.6.0    2017-08-07 [1] CRAN (R 4.0.2)                         
##  ggplot2                  3.3.2    2020-06-19 [1] CRAN (R 4.0.2)                         
##  glue                     1.4.2    2020-08-27 [1] CRAN (R 4.0.2)                         
##  golem                    0.2.1    2020-03-05 [1] CRAN (R 4.0.2)                         
##  gridExtra                2.3      2017-09-09 [1] CRAN (R 4.0.2)                         
##  gtable                   0.3.0    2019-03-25 [1] CRAN (R 4.0.2)                         
##  htmltools                0.5.0    2020-06-16 [1] CRAN (R 4.0.2)                         
##  htmlwidgets              1.5.2    2020-10-03 [1] CRAN (R 4.0.2)                         
##  httpuv                   1.5.4    2020-06-06 [1] CRAN (R 4.0.2)                         
##  httr                     1.4.2    2020-07-20 [1] CRAN (R 4.0.2)                         
##  interactiveDisplayBase   1.28.0   2020-10-27 [1] Bioconductor                           
##  IRanges                * 2.24.0   2020-10-27 [1] Bioconductor                           
##  irlba                    2.3.3    2019-02-05 [1] CRAN (R 4.0.2)                         
##  iterators                1.0.13   2020-10-15 [1] CRAN (R 4.0.2)                         
##  jpeg                     0.1-8.1  2019-10-24 [1] CRAN (R 4.0.2)                         
##  jsonlite                 1.7.1    2020-09-07 [1] CRAN (R 4.0.2)                         
##  knitcitations          * 1.0.10   2020-11-03 [1] Github (cboettig/knitcitations@ea5d202)
##  knitr                    1.30     2020-09-22 [1] CRAN (R 4.0.2)                         
##  labeling                 0.4.2    2020-10-20 [1] CRAN (R 4.0.2)                         
##  later                    1.1.0.1  2020-06-05 [1] CRAN (R 4.0.2)                         
##  lattice                  0.20-41  2020-04-02 [1] CRAN (R 4.0.3)                         
##  lazyeval                 0.2.2    2019-03-15 [1] CRAN (R 4.0.2)                         
##  lifecycle                0.2.0    2020-03-06 [1] CRAN (R 4.0.2)                         
##  lubridate                1.7.9    2020-06-08 [1] CRAN (R 4.0.2)                         
##  magrittr                 1.5      2014-11-22 [1] CRAN (R 4.0.2)                         
##  maps                     3.3.0    2018-04-03 [1] CRAN (R 4.0.2)                         
##  Matrix                   1.2-18   2019-11-27 [1] CRAN (R 4.0.3)                         
##  MatrixGenerics         * 1.2.0    2020-10-27 [1] Bioconductor                           
##  matrixStats            * 0.57.0   2020-09-25 [1] CRAN (R 4.0.2)                         
##  memoise                  1.1.0    2017-04-21 [1] CRAN (R 4.0.2)                         
##  mime                     0.9      2020-02-04 [1] CRAN (R 4.0.2)                         
##  munsell                  0.5.0    2018-06-12 [1] CRAN (R 4.0.2)                         
##  pillar                   1.4.6    2020-07-10 [1] CRAN (R 4.0.2)                         
##  pkgconfig                2.0.3    2019-09-22 [1] CRAN (R 4.0.2)                         
##  pkgload                  1.1.0    2020-05-29 [1] CRAN (R 4.0.2)                         
##  plotly                   4.9.2.1  2020-04-04 [1] CRAN (R 4.0.2)                         
##  plyr                     1.8.6    2020-03-03 [1] CRAN (R 4.0.2)                         
##  png                      0.1-7    2013-12-03 [1] CRAN (R 4.0.2)                         
##  Polychrome               1.2.5    2020-03-29 [1] CRAN (R 4.0.2)                         
##  promises                 1.1.1    2020-06-09 [1] CRAN (R 4.0.2)                         
##  purrr                    0.3.4    2020-04-17 [1] CRAN (R 4.0.2)                         
##  R6                       2.5.0    2020-10-28 [1] CRAN (R 4.0.2)                         
##  rappdirs                 0.3.1    2016-03-28 [1] CRAN (R 4.0.2)                         
##  RColorBrewer             1.1-2    2014-12-07 [1] CRAN (R 4.0.2)                         
##  Rcpp                     1.0.5    2020-07-06 [1] CRAN (R 4.0.2)                         
##  RCurl                    1.98-1.2 2020-04-18 [1] CRAN (R 4.0.2)                         
##  readbitmap               0.1.5    2018-06-27 [1] CRAN (R 4.0.2)                         
##  RefManageR               1.3.0    2020-11-03 [1] Github (ropensci/RefManageR@ab8fe60)   
##  remotes                  2.2.0    2020-07-21 [1] CRAN (R 4.0.2)                         
##  rlang                    0.4.8    2020-10-08 [1] CRAN (R 4.0.2)                         
##  rmarkdown                2.5      2020-10-21 [1] CRAN (R 4.0.3)                         
##  roxygen2                 7.1.1    2020-06-27 [1] CRAN (R 4.0.2)                         
##  rprojroot                1.3-2    2018-01-03 [1] CRAN (R 4.0.2)                         
##  RSQLite                  2.2.1    2020-09-30 [1] CRAN (R 4.0.2)                         
##  rstudioapi               0.11     2020-02-07 [1] CRAN (R 4.0.2)                         
##  rsvd                     1.0.3    2020-02-17 [1] CRAN (R 4.0.2)                         
##  S4Vectors              * 0.28.0   2020-10-27 [1] Bioconductor                           
##  scales                   1.1.1    2020-05-11 [1] CRAN (R 4.0.2)                         
##  scater                   1.18.0   2020-10-27 [1] Bioconductor                           
##  scatterplot3d            0.3-41   2018-03-14 [1] CRAN (R 4.0.2)                         
##  scuttle                  1.0.0    2020-10-27 [1] Bioconductor                           
##  sessioninfo            * 1.1.1    2018-11-05 [1] CRAN (R 4.0.2)                         
##  shiny                    1.5.0    2020-06-23 [1] CRAN (R 4.0.2)                         
##  shinyWidgets             0.5.4    2020-10-06 [1] CRAN (R 4.0.2)                         
##  SingleCellExperiment   * 1.12.0   2020-10-27 [1] Bioconductor                           
##  spam                     2.5-1    2019-12-12 [1] CRAN (R 4.0.2)                         
##  sparseMatrixStats        1.2.0    2020-10-27 [1] Bioconductor                           
##  SpatialExperiment        1.0.0    2020-10-27 [1] Bioconductor                           
##  spatialLIBD            * 1.2.0    2020-10-29 [1] Bioconductor                           
##  stringi                  1.5.3    2020-09-09 [1] CRAN (R 4.0.2)                         
##  stringr                  1.4.0    2019-02-10 [1] CRAN (R 4.0.2)                         
##  SummarizedExperiment   * 1.20.0   2020-10-27 [1] Bioconductor                           
##  testthat                 3.0.0    2020-10-31 [1] CRAN (R 4.0.2)                         
##  tibble                   3.0.4    2020-10-12 [1] CRAN (R 4.0.2)                         
##  tidyr                    1.1.2    2020-08-27 [1] CRAN (R 4.0.2)                         
##  tidyselect               1.1.0    2020-05-11 [1] CRAN (R 4.0.2)                         
##  tiff                     0.1-5    2013-09-04 [1] CRAN (R 4.0.2)                         
##  usethis                  1.6.3    2020-09-17 [1] CRAN (R 4.0.2)                         
##  vctrs                    0.3.4    2020-08-29 [1] CRAN (R 4.0.2)                         
##  vipor                    0.4.5    2017-03-22 [1] CRAN (R 4.0.2)                         
##  viridis                  0.5.1    2018-03-29 [1] CRAN (R 4.0.2)                         
##  viridisLite              0.3.0    2018-02-01 [1] CRAN (R 4.0.1)                         
##  withr                    2.3.0    2020-09-22 [1] CRAN (R 4.0.2)                         
##  xfun                     0.19     2020-10-30 [1] CRAN (R 4.0.2)                         
##  xml2                     1.3.2    2020-04-23 [1] CRAN (R 4.0.2)                         
##  xtable                   1.8-4    2019-04-21 [1] CRAN (R 4.0.2)                         
##  XVector                  0.30.0   2020-10-28 [1] Bioconductor                           
##  yaml                     2.2.1    2020-02-01 [1] CRAN (R 4.0.2)                         
##  zlibbioc                 1.36.0   2020-10-28 [1] Bioconductor                           
## 
## [1] /Library/Frameworks/R.framework/Versions/4.0/Resources/library&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&#34;footnotes&#34;&gt;
&lt;hr /&gt;
&lt;ol&gt;
&lt;li id=&#34;fn1&#34;&gt;&lt;p&gt;I believe that it’s the junior year in the US.&lt;a href=&#34;#fnref1&#34; class=&#34;footnote-back&#34;&gt;↩︎&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Using Space Ranger at JHPCE</title>
      <link>http://LieberInstitute.github.io/rstatsclub/2020/10/20/using-space-ranger-at-jhpce/</link>
      <pubDate>Tue, 20 Oct 2020 00:00:00 +0000</pubDate>
      <guid>http://LieberInstitute.github.io/rstatsclub/2020/10/20/using-space-ranger-at-jhpce/</guid>
      <description>
&lt;link href=&#34;http://LieberInstitute.github.io/rstatsclub/rstatsclub/rmarkdown-libs/anchor-sections/anchor-sections.css&#34; rel=&#34;stylesheet&#34; /&gt;
&lt;script src=&#34;http://LieberInstitute.github.io/rstatsclub/rstatsclub/rmarkdown-libs/anchor-sections/anchor-sections.js&#34;&gt;&lt;/script&gt;


&lt;p&gt;By &lt;a href=&#34;https://github.com/Nick-Eagles&#34;&gt;Nick Eagles&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;As part of recent LIBD work with spatial gene expression, I recently was recommended the tool &lt;a href=&#34;https://support.10xgenomics.com/spatial-gene-expression/software/pipelines/latest/what-is-space-ranger&#34;&gt;&lt;strong&gt;Space Ranger&lt;/strong&gt;&lt;/a&gt;, which provides software pipelines walking Visium spatial RNA-seq samples through the steps we ultimately need to explore gene expression coupled with spatial information. In this blog post, I’ll explain how to start using Space Ranger at &lt;a href=&#34;https://jhpce.jhu.edu/&#34;&gt;&lt;strong&gt;JHPCE&lt;/strong&gt;&lt;/a&gt;, focusing heavily on the set-up details relevant to this cluster in particular.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2020-10-20-using-space-ranger-at-jhpce_files/space_ranger_icon.png&#34; width=&#34;250&#34; /&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://www.teepublic.com/en-au/sticker/4448011-space-ranger&#34;&gt;Image source&lt;/a&gt;&lt;/p&gt;
&lt;div id=&#34;what-is-space-ranger&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;What is Space Ranger&lt;/h2&gt;
&lt;p&gt;In practice, there are a fairly large number of computational steps we’d need to perform to produce spatial information about gene expression for a multiple-sample experiment, given just microscope images and Visium RNA-seq output. To start, we’d want our data in FASTQ format- then we’d have to worry about aligning reads to a reference genome, producing gene counts, normalizing data, and so on. Thankfully, Space Ranger bundles together these steps into three simple utilities. We won’t focus too much on how to use these individual utilities or the various features of Space Ranger, documented in detail &lt;a href=&#34;https://support.10xgenomics.com/spatial-gene-expression/software/pipelines/latest/what-is-space-ranger&#34;&gt;here&lt;/a&gt;; rather, this blog post will describe how to get Space Ranger up and running at the JHPCE cluster.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;using-the-spaceranger-module-at-jhpce&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Using the &lt;code&gt;spaceranger&lt;/code&gt; module at JHPCE&lt;/h2&gt;
&lt;p&gt;We make regular use of &lt;a href=&#34;https://jhpce.jhu.edu/knowledge-base/environment-modules/&#34;&gt;lmod environment modules at JHPCE&lt;/a&gt;, as a means of loading and running software without worrying about user set-up differences, manually modifying your PATH, or other nasty considerations. While some sets of modules are available system-wide (for any user), others are not accessible unless you specifically “use” them. To make LIBD-specific modules like &lt;code&gt;spaceranger&lt;/code&gt; available, you must “use” the set of modules explicitly:&lt;/p&gt;
&lt;pre class=&#34;bash&#34;&gt;&lt;code&gt;module use /jhpce/shared/jhpce/modulefiles/libd&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If you want to avoid typing this every time you want to use an LIBD module, consider the &lt;code&gt;.bashrc&lt;/code&gt; trick described &lt;a href=&#34;https://github.com/LieberInstitute/jhpce_module_config#recurrent-usage&#34;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Next, let’s load the &lt;code&gt;spaceranger&lt;/code&gt; module in particular.&lt;/p&gt;
&lt;pre class=&#34;bash&#34;&gt;&lt;code&gt;module load spaceranger&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;em&gt;Note&lt;/em&gt;: the above code loads the default version of the &lt;code&gt;spaceranger&lt;/code&gt; module currently available. You can see which versions are available with:&lt;/p&gt;
&lt;pre class=&#34;bash&#34;&gt;&lt;code&gt;module avail spaceranger

# Example output may look like this: 
##-------------------------- /jhpce/shared/jhpce/modulefiles/libd ---------------------------
##   spaceranger/1.1.0
##

# You may also load a specific version of the module:
module load spaceranger/1.1.0&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;first-script&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;First script&lt;/h2&gt;
&lt;p&gt;Next, let’s run a test of the Space Ranger software on example data they provide. We will write a bash script to load the &lt;code&gt;spaceranger&lt;/code&gt; module as above, and call the executable. We could easily have &lt;code&gt;qrsh&lt;/code&gt;’d into a compute node and run the few lines of code interactively, but I recommend writing a bash script, which we will &lt;code&gt;qsub&lt;/code&gt;, for a few reasons:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A script documents the code you have run, allowing others to see and reproduce the work you’ve done.&lt;/li&gt;
&lt;li&gt;When we &lt;code&gt;qsub&lt;/code&gt; the script, we include arguments regarding memory and other hardware resources, which you otherwise would have to remember or estimate each time you interactively run this or similar code.&lt;/li&gt;
&lt;li&gt;Using &lt;code&gt;qsub&lt;/code&gt; allows long-running code to continue without having to worry about keeping your session running and network-connected. This example won’t take long to run, but Space Ranger on real experiments likely will.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Let’s start by writing the “skeleton” of our script, including only the basic required code before worrying about memory, logging, or other more complicated issues. Note that this will create a directory called “tiny” with the example outputs in the current working directory. I’m opening a new file I’ll call &lt;code&gt;spaceranger_test.sh&lt;/code&gt;, and the contents should like something like this:&lt;/p&gt;
&lt;pre class=&#34;bash&#34;&gt;&lt;code&gt;#  Make LIBD modules available, and load the &amp;quot;spaceranger&amp;quot; module
module use /jhpce/shared/jhpce/modulefiles/libd
module load spaceranger

#  Test Space Ranger on already-installed example data
spaceranger testrun --id=tiny&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If you &lt;code&gt;qsub&lt;/code&gt; this script as-is, it will produce two log files in your home directory, containing verbose and somewhat cryptic errors. We’d prefer a single clearly-named log file written to the same directory as our bash script, and of course to fix the source of the Space Ranger error. In this case, we simply need to provide more memory to fix the main error.&lt;/p&gt;
&lt;p&gt;Below, we flesh out &lt;code&gt;spaceranger_test.sh&lt;/code&gt; with arguments to &lt;code&gt;qsub&lt;/code&gt; which will improve logging and provide sufficient memory. These arguments are indicated by lines beginning with &lt;code&gt;#$&lt;/code&gt;.&lt;/p&gt;
&lt;pre class=&#34;bash&#34;&gt;&lt;code&gt;#  Specify memory and other details below. In order:
#    &amp;quot;-cwd&amp;quot;: write the log file to the current working directory
#    &amp;quot;-o&amp;quot; and &amp;quot;-e&amp;quot;: combine &amp;#39;STDOUT&amp;#39; and &amp;#39;STDERR&amp;#39; messages into the same log file
#    &amp;quot;-l mem_free=20G,h_vmem=20G&amp;quot;: request 20G of memory free, which may not be exceeded

#$ -cwd
#$ -o spaceranger_test.txt
#$ -e spaceranger_test.txt
#$ -l mem_free=20G,h_vmem=20G

#  Make LIBD modules available, and load the &amp;quot;spaceranger&amp;quot; module
module use /jhpce/shared/jhpce/modulefiles/libd
module load spaceranger

#  Test Space Ranger on already-installed example data
spaceranger testrun --id=tiny&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now, we can actually submit the script and wait for the job to complete.&lt;/p&gt;
&lt;pre class=&#34;bash&#34;&gt;&lt;code&gt;qsub spaceranger_test.sh&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If you open &lt;code&gt;spaceranger_test.txt&lt;/code&gt; after the job completes, you should see that the test was successful. However, there is a worrying warning suggesting that Space Ranger is not properly made aware of the memory to which it should have access:&lt;/p&gt;
&lt;pre class=&#34;bash&#34;&gt;&lt;code&gt;Martian Runtime - v4.0.0
2020-10-19 15:48:59 [jobmngr] WARNING: configured to use 453GB of local memory, but only 331.3GB is currently available.
2020-10-19 15:48:59 [jobmngr] WARNING: The current virtual address space size
                              limit is too low.
    Limiting virtual address space size interferes with the operation of many
    common libraries and programs, and is not recommended.
    Contact your system administrator to remove this limit.&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Rather than using 20GB of memory, Space Ranger believes it has a whopping 453GB of memory to work with, though only ~331GB are actually free. In the next section we will communicate memory and even CPU constraints to Space Ranger with arguments to the &lt;code&gt;spaceranger&lt;/code&gt; command.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;exploring-memory-and-parallelization-options&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Exploring memory and parallelization options&lt;/h2&gt;
&lt;p&gt;Below, we will construct another bash script to submit with &lt;code&gt;qsub&lt;/code&gt;, demonstrating how to properly specify memory and number of CPUs for a hypothetical dataset. Suppose we have an experiment with multiple FASTQ files and a microscope slide image. We would like to call the &lt;code&gt;spaceranger count&lt;/code&gt; command on this input data, making use of parallelization for speed. Let’s use 5 CPU cores and a total of 60GB of memory. Following the documentation &lt;a href=&#34;https://support.10xgenomics.com/spatial-gene-expression/software/pipelines/latest/advanced/job-submission-mode&#34;&gt;here&lt;/a&gt;, we can create the template script we’ll call &lt;code&gt;SR_count_example.sh&lt;/code&gt;, appropriate for running at JHPCE:&lt;/p&gt;
&lt;pre class=&#34;bash&#34;&gt;&lt;code&gt;# Specify memory and other details. Note that &amp;#39;mem_free&amp;#39; and &amp;#39;h_vmem&amp;#39; specify
# per-core memory (12G * 5 cores = 60GB total, as we want), as indicated here:
# https://jhpce.jhu.edu/knowledge-base/how-to/#multicore

#$ -cwd
#$ -o SR_count_example.txt
#$ -e SR_count_example.txt
#$ -l mem_free=12G,h_vmem=12G
#$ -pe local 5

#  Make LIBD modules available, and load the &amp;quot;spaceranger&amp;quot; module
module use /jhpce/shared/jhpce/modulefiles/libd
module load spaceranger

#  The main Space Ranger command
spaceranger count \
    --id=&amp;lt;SOME RUN ID HERE&amp;gt; \
    --fastqs &amp;lt;LIST OF FASTQ PATHS HERE&amp;gt; \
    --image &amp;lt;IMAGE PATH HERE&amp;gt; \
    --jobmode=local \ # we will use one &amp;quot;node&amp;quot; of the cluster, which has many cores available
    --localcores=5 \  # we requested 5 cores at the top
    --localmem=54     # 60GB * 0.9 = 54GB; using 90% of total memory requested is recommended&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In practice, you’d specify an &lt;code&gt;--id&lt;/code&gt;, the FASTQ paths &lt;code&gt;--fastqs&lt;/code&gt;, and the microscope image &lt;code&gt;--image&lt;/code&gt; in the above script, for your experiment. Then simply submit the script as a job!&lt;/p&gt;
&lt;pre class=&#34;bash&#34;&gt;&lt;code&gt;qsub SR_count_example.sh&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;em&gt;Note&lt;/em&gt;: you might also be interested in &lt;em&gt;&lt;a href=&#34;https://github.com/LieberInstitute/sgejobs&#34;&gt;sgejobs&lt;/a&gt;&lt;/em&gt; that we explored in a LIBD rstats club session. You can use it to create SGE &lt;code&gt;bash&lt;/code&gt; scripts.&lt;/p&gt;
&lt;iframe width=&#34;560&#34; height=&#34;315&#34; src=&#34;https://www.youtube.com/embed/mw5aQFX12wQ&#34; frameborder=&#34;0&#34; allow=&#34;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture&#34; allowfullscreen&gt;
&lt;/iframe&gt;
&lt;div id=&#34;acknowledgments&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Acknowledgments&lt;/h3&gt;
&lt;p&gt;This blog post was made possible thanks to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=blogdown&#34;&gt;blogdown&lt;/a&gt;&lt;/em&gt; &lt;a id=&#39;cite-Xie_2017&#39;&gt;&lt;/a&gt;(&lt;a href=&#39;https://github.com/rstudio/blogdown&#39;&gt;Xie, Hill, and Thomas, 2017&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=knitcitations&#34;&gt;knitcitations&lt;/a&gt;&lt;/em&gt; &lt;a id=&#39;cite-Boettiger_2019&#39;&gt;&lt;/a&gt;(&lt;a href=&#39;https://CRAN.R-project.org/package=knitcitations&#39;&gt;Boettiger, 2019&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=sessioninfo&#34;&gt;sessioninfo&lt;/a&gt;&lt;/em&gt; &lt;a id=&#39;cite-Csardi_2018&#39;&gt;&lt;/a&gt;(&lt;a href=&#39;https://CRAN.R-project.org/package=sessioninfo&#39;&gt;Csárdi, core, Wickham, Chang, et al., 2018&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div id=&#34;references&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;References&lt;/h3&gt;
&lt;p&gt;
&lt;a id=&#39;bib-Boettiger_2019&#39;&gt;&lt;/a&gt;&lt;a href=&#34;#cite-Boettiger_2019&#34;&gt;[1]&lt;/a&gt;&lt;cite&gt;
C. Boettiger.
&lt;em&gt;knitcitations: Citations for ‘Knitr’ Markdown Files&lt;/em&gt;.
R package version 1.0.10.
2019.
URL: &lt;a href=&#34;https://CRAN.R-project.org/package=knitcitations&#34;&gt;https://CRAN.R-project.org/package=knitcitations&lt;/a&gt;.&lt;/cite&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;a id=&#39;bib-Csardi_2018&#39;&gt;&lt;/a&gt;&lt;a href=&#34;#cite-Csardi_2018&#34;&gt;[2]&lt;/a&gt;&lt;cite&gt;
G. Csárdi, R. core, H. Wickham, W. Chang, et al.
&lt;em&gt;sessioninfo: R Session Information&lt;/em&gt;.
R package version 1.1.1.
2018.
URL: &lt;a href=&#34;https://CRAN.R-project.org/package=sessioninfo&#34;&gt;https://CRAN.R-project.org/package=sessioninfo&lt;/a&gt;.&lt;/cite&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;a id=&#39;bib-Xie_2017&#39;&gt;&lt;/a&gt;&lt;a href=&#34;#cite-Xie_2017&#34;&gt;[3]&lt;/a&gt;&lt;cite&gt;
Y. Xie, A. P. Hill, and A. Thomas.
&lt;em&gt;blogdown: Creating Websites with R Markdown&lt;/em&gt;.
ISBN 978-0815363729.
Boca Raton, Florida: Chapman and Hall/CRC, 2017.
URL: &lt;a href=&#34;https://github.com/rstudio/blogdown&#34;&gt;https://github.com/rstudio/blogdown&lt;/a&gt;.&lt;/cite&gt;
&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;reproducibility&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Reproducibility&lt;/h3&gt;
&lt;pre&gt;&lt;code&gt;## ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
##  setting  value                       
##  version  R version 4.0.2 (2020-06-22)
##  os       macOS Catalina 10.15.7      
##  system   x86_64, darwin17.0          
##  ui       X11                         
##  language (EN)                        
##  collate  en_US.UTF-8                 
##  ctype    en_US.UTF-8                 
##  tz       America/New_York            
##  date     2020-10-21                  
## 
## ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
##  package       * version date       lib source                            
##  assertthat      0.2.1   2019-03-21 [1] CRAN (R 4.0.0)                    
##  bibtex          0.4.2.3 2020-09-19 [1] CRAN (R 4.0.2)                    
##  BiocManager     1.30.10 2019-11-16 [1] CRAN (R 4.0.0)                    
##  BiocStyle     * 2.17.1  2020-09-24 [1] Bioconductor                      
##  blogdown        0.21.19 2020-10-21 [1] Github (rstudio/blogdown@1a7ad52) 
##  bookdown        0.21    2020-10-13 [1] CRAN (R 4.0.2)                    
##  cli             2.1.0   2020-10-12 [1] CRAN (R 4.0.2)                    
##  colorout      * 1.2-2   2020-05-18 [1] Github (jalvesaq/colorout@726d681)
##  crayon          1.3.4   2017-09-16 [1] CRAN (R 4.0.0)                    
##  digest          0.6.26  2020-10-17 [1] CRAN (R 4.0.2)                    
##  evaluate        0.14    2019-05-28 [1] CRAN (R 4.0.0)                    
##  fansi           0.4.1   2020-01-08 [1] CRAN (R 4.0.0)                    
##  generics        0.0.2   2018-11-29 [1] CRAN (R 4.0.0)                    
##  glue            1.4.2   2020-08-27 [1] CRAN (R 4.0.2)                    
##  htmltools       0.5.0   2020-06-16 [1] CRAN (R 4.0.2)                    
##  httr            1.4.2   2020-07-20 [1] CRAN (R 4.0.2)                    
##  jsonlite        1.7.1   2020-09-07 [1] CRAN (R 4.0.2)                    
##  knitcitations * 1.0.10  2019-09-15 [1] CRAN (R 4.0.0)                    
##  knitr           1.30    2020-09-22 [1] CRAN (R 4.0.2)                    
##  lubridate       1.7.9   2020-06-08 [1] CRAN (R 4.0.2)                    
##  magrittr        1.5     2014-11-22 [1] CRAN (R 4.0.0)                    
##  plyr            1.8.6   2020-03-03 [1] CRAN (R 4.0.0)                    
##  R6              2.4.1   2019-11-12 [1] CRAN (R 4.0.0)                    
##  Rcpp            1.0.5   2020-07-06 [1] CRAN (R 4.0.2)                    
##  RefManageR      1.2.12  2019-04-03 [1] CRAN (R 4.0.0)                    
##  rlang           0.4.8   2020-10-08 [1] CRAN (R 4.0.2)                    
##  rmarkdown       2.5     2020-10-21 [1] CRAN (R 4.0.2)                    
##  sessioninfo   * 1.1.1   2018-11-05 [1] CRAN (R 4.0.0)                    
##  stringi         1.5.3   2020-09-09 [1] CRAN (R 4.0.2)                    
##  stringr         1.4.0   2019-02-10 [1] CRAN (R 4.0.0)                    
##  withr           2.3.0   2020-09-22 [1] CRAN (R 4.0.2)                    
##  xfun            0.18    2020-09-29 [1] CRAN (R 4.0.2)                    
##  xml2            1.3.2   2020-04-23 [1] CRAN (R 4.0.0)                    
##  yaml            2.2.1   2020-02-01 [1] CRAN (R 4.0.0)                    
## 
## [1] /Library/Frameworks/R.framework/Versions/4.0branch/Resources/library&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>R 101</title>
      <link>http://LieberInstitute.github.io/rstatsclub/2018/12/24/r_101/</link>
      <pubDate>Mon, 24 Dec 2018 00:00:00 +0000</pubDate>
      <guid>http://LieberInstitute.github.io/rstatsclub/2018/12/24/r_101/</guid>
      <description>


&lt;div id=&#34;happy-holidays&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;HAPPY HOLIDAYS!!!🎉⛄🎆🍾❄&lt;/h3&gt;
&lt;p&gt;In the spirit of the coming new year and new beginnings, we created a tutorial for getting started or restarted with R. If you are new to R or have dabbled in R but haven’t used it much recently, then this post is for you. We will focus on data classes and types, as well as data wrangling, and we will provide basic statistics and basic plotting examples using real data. Enjoy!&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;https://carriewright11.github.io&#34;&gt;C.Wright&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;As with most programming tutorials, let’s start with a good’ol “Hello World”.&lt;/p&gt;
&lt;div id=&#34;first-command&#34; class=&#34;section level6&#34;&gt;
&lt;h6&gt;1) First Command&lt;/h6&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;print(&amp;quot;Hello World&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] &amp;quot;Hello World&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;install-and-load-packages-and-data&#34; class=&#34;section level6&#34;&gt;
&lt;h6&gt;2) Install and Load Packages and Data&lt;/h6&gt;
&lt;p&gt;Now we need some data. Packages are collections of functions and/or data. There are published packages that you can use from the community such as these two packages, or you can make your own package for your own private use.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;install.packages(&amp;quot;babynames&amp;quot;)  
install.packages(&amp;quot;titanic&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now that we have installed the packages, we need to load them.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(&amp;quot;babynames&amp;quot;)
library(&amp;quot;titanic&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Each installation of R comes with quite a bit of data! Now we want to load the “quake” data - there are lots of other options.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;data(&amp;quot;quakes&amp;quot;)
data() #this will list all of the datasets available&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;assigning-objects&#34; class=&#34;section level6&#34;&gt;
&lt;h6&gt;3) Assigning Objects&lt;/h6&gt;
&lt;p&gt;Objects can be many different things ranging from a simple number to a giant matrix, but they refer to things that you can manipulate in R.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;myString &amp;lt;- &amp;quot;Hello World&amp;quot; #notice how we need &amp;quot;&amp;quot; around words, aka strings
myString #take a look at myString&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] &amp;quot;Hello World&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;A &amp;lt;- 14 #now we do not need &amp;quot;&amp;quot; around numbers
A #take a look at A&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] 14&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;A = 5 #can also use the equal sign to assign objects
A #notice how A has changed&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] 5&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;assigning-objects-with-multiple-elements&#34; class=&#34;section level6&#34;&gt;
&lt;h6&gt;4) Assigning Objects with Multiple Elements&lt;/h6&gt;
&lt;p&gt;Now lets assign a more complex object&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;height &amp;lt;- c(5.5, 4.5, 5, 5.6, 5.8, 5.2, 6, 6.2, 5.9, 5.8, 6, 5.9) #this is called a vector
colors_to_use &amp;lt;- c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)# a vector of strings&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;classes&#34; class=&#34;section level6&#34;&gt;
&lt;h6&gt;5) Classes&lt;/h6&gt;
&lt;p&gt;There are a variety of different object classes. We can use the function class() to tell us what class an object belongs to.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;class(height) #this is a numeric vector&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] &amp;quot;numeric&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;class(colors_to_use) #this is a character vector&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] &amp;quot;character&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;heightdf&amp;lt;-data.frame(height, gender =c(&amp;quot;F&amp;quot;, &amp;quot;F&amp;quot;, &amp;quot;F&amp;quot;, &amp;quot;F&amp;quot;, &amp;quot;F&amp;quot;, &amp;quot;F&amp;quot;, &amp;quot;M&amp;quot;, &amp;quot;M&amp;quot;, &amp;quot;M&amp;quot;, &amp;quot;M&amp;quot;, &amp;quot;M&amp;quot;, &amp;quot;M&amp;quot;))
heightdf #take a look at the dataframe&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##    height gender
## 1     5.5      F
## 2     4.5      F
## 3     5.0      F
## 4     5.6      F
## 5     5.8      F
## 6     5.2      F
## 7     6.0      M
## 8     6.2      M
## 9     5.9      M
## 10    5.8      M
## 11    6.0      M
## 12    5.9      M&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;class(heightdf) #check the class&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] &amp;quot;data.frame&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;heightdf$height # we can refer to indivdual columns based on the column name&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##  [1] 5.5 4.5 5.0 5.6 5.8 5.2 6.0 6.2 5.9 5.8 6.0 5.9&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;class(heightdf$gender)#here we see a factor(categorical variable - stored in R as with integer levels&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] &amp;quot;factor&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;logical_variable&amp;lt;-height == heightdf$height #this shows that all the elements in the height column of the heightdf dataframe are equivalent to those of the height vector
class(logical_variable)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] &amp;quot;logical&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;matrix_variable &amp;lt;- matrix(height, nrow = 2, ncol = 3)#now we will make a matrix
matrix_variable #take a look at the matrix&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##      [,1] [,2] [,3]
## [1,]  5.5  5.0  5.8
## [2,]  4.5  5.6  5.2&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;class(matrix_variable)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] &amp;quot;matrix&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;subsetting-data&#34; class=&#34;section level6&#34;&gt;
&lt;h6&gt;6) Subsetting Data&lt;/h6&gt;
&lt;p&gt;Now that we can assign or instantiate objects, let’s try to look at or manipulate specific parts of more complex objects.&lt;/p&gt;
&lt;p&gt;Lets create an object of male heights by grabbing rows from heightdf.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;maleIndex&amp;lt;-which(heightdf$gender == &amp;quot;M&amp;quot;) #lets try subsetting just the male data out of the heightdf - first we need to determine which rows of the dataframe are male
maleIndex # this is now just a list of rows&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1]  7  8  9 10 11 12&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;heightmale&amp;lt;-heightdf[maleIndex,] #now we will use the brackets to grab these rows - we use the comma to indicate that we want rows not columns
heightmale # now this is just the males&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##    height gender
## 7     6.0      M
## 8     6.2      M
## 9     5.9      M
## 10    5.8      M
## 11    6.0      M
## 12    5.9      M&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Here is another way using a package called dpylr:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;install.packages(&amp;quot;dplyr&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Here we are creating an object of height data for males over 6 feet.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(dplyr) #load a useful package for subsetting data&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 
## Attaching package: &amp;#39;dplyr&amp;#39;&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## The following objects are masked from &amp;#39;package:stats&amp;#39;:
## 
##     filter, lag&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## The following objects are masked from &amp;#39;package:base&amp;#39;:
## 
##     intersect, setdiff, setequal, union&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;#heightmale_over6feet &amp;lt;- dplyr::subset(heightdf, gender ==&amp;quot;M&amp;quot; &amp;amp; height &amp;gt;=6)#need to use column names to describe what we want to pull out of our data
heightmale_over6feet &amp;lt;- subset(heightdf, gender ==&amp;quot;M&amp;quot; &amp;amp; height &amp;gt;=6)#need to use column names to describe what we want to pull out of our data

heightmale_over6feet#now we just have the males 6 feet or over&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##    height gender
## 7     6.0      M
## 8     6.2      M
## 11    6.0      M&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now let’s create an object by grabbing part of an object based on its columns.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;gender1&amp;lt;-heightdf[2]#notice how here we use the brackets but no comma
gender1&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##    gender
## 1       F
## 2       F
## 3       F
## 4       F
## 5       F
## 6       F
## 7       M
## 8       M
## 9       M
## 10      M
## 11      M
## 12      M&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;gender2&amp;lt;-heightdf$gender#this does the same thing #notice this way we loose the data architecture - no longer a dataframe
gender2&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##  [1] F F F F F F M M M M M M
## Levels: F M&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;gender2&amp;lt;-data.frame(gender =heightdf$gender)# this however stays as a dataframe
gender2&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##    gender
## 1       F
## 2       F
## 3       F
## 4       F
## 5       F
## 6       F
## 7       M
## 8       M
## 9       M
## 10      M
## 11      M
## 12      M&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;genderindex&amp;lt;- which(colnames(heightdf) == &amp;quot;gender&amp;quot;) #now wwe will use which() to select columns named gender
genderindex&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] 2&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;gender3 &amp;lt;-heightdf[genderindex]#now we will use the brackets to grab just this column
gender3&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##    gender
## 1       F
## 2       F
## 3       F
## 4       F
## 5       F
## 6       F
## 7       M
## 8       M
## 9       M
## 10      M
## 11      M
## 12      M&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;identical(gender1, gender2)#lets see if they are identical - this is a helpful function - can only compare two variables at a time&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] TRUE&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;gender1==gender2 # are they the same? should say true if they are&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##       gender
##  [1,]   TRUE
##  [2,]   TRUE
##  [3,]   TRUE
##  [4,]   TRUE
##  [5,]   TRUE
##  [6,]   TRUE
##  [7,]   TRUE
##  [8,]   TRUE
##  [9,]   TRUE
## [10,]   TRUE
## [11,]   TRUE
## [12,]   TRUE&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;gender2==gender3 # are they the same? should say true if they are&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##       gender
##  [1,]   TRUE
##  [2,]   TRUE
##  [3,]   TRUE
##  [4,]   TRUE
##  [5,]   TRUE
##  [6,]   TRUE
##  [7,]   TRUE
##  [8,]   TRUE
##  [9,]   TRUE
## [10,]   TRUE
## [11,]   TRUE
## [12,]   TRUE&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now let’s try to look at/grab specific values.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;height2&amp;lt;-c(6, 5.5, 6, 6, 6, 6, 4.3) #6 and 5.5 are in our orignal height vector but not 4.3
which(height %in% height2) # what of our orignial data is also found in height2&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1]  1  7 11&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;heightdf[which(height %in% height2),] # here we skipped making another variable for the index&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##    height gender
## 1     5.5      F
## 7     6.0      M
## 11    6.0      M&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;#we can also use a function clalled grep
wanted_heights_index&amp;lt;-grep(5.9, heightdf$height)
heightdf[wanted_heights_index,] #now we just have the samples who are 5.9&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##    height gender
## 9     5.9      M
## 12    5.9      M&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;#say we want to know the value of an element at a particular location
heightdf$height[2] #second value in the height column&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] 4.5&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;heightdf$height[1:3]# first three valeus in the height column&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] 5.5 4.5 5.0&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This allows you to grab random data points.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;sample(heightdf$height, 2)#takes a random sample from a vector of the specified number of elements&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] 5.5 5.9&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;sample.int(1000999, 2)#takes a random sample from 1 to the first whole number specified. Thue number of random values to output is given by the second number.&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] 161093 430020&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;plotting-data&#34; class=&#34;section level6&#34;&gt;
&lt;h6&gt;7) Plotting Data&lt;/h6&gt;
&lt;p&gt;Now let’s try plotting some data and perform some statistical tests.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;boxplot(heightdf$height~heightdf$gender)#simple boxplot&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2018-12-24-R_101_files/figure-html/unnamed-chunk-14-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;#lets make a fancy boxplot
boxplot(heightdf$height~heightdf$gender, col = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;), ylab = &amp;quot;Height&amp;quot;, xlab = &amp;quot;Gender&amp;quot;, main = &amp;quot;Relationship of gender and height&amp;quot;, cex.lab =2, cex.main = 2, cex.axis = 1.3, par(mar=c(5, 5, 5, 5)))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2018-12-24-R_101_files/figure-html/unnamed-chunk-14-2.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;hist(heightdf$height)#histogram&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2018-12-24-R_101_files/figure-html/unnamed-chunk-14-3.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;heightdf$age &amp;lt;-c(20, 30, 15, 20, 40, 14, 35, 40, 17, 16, 25, 16)##adding another variable to a dataframe
plot(heightdf$height)#one variable&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2018-12-24-R_101_files/figure-html/unnamed-chunk-14-4.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;plot(y=heightdf$height, x=heightdf$age)#scatterplot of 2 variables
height_age&amp;lt;-lm(heightdf$height~heightdf$age)#perform a regression on the data - evaluate height and age relationship
summary(height_age)#shows the stats results from the regression&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 
## Call:
## lm(formula = heightdf$height ~ heightdf$age)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.1999 -0.1154  0.1348  0.3634  0.3943 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(&amp;gt;|t|)    
## (Intercept)   5.28384    0.39367  13.422 1.01e-07 ***
## heightdf$age  0.01387    0.01527   0.908    0.385    
## ---
## Signif. codes:  0 &amp;#39;***&amp;#39; 0.001 &amp;#39;**&amp;#39; 0.01 &amp;#39;*&amp;#39; 0.05 &amp;#39;.&amp;#39; 0.1 &amp;#39; &amp;#39; 1
## 
## Residual standard error: 0.4973 on 10 degrees of freedom
## Multiple R-squared:  0.07616,    Adjusted R-squared:  -0.01622 
## F-statistic: 0.8244 on 1 and 10 DF,  p-value: 0.3853&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;abline(height_age)#add regression line to plot&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2018-12-24-R_101_files/figure-html/unnamed-chunk-14-5.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;cor.test(y=heightdf$height, x=heightdf$age)#shows the same p value when performing a correlation test&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 
##  Pearson&amp;#39;s product-moment correlation
## 
## data:  heightdf$age and heightdf$height
## t = 0.90797, df = 10, p-value = 0.3853
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.3539944  0.7336745
## sample estimates:
##       cor 
## 0.2759734&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;more-statistical-tests&#34; class=&#34;section level6&#34;&gt;
&lt;h6&gt;8) More Statistical Tests&lt;/h6&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;t.test(heightdf$height~heightdf$gender)#try a t test between male height and female height - this is significant!&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 
##  Welch Two Sample t-test
## 
## data:  heightdf$height by heightdf$gender
## t = -3.4903, df = 5.8325, p-value = 0.01359
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.194177 -0.205823
## sample estimates:
## mean in group F mean in group M 
##        5.266667        5.966667&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;#if p&amp;lt;0.05 it is generally considered significant
fit &amp;lt;-aov(heightdf$height~heightdf$gender + heightdf$age)#now lets perform an anova or multiple regression
summary(fit)# here are the results&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##                 Df Sum Sq Mean Sq F value  Pr(&amp;gt;F)   
## heightdf$gender  1 1.4700  1.4700  12.167 0.00685 **
## heightdf$age     1 0.1193  0.1193   0.987 0.34638   
## Residuals        9 1.0874  0.1208                   
## ---
## Signif. codes:  0 &amp;#39;***&amp;#39; 0.001 &amp;#39;**&amp;#39; 0.01 &amp;#39;*&amp;#39; 0.05 &amp;#39;.&amp;#39; 0.1 &amp;#39; &amp;#39; 1&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;anova(fit)# same results&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Analysis of Variance Table
## 
## Response: heightdf$height
##                 Df  Sum Sq Mean Sq F value   Pr(&amp;gt;F)   
## heightdf$gender  1 1.47000 1.47000 12.1668 0.006851 **
## heightdf$age     1 0.11928 0.11928  0.9872 0.346383   
## Residuals        9 1.08739 0.12082                    
## ---
## Signif. codes:  0 &amp;#39;***&amp;#39; 0.001 &amp;#39;**&amp;#39; 0.01 &amp;#39;*&amp;#39; 0.05 &amp;#39;.&amp;#39; 0.1 &amp;#39; &amp;#39; 1&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;fit &amp;lt;-lm(heightdf$height~heightdf$gender + heightdf$age)# performing as multiple regression
summary(fit) #gives the same result as above - this is an anova but the results are presented differently&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 
## Call:
## lm(formula = heightdf$height ~ heightdf$gender + heightdf$age)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.83944 -0.07318  0.02918  0.12062  0.36706 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(&amp;gt;|t|)    
## (Intercept)       5.01995    0.28600  17.552 2.86e-08 ***
## heightdf$genderM  0.68225    0.20148   3.386  0.00805 ** 
## heightdf$age      0.01065    0.01072   0.994  0.34638    
## ---
## Signif. codes:  0 &amp;#39;***&amp;#39; 0.001 &amp;#39;**&amp;#39; 0.01 &amp;#39;*&amp;#39; 0.05 &amp;#39;.&amp;#39; 0.1 &amp;#39; &amp;#39; 1
## 
## Residual standard error: 0.3476 on 9 degrees of freedom
## Multiple R-squared:  0.5938, Adjusted R-squared:  0.5035 
## F-statistic: 6.577 on 2 and 9 DF,  p-value: 0.01736&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;anova(fit)#also gives the same result&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Analysis of Variance Table
## 
## Response: heightdf$height
##                 Df  Sum Sq Mean Sq F value   Pr(&amp;gt;F)   
## heightdf$gender  1 1.47000 1.47000 12.1668 0.006851 **
## heightdf$age     1 0.11928 0.11928  0.9872 0.346383   
## Residuals        9 1.08739 0.12082                    
## ---
## Signif. codes:  0 &amp;#39;***&amp;#39; 0.001 &amp;#39;**&amp;#39; 0.01 &amp;#39;*&amp;#39; 0.05 &amp;#39;.&amp;#39; 0.1 &amp;#39; &amp;#39; 1&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Let’s do a more classic anova - using a categorical variable with more than two categories.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;heightdf$country &amp;lt;-c(&amp;quot;British&amp;quot;, &amp;quot;French&amp;quot;, &amp;quot;British&amp;quot;, &amp;quot;Dutch&amp;quot;, &amp;quot;Dutch&amp;quot;, &amp;quot;French&amp;quot;, &amp;quot;Dutch&amp;quot;, &amp;quot;Dutch&amp;quot;, &amp;quot;British&amp;quot;, &amp;quot;French&amp;quot;, &amp;quot;British&amp;quot;, &amp;quot;French&amp;quot;)
fit &amp;lt;-aov(heightdf$height~heightdf$gender + heightdf$age + heightdf$country)
summary(fit)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##                  Df Sum Sq Mean Sq F value  Pr(&amp;gt;F)   
## heightdf$gender   1 1.4700  1.4700  19.157 0.00325 **
## heightdf$age      1 0.1193  0.1193   1.554 0.25258   
## heightdf$country  2 0.5503  0.2751   3.586 0.08471 . 
## Residuals         7 0.5371  0.0767                   
## ---
## Signif. codes:  0 &amp;#39;***&amp;#39; 0.001 &amp;#39;**&amp;#39; 0.01 &amp;#39;*&amp;#39; 0.05 &amp;#39;.&amp;#39; 0.1 &amp;#39; &amp;#39; 1&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;fit &amp;lt;-aov(heightdf$height~ heightdf$country)
summary(fit)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##                  Df Sum Sq Mean Sq F value Pr(&amp;gt;F)
## heightdf$country  2 0.6067  0.3033   1.319  0.315
## Residuals         9 2.0700  0.2300&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;anova(fit)# we see the results of country but not each country&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Analysis of Variance Table
## 
## Response: heightdf$height
##                  Df  Sum Sq Mean Sq F value Pr(&amp;gt;F)
## heightdf$country  2 0.60667 0.30333  1.3188 0.3146
## Residuals         9 2.07000 0.23000&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;TukeyHSD(fit)# this is how we get these results - none are significant&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = heightdf$height ~ heightdf$country)
## 
## $`heightdf$country`
##                 diff        lwr       upr     p adj
## Dutch-British   0.30 -0.6468152 1.2468152 0.6627841
## French-British -0.25 -1.1968152 0.6968152 0.7484769
## French-Dutch   -0.55 -1.4968152 0.3968152 0.2860337&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;fit &amp;lt;-lm(heightdf$height~heightdf$gender + heightdf$age + heightdf$country)
summary(fit) #gives the same result as above - this is an anova just different output&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 
## Call:
## lm(formula = heightdf$height ~ heightdf$gender + heightdf$age + 
##     heightdf$country)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.36474 -0.13454  0.03405  0.15333  0.33097 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(&amp;gt;|t|)    
## (Intercept)             5.46051    0.28225  19.346 2.46e-07 ***
## heightdf$genderM        0.71905    0.16131   4.458  0.00294 ** 
## heightdf$age           -0.01143    0.01263  -0.905  0.39547    
## heightdf$countryDutch   0.46574    0.26813   1.737  0.12596    
## heightdf$countryFrench -0.25286    0.19590  -1.291  0.23778    
## ---
## Signif. codes:  0 &amp;#39;***&amp;#39; 0.001 &amp;#39;**&amp;#39; 0.01 &amp;#39;*&amp;#39; 0.05 &amp;#39;.&amp;#39; 0.1 &amp;#39; &amp;#39; 1
## 
## Residual standard error: 0.277 on 7 degrees of freedom
## Multiple R-squared:  0.7993, Adjusted R-squared:  0.6847 
## F-statistic: 6.971 on 4 and 7 DF,  p-value: 0.01375&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;lets-use-some-real-data&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Lets use some &lt;strong&gt;real&lt;/strong&gt; data!&lt;/h3&gt;
&lt;div id=&#34;baby-name-data&#34; class=&#34;section level6&#34;&gt;
&lt;h6&gt;Baby Name Data&lt;/h6&gt;
&lt;p&gt;This is a very fun package to check out. If you have ever wondered about the popularity of your name or someone that you know, you will find this very interesting. I also have some friends who have used it to help them name their child.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;#recall that we installed and loaded this data earlier
head(babynames)#this is a special data type called a tibble - it is basically a fancy dataframe&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 6 x 5
##    year sex   name          n   prop
##   &amp;lt;dbl&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;chr&amp;gt;     &amp;lt;int&amp;gt;  &amp;lt;dbl&amp;gt;
## 1 1880. F     Mary       7065 0.0724
## 2 1880. F     Anna       2604 0.0267
## 3 1880. F     Emma       2003 0.0205
## 4 1880. F     Elizabeth  1939 0.0199
## 5 1880. F     Minnie     1746 0.0179
## 6 1880. F     Margaret   1578 0.0162&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;tail(babynames)# we can see the data goes up to 2015&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 6 x 5
##    year sex   name       n       prop
##   &amp;lt;dbl&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;chr&amp;gt;  &amp;lt;int&amp;gt;      &amp;lt;dbl&amp;gt;
## 1 2015. M     Zyah       5 0.00000247
## 2 2015. M     Zykell     5 0.00000247
## 3 2015. M     Zyking     5 0.00000247
## 4 2015. M     Zykir      5 0.00000247
## 5 2015. M     Zyrus      5 0.00000247
## 6 2015. M     Zyus       5 0.00000247&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;#how many unique baby names are there?
length(unique(babynames$name))# that&amp;#39;s a lot of baby names!&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] 95025&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;#check to see if your name is included
grep(&amp;quot;Bob&amp;quot;, unique(babynames$name))#looks like bob is in there&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##  [1]  1148  2502  4948  6510  6573  6999  9443 10761 13598 13794 18059
## [12] 18701 19278 19812 20116 20921 22002 23289 26242 27453 30231 34262
## [23] 35357 37057 37702 38171 41808 42382 43247 44135 44778 46568 50097&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;#Let&amp;#39;s look at the values
babynames$name [grep(&amp;quot;Bob&amp;quot;, unique(babynames$name))] # this is a vector so we don&amp;#39;t need to specify rows with a comma&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##  [1] &amp;quot;Scott&amp;quot;     &amp;quot;Tessie&amp;quot;    &amp;quot;Sadye&amp;quot;     &amp;quot;Una&amp;quot;       &amp;quot;Philomena&amp;quot;
##  [6] &amp;quot;Belva&amp;quot;     &amp;quot;Rufus&amp;quot;     &amp;quot;Dovie&amp;quot;     &amp;quot;Janette&amp;quot;   &amp;quot;Mammie&amp;quot;   
## [11] &amp;quot;Melinda&amp;quot;   &amp;quot;Honor&amp;quot;     &amp;quot;Arch&amp;quot;      &amp;quot;Denis&amp;quot;     &amp;quot;Orrie&amp;quot;    
## [16] &amp;quot;Floyd&amp;quot;     &amp;quot;Al&amp;quot;        &amp;quot;Selina&amp;quot;    &amp;quot;Clora&amp;quot;     &amp;quot;Elvin&amp;quot;    
## [21] &amp;quot;Lafayette&amp;quot; &amp;quot;Lovie&amp;quot;     &amp;quot;Armilda&amp;quot;   &amp;quot;Nola&amp;quot;      &amp;quot;Icy&amp;quot;      
## [26] &amp;quot;Mahalia&amp;quot;   &amp;quot;Gordon&amp;quot;    &amp;quot;Seth&amp;quot;      &amp;quot;Claudia&amp;quot;   &amp;quot;Glada&amp;quot;    
## [31] &amp;quot;Floyd&amp;quot;     &amp;quot;Theodora&amp;quot;  &amp;quot;Vella&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# oops! this didn&amp;#39;t work. why? because were aren&amp;#39;t subsetting with an index derived from the data
unique(babynames$name) [grep(&amp;quot;Bob&amp;quot;, unique(babynames$name))] # here we go&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##  [1] &amp;quot;Bob&amp;quot;        &amp;quot;Bobbie&amp;quot;     &amp;quot;Bobby&amp;quot;      &amp;quot;Bobie&amp;quot;      &amp;quot;Bobbye&amp;quot;    
##  [6] &amp;quot;Bobbe&amp;quot;      &amp;quot;Bobette&amp;quot;    &amp;quot;Bobetta&amp;quot;    &amp;quot;Boby&amp;quot;       &amp;quot;Bobbette&amp;quot;  
## [11] &amp;quot;Bobbi&amp;quot;      &amp;quot;Bobo&amp;quot;       &amp;quot;Bobra&amp;quot;      &amp;quot;Bobi&amp;quot;       &amp;quot;Bobbee&amp;quot;    
## [16] &amp;quot;Bobb&amp;quot;       &amp;quot;Bobbetta&amp;quot;   &amp;quot;Bobbyetta&amp;quot;  &amp;quot;Bobbijo&amp;quot;    &amp;quot;Bobbiejo&amp;quot;  
## [21] &amp;quot;Bobbyjo&amp;quot;    &amp;quot;Bobbiejean&amp;quot; &amp;quot;Bobbilynn&amp;quot;  &amp;quot;Boban&amp;quot;      &amp;quot;Bobijo&amp;quot;    
## [26] &amp;quot;Bobbyjoe&amp;quot;   &amp;quot;Bobak&amp;quot;      &amp;quot;Bobbilee&amp;quot;   &amp;quot;Bobbisue&amp;quot;   &amp;quot;Bobbiesue&amp;quot; 
## [31] &amp;quot;Boback&amp;quot;     &amp;quot;Bobbylee&amp;quot;   &amp;quot;Bobbielee&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;#now we can see all the variations of Bob in the data
Bob&amp;lt;- subset(babynames,babynames$name == &amp;quot;Bob&amp;quot;)
#Let&amp;#39;s see how much the name has been used in the past
plot(Bob$n~Bob$year)#Bob was popular but it isn&amp;#39;t anymore  &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2018-12-24-R_101_files/figure-html/unnamed-chunk-17-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;#what is the line of samples at the bottom?
plot(Bob$n~Bob$year, col= c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)[as.factor(Bob$sex)]) &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2018-12-24-R_101_files/figure-html/unnamed-chunk-17-2.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;#looks like most people named Bob were male, the line of samples at the bottom are females
#lets try another name
Lori&amp;lt;- subset(babynames,babynames$name == &amp;quot;Lori&amp;quot;)
plot(Lori$n~Lori$year, col= c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)[as.factor(Lori$sex)])&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2018-12-24-R_101_files/figure-html/unnamed-chunk-17-3.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;Lori_M&amp;lt;- subset(babynames,name == &amp;quot;Lori&amp;quot; &amp;amp; sex == &amp;quot;M&amp;quot;)#lets look at when exactly some males were named Lori
head(Lori_M)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 6 x 5
##    year sex   name      n       prop
##   &amp;lt;dbl&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;int&amp;gt;      &amp;lt;dbl&amp;gt;
## 1 1954. M     Lori      5 0.00000242
## 2 1955. M     Lori      5 0.00000239
## 3 1956. M     Lori     14 0.00000653
## 4 1957. M     Lori     20 0.00000914
## 5 1958. M     Lori     25 0.0000116 
## 6 1959. M     Lori     29 0.0000134&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# lets see how many samples are present for each year in the data
table(babynames$year)# so there are 2000 samples from 1880 and 1935 samples from 1881&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 
##  1880  1881  1882  1883  1884  1885  1886  1887  1888  1889  1890  1891 
##  2000  1935  2127  2084  2297  2294  2392  2373  2651  2590  2695  2660 
##  1892  1893  1894  1895  1896  1897  1898  1899  1900  1901  1902  1903 
##  2921  2831  2941  3049  3091  3028  3264  3042  3731  3153  3362  3389 
##  1904  1905  1906  1907  1908  1909  1910  1911  1912  1913  1914  1915 
##  3561  3655  3633  3948  4018  4227  4629  4867  6351  6967  7964  9358 
##  1916  1917  1918  1919  1920  1921  1922  1923  1924  1925  1926  1927 
##  9696  9915 10401 10368 10756 10856 10757 10641 10869 10641 10460 10405 
##  1928  1929  1930  1931  1932  1933  1934  1935  1936  1937  1938  1939 
## 10159  9816  9788  9293  9383  9011  9181  9035  8893  8945  9030  8919 
##  1940  1941  1942  1943  1944  1945  1946  1947  1948  1949  1950  1951 
##  8960  9087  9424  9405  9153  9026  9702 10370 10237 10264 10309 10460 
##  1952  1953  1954  1955  1956  1957  1958  1959  1960  1961  1962  1963 
## 10654 10831 10963 11114 11339 11564 11521 11771 11924 12178 12206 12278 
##  1964  1965  1966  1967  1968  1969  1970  1971  1972  1973  1974  1975 
## 12394 11953 12148 12400 12930 13746 14782 15291 15414 15676 16243 16934 
##  1976  1977  1978  1979  1980  1981  1982  1983  1984  1985  1986  1987 
## 17395 18171 18224 19032 19439 19470 19680 19398 19501 20076 20642 21399 
##  1988  1989  1990  1991  1992  1993  1994  1995  1996  1997  1998  1999 
## 22360 23769 24715 25104 25421 25959 25998 26080 26420 26966 27894 28546 
##  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011 
## 29764 30264 30559 31179 32040 32539 34073 34941 35051 34689 34050 33880 
##  2012  2013  2014  2015 
## 33697 33229 33176 32952&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Ok, so more samples were included in the more recent years
hist(babynames$year)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2018-12-24-R_101_files/figure-html/unnamed-chunk-17-4.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Let’s look at some other data…&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;titanic-data&#34; class=&#34;section level6&#34;&gt;
&lt;h6&gt;Titanic Data&lt;/h6&gt;
&lt;p&gt;This package contains real data about the passengers that were aboard the Titanic.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;head(Titanic) # we can see that this may be an unusal data type&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1]  0  0 35  0  0  0&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;class(Titanic) # indeed this appears to be a table&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] &amp;quot;table&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;dim(Titanic)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] 4 2 2 2&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;dimnames(Titanic)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## $Class
## [1] &amp;quot;1st&amp;quot;  &amp;quot;2nd&amp;quot;  &amp;quot;3rd&amp;quot;  &amp;quot;Crew&amp;quot;
## 
## $Sex
## [1] &amp;quot;Male&amp;quot;   &amp;quot;Female&amp;quot;
## 
## $Age
## [1] &amp;quot;Child&amp;quot; &amp;quot;Adult&amp;quot;
## 
## $Survived
## [1] &amp;quot;No&amp;quot;  &amp;quot;Yes&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;str(Titanic) # shows the structure of the data&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##  table [1:4, 1:2, 1:2, 1:2] 0 0 35 0 0 0 17 0 118 154 ...
##  - attr(*, &amp;quot;dimnames&amp;quot;)=List of 4
##   ..$ Class   : chr [1:4] &amp;quot;1st&amp;quot; &amp;quot;2nd&amp;quot; &amp;quot;3rd&amp;quot; &amp;quot;Crew&amp;quot;
##   ..$ Sex     : chr [1:2] &amp;quot;Male&amp;quot; &amp;quot;Female&amp;quot;
##   ..$ Age     : chr [1:2] &amp;quot;Child&amp;quot; &amp;quot;Adult&amp;quot;
##   ..$ Survived: chr [1:2] &amp;quot;No&amp;quot; &amp;quot;Yes&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;help(&amp;quot;titanic_test&amp;quot;)#this shows more information about the data
help(&amp;quot;titanic_train&amp;quot;)# I would assume survived is a 1
#lets see if more males or females survived
boxplot(titanic_train$Survived~titanic_train$Sex)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2018-12-24-R_101_files/figure-html/unnamed-chunk-18-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;table(titanic_train$Survived,titanic_train$Sex)# it looks like males laregly did not survive&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##    
##     female male
##   0     81  468
##   1    233  109&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# this might be a better way to view the data - here width represents the number of samples - thus there are more males overall
mosaicplot(Sex ~ Survived, data=titanic_train) &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2018-12-24-R_101_files/figure-html/unnamed-chunk-18-2.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# It looks like even though there were many more males, the female passangers were much more likely to survive&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;How about some more data…&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;earthquake-data&#34; class=&#34;section level6&#34;&gt;
&lt;h6&gt;Earthquake Data&lt;/h6&gt;
&lt;p&gt;Here we will scrape data (or obtain data from a website) from the USGS website about earthquake rates in different US states. See our previous &lt;a href=&#34;http://research.libd.org/rstatsclub/2018/03/19/introduction-to-scraping-and-wranging-tables-from-research-articles/#.W_MStJNKhR4&#34;&gt;post&lt;/a&gt; from S. Semick on scraping data from research articles for more information on how to do this.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;install.packages(&amp;quot;htmltab&amp;quot;)
install.packages(&amp;quot;reshape2&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(htmltab)
library(reshape2)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning: package &amp;#39;reshape2&amp;#39; was built under R version 3.4.3&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;url&amp;lt;-&amp;quot;https://earthquake.usgs.gov/earthquakes/browse/stats.php&amp;quot;
eq &amp;lt;- htmltab(doc = url, which = 5)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## No encoding supplied: defaulting to UTF-8.&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;rownames(eq)&amp;lt;-eq$States
eq&amp;lt;-eq[-1]
head(eq)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##            2010 2011 2012 2013 2014 2015
## Alabama       1    1    0    0    2    6
## Alaska     2245 1409 1166 1329 1296 1575
## Arizona       6    7    4    3   31   10
## Arkansas     15   44    4    4    1    0
## California  546  195  243  240  191  130
## Colorado      4   23    7    2   13    7&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;eq2 &amp;lt;- as.data.frame(sapply(eq, function(x) as.numeric(as.character(x))))
head(eq2)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##   2010 2011 2012 2013 2014 2015
## 1    1    1    0    0    2    6
## 2 2245 1409 1166 1329 1296 1575
## 3    6    7    4    3   31   10
## 4   15   44    4    4    1    0
## 5  546  195  243  240  191  130
## 6    4   23    7    2   13    7&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;convertoChar&amp;lt;-function(x) as.numeric(as.character(x)) # or you could create a function to use multiple times
factor_to_fix&amp;lt;-as.factor(c(1,2))
class(factor_to_fix)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] &amp;quot;factor&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;class(trying_function&amp;lt;-convertoChar(x=factor_to_fix))# now the class is numeric&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] &amp;quot;numeric&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;class(trying_function2&amp;lt;-convertoChar(factor_to_fix))# now the class is numeric&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] &amp;quot;numeric&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;rownames(eq2)&amp;lt;-rownames(eq)
head(eq2)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##            2010 2011 2012 2013 2014 2015
## Alabama       1    1    0    0    2    6
## Alaska     2245 1409 1166 1329 1296 1575
## Arizona       6    7    4    3   31   10
## Arkansas     15   44    4    4    1    0
## California  546  195  243  240  191  130
## Colorado      4   23    7    2   13    7&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;colSums(eq2)#look at the col sums&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 2010 2011 2012 2013 2014 2015 
## 3026 1955 1603 1899 2628 3225&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;colMeans(eq2)# look at the col means&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##  2010  2011  2012  2013  2014  2015 
## 60.52 39.10 32.06 37.98 52.56 64.50&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;rowMeans(eq2)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##        Alabama         Alaska        Arizona       Arkansas     California 
##      1.6666667   1503.3333333     10.1666667     11.3333333    257.5000000 
##       Colorado    Connecticut       Delaware        Florida        Georgia 
##      9.3333333      0.1666667      0.0000000      0.0000000      0.0000000 
##         Hawaii          Idaho       Illinois        Indiana           Iowa 
##     33.3333333     15.8333333      0.8333333      0.6666667      0.0000000 
##         Kansas       Kentucky      Louisiana          Maine       Maryland 
##     17.3333333      0.3333333      0.1666667      0.5000000      0.1666667 
##  Massachusetts       Michigan      Minnesota    Mississippi       Missouri 
##      0.0000000      0.3333333      0.1666667      0.5000000      2.1666667 
##        Montana       Nebraska         Nevada  New Hampshire     New Jersey 
##     14.8333333      1.0000000     85.5000000      0.1666667      0.0000000 
##     New Mexico       New York North Carolina   North Dakota           Ohio 
##      6.3333333      0.5000000      0.1666667      0.1666667      1.0000000 
##       Oklahoma         Oregon   Pennsylvania   Rhode Island South Carolina 
##    285.6666667      2.8333333      0.0000000      0.0000000      0.5000000 
##   South Dakota      Tennessee          Texas           Utah        Vermont 
##      1.0000000      1.3333333     13.8333333     11.5000000      0.0000000 
##       Virginia     Washington  West Virginia      Wisconsin        Wyoming 
##      2.1666667     10.0000000      0.3333333      0.0000000     84.6666667&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;rowSums(eq2)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##        Alabama         Alaska        Arizona       Arkansas     California 
##             10           9020             61             68           1545 
##       Colorado    Connecticut       Delaware        Florida        Georgia 
##             56              1              0              0              0 
##         Hawaii          Idaho       Illinois        Indiana           Iowa 
##            200             95              5              4              0 
##         Kansas       Kentucky      Louisiana          Maine       Maryland 
##            104              2              1              3              1 
##  Massachusetts       Michigan      Minnesota    Mississippi       Missouri 
##              0              2              1              3             13 
##        Montana       Nebraska         Nevada  New Hampshire     New Jersey 
##             89              6            513              1              0 
##     New Mexico       New York North Carolina   North Dakota           Ohio 
##             38              3              1              1              6 
##       Oklahoma         Oregon   Pennsylvania   Rhode Island South Carolina 
##           1714             17              0              0              3 
##   South Dakota      Tennessee          Texas           Utah        Vermont 
##              6              8             83             69              0 
##       Virginia     Washington  West Virginia      Wisconsin        Wyoming 
##             13             60              2              0            508&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;max(eq2)# maximum value&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] 2245&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;boxplot(eq2)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2018-12-24-R_101_files/figure-html/unnamed-chunk-20-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;boxplot(eq2, ylim = c(0,40))# change the limit of the plot&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2018-12-24-R_101_files/figure-html/unnamed-chunk-20-2.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;boxplot(t(eq2), ylim =c(0,2000))#flip the data using t()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2018-12-24-R_101_files/figure-html/unnamed-chunk-20-3.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;eq3&amp;lt;-melt(eq2)#this puts the data in long form&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## No id variables; using all as measure variables&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;names(eq3)&amp;lt;-c(&amp;quot;year&amp;quot;, &amp;quot;quakes&amp;quot;)
head(eq3)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##   year quakes
## 1 2010      1
## 2 2010   2245
## 3 2010      6
## 4 2010     15
## 5 2010    546
## 6 2010      4&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;fit&amp;lt;-aov(eq3$quakes~eq3$year)
summary(fit)#no significant difference by year&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##              Df   Sum Sq Mean Sq F value Pr(&amp;gt;F)
## eq3$year      5    44161    8832   0.169  0.974
## Residuals   294 15405834   52401&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;in-addition-these-are-functions-that-the-members-of-libd-rstats-use-often&#34; class=&#34;section level6&#34;&gt;
&lt;h6&gt;In addition, these are functions that the members of &lt;a href=&#34;http://research.libd.org/rstatsclub/#.W_MMz5NKhR4&#34;&gt;LIBD Rstats&lt;/a&gt; use often:&lt;/h6&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://www.rdocumentation.org/packages/utils/versions/3.5.1/topics/head&#34;&gt;head()&lt;/a&gt; / &lt;a href=&#34;https://www.rdocumentation.org/packages/utils/versions/3.5.1/topics/head&#34;&gt;tail()&lt;/a&gt; – see the head and the tail - also check out the corner function of the &lt;a href=&#34;https://github.com/LieberInstitute/jaffelab&#34;&gt;jaffelab package&lt;/a&gt; created by LIBD Rstats founding member E. Burke&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/row%2Bcolnames&#34;&gt;colnames()&lt;/a&gt; / &lt;a href=&#34;https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/row%2Bcolnames&#34;&gt;rownames()&lt;/a&gt; – see and rename columns or row names&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.rdocumentation.org/packages/gmatrix/versions/0.3/topics/colMeans&#34;&gt;colMeans()&lt;/a&gt; / &lt;a href=&#34;https://www.rdocumentation.org/packages/fame/versions/1.03/topics/rowMeans&#34;&gt;rowMeans()&lt;/a&gt; / &lt;a href=&#34;https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/colSums&#34;&gt;colSums()&lt;/a&gt; / &lt;a href=&#34;https://www.rdocumentation.org/packages/raster/versions/2.7-15/topics/rowSums&#34;&gt;rowSums()&lt;/a&gt; – get means and sums of columns and rows&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/dim&#34;&gt;dim()&lt;/a&gt; and &lt;a href=&#34;https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/length&#34;&gt;length()&lt;/a&gt; – determine the dimensions/size of a data set – need to use length() when evaluating a vector&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.rdocumentation.org/packages/hyperSpec/versions/0.98-20140523/topics/ncol&#34;&gt;ncol()&lt;/a&gt; / &lt;a href=&#34;https://www.rdocumentation.org/packages/hyperSpec/versions/0.98-20140523/topics/ncol&#34;&gt;nrow()&lt;/a&gt; – number of columns and rows&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.rdocumentation.org/packages/utils/versions/3.5.1/topics/str&#34;&gt;str()&lt;/a&gt; – displays the structure of an object - this is very useful with complex data structures&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/unique&#34;&gt;unique()&lt;/a&gt;/&lt;a href=&#34;https://www.rdocumentation.org/packages/data.table/versions/1.11.8/topics/duplicated&#34;&gt;duplicated()&lt;/a&gt; – find unique and duplicated values&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/order&#34;&gt;order()&lt;/a&gt;/&lt;a href=&#34;https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/sort&#34;&gt;sort()&lt;/a&gt;– order and sort your data&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/grep&#34;&gt;gsub()&lt;/a&gt; – replace values&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/table&#34;&gt;table()&lt;/a&gt; – summarize your data in table format&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.rdocumentation.org/packages/stats/versions/3.5.1/topics/t.test&#34;&gt;t.test()&lt;/a&gt; – perform a t test&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.rdocumentation.org/packages/stats/versions/3.5.1/topics/cor.test&#34;&gt;cor.test()&lt;/a&gt; – perform a correlation test&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.rdocumentation.org/packages/stats/versions/3.5.1/topics/lm&#34;&gt;lm()&lt;/a&gt; – make a linear model&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/summary&#34;&gt;summary()&lt;/a&gt; – if you use the lm() output – this will give you the results&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/Random&#34;&gt;set.seed()&lt;/a&gt; – allows for random permutations or random data to be the same every time your run your code&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;for-additional-help-take-a-look-at-these-links&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;For additional help take a look at these links:&lt;/h3&gt;
&lt;div id=&#34;free-courses-and-tutorials&#34; class=&#34;section level6&#34;&gt;
&lt;h6&gt;Free courses and tutorials&lt;/h6&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://www.coursera.org/courses?query=r&#34; class=&#34;uri&#34;&gt;https://www.coursera.org/courses?query=r&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.pluralsight.com/courses/r-programming-fundamentals&#34; class=&#34;uri&#34;&gt;https://www.pluralsight.com/courses/r-programming-fundamentals&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.rstudio.com/online-learning/&#34; class=&#34;uri&#34;&gt;https://www.rstudio.com/online-learning/&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div id=&#34;community-support&#34; class=&#34;section level6&#34;&gt;
&lt;h6&gt;Community support&lt;/h6&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://community.rstudio.com/&#34; class=&#34;uri&#34;&gt;https://community.rstudio.com/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://support.bioconductor.org/&#34; class=&#34;uri&#34;&gt;https://support.bioconductor.org/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://resources.rstudio.com/webinars/help-me-help-you-creating-reproducible-examples-jenny-bryan&#34; class=&#34;uri&#34;&gt;https://resources.rstudio.com/webinars/help-me-help-you-creating-reproducible-examples-jenny-bryan&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;link helps you make examples of code errors that you need help with&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div id=&#34;tips-for-help&#34; class=&#34;section level6&#34;&gt;
&lt;h6&gt;Tips for help&lt;/h6&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://www.r-project.org/help.html&#34; class=&#34;uri&#34;&gt;https://www.r-project.org/help.html&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Google! – stackoverflow, biostars&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.rstudio.com/resources/cheatsheets/&#34; class=&#34;uri&#34;&gt;https://www.rstudio.com/resources/cheatsheets/&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div id=&#34;also-follow-our-blog-for-more-helpful-posts.&#34; class=&#34;section level6&#34;&gt;
&lt;h6&gt;Also follow our blog for more helpful posts.&lt;/h6&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;thanks-for-reading-and-have-fun-getting-to-know-r&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Thanks for reading and have fun getting to know R!&lt;/h2&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2018-12-24-R_101_files/startnew.jpg&#34; width=&#34;500&#34; /&gt;

&lt;/div&gt;
&lt;p&gt;This image came from: &lt;a href=&#34;https://www.pinterest.com/pin/89790586304535333/&#34; class=&#34;uri&#34;&gt;https://www.pinterest.com/pin/89790586304535333/&lt;/a&gt;&lt;/p&gt;
&lt;div id=&#34;acknowledgments&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Acknowledgments&lt;/h3&gt;
&lt;p&gt;This blog post was made possible thanks to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;&lt;a href=&#34;http://bioconductor.org/packages/BiocStyle&#34;&gt;BiocStyle&lt;/a&gt;&lt;/em&gt; &lt;a id=&#39;cite-Oles_2017&#39;&gt;&lt;/a&gt;(&lt;a href=&#39;https://github.com/Bioconductor/BiocStyle&#39;&gt;Oleś, Morgan, and Huber, 2017&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=blogdown&#34;&gt;blogdown&lt;/a&gt;&lt;/em&gt; &lt;a id=&#39;cite-Xie_2017&#39;&gt;&lt;/a&gt;(&lt;a href=&#39;https://github.com/rstudio/blogdown&#39;&gt;Xie, Hill, and Thomas, 2017&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=knitcitations&#34;&gt;knitcitations&lt;/a&gt;&lt;/em&gt; &lt;a id=&#39;cite-Boettiger_2017&#39;&gt;&lt;/a&gt;(&lt;a href=&#39;https://CRAN.R-project.org/package=knitcitations&#39;&gt;Boettiger, 2017&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=sessioninfo&#34;&gt;sessioninfo&lt;/a&gt;&lt;/em&gt; &lt;a id=&#39;cite-Csardi_2018&#39;&gt;&lt;/a&gt;(&lt;a href=&#39;https://CRAN.R-project.org/package=sessioninfo&#39;&gt;Csárdi, core, Wickham, Chang, et al., 2018&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div id=&#34;references&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;References&lt;/h3&gt;
&lt;p&gt;
&lt;a id=&#39;bib-Boettiger_2017&#39;&gt;&lt;/a&gt;&lt;a href=&#34;#cite-Boettiger_2017&#34;&gt;[1]&lt;/a&gt;&lt;cite&gt; C. Boettiger. &lt;em&gt;knitcitations: Citations for ‘Knitr’ Markdown Files&lt;/em&gt;. R package version 1.0.8. 2017. URL: &lt;a href=&#34;https://CRAN.R-project.org/package=knitcitations&#34;&gt;https://CRAN.R-project.org/package=knitcitations&lt;/a&gt;.&lt;/cite&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;a id=&#39;bib-Csardi_2018&#39;&gt;&lt;/a&gt;&lt;a href=&#34;#cite-Csardi_2018&#34;&gt;[2]&lt;/a&gt;&lt;cite&gt; G. Csárdi, R. core, H. Wickham, W. Chang, et al. &lt;em&gt;sessioninfo: R Session Information&lt;/em&gt;. R package version 1.1.1. 2018. URL: &lt;a href=&#34;https://CRAN.R-project.org/package=sessioninfo&#34;&gt;https://CRAN.R-project.org/package=sessioninfo&lt;/a&gt;.&lt;/cite&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;a id=&#39;bib-Oles_2017&#39;&gt;&lt;/a&gt;&lt;a href=&#34;#cite-Oles_2017&#34;&gt;[3]&lt;/a&gt;&lt;cite&gt; A. Oleś, M. Morgan and W. Huber. &lt;em&gt;BiocStyle: Standard styles for vignettes and other Bioconductor documents&lt;/em&gt;. R package version 2.6.1. 2017. URL: &lt;a href=&#34;https://github.com/Bioconductor/BiocStyle&#34;&gt;https://github.com/Bioconductor/BiocStyle&lt;/a&gt;.&lt;/cite&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;a id=&#39;bib-Xie_2017&#39;&gt;&lt;/a&gt;&lt;a href=&#34;#cite-Xie_2017&#34;&gt;[4]&lt;/a&gt;&lt;cite&gt; Y. Xie, A. P. Hill and A. Thomas. &lt;em&gt;blogdown: Creating Websites with R Markdown&lt;/em&gt;. ISBN 978-0815363729. Boca Raton, Florida: Chapman and Hall/CRC, 2017. URL: &lt;a href=&#34;https://github.com/rstudio/blogdown&#34;&gt;https://github.com/rstudio/blogdown&lt;/a&gt;.&lt;/cite&gt;
&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;reproducibility&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Reproducibility&lt;/h3&gt;
&lt;pre&gt;&lt;code&gt;## ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
##  setting  value                       
##  version  R version 3.4.0 (2017-04-21)
##  os       macOS Sierra 10.12.6        
##  system   x86_64, darwin15.6.0        
##  ui       X11                         
##  language (EN)                        
##  collate  en_US.UTF-8                 
##  ctype    en_US.UTF-8                 
##  tz       America/New_York            
##  date     2018-11-19                  
## 
## ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
##  package       * version   date       lib source                           
##  assertthat      0.2.0     2017-04-11 [1] CRAN (R 3.4.0)                   
##  babynames     * 0.3.0     2017-04-14 [1] CRAN (R 3.4.0)                   
##  backports       1.1.2     2018-04-18 [1] Github (r-lib/backports@cee9348) 
##  bibtex          0.4.2     2017-06-30 [1] CRAN (R 3.4.1)                   
##  bindr           0.1       2016-11-13 [1] CRAN (R 3.4.0)                   
##  bindrcpp        0.2       2017-06-17 [1] CRAN (R 3.4.0)                   
##  BiocStyle     * 2.6.1     2017-11-30 [1] Bioconductor                     
##  blogdown        0.5.9     2018-03-08 [1] Github (rstudio/blogdown@dc1f41c)
##  bookdown        0.7       2018-02-18 [1] CRAN (R 3.4.3)                   
##  cli             1.0.0     2017-11-05 [1] CRAN (R 3.4.2)                   
##  crayon          1.3.4     2017-09-16 [1] CRAN (R 3.4.1)                   
##  curl            3.2       2018-03-28 [1] CRAN (R 3.4.4)                   
##  digest          0.6.15    2018-01-28 [1] CRAN (R 3.4.3)                   
##  dplyr         * 0.7.4     2017-09-28 [1] CRAN (R 3.4.2)                   
##  evaluate        0.11      2018-07-17 [1] CRAN (R 3.4.4)                   
##  glue            1.3.0     2018-07-17 [1] CRAN (R 3.4.4)                   
##  htmltab       * 0.7.1     2016-12-29 [1] CRAN (R 3.4.0)                   
##  htmltools       0.3.6     2017-04-28 [1] CRAN (R 3.4.0)                   
##  httr            1.3.1     2017-08-20 [1] CRAN (R 3.4.1)                   
##  jsonlite        1.5       2017-06-01 [1] CRAN (R 3.4.0)                   
##  knitcitations * 1.0.8     2017-07-04 [1] CRAN (R 3.4.1)                   
##  knitr           1.20      2018-02-20 [1] CRAN (R 3.4.3)                   
##  lubridate       1.7.4     2018-04-11 [1] CRAN (R 3.4.4)                   
##  magrittr        1.5       2014-11-22 [1] CRAN (R 3.4.0)                   
##  pillar          1.2.1     2018-02-27 [1] CRAN (R 3.4.3)                   
##  pkgconfig       2.0.1     2017-03-21 [1] CRAN (R 3.4.0)                   
##  plyr            1.8.4     2016-06-08 [1] CRAN (R 3.4.0)                   
##  R6              2.2.2     2017-06-17 [1] CRAN (R 3.4.0)                   
##  Rcpp            0.12.16   2018-03-13 [1] CRAN (R 3.4.4)                   
##  RefManageR      1.2.0     2018-04-25 [1] CRAN (R 3.4.4)                   
##  reshape2      * 1.4.3     2017-12-11 [1] CRAN (R 3.4.3)                   
##  rlang           0.2.0     2018-02-20 [1] CRAN (R 3.4.3)                   
##  rmarkdown       1.10      2018-06-11 [1] CRAN (R 3.4.4)                   
##  rprojroot       1.3-2     2018-01-03 [1] CRAN (R 3.4.3)                   
##  sessioninfo   * 1.1.1     2018-11-05 [1] CRAN (R 3.4.4)                   
##  stringi         1.2.4     2018-07-20 [1] CRAN (R 3.4.4)                   
##  stringr         1.3.1     2018-05-10 [1] CRAN (R 3.4.4)                   
##  tibble          1.4.2     2018-01-22 [1] CRAN (R 3.4.3)                   
##  titanic       * 0.1.0     2015-08-31 [1] CRAN (R 3.4.0)                   
##  utf8            1.1.3     2018-01-03 [1] CRAN (R 3.4.3)                   
##  withr           2.1.2     2018-03-15 [1] CRAN (R 3.4.4)                   
##  xfun            0.3       2018-07-06 [1] CRAN (R 3.4.4)                   
##  XML             3.98-1.10 2018-02-19 [1] CRAN (R 3.4.3)                   
##  xml2            1.2.0     2018-01-24 [1] CRAN (R 3.4.3)                   
##  yaml            2.2.0     2018-07-25 [1] CRAN (R 3.4.4)                   
## 
## [1] /Library/Frameworks/R.framework/Versions/3.4/Resources/library&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Quality Surrogate Variable Analysis </title>
      <link>http://LieberInstitute.github.io/rstatsclub/2018/12/11/quality-surrogate-variable-analysis/</link>
      <pubDate>Tue, 11 Dec 2018 00:00:00 +0000</pubDate>
      <guid>http://LieberInstitute.github.io/rstatsclub/2018/12/11/quality-surrogate-variable-analysis/</guid>
      <description>


&lt;p&gt;By &lt;a href=&#34;https://amy-peterson.github.io/&#34;&gt;Amy Peterson&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Studying genetic differential expression using postmortem human brain tissue requires an understanding of the effect brain tissue degradation has on genetic expression. Particularly when brain tissue degradation confounds&lt;a href=&#34;#fn1&#34; class=&#34;footnote-ref&#34; id=&#34;fnref1&#34;&gt;&lt;sup&gt;1&lt;/sup&gt;&lt;/a&gt; the differences in gene expression levels between subject groups. This problem of confounding necessitates measures from a control dataset of postmortem tissue from individuals who do not have the outcome of interest. Doing so provides a comparative measure of the impact of tissue degradation on expression that can then be used in a case-control study to examine the impact of the outcome of interest on genetic expression. Incorporating the determinations of tissue degradation in control brains in an algorithm to assess the results of genetic differential expression in brains that have the outcome of interest leads to more accurate results and reduces the number of false positive genes that are incorrectly identified as differentially expressed between cases and controls.&lt;/p&gt;
&lt;div id=&#34;sva-background&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;SVA background&lt;/h3&gt;
&lt;p&gt;RNA-sequencing (RNA-seq) is a high-throughput method for quantifying gene expression levels that requires using high-quality RNA. The effect of RNA quality on detecting genetic differential expression accurately was previously addressed with surrogate variable analysis (SVA), which includes batch effects to address the issue of heterogeneity in expression studies &lt;a id=&#39;cite-Leek_2007&#39;&gt;&lt;/a&gt;(&lt;a href=&#39;https://doi.org/10.1371/journal.pgen.0030161&#39;&gt;Leek and Storey, 2007&lt;/a&gt;). The problem of confounding requires a more robust approach to identifying genes that are differentially expressed.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;qsva&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;qSVA&lt;/h3&gt;
&lt;p&gt;The quality surrogate variable analysis (qSVA) algorithmic framework, an extended version of SVA, was developed by Andrew Jaffe and colleagues &lt;a id=&#39;cite-Jaffe_2017&#39;&gt;&lt;/a&gt;(&lt;a href=&#39;https://doi.org/10.1073/pnas.1617384114&#39;&gt;Jaffe, Tao, Norris, Kealhofer, et al., 2017&lt;/a&gt;) to provide a method for solving the issue of confounding by brain degradation. The qSVA framework reduces the number of false positive genes, since genes may be identified because RNA quality confounding is not controlled for adequately. This conservative approach uses stricter criteria and involves processing methods that are well established, applying expression cutoffs, avoiding potential batch effects, and adjusting for RNA quality degradation confounding using qSVA.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;datasets&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Datasets&lt;/h3&gt;
&lt;p&gt;The qSVA algorithm requires the use of two datasets. Here, the dataset of interest is part of BrainSeq, A Human Brain Genomics Consortium, which was initiated with the goal of generating a public database of gene expression in postmortem brain tissue to enhance the understanding of psychiatric disorders through neurogenomic data &lt;a id=&#39;cite-bsc2015&#39;&gt;&lt;/a&gt;(&lt;a href=&#39;https://doi.org/10.1016/j.neuron.2015.10.047&#39;&gt;Schubert, O’Donnell, Quan, Wendland, et al., 2015&lt;/a&gt;). The other dataset is a control dataset, which can also be referred to as the degradation dataset, since it is the measure of the impact of degradation on gene expression in postmortem tissue for individuals who do not have the outcome of interest. The degradation dataset is a much smaller dataset and helps determine the genomic regions most associated with brain degradation. This addresses the concern of an association between the outcome of interest and genetic expression, and helps better understand metrics that demonstrate RNA quality through experimental approaches. Using these two datasets, and by extending qSVA to more than one brain region, we are able to examine the issue of RNA quality confounding using RNA-seq data from multiple brain regions in a case-control study comparing degradation of tissue in patients with schizophrenia to non-psychiatric controls using BrainSeq consortium data &lt;a id=&#39;cite-colladotorres2018&#39;&gt;&lt;/a&gt;(&lt;a href=&#39;https://doi.org/10.1101/426213&#39;&gt;Collado-Torres, Burke, Peterson, Shin, et al., 2018&lt;/a&gt;). We focused on the hippocampus (HIPPO) and dorsolateral prefrontal cortex (DLPFC), two brain regions that have been identified as functionally-altered in schizophrenia &lt;a id=&#39;cite-Rasetti_2014&#39;&gt;&lt;/a&gt;(&lt;a href=&#39;https://doi.org/10.1001/jamapsychiatry.2013.3911&#39;&gt;Rasetti, Mattay, White, Sambataro, et al., 2014&lt;/a&gt;).&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;algorithm-and-workflow&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Algorithm and Workflow&lt;/h3&gt;
&lt;p&gt;The algorithm has terms that account for measured covariates, including diagnosis, age, sex, mitochondrial rate, rRNA rate, gene assignment rate, RNA integrity number (RIN), ethnicity principal components (PCs)&lt;a href=&#34;#fn2&#34; class=&#34;footnote-ref&#34; id=&#34;fnref2&#34;&gt;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt;, and the region-specific quality surrogate variables, or qSVs, identified using the degradation dataset.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2018-12-11-quality-surrogate-variable-analysis_files/Picture1.png&#34; width=&#34;600&#34; /&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;results&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Results&lt;/h3&gt;
&lt;p&gt;After using qSVA to adjust for the confounding effect of RNA, differential expression quality (DEqual) plots are used to assess the effectiveness of the statistical correction. These plots compare the differential expression statistics for the degradation data experiments on the y axis to statistics for the outcome from the dataset of interest on the x axis. The plots are shown for the HIPPO samples, looking at the log-fold change in expression per minute, with each point representing a gene. The goal is to assess the correlation between these two datasets, and how the correlation changes after including the quality surrogate variables (qSVs) in the model. There should not be correlation between the degradation dataset and the schizophrenia disorder case-control BrainSeq dataset, labeled as Dx on the axis for diagnosis, since they are independent datasets and the degradation dataset is serving as a control. Model 1 is a naïve model that includes diagnosis only. Model 2 includes diagnosis and measures for RNA-quality and demographic covariates. Model 3 includes all of the terms from the previous models, with the added qSVs. The number of genes identified as differentially expressed are shown in parentheses next to each model, and the number of genes identified as differentially expressed reduces drastically from over 6,000 in model 1, to 63 in model 2, to 48 in model 3.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2018-12-11-quality-surrogate-variable-analysis_files/Picture2.png&#34; width=&#34;600&#34; /&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2018-12-11-quality-surrogate-variable-analysis_files/Picture3.png&#34; width=&#34;600&#34; /&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;conclusions&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Conclusions&lt;/h3&gt;
&lt;p&gt;Once we are confident that confounding has been removed from the samples of interest, we are able to assess differential expression between cases and controls. Using the 48 genes identified from model 3, we can then perform gene biological process ontology enrichment analysis to determine which genes show enrichment, and to gain clearer insights into which genes are most affected in brain tissue of individuals with schizophrenia. For more information please check the freely available pre-print describing the BrainSeq Phase II project (&lt;a href=&#39;https://doi.org/10.1101/426213&#39;&gt;Collado-Torres, Burke, Peterson, Shin, et al., 2018&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;a href=&#34;https://amy-peterson.github.io/&#34;&gt;Amy Peterson&lt;/a&gt; extended the qSVA statistical framework to from one to multiple brain regions as part of her &lt;a href=&#34;https://www.jhsph.edu/&#34;&gt;JHBSPH&lt;/a&gt; &lt;a href=&#34;https://www.jhsph.edu/academics/degree-programs/master-of-public-health/curriculum/capstone.html&#34;&gt;MPH Capstone&lt;/a&gt; project that she carried out with &lt;a href=&#34;http://aejaffe.com/&#34;&gt;Andrew E. Jaffe&lt;/a&gt; and &lt;a href=&#34;http://lcolladotor.github.io/&#34;&gt;Leonardo Collado-Torres&lt;/a&gt; at the &lt;a href=&#34;https://www.libd.org/&#34;&gt;Lieber Institute for Brain Development&lt;/a&gt;. The &lt;code&gt;R&lt;/code&gt; and &lt;code&gt;bash&lt;/code&gt; code Amy Peterson wrote is available online via GitHub at &lt;a href=&#34;https://github.com/LieberInstitute/qsva_brain&#34;&gt;LieberInstitute/qsva_brain&lt;/a&gt;. &lt;/em&gt;&lt;/p&gt;
&lt;blockquote class=&#34;twitter-tweet&#34;&gt;&lt;p lang=&#34;en&#34; dir=&#34;ltr&#34;&gt;A few days late, but here&amp;#39;s Amy &lt;a href=&#34;https://t.co/A0OSUVD9Nw&#34;&gt;https://t.co/A0OSUVD9Nw&lt;/a&gt; after successfully presenting her &lt;a href=&#34;https://twitter.com/JohnsHopkinsSPH?ref_src=twsrc%5Etfw&#34;&gt;@JohnsHopkinsSPH&lt;/a&gt; MPH capstone project. It was a great experience for me to mentor her along with &lt;a href=&#34;https://twitter.com/AndrewJaffe?ref_src=twsrc%5Etfw&#34;&gt;@andrewjaffe&lt;/a&gt; at &lt;a href=&#34;https://twitter.com/LieberInstitute?ref_src=twsrc%5Etfw&#34;&gt;@lieberinstitute&lt;/a&gt; I look forward to seeing where her career takes her 🙌🏾 ^^ &lt;a href=&#34;https://t.co/hbUiHQOVq3&#34;&gt;pic.twitter.com/hbUiHQOVq3&lt;/a&gt;&lt;/p&gt;&amp;mdash; 🇲🇽 Leonardo Collado-Torres (@lcolladotor) &lt;a href=&#34;https://twitter.com/lcolladotor/status/993683427131092993?ref_src=twsrc%5Etfw&#34;&gt;May 8, 2018&lt;/a&gt;&lt;/blockquote&gt;
&lt;script async src=&#34;https://platform.twitter.com/widgets.js&#34; charset=&#34;utf-8&#34;&gt;&lt;/script&gt;


&lt;/div&gt;
&lt;div id=&#34;acknowledgments&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Acknowledgments&lt;/h3&gt;
&lt;p&gt;This blog post was made possible thanks to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;&lt;a href=&#34;https://bioconductor.org/packages/3.8/BiocStyle&#34;&gt;BiocStyle&lt;/a&gt;&lt;/em&gt; &lt;a id=&#39;cite-Oles_2018&#39;&gt;&lt;/a&gt;(&lt;a href=&#39;https://github.com/Bioconductor/BiocStyle&#39;&gt;Oleś, Morgan, and Huber, 2018&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=blogdown&#34;&gt;blogdown&lt;/a&gt;&lt;/em&gt; &lt;a id=&#39;cite-Xie_2017&#39;&gt;&lt;/a&gt;(&lt;a href=&#39;https://github.com/rstudio/blogdown&#39;&gt;Xie, Hill, and Thomas, 2017&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=knitcitations&#34;&gt;knitcitations&lt;/a&gt;&lt;/em&gt; &lt;a id=&#39;cite-Boettiger_2017&#39;&gt;&lt;/a&gt;(&lt;a href=&#39;https://CRAN.R-project.org/package=knitcitations&#39;&gt;Boettiger, 2017&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=sessioninfo&#34;&gt;sessioninfo&lt;/a&gt;&lt;/em&gt; &lt;a id=&#39;cite-Csardi_2018&#39;&gt;&lt;/a&gt;(&lt;a href=&#39;https://CRAN.R-project.org/package=sessioninfo&#39;&gt;Csárdi, core, Wickham, Chang, et al., 2018&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div id=&#34;references&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;References&lt;/h3&gt;
&lt;p&gt;
&lt;a id=&#39;bib-Boettiger_2017&#39;&gt;&lt;/a&gt;&lt;a href=&#34;#cite-Boettiger_2017&#34;&gt;[1]&lt;/a&gt;&lt;cite&gt;
C. Boettiger.
&lt;em&gt;knitcitations: Citations for ‘Knitr’ Markdown Files&lt;/em&gt;.
R package version 1.0.8.
2017.
URL: &lt;a href=&#34;https://CRAN.R-project.org/package=knitcitations&#34;&gt;https://CRAN.R-project.org/package=knitcitations&lt;/a&gt;.&lt;/cite&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;a id=&#39;bib-colladotorres2018&#39;&gt;&lt;/a&gt;&lt;a href=&#34;#cite-colladotorres2018&#34;&gt;[2]&lt;/a&gt;&lt;cite&gt;
L. Collado-Torres, E. E. Burke, A. Peterson, J. H. Shin, et al.
“Regional heterogeneity in gene expression, regulation and coherence in hippocampus and dorsolateral prefrontal cortex across development and in schizophrenia”.
In: &lt;em&gt;bioRxiv&lt;/em&gt; (2018).
DOI: &lt;a href=&#34;https://doi.org/10.1101/426213&#34;&gt;10.1101/426213&lt;/a&gt;.&lt;/cite&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;a id=&#39;bib-Csardi_2018&#39;&gt;&lt;/a&gt;&lt;a href=&#34;#cite-Csardi_2018&#34;&gt;[3]&lt;/a&gt;&lt;cite&gt;
G. Csárdi, R. core, H. Wickham, W. Chang, et al.
&lt;em&gt;sessioninfo: R Session Information&lt;/em&gt;.
R package version 1.1.1.
2018.
URL: &lt;a href=&#34;https://CRAN.R-project.org/package=sessioninfo&#34;&gt;https://CRAN.R-project.org/package=sessioninfo&lt;/a&gt;.&lt;/cite&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;a id=&#39;bib-Jaffe_2017&#39;&gt;&lt;/a&gt;&lt;a href=&#34;#cite-Jaffe_2017&#34;&gt;[4]&lt;/a&gt;&lt;cite&gt;
A. E. Jaffe, R. Tao, A. L. Norris, M. Kealhofer, et al.
“qSVA framework for RNA quality correction in differential expression analysis”.
In: &lt;em&gt;Proceedings of the National Academy of Sciences&lt;/em&gt; 114.27 (Jun. 2017), pp. 7130–7135.
DOI: &lt;a href=&#34;https://doi.org/10.1073/pnas.1617384114&#34;&gt;10.1073/pnas.1617384114&lt;/a&gt;.
URL: &lt;a href=&#34;https://doi.org/10.1073/pnas.1617384114&#34;&gt;https://doi.org/10.1073/pnas.1617384114&lt;/a&gt;.&lt;/cite&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;a id=&#39;bib-Leek_2007&#39;&gt;&lt;/a&gt;&lt;a href=&#34;#cite-Leek_2007&#34;&gt;[5]&lt;/a&gt;&lt;cite&gt;
J. T. Leek and J. D. Storey.
“Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis”.
In: &lt;em&gt;PLoS Genetics&lt;/em&gt; 3.9 (2007), p. e161.
DOI: &lt;a href=&#34;https://doi.org/10.1371/journal.pgen.0030161&#34;&gt;10.1371/journal.pgen.0030161&lt;/a&gt;.
URL: &lt;a href=&#34;https://doi.org/10.1371/journal.pgen.0030161&#34;&gt;https://doi.org/10.1371/journal.pgen.0030161&lt;/a&gt;.&lt;/cite&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;a id=&#39;bib-Oles_2018&#39;&gt;&lt;/a&gt;&lt;a href=&#34;#cite-Oles_2018&#34;&gt;[6]&lt;/a&gt;&lt;cite&gt;
A. Oleś, M. Morgan and W. Huber.
&lt;em&gt;BiocStyle: Standard styles for vignettes and other Bioconductor documents&lt;/em&gt;.
R package version 2.10.0.
2018.
URL: &lt;a href=&#34;https://github.com/Bioconductor/BiocStyle&#34;&gt;https://github.com/Bioconductor/BiocStyle&lt;/a&gt;.&lt;/cite&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;a id=&#39;bib-Rasetti_2014&#39;&gt;&lt;/a&gt;&lt;a href=&#34;#cite-Rasetti_2014&#34;&gt;[7]&lt;/a&gt;&lt;cite&gt;
R. Rasetti, V. S. Mattay, M. G. White, F. Sambataro, et al.
“Altered Hippocampal-Parahippocampal Function During Stimulus Encoding”.
In: &lt;em&gt;JAMA Psychiatry&lt;/em&gt; 71.3 (Mar. 2014), p. 236.
DOI: &lt;a href=&#34;https://doi.org/10.1001/jamapsychiatry.2013.3911&#34;&gt;10.1001/jamapsychiatry.2013.3911&lt;/a&gt;.
URL: &lt;a href=&#34;https://doi.org/10.1001/jamapsychiatry.2013.3911&#34;&gt;https://doi.org/10.1001/jamapsychiatry.2013.3911&lt;/a&gt;.&lt;/cite&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;a id=&#39;bib-bsc2015&#39;&gt;&lt;/a&gt;&lt;a href=&#34;#cite-bsc2015&#34;&gt;[8]&lt;/a&gt;&lt;cite&gt;
Schubert, O’Donnell, Quan, Wendland, et al.
“BrainSeq: Neurogenomics to Drive Novel Target Discovery for Neuropsychiatric Disorders”.
In: &lt;em&gt;Neuron&lt;/em&gt; (2015).
DOI: &lt;a href=&#34;https://doi.org/10.1016/j.neuron.2015.10.047&#34;&gt;10.1016/j.neuron.2015.10.047&lt;/a&gt;.&lt;/cite&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;a id=&#39;bib-Xie_2017&#39;&gt;&lt;/a&gt;&lt;a href=&#34;#cite-Xie_2017&#34;&gt;[9]&lt;/a&gt;&lt;cite&gt;
Y. Xie, A. P. Hill and A. Thomas.
&lt;em&gt;blogdown: Creating Websites with R Markdown&lt;/em&gt;.
ISBN 978-0815363729.
Boca Raton, Florida: Chapman and Hall/CRC, 2017.
URL: &lt;a href=&#34;https://github.com/rstudio/blogdown&#34;&gt;https://github.com/rstudio/blogdown&lt;/a&gt;.&lt;/cite&gt;
&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;reproducibility&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Reproducibility&lt;/h3&gt;
&lt;pre&gt;&lt;code&gt;## ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
##  setting  value                       
##  version  R version 3.5.1 (2018-07-02)
##  os       macOS Mojave 10.14.1        
##  system   x86_64, darwin15.6.0        
##  ui       X11                         
##  language (EN)                        
##  collate  en_US.UTF-8                 
##  ctype    en_US.UTF-8                 
##  tz       America/New_York            
##  date     2018-12-11                  
## 
## ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
##  package       * version date       lib source                            
##  assertthat      0.2.0   2017-04-11 [1] CRAN (R 3.5.0)                    
##  backports       1.1.2   2017-12-13 [1] CRAN (R 3.5.0)                    
##  bibtex          0.4.2   2017-06-30 [1] CRAN (R 3.5.0)                    
##  BiocManager     1.30.4  2018-11-13 [1] CRAN (R 3.5.0)                    
##  BiocStyle     * 2.10.0  2018-10-30 [1] Bioconductor                      
##  blogdown        0.9     2018-10-23 [1] CRAN (R 3.5.0)                    
##  bookdown        0.8     2018-12-03 [1] CRAN (R 3.5.0)                    
##  cli             1.0.1   2018-09-25 [1] CRAN (R 3.5.0)                    
##  colorout      * 1.2-0   2018-05-03 [1] Github (jalvesaq/colorout@c42088d)
##  crayon          1.3.4   2017-09-16 [1] CRAN (R 3.5.0)                    
##  curl            3.2     2018-03-28 [1] CRAN (R 3.5.0)                    
##  digest          0.6.18  2018-10-10 [1] CRAN (R 3.5.0)                    
##  evaluate        0.12    2018-10-09 [1] CRAN (R 3.5.0)                    
##  htmltools       0.3.6   2017-04-28 [1] CRAN (R 3.5.0)                    
##  httr            1.3.1   2017-08-20 [1] CRAN (R 3.5.0)                    
##  jsonlite        1.5     2017-06-01 [1] CRAN (R 3.5.0)                    
##  knitcitations * 1.0.8   2017-07-04 [1] CRAN (R 3.5.0)                    
##  knitr           1.20    2018-02-20 [1] CRAN (R 3.5.0)                    
##  lubridate       1.7.4   2018-04-11 [1] CRAN (R 3.5.0)                    
##  magrittr        1.5     2014-11-22 [1] CRAN (R 3.5.0)                    
##  plyr            1.8.4   2016-06-08 [1] CRAN (R 3.5.0)                    
##  R6              2.3.0   2018-10-04 [1] CRAN (R 3.5.0)                    
##  Rcpp            1.0.0   2018-11-07 [1] CRAN (R 3.5.0)                    
##  RefManageR      1.2.0   2018-04-25 [1] CRAN (R 3.5.0)                    
##  rmarkdown       1.10    2018-06-11 [1] CRAN (R 3.5.0)                    
##  rprojroot       1.3-2   2018-01-03 [1] CRAN (R 3.5.0)                    
##  sessioninfo   * 1.1.1   2018-11-05 [1] CRAN (R 3.5.0)                    
##  stringi         1.2.4   2018-07-20 [1] CRAN (R 3.5.0)                    
##  stringr         1.3.1   2018-05-10 [1] CRAN (R 3.5.0)                    
##  withr           2.1.2   2018-03-15 [1] CRAN (R 3.5.0)                    
##  xfun            0.4     2018-10-23 [1] CRAN (R 3.5.0)                    
##  xml2            1.2.0   2018-01-24 [1] CRAN (R 3.5.0)                    
##  yaml            2.2.0   2018-07-25 [1] CRAN (R 3.5.0)                    
## 
## [1] /Library/Frameworks/R.framework/Versions/3.5devel/Resources/library&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div class=&#34;footnotes&#34;&gt;
&lt;hr /&gt;
&lt;ol&gt;
&lt;li id=&#34;fn1&#34;&gt;&lt;p&gt;As defined in &lt;a href=&#34;https://en.wikipedia.org/wiki/Confounding&#34;&gt;Wikipedia&lt;/a&gt;, confounding is: “In statistics, a confounder (also confounding variable, confounding factor or lurking variable) is a variable that influences both the dependent variable and independent variable causing a spurious association. Confounding is a causal concept, and as such, cannot be described in terms of correlations or associations.”&lt;a href=&#34;#fnref1&#34; class=&#34;footnote-back&#34;&gt;↩&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li id=&#34;fn2&#34;&gt;&lt;p&gt;These are PCs computed on the genotype information from the individuals in this study. We use them to adjust for ethnicity in a more rigorous form than a categorical &lt;em&gt;race&lt;/em&gt; variable would be able to do.&lt;a href=&#34;#fnref2&#34; class=&#34;footnote-back&#34;&gt;↩&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Quick overview on the new Bioconductor 3.8 release</title>
      <link>http://LieberInstitute.github.io/rstatsclub/2018/11/02/quick-overview-on-the-new-bioconductor-3-8-release/</link>
      <pubDate>Fri, 02 Nov 2018 00:00:00 +0000</pubDate>
      <guid>http://LieberInstitute.github.io/rstatsclub/2018/11/02/quick-overview-on-the-new-bioconductor-3-8-release/</guid>
      <description>


&lt;p&gt;Every six months the Bioconductor project releases it’s new version of packages. This allows developers a time window to try out new methods and test them rigorously before releasing them to the community at large. It also means that this is an exciting time 🎉. With every release there are dozens&lt;a href=&#34;#fn1&#34; class=&#34;footnoteRef&#34; id=&#34;fnref1&#34;&gt;&lt;sup&gt;1&lt;/sup&gt;&lt;/a&gt; of new software packages. Bioconductor version 3.8 was &lt;a href=&#34;https://www.bioconductor.org/news/bioc_3_8_release/&#34;&gt;just released&lt;/a&gt; on Halloween: October 31st, 2018. Thus, this is the perfect time to browse through their descriptions and find out what’s new that can be of use to your research.&lt;/p&gt;
&lt;p&gt;That’s exactly what our post today is about. We looked at the list of new packages as well as updates to find those that we think could be useful for us. That is, packages that we might want to explore further.&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2018-11-02-quick-overview-on-the-new-bioconductor-3-8-release_files/Screen%20Shot%202018-11-02%20at%2010.48.18%20AM.png&#34; width=&#34;400&#34; /&gt;

&lt;/div&gt;
&lt;div id=&#34;affixcan&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;AffiXcan&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;&lt;a href=&#34;https://bioconductor.org/packages/3.8/AffiXcan&#34;&gt;AffiXcan&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Impute a GReX (Genetically Regulated Expression) for a set of genes in a sample of individuals, using a method based on the Total Binding Affinity (TBA). Statistical models to impute GReX can be trained with a training dataset where the real total expression values are known.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Looks interesting from the name but the description is too vague.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;biocpkgtools&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;BiocPkgTools&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;&lt;a href=&#34;https://bioconductor.org/packages/3.8/BiocPkgTools&#34;&gt;BiocPkgTools&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Bioconductor has a rich ecosystem of metadata around packages, usage, and build status. This package is a simple collection of functions to access that metadata from R. The goal is to expose metadata for data mining and value-added functionality such as package searching, text mining, and analytics on packages.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;As Bioconductor developers this package sounds useful. Maybe it can be used to see if any of your packages is broken (errors, warnings) in BioC release or BioC devel.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;brainimager&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;brainImageR&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;&lt;a href=&#34;https://bioconductor.org/packages/3.8/brainImageR&#34;&gt;brainImageR&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;BrainImageR is a package that provides the user with information of where in the human brain their gene set corresponds to. This is provided both as a continuous variable and as a easily-interpretable image. BrainImageR has additional functionality of identifying approximately when in developmental time that a gene expression dataset corresponds to. Both the spatial gene set enrichment and the developmental time point prediction are assessed in comparison to the Allen Brain Atlas reference data.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Sounds interesting since we work with brain data ourselves. We are curious about where the data comes from!&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;buscorrect&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;BUScorrect&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;&lt;a href=&#34;https://bioconductor.org/packages/3.8/BUScorrect&#34;&gt;BUScorrect&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;High-throughput experimental data are accumulating exponentially in public databases. However, mining valid scientific discoveries from these abundant resources is hampered by technical artifacts and inherent biological heterogeneity. The former are usually termed “batch effects,” and the latter is often modelled by “subtypes.” The R package BUScorrect fits a Bayesian hierarchical model, the Batch-effects-correction-with-Unknown-Subtypes model (BUS), to correct batch effects in the presence of unknown subtypes. BUS is capable of (a) correcting batch effects explicitly, (b) grouping samples that share similar characteristics into subtypes, (c) identifying features that distinguish subtypes, and (d) enjoying a linear-order computation complexity.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Hm… maybe this can be used with data from &lt;em&gt;&lt;a href=&#34;https://bioconductor.org/packages/3.8/recount&#34;&gt;recount&lt;/a&gt;&lt;/em&gt;. We also have to work sometimes with data from multiple labs.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;consensusde&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;consensusDE&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;&lt;a href=&#34;https://bioconductor.org/packages/3.8/consensusDE&#34;&gt;consensusDE&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;This package allows users to perform DE analysis using multiple algorithms. It seeks consensus from multiple methods. Currently it supports “Voom”, “EdgeR” and “DESeq”, but can be easily extended. It uses RUV-seq (optional) to remove batch effects.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Depending how flexible this new package is, it could be useful for saving time. If it’s not flexible we won’t really use it.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;enhancedvolcano&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;EnhancedVolcano&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;&lt;a href=&#34;https://bioconductor.org/packages/3.8/EnhancedVolcano&#34;&gt;EnhancedVolcano&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Volcano plots represent a useful way to visualise the results of differential expression analyses. Here, we present a highly-configurable function that produces publication-ready volcano plots. EnhancedVolcano will attempt to fit as many transcript names in the plot window as possible, thus avoiding ‘clogging’ up the plot with labels that could not otherwise have been read.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Enhanced volcano plots? Cool! We use volcano plots all the time, so they sold us with the name. Plus, who doesn’t want &lt;em&gt;publication ready plots&lt;/em&gt;?&lt;a href=&#34;#fn2&#34; class=&#34;footnoteRef&#34; id=&#34;fnref2&#34;&gt;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;excluster&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;ExCluster&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;&lt;a href=&#34;https://bioconductor.org/packages/3.8/ExCluster&#34;&gt;ExCluster&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;ExCluster flattens Ensembl and GENCODE GTF files into GFF files, which are used to count reads per non-overlapping exon bin from BAM files. This read counting is done using the function featureCounts from the package Rsubread. Library sizes are normalized across all biological replicates, and ExCluster then compares two different conditions to detect signifcantly differentially spliced genes. This process requires at least two independent biological repliates per condition, and ExCluster accepts only exactly two conditions at a time. ExCluster ultimately produces false discovery rates (FDRs) per gene, which are used to detect significance. Exon log2 fold change (log2FC) means and variances may be plotted for each significantly differentially spliced gene, which helps scientists develop hypothesis and target differential splicing events for RT-qPCR validation in the wet lab.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Hm… this one could be useful for some future work related to &lt;em&gt;&lt;a href=&#34;https://bioconductor.org/packages/3.8/recount&#34;&gt;recount&lt;/a&gt;&lt;/em&gt;. Specially the part about simplifying a GTF file.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;maser&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;maser&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;&lt;a href=&#34;https://bioconductor.org/packages/3.8/maser&#34;&gt;maser&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;This package provides functionalities for analysis, annotation and visualizaton of alternative splicing events.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Visualization of splicing events is something that can be useful. But the description is too vague and will require more investigation.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;mirsm&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;miRSM&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;&lt;a href=&#34;https://bioconductor.org/packages/3.8/miRSM&#34;&gt;miRSM&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The package aims to identify miRNA sponge modules by integrating expression data and miRNA-target binding information. It provides several functions to study miRNA sponge modules, including popular methods for inferring gene modules (candidate miRNA sponge modules), and a function to identify miRNA sponge modules, as well as a function to conduct functional analysis of miRNA sponge modules.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;em&gt;I’m still reading the description&lt;/em&gt;, hold on! We’ll look into this more!&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;outrider&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;OUTRIDER&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;&lt;a href=&#34;https://bioconductor.org/packages/3.8/OUTRIDER&#34;&gt;OUTRIDER&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Identification of aberrent gene expression in RNA-seq data. Read count expectations are modeled by an autoencoder to control for confounders in the data. Given these expectations, the RNA-seq read counts are assumed to follow a negative binomial distribution with a gene-specific dispersion. Outliers are then identified as read counts that significantly deviate from this distribution. Further OUTRIDER provides useful plotting function to analyze and visualize the results.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;We’d be interested in testing the performance of OUTRIDER in our already analyzed datasets.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;primirtss&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;primirTSS&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;&lt;a href=&#34;https://bioconductor.org/packages/3.8/primirTSS&#34;&gt;primirTSS&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;A fast, convenient tool to identify the TSSs of miRNAs by integrating the data of H3K4me3 and Pol II as well as combining the conservation level and sequence feature, provided within both command-line and graphical interfaces, which achieves a better performance than the previous non-cell-specific methods on miRNA TSSs.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;We don’t have this kind of data (Pol II ChIP-seq) but it looks useful if you do have it.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;timeseriesexperiment&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;TimeSeriesExperiment&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;&lt;a href=&#34;https://bioconductor.org/packages/3.8/TimeSeriesExperiment&#34;&gt;TimeSeriesExperiment&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Visualization and analysis toolbox for short time course data which includes dimensionality reduction, clustering, two-sample differential expression testing and gene ranking techniques. The package also provides methods for retrieving enriched pathways.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;We have time series data and could maybe try this package out for clustering and pathway analyses. We are not sure why it matters that the time course is &lt;em&gt;short&lt;/em&gt;, what does this really mean? Could they mean that the time course is complete (no dropouts) unlike a longitudinal time course project? From the vignette: they are talking about small number of time points: that is, statistical methods for this scenario.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;trna&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;tRNA&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;&lt;a href=&#34;https://bioconductor.org/packages/3.8/tRNA&#34;&gt;tRNA&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The tRNA package allows tRNA sequences and structures to be accessed and used for subsetting. In addition, it provides visualization tools to compare feature parameters of multiple tRNA sets and correlate them to additional data. The tRNA package uses GRanges objects as inputs requiring only few additional column data sets.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Wow! tRNAs!? Let’s look into this!&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;tximeta&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;tximeta&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;&lt;a href=&#34;https://bioconductor.org/packages/3.8/tximeta&#34;&gt;tximeta&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Transcript quantification import from Salmon with automatic population of metadata and transcript ranges. Filtered, combined, or de novo transcriptomes can be linked to the appropriate sources with linkedTxomes and shared for reproducible analyses.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Just look at this great video tweet by Michael Love!&lt;/p&gt;
&lt;blockquote class=&#34;twitter-tweet&#34;&gt;&lt;p lang=&#34;en&#34; dir=&#34;ltr&#34;&gt;.&lt;a href=&#34;https://twitter.com/Bioconductor?ref_src=twsrc%5Etfw&#34;&gt;@Bioconductor&lt;/a&gt; 3.8 is released, which means so is tximeta! this idea came up more than 2 years ago, to auto-populate metadata for Salmon quant directories. the goal is no more guessing for the data you quantified earlier in a project, or from public archive. here&amp;#39;s a demo &lt;a href=&#34;https://t.co/6r1yoNcIyj&#34;&gt;pic.twitter.com/6r1yoNcIyj&lt;/a&gt;&lt;/p&gt;&amp;mdash; Michael Love (@mikelove) &lt;a href=&#34;https://twitter.com/mikelove/status/1057948391261511680?ref_src=twsrc%5Etfw&#34;&gt;November 1, 2018&lt;/a&gt;&lt;/blockquote&gt;
&lt;script async src=&#34;https://platform.twitter.com/widgets.js&#34; charset=&#34;utf-8&#34;&gt;&lt;/script&gt;


&lt;/div&gt;
&lt;div id=&#34;ularcirc&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Ularcirc&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;&lt;a href=&#34;https://bioconductor.org/packages/3.8/Ularcirc&#34;&gt;Ularcirc&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Ularcirc reads in STAR aligned splice junction files and provides visualisation and analysis tools for splicing analysis. Users can assess backsplice junctions and forward canonical junctions.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;We are interested in exploring its visualization tools and if they are specific STAR output files or if this package works with a wider set of aligners.&lt;/p&gt;
&lt;p&gt;We also appreciate the name of this package!&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;wrapping-up&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Wrapping up&lt;/h3&gt;
&lt;p&gt;We liked how many of the new software packages emphasized visualization! The package with the best name was &lt;em&gt;&lt;a href=&#34;https://bioconductor.org/packages/3.8/COCOA&#34;&gt;COCOA&lt;/a&gt;&lt;/em&gt; hehe.&lt;/p&gt;
&lt;p&gt;We hope that you are as excited as we are about trying out the new &lt;a href=&#34;https://www.bioconductor.org/news/bioc_3_8_release/&#34;&gt;Bioconductor 3.8 packages&lt;/a&gt;! If we implement any of these packages into our analysis routine we &lt;em&gt;want&lt;/em&gt; to come back and write a blog post about them&lt;a href=&#34;#fn3&#34; class=&#34;footnoteRef&#34; id=&#34;fnref3&#34;&gt;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;acknowledgments&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Acknowledgments&lt;/h3&gt;
&lt;p&gt;This blog post was made possible thanks to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;&lt;a href=&#34;https://bioconductor.org/packages/3.8/BiocStyle&#34;&gt;BiocStyle&lt;/a&gt;&lt;/em&gt; &lt;a id=&#39;cite-Oles_2018&#39;&gt;&lt;/a&gt;(&lt;a href=&#39;https://github.com/Bioconductor/BiocStyle&#39;&gt;Oleś, Morgan, and Huber, 2018&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=blogdown&#34;&gt;blogdown&lt;/a&gt;&lt;/em&gt; &lt;a id=&#39;cite-Xie_2017&#39;&gt;&lt;/a&gt;(&lt;a href=&#39;https://github.com/rstudio/blogdown&#39;&gt;Xie, Hill, and Thomas, 2017&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=knitcitations&#34;&gt;knitcitations&lt;/a&gt;&lt;/em&gt; &lt;a id=&#39;cite-Boettiger_2017&#39;&gt;&lt;/a&gt;(&lt;a href=&#39;https://CRAN.R-project.org/package=knitcitations&#39;&gt;Boettiger, 2017&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=sessioninfo&#34;&gt;sessioninfo&lt;/a&gt;&lt;/em&gt; &lt;a id=&#39;cite-Csardi_2018&#39;&gt;&lt;/a&gt;(&lt;a href=&#39;https://github.com/r-lib/sessioninfo#readme&#39;&gt;Csárdi, core, Wickham, Chang, et al., 2018&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;and all the Bioconductor package developers and maintainers!&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;references&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;References&lt;/h3&gt;
&lt;p&gt;
&lt;a id=&#39;bib-Boettiger_2017&#39;&gt;&lt;/a&gt;&lt;a href=&#34;#cite-Boettiger_2017&#34;&gt;[1]&lt;/a&gt;&lt;cite&gt; C. Boettiger. &lt;em&gt;knitcitations: Citations for ‘Knitr’ Markdown Files&lt;/em&gt;. R package version 1.0.8. 2017. URL: &lt;a href=&#34;https://CRAN.R-project.org/package=knitcitations&#34;&gt;https://CRAN.R-project.org/package=knitcitations&lt;/a&gt;.&lt;/cite&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;a id=&#39;bib-Csardi_2018&#39;&gt;&lt;/a&gt;&lt;a href=&#34;#cite-Csardi_2018&#34;&gt;[2]&lt;/a&gt;&lt;cite&gt; G. Csárdi, R. core, H. Wickham, W. Chang, et al. &lt;em&gt;sessioninfo: R Session Information&lt;/em&gt;. R package version 1.1.0.9000. 2018. URL: &lt;a href=&#34;https://github.com/r-lib/sessioninfo#readme&#34;&gt;https://github.com/r-lib/sessioninfo#readme&lt;/a&gt;.&lt;/cite&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;a id=&#39;bib-Oles_2018&#39;&gt;&lt;/a&gt;&lt;a href=&#34;#cite-Oles_2018&#34;&gt;[3]&lt;/a&gt;&lt;cite&gt; A. Oleś, M. Morgan and W. Huber. &lt;em&gt;BiocStyle: Standard styles for vignettes and other Bioconductor documents&lt;/em&gt;. R package version 2.10.0. 2018. URL: &lt;a href=&#34;https://github.com/Bioconductor/BiocStyle&#34;&gt;https://github.com/Bioconductor/BiocStyle&lt;/a&gt;.&lt;/cite&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;a id=&#39;bib-Xie_2017&#39;&gt;&lt;/a&gt;&lt;a href=&#34;#cite-Xie_2017&#34;&gt;[4]&lt;/a&gt;&lt;cite&gt; Y. Xie, A. P. Hill and A. Thomas. &lt;em&gt;blogdown: Creating Websites with R Markdown&lt;/em&gt;. ISBN 978-0815363729. Boca Raton, Florida: Chapman and Hall/CRC, 2017. URL: &lt;a href=&#34;https://github.com/rstudio/blogdown&#34;&gt;https://github.com/rstudio/blogdown&lt;/a&gt;.&lt;/cite&gt;
&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;reproducibility&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Reproducibility&lt;/h3&gt;
&lt;pre&gt;&lt;code&gt;## ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
##  setting  value                                      
##  version  R version 3.5.1 Patched (2018-10-14 r75439)
##  os       macOS Mojave 10.14.1                       
##  system   x86_64, darwin15.6.0                       
##  ui       X11                                        
##  language (EN)                                       
##  collate  en_US.UTF-8                                
##  ctype    en_US.UTF-8                                
##  tz       America/New_York                           
##  date     2018-11-02                                 
## 
## ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
##  package       * version    date       lib source                            
##  assertthat      0.2.0      2017-04-11 [1] CRAN (R 3.5.0)                    
##  backports       1.1.2      2017-12-13 [1] CRAN (R 3.5.0)                    
##  bibtex          0.4.2      2017-06-30 [1] CRAN (R 3.5.0)                    
##  BiocManager     1.30.3     2018-10-10 [1] CRAN (R 3.5.0)                    
##  BiocStyle     * 2.10.0     2018-10-30 [1] Bioconductor                      
##  blogdown        0.9        2018-10-23 [1] CRAN (R 3.5.0)                    
##  bookdown        0.7        2018-02-18 [1] CRAN (R 3.5.0)                    
##  cli             1.0.1      2018-09-25 [1] CRAN (R 3.5.0)                    
##  colorout      * 1.2-0      2018-05-03 [1] Github (jalvesaq/colorout@c42088d)
##  crayon          1.3.4      2017-09-16 [1] CRAN (R 3.5.0)                    
##  digest          0.6.18     2018-10-10 [1] CRAN (R 3.5.0)                    
##  evaluate        0.12       2018-10-09 [1] CRAN (R 3.5.0)                    
##  htmltools       0.3.6      2017-04-28 [1] CRAN (R 3.5.0)                    
##  httr            1.3.1      2017-08-20 [1] CRAN (R 3.5.0)                    
##  jsonlite        1.5        2017-06-01 [1] CRAN (R 3.5.0)                    
##  knitcitations * 1.0.8      2017-07-04 [1] CRAN (R 3.5.0)                    
##  knitr           1.20       2018-02-20 [1] CRAN (R 3.5.0)                    
##  lubridate       1.7.4      2018-04-11 [1] CRAN (R 3.5.0)                    
##  magrittr        1.5        2014-11-22 [1] CRAN (R 3.5.0)                    
##  plyr            1.8.4      2016-06-08 [1] CRAN (R 3.5.0)                    
##  R6              2.3.0      2018-10-04 [1] CRAN (R 3.5.0)                    
##  Rcpp            0.12.19    2018-10-01 [1] CRAN (R 3.5.1)                    
##  RefManageR      1.2.0      2018-04-25 [1] CRAN (R 3.5.0)                    
##  rmarkdown       1.10       2018-06-11 [1] CRAN (R 3.5.0)                    
##  rprojroot       1.3-2      2018-01-03 [1] CRAN (R 3.5.0)                    
##  sessioninfo   * 1.1.0.9000 2018-10-02 [1] Github (r-lib/sessioninfo@4f91fad)
##  stringi         1.2.4      2018-07-20 [1] CRAN (R 3.5.0)                    
##  stringr         1.3.1      2018-05-10 [1] CRAN (R 3.5.0)                    
##  withr           2.1.2      2018-03-15 [1] CRAN (R 3.5.0)                    
##  xfun            0.4        2018-10-23 [1] CRAN (R 3.5.0)                    
##  xml2            1.2.0      2018-01-24 [1] CRAN (R 3.5.0)                    
##  yaml            2.2.0      2018-07-25 [1] CRAN (R 3.5.0)                    
## 
## [1] /Library/Frameworks/R.framework/Versions/3.5devel/Resources/library&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div class=&#34;footnotes&#34;&gt;
&lt;hr /&gt;
&lt;ol&gt;
&lt;li id=&#34;fn1&#34;&gt;&lt;p&gt;Soon it’ll be in the hundreds! Wow!&lt;a href=&#34;#fnref1&#34;&gt;↩&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li id=&#34;fn2&#34;&gt;&lt;p&gt;We’ll see if that’s true hehe.&lt;a href=&#34;#fnref2&#34;&gt;↩&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li id=&#34;fn3&#34;&gt;&lt;p&gt;Time permitting :P&lt;a href=&#34;#fnref3&#34;&gt;↩&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>“Demystifying Data Science” remote notes</title>
      <link>http://LieberInstitute.github.io/rstatsclub/2018/10/24/demystifying-data-science-remote-notes/</link>
      <pubDate>Wed, 24 Oct 2018 00:00:00 +0000</pubDate>
      <guid>http://LieberInstitute.github.io/rstatsclub/2018/10/24/demystifying-data-science-remote-notes/</guid>
      <description>


&lt;p&gt;To carry on our momentum from a few weeks ago from our &lt;a href=&#34;http://research.libd.org/rstatsclub/2018/07/13/libd-rstats-club-remote-user-2018-notes&#34;&gt;useR!2018 remote notes blog post&lt;/a&gt;, this time we will be summarizing the &lt;a href=&#34;https://www.thisismetis.com/demystifying-data-science&#34;&gt;Demystifying Data Sience 2018&lt;/a&gt; conference for which you can register for free. We are just following &lt;a href=&#34;https://twitter.com/drob&#34;&gt;David Robinson’s&lt;/a&gt; advice to blog all the time!&lt;/p&gt;

&lt;div id=&#34;conference-overview&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Conference overview&lt;/h3&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2018-10-24-demystifying-data-science-remote-notes_files/2018-10-20%2018.40.37.jpg&#34; width=&#34;600&#34; /&gt;

&lt;/div&gt;
&lt;p&gt;We got interested in this conference&lt;a href=&#34;#fn1&#34; class=&#34;footnoteRef&#34; id=&#34;fnref1&#34;&gt;&lt;sup&gt;1&lt;/sup&gt;&lt;/a&gt; thanks to tweets like these ones that highlight that:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;data scientists are young!&lt;/li&gt;
&lt;li&gt;specialists are more in demand!&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Hopefully you find these tweets interesting as well. We can find more about the conference on Twitter using the &lt;a href=&#34;https://twitter.com/search?q=%23DemystifyDS&amp;amp;src=typd&#34;&gt;DemystifyDS&lt;/a&gt; hashtag which also covers previous conferences. We see that the official event account &lt;a href=&#34;https://twitter.com/thisismetis&#34;&gt;thisismetis&lt;/a&gt; really went all out with branded summary tweets! You can find recordings of all the talks and there were many interesting titles. So we decided to spend 2 sessions and watched 4 full talks.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://youtu.be/NGMFq5MQcOI&#34;&gt;Navigating the Maze of the Data Science Job Hunt&lt;/a&gt; by &lt;a href=&#34;https://www.linkedin.com/in/markmeloon/&#34;&gt;Mark Meloon&lt;/a&gt;, Data Scientist, ServiceNow.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://youtu.be/M_dc-XzApGA&#34;&gt;How to Get a Foothold in the Field of Data Science&lt;/a&gt; by &lt;a href=&#34;https://www.linkedin.com/in/brohrer/&#34;&gt;Brandon Rohrer&lt;/a&gt;, Data Scientist, Facebook.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://youtu.be/KI-1GA5Zotc&#34;&gt;Data Visualization: How to Overcome Common Challenges&lt;/a&gt; by &lt;a href=&#34;https://www.linkedin.com/in/kate-strachnyi-data/&#34;&gt;Kate Strachnyi&lt;/a&gt;, Manager and Data Visualization Specialist, Deloitte.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://youtu.be/nF315csYx3U&#34;&gt;The Art of &amp;amp; Science of Creating a Actionable Data Story&lt;/a&gt; by &lt;a href=&#34;https://www.linkedin.com/in/micoyuk/&#34;&gt;Mico Yuk&lt;/a&gt;, Chief Executive Officer, BI-Brainz Group | Author, Data Visualization for Dummies &amp;amp; More.&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div id=&#34;talks-summaries&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Talks summaries&lt;/h3&gt;
&lt;div id=&#34;mark-meloon-markmeloon&#34; class=&#34;section level4&#34;&gt;
&lt;h4&gt;Mark Meloon &lt;a href=&#34;https://twitter.com/MarkMeloon&#34;&gt;&lt;code&gt;MarkMeloon&lt;/code&gt;&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;In the &lt;a href=&#34;https://youtu.be/NGMFq5MQcOI&#34;&gt;first talk&lt;/a&gt;, by &lt;a href=&#34;https://www.linkedin.com/in/markmeloon/&#34;&gt;Mark Meloon&lt;/a&gt;, we learned about the power of &lt;a href=&#34;https://www.linkedin.com/&#34;&gt;LinkedIn&lt;/a&gt; for networking and finding your next job. He suggested posting regularly on LinkedIn as your feed will show up more on others’, allowing you to connect with more people. If you write something about content described by someone you especially admire or hope to work for, you are more likely to catch their attention. It’s best to not ask people directly for a job, but to contact them first to discuss their work or to ask for advice. He also suggested adding key data analysis techniques to your profile. He suggested that describing the techniques with specificity would be best, instead of using more vague terms.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;brandon-rohrer-_brohrer_&#34; class=&#34;section level4&#34;&gt;
&lt;h4&gt;Brandon Rohrer &lt;a href=&#34;https://twitter.com/_brohrer_&#34;&gt;&lt;code&gt;_brohrer_&lt;/code&gt;&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;In the &lt;a href=&#34;https://youtu.be/M_dc-XzApGA&#34;&gt;second talk&lt;/a&gt;, by &lt;a href=&#34;https://www.linkedin.com/in/brohrer/&#34;&gt;Brandon Rohrer&lt;/a&gt;, we learned about the different data science careers that are possible.&lt;/p&gt;
&lt;p&gt;The major fields are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Data Analysis - statistics and interpretation&lt;/li&gt;
&lt;li&gt;Data Modeling - machine learning, prediction&lt;/li&gt;
&lt;li&gt;Data Engineering - automation, databases, programming&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The major roles/archetypes are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Generalist - decent at all three fields&lt;/li&gt;
&lt;li&gt;Detective - master of analysis&lt;/li&gt;
&lt;li&gt;Oracle - master of modeling&lt;/li&gt;
&lt;li&gt;Maker - master of engineering&lt;/li&gt;
&lt;li&gt;Unicorn - master of all!!!&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;He ended by mentioning that job postings using the term “data science” often vary widely, and he recommends ignoring the posted job titles and de-emphasizing the specific tools listed, and instead focus on the &lt;em&gt;skills&lt;/em&gt; that are being asked for to get a real sense of the job and how you would perform.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;kate-strachnyi-storybydata&#34; class=&#34;section level4&#34;&gt;
&lt;h4&gt;Kate Strachnyi &lt;a href=&#34;https://twitter.com/StorybyData&#34;&gt;&lt;code&gt;StorybyData&lt;/code&gt;&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;In the &lt;a href=&#34;https://youtu.be/KI-1GA5Zotc&#34;&gt;third talk&lt;/a&gt; by &lt;a href=&#34;https://www.linkedin.com/in/kate-strachnyi-data/&#34;&gt;Kate Strachnyi&lt;/a&gt; we learned about how to overcome challenges in data visualization. She described data visualizations as “Information Maps” that should ideally be:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Informative&lt;/li&gt;
&lt;li&gt;Efficient&lt;/li&gt;
&lt;li&gt;Appealing&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Common issues were:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Wrong chart choice - some charts will be much more effective&lt;/li&gt;
&lt;li&gt;Improper use of color - use to tell a story in a useful way - not just to be decorative&lt;/li&gt;
&lt;li&gt;Information overload - don’t try to do too much at once - loses impact&lt;/li&gt;
&lt;li&gt;Clutter - leave out the nonessential&lt;/li&gt;
&lt;li&gt;Not speaking the same language - know your audience (jargon/lingo)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;She also noted that we should be careful about color schemes. She suggested that there are websites to check how your figures would appear to others with colorblindness.&lt;/p&gt;
&lt;p&gt;She mostly uses tableau in her work and suggested that it makes a nice free option for data visualization.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;mico-yuk-micoyuk&#34; class=&#34;section level4&#34;&gt;
&lt;h4&gt;Mico Yuk &lt;a href=&#34;https://twitter.com/micoyuk&#34;&gt;&lt;code&gt;micoyuk&lt;/code&gt;&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;In the &lt;a href=&#34;https://youtu.be/nF315csYx3U&#34;&gt;fourth and last talk&lt;/a&gt; by &lt;a href=&#34;https://www.linkedin.com/in/micoyuk/&#34;&gt;Mico Yuk&lt;/a&gt; we learned about storyboards and remembering that our data analyses are always to try to tell story about the data. She pointed out that the human mind is wired visually, that we retain about 80% of what we see, 20% of what we read, and 10% of what we hear. She suggested that we create &lt;a href=&#34;https://en.wikipedia.org/wiki/SMART_criteria&#34;&gt;SMART goals&lt;/a&gt; (she credits Peter Drucker) to make sure that our work is driven efficiently in the correct direction. She suggested that communicating our work in a SMART goal-based framework based would concisely and clearly communicate the purpose and results of work.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;our-impressions&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Our impressions&lt;/h3&gt;
&lt;p&gt;Given our diversity of impressions we thought it would be more useful to share our impressions. Without further ado, here they are.&lt;/p&gt;
&lt;div id=&#34;p1&#34; class=&#34;section level4&#34;&gt;
&lt;h4&gt;P1&lt;/h4&gt;
&lt;p&gt;I found &lt;a href=&#34;https://www.linkedin.com/in/markmeloon/&#34;&gt;Mark Meloon&lt;/a&gt;’s talk very useful. I have actually started posting more regularly on my own LinkedIn account and it has indeed captured more attention from others. In fact, I have even received emails from companies interested in hiring someone with my expertise. &lt;a href=&#34;https://www.linkedin.com/in/brohrer/&#34;&gt;Brandon Rohrer&lt;/a&gt; clarified some trends that I had noticed about data science. I identify with the “Detective” role and I see that while I may aspire at times (unsuccessfully) to be a “Generalist - or someday a Unicorn”, my experience as a Detective is very worthwhile as well. I love data visualization and I loved &lt;a href=&#34;https://www.linkedin.com/in/kate-strachnyi-data/&#34;&gt;Kate Strachnyi&lt;/a&gt;’s talk. I found her tips to be very clear reminders for how to continue with my own visualizations. The talk by &lt;a href=&#34;https://www.linkedin.com/in/micoyuk/&#34;&gt;Mico Yuk&lt;/a&gt; was a good reminder to keep overall goals in mind as you work and to regularly take a step back and assess if your work is really proceeding in the direction and at the rate that you planned.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;p2&#34; class=&#34;section level4&#34;&gt;
&lt;h4&gt;P2&lt;/h4&gt;
&lt;p&gt;&lt;a href=&#34;https://www.linkedin.com/in/markmeloon/&#34;&gt;Mark Meloon’s&lt;/a&gt; talk emphasized the use of LinkedIn for networking and job hunting. He interviews job candidates for his company so his viewpoint was a direct reflection of someone who uses the website to find and/or assess job applicants. I liked that he gave both good and bad examples of actual profiles and messages he’s seen on LinkedIn. He also noted that, to get a foot in the door of a job posting, you don’t need to directly know the hiring manager, but reaching out to anyone you know in the company, even if it’s a second- or third-level connection (i.e. friend of a friend), is better than nothing, as long as you do it right. I do wish he had spoken about other social media platforms, such as Twitter, and how they compare to LinkedIn for networking.&lt;/p&gt;
&lt;p&gt;I found the breakdown of skills and job types by &lt;a href=&#34;https://www.linkedin.com/in/brohrer/&#34;&gt;Brandon Rohrer&lt;/a&gt; to be really instructive. It made me reflect on my own interests and skills in a broken down way, and I think it will help to have this framework for both future job hunts and interviews. I particularly like that he emphasized it’s okay/normal to not be great at &lt;em&gt;everything&lt;/em&gt; related to data science - it’s a broad field - and that people with a narrower set of expertise are still needed and valuable for specific jobs. His talk also gave me some ideas of skills I may be able to work on and add to my portfolio to round out my skill-set. I would recommend this talk to anyone in the data science or analysis fields that is looking for clarity or definition in their current job or career path!&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;p3&#34; class=&#34;section level4&#34;&gt;
&lt;h4&gt;P3&lt;/h4&gt;
&lt;p&gt;&lt;a href=&#34;https://www.linkedin.com/in/kate-strachnyi-data/&#34;&gt;Kate Strachnyi&lt;/a&gt;’s talk was a great reminder of the importance of keeping your audience in mind when presenting information and making sure that visualizations are not just accurate but also easily understood. Her list of common issues was a helpful summary of guidelines I’ve heard before, and I appreciated the examples she used. In particular, I think I often run into the challenge of “information overload” when I present informally to others – I need to remember that it’s not enough for the information to be there, it also needs to be arranged in way that lets people understand it quickly.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://www.linkedin.com/in/micoyuk/&#34;&gt;Mico Yuk&lt;/a&gt;’s talk was probably more applicable to someone working in a corporate field rather than an academic one, but the main idea of framing data as a story and keeping the goal in mind was still relevant to me. Some of the suggestions, like asking the “right questions” of your user, could easily be reworked for research (even if the user is just be me). I haven’t worked with a storyboard before, but it would be interesting to see if that approach could also apply to planning out analyses for a research paper – the goal might be the question we’re asking, KPI the metrics we’re using to answer that question, trends the conclusions we can draw, and actions the next direction of analysis. The translation from business to academic research probably needs some tweaking, but I might try this approach on a future project to help with organization and keeping the bigger story in mind.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;p4&#34; class=&#34;section level4&#34;&gt;
&lt;h4&gt;P4&lt;/h4&gt;
&lt;p&gt;&lt;a href=&#34;https://www.linkedin.com/in/markmeloon/&#34;&gt;Mark Meloon&lt;/a&gt;’s talk reminded me that many use &lt;a href=&#34;https://www.linkedin.com/&#34;&gt;LinkedIn&lt;/a&gt; for networking which hasn’t been that common in my experience in academia. This is something I would need to keep in mind for advising students in the future that are either unsure of staying in academia or want to go to industry. I do brush up my profile once in a while, and parts of Mark’s advice applies also to CVs (writing them and sending them via email): basically, be genuine and respectful of others.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://www.linkedin.com/in/brohrer/&#34;&gt;Brandon Rohrer&lt;/a&gt; verbalized distinctions in data science roles that I had either heard of before or had some intuition behind them, but hadn’t actually spent the time to see them as clearly defined as Brandon did. I was also quite curious of everyone’s reaction to his talk and how each of us labelled ourselves. For example, maybe &lt;em&gt;X&lt;/em&gt; thought &lt;em&gt;Z&lt;/em&gt; was a &lt;em&gt;unicorn&lt;/em&gt;, but &lt;em&gt;Z&lt;/em&gt; perceived themselves as a &lt;em&gt;beginner&lt;/em&gt;. In my case, I think that it’s probably too ambitious to get to the unicorn level. I’m simply aiming to get to (or am at) a level where I can understand most of the terms and conversations, but then go back and research a bit more if I need to as preparation for a follow up meeting. I guess that I’m a generalist.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://www.linkedin.com/in/kate-strachnyi-data/&#34;&gt;Kate Strachnyi&lt;/a&gt;’s key points are I guess topics that I’ve heard before and loosely follow. I think that her audience is different from mine as she seems to create visualizations that are used in many company presentations. I’m frequently under pressure to get a simple version of a plot done where we can see the trend in the data and only work on polishing a few selected plots that get highlighted in a research paper. Though I guess that I could/should spend a bit more time thinking about the plot design and colors before I make the next one. For that, I would like to learn more about the &lt;code&gt;paletteer&lt;/code&gt; R package:&lt;/p&gt;
&lt;blockquote class=&#34;twitter-tweet&#34;&gt;&lt;p lang=&#34;en&#34; dir=&#34;ltr&#34;&gt;ICYMI, 🎨 With more palettes than a tweet could possibly contain…&lt;br&gt;&amp;quot;paletteer: Collection of most color palettes in a single R 📦&amp;quot; 👨‍🎨 &lt;a href=&#34;https://twitter.com/Emil_Hvitfeldt?ref_src=twsrc%5Etfw&#34;&gt;@Emil_Hvitfeldt&lt;/a&gt; &lt;a href=&#34;https://t.co/7kKSyohQN4&#34;&gt;https://t.co/7kKSyohQN4&lt;/a&gt; &lt;a href=&#34;https://twitter.com/hashtag/rstats?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#rstats&lt;/a&gt; &lt;a href=&#34;https://t.co/zibFhW03EU&#34;&gt;pic.twitter.com/zibFhW03EU&lt;/a&gt;&lt;/p&gt;&amp;mdash; Mara Averick (@dataandme) &lt;a href=&#34;https://twitter.com/dataandme/status/1021828886160654336?ref_src=twsrc%5Etfw&#34;&gt;July 24, 2018&lt;/a&gt;&lt;/blockquote&gt;
&lt;script async src=&#34;https://platform.twitter.com/widgets.js&#34; charset=&#34;utf-8&#34;&gt;&lt;/script&gt;


&lt;p&gt;&lt;a href=&#34;https://www.linkedin.com/in/micoyuk/&#34;&gt;Mico Yuk&lt;/a&gt; talked about SMART goals. Hmm… I don’t remember what that stood for, so I clearly would need to re-watch her talk. After skimming through it again I guess that I can only say that it was hard for me to relate to her talk because I haven’t been in a project that involved all planning steps that she talked about. While it wasn’t for me, it might be useful to you, so give it a try!&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;wrapping-up&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Wrapping up&lt;/h3&gt;
&lt;p&gt;Thanks for getting this far. We are curious to hear what where your own impressions in these and other talks from the &lt;a href=&#34;https://www.thisismetis.com/demystifying-data-science&#34;&gt;Demystifying Data Sience 2018&lt;/a&gt;: they have &lt;a href=&#34;https://www.youtube.com/channel/UCpbU53RP134D9qy9GzKBAXA/videos&#34;&gt;28 recorded talks&lt;/a&gt; in total! We also hope that you enjoyed reading about our different perspectives.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;acknowledgments&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Acknowledgments&lt;/h3&gt;
&lt;p&gt;We are grateful to everyone that tweeted about the conference and shared their materials online! We are also happy that &lt;em&gt;Metis&lt;/em&gt; got interested in our summary blog post.&lt;/p&gt;
&lt;p&gt;This blog post was made possible thanks to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=blogdown&#34;&gt;blogdown&lt;/a&gt;&lt;/em&gt; &lt;a id=&#39;cite-Xie_2017&#39;&gt;&lt;/a&gt;(&lt;a href=&#39;https://github.com/rstudio/blogdown&#39;&gt;Xie, Hill, and Thomas, 2017&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=knitcitations&#34;&gt;knitcitations&lt;/a&gt;&lt;/em&gt; &lt;a id=&#39;cite-Boettiger_2017&#39;&gt;&lt;/a&gt;(&lt;a href=&#39;https://CRAN.R-project.org/package=knitcitations&#39;&gt;Boettiger, 2017&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=sessioninfo&#34;&gt;sessioninfo&lt;/a&gt;&lt;/em&gt; &lt;a id=&#39;cite-Csardi_2018&#39;&gt;&lt;/a&gt;(&lt;a href=&#39;https://github.com/r-lib/sessioninfo#readme&#39;&gt;Csárdi, core, Wickham, Chang, et al., 2018&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div id=&#34;references&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;References&lt;/h3&gt;
&lt;p&gt;
&lt;a id=&#39;bib-Boettiger_2017&#39;&gt;&lt;/a&gt;&lt;a href=&#34;#cite-Boettiger_2017&#34;&gt;[1]&lt;/a&gt;&lt;cite&gt; C. Boettiger. &lt;em&gt;knitcitations: Citations for ‘Knitr’ Markdown Files&lt;/em&gt;. R package version 1.0.8. 2017. URL: &lt;a href=&#34;https://CRAN.R-project.org/package=knitcitations&#34;&gt;https://CRAN.R-project.org/package=knitcitations&lt;/a&gt;.&lt;/cite&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;a id=&#39;bib-Csardi_2018&#39;&gt;&lt;/a&gt;&lt;a href=&#34;#cite-Csardi_2018&#34;&gt;[2]&lt;/a&gt;&lt;cite&gt; G. Csárdi, R. core, H. Wickham, W. Chang, et al. &lt;em&gt;sessioninfo: R Session Information&lt;/em&gt;. R package version 1.1.0.9000. 2018. URL: &lt;a href=&#34;https://github.com/r-lib/sessioninfo#readme&#34;&gt;https://github.com/r-lib/sessioninfo#readme&lt;/a&gt;.&lt;/cite&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;a id=&#39;bib-Xie_2017&#39;&gt;&lt;/a&gt;&lt;a href=&#34;#cite-Xie_2017&#34;&gt;[3]&lt;/a&gt;&lt;cite&gt; Y. Xie, A. P. Hill and A. Thomas. &lt;em&gt;blogdown: Creating Websites with R Markdown&lt;/em&gt;. ISBN 978-0815363729. Boca Raton, Florida: Chapman and Hall/CRC, 2017. URL: &lt;a href=&#34;https://github.com/rstudio/blogdown&#34;&gt;https://github.com/rstudio/blogdown&lt;/a&gt;.&lt;/cite&gt;
&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;reproducibility&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Reproducibility&lt;/h3&gt;
&lt;pre&gt;&lt;code&gt;## ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
##  setting  value                                      
##  version  R version 3.5.1 Patched (2018-10-14 r75439)
##  os       macOS High Sierra 10.13.6                  
##  system   x86_64, darwin15.6.0                       
##  ui       X11                                        
##  language (EN)                                       
##  collate  en_US.UTF-8                                
##  ctype    en_US.UTF-8                                
##  tz       America/New_York                           
##  date     2018-10-24                                 
## 
## ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
##  package       * version    date       lib source                            
##  assertthat      0.2.0      2017-04-11 [1] CRAN (R 3.5.0)                    
##  backports       1.1.2      2017-12-13 [1] CRAN (R 3.5.0)                    
##  bibtex          0.4.2      2017-06-30 [1] CRAN (R 3.5.0)                    
##  BiocStyle     * 2.8.2      2018-05-30 [1] Bioconductor                      
##  blogdown        0.8        2018-07-15 [1] CRAN (R 3.5.0)                    
##  bookdown        0.7        2018-02-18 [1] CRAN (R 3.5.0)                    
##  cli             1.0.1      2018-09-25 [1] CRAN (R 3.5.0)                    
##  colorout      * 1.2-0      2018-05-03 [1] Github (jalvesaq/colorout@c42088d)
##  crayon          1.3.4      2017-09-16 [1] CRAN (R 3.5.0)                    
##  digest          0.6.18     2018-10-10 [1] CRAN (R 3.5.0)                    
##  evaluate        0.12       2018-10-09 [1] CRAN (R 3.5.0)                    
##  htmltools       0.3.6      2017-04-28 [1] CRAN (R 3.5.0)                    
##  httr            1.3.1      2017-08-20 [1] CRAN (R 3.5.0)                    
##  jsonlite        1.5        2017-06-01 [1] CRAN (R 3.5.0)                    
##  knitcitations * 1.0.8      2017-07-04 [1] CRAN (R 3.5.0)                    
##  knitr           1.20       2018-02-20 [1] CRAN (R 3.5.0)                    
##  lubridate       1.7.4      2018-04-11 [1] CRAN (R 3.5.0)                    
##  magrittr        1.5        2014-11-22 [1] CRAN (R 3.5.0)                    
##  plyr            1.8.4      2016-06-08 [1] CRAN (R 3.5.0)                    
##  R6              2.3.0      2018-10-04 [1] CRAN (R 3.5.0)                    
##  Rcpp            0.12.19    2018-10-01 [1] CRAN (R 3.5.1)                    
##  RefManageR      1.2.0      2018-04-25 [1] CRAN (R 3.5.0)                    
##  rmarkdown       1.10       2018-06-11 [1] CRAN (R 3.5.0)                    
##  rprojroot       1.3-2      2018-01-03 [1] CRAN (R 3.5.0)                    
##  sessioninfo   * 1.1.0.9000 2018-10-02 [1] Github (r-lib/sessioninfo@4f91fad)
##  stringi         1.2.4      2018-07-20 [1] CRAN (R 3.5.0)                    
##  stringr         1.3.1      2018-05-10 [1] CRAN (R 3.5.0)                    
##  withr           2.1.2      2018-03-15 [1] CRAN (R 3.5.0)                    
##  xfun            0.3        2018-07-06 [1] CRAN (R 3.5.0)                    
##  xml2            1.2.0      2018-01-24 [1] CRAN (R 3.5.0)                    
##  yaml            2.2.0      2018-07-25 [1] CRAN (R 3.5.0)                    
## 
## [1] /Library/Frameworks/R.framework/Versions/3.5/Resources/library&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div class=&#34;footnotes&#34;&gt;
&lt;hr /&gt;
&lt;ol&gt;
&lt;li id=&#34;fn1&#34;&gt;&lt;p&gt;This conference covered a large spectrum of data science topics, hence the picture for the post!&lt;a href=&#34;#fnref1&#34;&gt;↩&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Hacking our way through UpSetR</title>
      <link>http://LieberInstitute.github.io/rstatsclub/2018/07/27/hacking-our-way-through-upsetr/</link>
      <pubDate>Fri, 27 Jul 2018 00:00:00 +0000</pubDate>
      <guid>http://LieberInstitute.github.io/rstatsclub/2018/07/27/hacking-our-way-through-upsetr/</guid>
      <description>


&lt;p&gt;For our club meeting today we were going to summarize the &lt;a href=&#34;https://www.thisismetis.com/demystifying-data-science&#34;&gt;Demystifying Data Science&lt;/a&gt; conference but we forgot that the videos are not released yet.&lt;/p&gt;
&lt;blockquote class=&#34;twitter-tweet&#34;&gt;&lt;p lang=&#34;en&#34; dir=&#34;ltr&#34;&gt;Oops, we&amp;#39;ll have to postpone our blog post. We didn&amp;#39;t read the fine print that talk recordings will be available sometime next week. Sorry about that!&lt;/p&gt;&amp;mdash; LIBD rstats club (@LIBDrstats) &lt;a href=&#34;https://twitter.com/LIBDrstats/status/1022862435869450240?ref_src=twsrc%5Etfw&#34;&gt;July 27, 2018&lt;/a&gt;&lt;/blockquote&gt;
&lt;script async src=&#34;https://platform.twitter.com/widgets.js&#34; charset=&#34;utf-8&#34;&gt;&lt;/script&gt;


&lt;p&gt;So we adjusted plans and decided to continue our work on the &lt;em&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=UpSetR&#34;&gt;UpSetR&lt;/a&gt;&lt;/em&gt; &lt;a id=&#39;cite-Gehlenborg_2016&#39;&gt;&lt;/a&gt;(&lt;a href=&#39;http://github.com/hms-dbmi/UpSetR&#39;&gt;Gehlenborg, 2016&lt;/a&gt;) package by &lt;a href=&#34;https://twitter.com/ngehlenborg&#34;&gt;Nils Gehlenborg&lt;/a&gt;.&lt;/p&gt;
&lt;blockquote class=&#34;twitter-tweet&#34;&gt;&lt;p lang=&#34;en&#34; dir=&#34;ltr&#34;&gt;Yesterday we discussed various options for visualizing large numbers of overlapping groups. We explored uses of the &lt;a href=&#34;https://twitter.com/hashtag/venneuler?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#venneuler&lt;/a&gt; package from Lee Wilkinson and Simon Urbanek and the &lt;a href=&#34;https://twitter.com/hashtag/UpSetR?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#UpSetR&lt;/a&gt; package from Jake Conway, Alexander Lex, and &lt;a href=&#34;https://twitter.com/ngehlenborg?ref_src=twsrc%5Etfw&#34;&gt;@ngehlenborg&lt;/a&gt;. &lt;a href=&#34;https://twitter.com/hashtag/rstats?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#rstats&lt;/a&gt; &lt;a href=&#34;https://t.co/k55YfihmiP&#34;&gt;pic.twitter.com/k55YfihmiP&lt;/a&gt;&lt;/p&gt;&amp;mdash; LIBD rstats club (@LIBDrstats) &lt;a href=&#34;https://twitter.com/LIBDrstats/status/1007992479159812101?ref_src=twsrc%5Etfw&#34;&gt;June 16, 2018&lt;/a&gt;&lt;/blockquote&gt;
&lt;script async src=&#34;https://platform.twitter.com/widgets.js&#34; charset=&#34;utf-8&#34;&gt;&lt;/script&gt;


&lt;div id=&#34;what-you-can-currently-do&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;What you can currently do&lt;/h3&gt;
&lt;p&gt;First, let’s install the version we used for this post:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;devtools::install_github(&amp;#39;hms-dbmi/UpSetR@fe2812c&amp;#39;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Our ultimate goal is to submit a pull request that enables &lt;code&gt;UpSetR&lt;/code&gt; users to specify a color by row for the dots instead of the actual rows. We had already identified an example that we could work with.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(&amp;#39;UpSetR&amp;#39;)
movies &amp;lt;- read.csv( system.file(&amp;quot;extdata&amp;quot;, &amp;quot;movies.csv&amp;quot;, package = &amp;quot;UpSetR&amp;quot;), 
                    header=T, sep=&amp;quot;;&amp;quot; )

require(ggplot2); require(plyr); require(gridExtra); require(grid);&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Loading required package: ggplot2&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Loading required package: plyr&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Loading required package: gridExtra&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Loading required package: grid&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;upset(movies, 
      sets = c(&amp;quot;Action&amp;quot;, &amp;quot;Comedy&amp;quot;, &amp;quot;Drama&amp;quot;), 
      order.by=&amp;quot;degree&amp;quot;, matrix.color=&amp;quot;blue&amp;quot;, point.size=5,
      sets.bar.color=c(&amp;quot;maroon&amp;quot;,&amp;quot;blue&amp;quot;,&amp;quot;orange&amp;quot;))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2018-07-27-hacking-our-way-through-upsetr_files/figure-html/unnamed-chunk-2-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;We also explored the &lt;a href=&#34;https://cran.rstudio.com/web/packages/UpSetR/vignettes/set.metadata.plots.html&#34;&gt;set metadata vignette&lt;/a&gt; that includes examples such as the following one.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;set.seed(20180727)

## Create the metadata object first
sets &amp;lt;- names(movies[3:19])
avgRottenTomatoesScore &amp;lt;- round(runif(17, min = 0, max = 90))
metadata &amp;lt;- as.data.frame(cbind(sets, avgRottenTomatoesScore))
names(metadata) &amp;lt;- c(&amp;quot;sets&amp;quot;, &amp;quot;avgRottenTomatoesScore&amp;quot;)
metadata$avgRottenTomatoesScore &amp;lt;- as.numeric(as.character(metadata$avgRottenTomatoesScore))
Cities &amp;lt;- sample(c(&amp;quot;Boston&amp;quot;, &amp;quot;NYC&amp;quot;, &amp;quot;LA&amp;quot;), 17, replace = T)
metadata &amp;lt;- cbind(metadata, Cities)
metadata$Cities &amp;lt;- as.character(metadata$Cities)
metadata[which(metadata$sets %in% c(&amp;quot;Drama&amp;quot;, &amp;quot;Comedy&amp;quot;, &amp;quot;Action&amp;quot;, &amp;quot;Thriller&amp;quot;, 
    &amp;quot;Romance&amp;quot;)), ]&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##        sets avgRottenTomatoesScore Cities
## 1    Action                     68 Boston
## 4    Comedy                     40    NYC
## 7     Drama                     48     LA
## 13  Romance                     77 Boston
## 15 Thriller                     19    NYC&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;accepted &amp;lt;- round(runif(17, min = 0, max = 1))
metadata &amp;lt;- cbind(metadata, accepted)
metadata[which(metadata$sets %in% c(&amp;quot;Drama&amp;quot;, &amp;quot;Comedy&amp;quot;, &amp;quot;Action&amp;quot;, &amp;quot;Thriller&amp;quot;, 
    &amp;quot;Romance&amp;quot;)), ]&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##        sets avgRottenTomatoesScore Cities accepted
## 1    Action                     68 Boston        0
## 4    Comedy                     40    NYC        1
## 7     Drama                     48     LA        0
## 13  Romance                     77 Boston        1
## 15 Thriller                     19    NYC        0&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;## Now make the plot
upset(movies, set.metadata = list(data = metadata, plots = list(list(type = &amp;quot;hist&amp;quot;, 
    column = &amp;quot;avgRottenTomatoesScore&amp;quot;, assign = 20), list(type = &amp;quot;matrix_rows&amp;quot;, 
    column = &amp;quot;Cities&amp;quot;, colors = c(Boston = &amp;quot;green&amp;quot;, NYC = &amp;quot;navy&amp;quot;, LA = &amp;quot;purple&amp;quot;), 
    alpha = 0.5))))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2018-07-27-hacking-our-way-through-upsetr_files/figure-html/unnamed-chunk-3-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;hacking-our-way&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Hacking our way&lt;/h3&gt;
&lt;p&gt;Using the &lt;code&gt;metadata&lt;/code&gt; looked complicated to us and hopefully not necessary for what we are trying to accomplish. That is, we really wanted to change the colors of the circles in each row, not the rows themselves. So we found the GitHub repo with &lt;a href=&#34;https://github.com/hms-dbmi/UpSetR&#34;&gt;the code&lt;/a&gt;, plugged a laptop to a TV and started exploring as a group. We went the rabbit hole to see how the &lt;code&gt;matrix.color&lt;/code&gt; argument got used. To actually hack our way through, we downloaded the latest version of the code using &lt;code&gt;git&lt;/code&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;git clone git@github.com:hms-dbmi/UpSetR.git
cd UpSetR&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Next, we created the objects that match the default arguments of &lt;code&gt;upset()&lt;/code&gt; by finding and replacing commas by semi-colons. Well, not all of the commas. Also, for inputs that specified a vector (mostly 2 options), we chose the first one to match the default R behavior. This way we could execute them and have them in our session.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;## Default upset() arguments
nsets = 5; nintersects = 40; sets = NULL; keep.order = F; set.metadata = NULL; intersections = NULL;
matrix.color = &amp;quot;gray23&amp;quot;; main.bar.color = &amp;quot;gray23&amp;quot;; mainbar.y.label = &amp;quot;Intersection Size&amp;quot;; mainbar.y.max = NULL;
sets.bar.color = &amp;quot;gray23&amp;quot;; sets.x.label = &amp;quot;Set Size&amp;quot;; point.size = 2.2; line.size = 0.7;
mb.ratio = c(0.70,0.30); expression = NULL; att.pos = NULL; att.color = main.bar.color; order.by = &amp;#39;freq&amp;#39;;
decreasing = T; show.numbers = &amp;quot;yes&amp;quot;; number.angles = 0; group.by = &amp;quot;degree&amp;quot;;cutoff = NULL;
queries = NULL; query.legend = &amp;quot;none&amp;quot;; shade.color = &amp;quot;gray88&amp;quot;; shade.alpha = 0.25; matrix.dot.alpha =0.5;
empty.intersections = NULL; color.pal = 1; boxplot.summary = NULL; attribute.plots = NULL; scale.intersections = &amp;quot;identity&amp;quot;;
scale.sets = &amp;quot;identity&amp;quot;; text.scale = 1; set_size.angles = 0 ; set_size.show = FALSE &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Next, we did the same (commas to semicolons) for the inputs of the first example.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;## Initial inputs on the first example
movies &amp;lt;- read.csv( system.file(&amp;quot;extdata&amp;quot;, &amp;quot;movies.csv&amp;quot;, package = &amp;quot;UpSetR&amp;quot;), 
                    header=T, sep=&amp;quot;;&amp;quot; )

## comma -&amp;gt; semicolon
data = movies; sets = c(&amp;quot;Action&amp;quot;, &amp;quot;Comedy&amp;quot;, &amp;quot;Drama&amp;quot;); 
      order.by=&amp;quot;degree&amp;quot;; matrix.color=&amp;quot;blue&amp;quot;; point.size=5;
      sets.bar.color=c(&amp;quot;maroon&amp;quot;,&amp;quot;blue&amp;quot;,&amp;quot;orange&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now we were ready to start modifying some of the internal &lt;em&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=UpSetR&#34;&gt;UpSetR&lt;/a&gt;&lt;/em&gt; (&lt;a href=&#39;http://github.com/hms-dbmi/UpSetR&#39;&gt;Gehlenborg, 2016&lt;/a&gt;) code.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;hacking-internals&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Hacking internals&lt;/h3&gt;
&lt;p&gt;The function &lt;code&gt;upset()&lt;/code&gt; is pretty long and uses many un-exported functions from the package itself. In order to test thing quickly we added &lt;code&gt;UpSetR:::&lt;/code&gt; calls before the un-exported functions. Here’s our modified version where we added a piece of code to modify the &lt;code&gt;Matrix_layout&lt;/code&gt; object and add some colors.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;## Piece of code we introduced
for(i in 1:3) {
      j &amp;lt;- which(Matrix_layout$y == i &amp;amp; Matrix_layout$value == 1)
      if(length(j) &amp;gt; 0) Matrix_layout$color[j] &amp;lt;- c(&amp;quot;maroon&amp;quot;,&amp;quot;blue&amp;quot;,&amp;quot;orange&amp;quot;)[i]
  }&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Ok, here’s the full modified &lt;code&gt;upset()&lt;/code&gt; function.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;## Modified internal upset() code

startend &amp;lt;- UpSetR:::FindStartEnd(data)
  first.col &amp;lt;- startend[1]
  last.col &amp;lt;- startend[2]

  if(color.pal == 1){
    palette &amp;lt;- c(&amp;quot;#1F77B4&amp;quot;, &amp;quot;#FF7F0E&amp;quot;, &amp;quot;#2CA02C&amp;quot;, &amp;quot;#D62728&amp;quot;, &amp;quot;#9467BD&amp;quot;, &amp;quot;#8C564B&amp;quot;, &amp;quot;#E377C2&amp;quot;,
                 &amp;quot;#7F7F7F&amp;quot;, &amp;quot;#BCBD22&amp;quot;, &amp;quot;#17BECF&amp;quot;)
  } else{
    palette &amp;lt;- c(&amp;quot;#E69F00&amp;quot;, &amp;quot;#56B4E9&amp;quot;, &amp;quot;#009E73&amp;quot;, &amp;quot;#F0E442&amp;quot;, &amp;quot;#0072B2&amp;quot;, &amp;quot;#D55E00&amp;quot;,
                 &amp;quot;#CC79A7&amp;quot;)
  }

  if(is.null(intersections) == F){
    Set_names &amp;lt;- unique((unlist(intersections)))
    Sets_to_remove &amp;lt;- UpSetR:::Remove(data, first.col, last.col, Set_names)
    New_data &amp;lt;- UpSetR:::Wanted(data, Sets_to_remove)
    Num_of_set &amp;lt;- UpSetR:::Number_of_sets(Set_names)
    if(keep.order == F){
      Set_names &amp;lt;- UpSetR:::order_sets(New_data, Set_names)
    }
    All_Freqs &amp;lt;- UpSetR:::specific_intersections(data, first.col, last.col, intersections, order.by, group.by, decreasing,
                                        cutoff, main.bar.color, Set_names)
  } else if(is.null(intersections) == T){
    Set_names &amp;lt;- sets
    if(is.null(Set_names) == T || length(Set_names) == 0 ){
      Set_names &amp;lt;- UpSetR:::FindMostFreq(data, first.col, last.col, nsets)
    }
    Sets_to_remove &amp;lt;- UpSetR:::Remove(data, first.col, last.col, Set_names)
    New_data &amp;lt;- UpSetR:::Wanted(data, Sets_to_remove)
    Num_of_set &amp;lt;- UpSetR:::Number_of_sets(Set_names)
    if(keep.order == F){
    Set_names &amp;lt;- UpSetR:::order_sets(New_data, Set_names)
    }
    All_Freqs &amp;lt;- UpSetR:::Counter(New_data, Num_of_set, first.col, Set_names, nintersects, main.bar.color,
                         order.by, group.by, cutoff, empty.intersections, decreasing)
  }
  Matrix_setup &amp;lt;- UpSetR:::Create_matrix(All_Freqs)
  labels &amp;lt;- UpSetR:::Make_labels(Matrix_setup)
  #Chose NA to represent NULL case as result of NA being inserted when at least one contained both x and y
  #i.e. if one custom plot had both x and y, and others had only x, the y&amp;#39;s for the other plots were NA
  #if I decided to make the NULL case (all x and no y, or vice versa), there would have been alot more if/else statements
  #NA can be indexed so that we still get the non NA y aesthetics on correct plot. NULL cant be indexed.
  att.x &amp;lt;- c(); att.y &amp;lt;- c();
  if(is.null(attribute.plots) == F){
    for(i in seq_along(attribute.plots$plots)){
      if(length(attribute.plots$plots[[i]]$x) != 0){
        att.x[i] &amp;lt;- attribute.plots$plots[[i]]$x
      }
      else if(length(attribute.plots$plots[[i]]$x) == 0){
        att.x[i] &amp;lt;- NA
      }
      if(length(attribute.plots$plots[[i]]$y) != 0){
        att.y[i] &amp;lt;- attribute.plots$plots[[i]]$y
      }
      else if(length(attribute.plots$plots[[i]]$y) == 0){
        att.y[i] &amp;lt;- NA
      }
    }
  }

  BoxPlots &amp;lt;- NULL
  if(is.null(boxplot.summary) == F){
    BoxData &amp;lt;- UpSetR:::IntersectionBoxPlot(All_Freqs, New_data, first.col, Set_names)
    BoxPlots &amp;lt;- list()
    for(i in seq_along(boxplot.summary)){
      BoxPlots[[i]] &amp;lt;- UpSetR:::BoxPlotsPlot(BoxData, boxplot.summary[i], att.color)
    }
  }

  customAttDat &amp;lt;- NULL
  customQBar &amp;lt;- NULL
  Intersection &amp;lt;- NULL
  Element &amp;lt;- NULL
  legend &amp;lt;- NULL
  EBar_data &amp;lt;- NULL
  if(is.null(queries) == F){
    custom.queries &amp;lt;- UpSetR:::SeperateQueries(queries, 2, palette)
    customDat &amp;lt;- UpSetR:::customQueries(New_data, custom.queries, Set_names)
    legend &amp;lt;- UpSetR:::GuideGenerator(queries, palette)
    legend &amp;lt;- UpSetR:::Make_legend(legend)
    if(is.null(att.x) == F &amp;amp;&amp;amp; is.null(customDat) == F){
      customAttDat &amp;lt;- UpSetR:::CustomAttData(customDat, Set_names)
    }
    customQBar &amp;lt;- UpSetR:::customQueriesBar(customDat, Set_names, All_Freqs, custom.queries)
  }
  if(is.null(queries) == F){
    Intersection &amp;lt;- UpSetR:::SeperateQueries(queries, 1, palette)
    Matrix_col &amp;lt;- UpSetR:::intersects(QuerieInterData, Intersection, New_data, first.col, Num_of_set,
                             All_Freqs, expression, Set_names, palette)
    Element &amp;lt;- UpSetR:::SeperateQueries(queries, 1, palette)
    EBar_data &amp;lt;-UpSetR:::ElemBarDat(Element, New_data, first.col, expression, Set_names,palette, All_Freqs)
  } else{
    Matrix_col &amp;lt;- NULL
  }
  
  Matrix_layout &amp;lt;- UpSetR:::Create_layout(Matrix_setup, matrix.color, Matrix_col, matrix.dot.alpha)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As a little pause in &lt;code&gt;upset()&lt;/code&gt;, let’s check what actually &lt;code&gt;Matrix_layout&lt;/code&gt; looks.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;Matrix_layout&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##    y x value  color alpha Intersection
## 1  1 1     1   blue   1.0         1yes
## 2  2 1     1   blue   1.0         1yes
## 3  3 1     1   blue   1.0         1yes
## 4  1 2     0 gray83   0.5          4No
## 5  2 2     1   blue   1.0         2yes
## 6  3 2     1   blue   1.0         2yes
## 7  1 3     1   blue   1.0         3yes
## 8  2 3     0 gray83   0.5          8No
## 9  3 3     1   blue   1.0         3yes
## 10 1 4     1   blue   1.0         4yes
## 11 2 4     1   blue   1.0         4yes
## 12 3 4     0 gray83   0.5         12No
## 13 1 5     0 gray83   0.5         13No
## 14 2 5     0 gray83   0.5         14No
## 15 3 5     1   blue   1.0         5yes
## 16 1 6     0 gray83   0.5         16No
## 17 2 6     1   blue   1.0         6yes
## 18 3 6     0 gray83   0.5         18No
## 19 1 7     1   blue   1.0         7yes
## 20 2 7     0 gray83   0.5         20No
## 21 3 7     0 gray83   0.5         21No&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We figured out that we had to change the colors only the rows with &lt;code&gt;value = 1&lt;/code&gt; and that &lt;code&gt;y&lt;/code&gt; was the row grouping variable.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;## our modification
  for(i in 1:3) {
      j &amp;lt;- which(Matrix_layout$y == i &amp;amp; Matrix_layout$value == 1)
      if(length(j) &amp;gt; 0) Matrix_layout$color[j] &amp;lt;- c(&amp;quot;maroon&amp;quot;,&amp;quot;blue&amp;quot;,&amp;quot;orange&amp;quot;)[i]
  }&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Here’s our modified &lt;code&gt;Matrix_layout&lt;/code&gt;:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;Matrix_layout&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##    y x value  color alpha Intersection
## 1  1 1     1 maroon   1.0         1yes
## 2  2 1     1   blue   1.0         1yes
## 3  3 1     1 orange   1.0         1yes
## 4  1 2     0 gray83   0.5          4No
## 5  2 2     1   blue   1.0         2yes
## 6  3 2     1 orange   1.0         2yes
## 7  1 3     1 maroon   1.0         3yes
## 8  2 3     0 gray83   0.5          8No
## 9  3 3     1 orange   1.0         3yes
## 10 1 4     1 maroon   1.0         4yes
## 11 2 4     1   blue   1.0         4yes
## 12 3 4     0 gray83   0.5         12No
## 13 1 5     0 gray83   0.5         13No
## 14 2 5     0 gray83   0.5         14No
## 15 3 5     1 orange   1.0         5yes
## 16 1 6     0 gray83   0.5         16No
## 17 2 6     1   blue   1.0         6yes
## 18 3 6     0 gray83   0.5         18No
## 19 1 7     1 maroon   1.0         7yes
## 20 2 7     0 gray83   0.5         20No
## 21 3 7     0 gray83   0.5         21No&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Ok, let’s continue with the rest of &lt;code&gt;upset()&lt;/code&gt;.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;## continuing with upset()
  
  Set_sizes &amp;lt;- UpSetR:::FindSetFreqs(New_data, first.col, Num_of_set, Set_names, keep.order)
  Bar_Q &amp;lt;- NULL
  if(is.null(queries) == F){
    Bar_Q &amp;lt;- UpSetR:::intersects(QuerieInterBar, Intersection, New_data, first.col, Num_of_set, All_Freqs, expression, Set_names, palette)
  }
  QInter_att_data &amp;lt;- NULL
  QElem_att_data &amp;lt;- NULL
  if((is.null(queries) == F) &amp;amp; (is.null(att.x) == F)){
    QInter_att_data &amp;lt;- UpSetR:::intersects(QuerieInterAtt, Intersection, New_data, first.col, Num_of_set, att.x, att.y,
                                  expression, Set_names, palette)
    QElem_att_data &amp;lt;- UpSetR:::elements(QuerieElemAtt, Element, New_data, first.col, expression, Set_names, att.x, att.y,
                               palette)
  }
  AllQueryData &amp;lt;- UpSetR:::combineQueriesData(QInter_att_data, QElem_att_data, customAttDat, att.x, att.y)

  ShadingData &amp;lt;- NULL

  if(is.null(set.metadata) == F){
    ShadingData &amp;lt;- get_shade_groups(set.metadata, Set_names, Matrix_layout, shade.alpha)
    output &amp;lt;- Make_set_metadata_plot(set.metadata, Set_names)
    set.metadata.plots &amp;lt;- output[[1]]
    set.metadata &amp;lt;- output[[2]]

    if(is.null(ShadingData) == FALSE){
    shade.alpha &amp;lt;- unique(ShadingData$alpha)
    }
  } else {
    set.metadata.plots &amp;lt;- NULL
  }
  if(is.null(ShadingData) == TRUE){
  ShadingData &amp;lt;- UpSetR:::MakeShading(Matrix_layout, shade.color)
  }
  Main_bar &amp;lt;- suppressMessages(UpSetR:::Make_main_bar(All_Freqs, Bar_Q, show.numbers, mb.ratio, customQBar, number.angles, EBar_data, mainbar.y.label,
                            mainbar.y.max, scale.intersections, text.scale, attribute.plots))
  Matrix &amp;lt;- UpSetR:::Make_matrix_plot(Matrix_layout, Set_sizes, All_Freqs, point.size, line.size,
                             text.scale, labels, ShadingData, shade.alpha)
  Sizes &amp;lt;- UpSetR:::Make_size_plot(Set_sizes, sets.bar.color, mb.ratio, sets.x.label, scale.sets, text.scale, set_size.angles,set_size.show)

  # Make_base_plot(Main_bar, Matrix, Sizes, labels, mb.ratio, att.x, att.y, New_data,
  #                expression, att.pos, first.col, att.color, AllQueryData, attribute.plots,
  #                legend, query.legend, BoxPlots, Set_names, set.metadata, set.metadata.plots)

  structure(class = &amp;quot;upset&amp;quot;,
    .Data=list(
      Main_bar = Main_bar,
      Matrix = Matrix,
      Sizes = Sizes,
      labels = labels,
      mb.ratio = mb.ratio,
      att.x = att.x,
      att.y = att.y,
      New_data = New_data,
      expression = expression,
      att.pos = att.pos,
      first.col = first.col,
      att.color = att.color,
      AllQueryData = AllQueryData,
      attribute.plots = attribute.plots,
      legend = legend,
      query.legend = query.legend,
      BoxPlots = BoxPlots,
      Set_names = Set_names,
      set.metadata = set.metadata,
      set.metadata.plots = set.metadata.plots)
  )&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2018-07-27-hacking-our-way-through-upsetr_files/figure-html/unnamed-chunk-11-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;line-colors&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Line colors&lt;/h3&gt;
&lt;p&gt;Ok, that’s great but we have a problem with the lines. The color is no longer black, so we went deeper into the rabbit hole and found that the internal &lt;code&gt;Make_matrix_plot()&lt;/code&gt; function is where the lines are made. We made some edits but got a plot where the lines were on top of the circles as shown in this screenshot.&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2018-07-27-hacking-our-way-through-upsetr_files/Screen%20Shot%202018-07-27%20at%2012.17.58%20PM.png&#34; width=&#34;500&#34; /&gt;

&lt;/div&gt;
&lt;p&gt;Our club session was out of time, so we decided to continue our project another day and ask for help on twitter. And yay, we got help super fast!&lt;/p&gt;
&lt;blockquote class=&#34;twitter-tweet&#34;&gt;&lt;p lang=&#34;en&#34; dir=&#34;ltr&#34;&gt;Thank you and &lt;a href=&#34;https://twitter.com/thatdnaguy?ref_src=twsrc%5Etfw&#34;&gt;@thatdnaguy&lt;/a&gt;, that did it! &lt;a href=&#34;https://t.co/tzQvhKFXgR&#34;&gt;pic.twitter.com/tzQvhKFXgR&lt;/a&gt;&lt;/p&gt;&amp;mdash; LIBD rstats club (@LIBDrstats) &lt;a href=&#34;https://twitter.com/LIBDrstats/status/1022903971416088577?ref_src=twsrc%5Etfw&#34;&gt;July 27, 2018&lt;/a&gt;&lt;/blockquote&gt;
&lt;script async src=&#34;https://platform.twitter.com/widgets.js&#34; charset=&#34;utf-8&#34;&gt;&lt;/script&gt;


&lt;p&gt;So here’s our modified version of &lt;code&gt;Make_matrix_plot()&lt;/code&gt; that keeps the lines black.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;Make_matrix_plot &amp;lt;- function(Mat_data,Set_size_data, Main_bar_data, point_size, line_size, text_scale, labels,
                             shading_data, shade_alpha){

  if(length(text_scale) == 1){
    name_size_scale &amp;lt;- text_scale
  }
  if(length(text_scale) &amp;gt; 1 &amp;amp;&amp;amp; length(text_scale) &amp;lt;= 6){
    name_size_scale &amp;lt;- text_scale[5]
  }
  
  Mat_data$line_col &amp;lt;- &amp;#39;black&amp;#39;

  Matrix_plot &amp;lt;- (ggplot()
                  + theme(panel.background = element_rect(fill = &amp;quot;white&amp;quot;),
                          plot.margin=unit(c(-0.2,0.5,0.5,0.5), &amp;quot;lines&amp;quot;),
                          axis.text.x = element_blank(),
                          axis.ticks.x = element_blank(),
                          axis.ticks.y = element_blank(),
                          axis.text.y = element_text(colour = &amp;quot;gray0&amp;quot;,
                                                     size = 7*name_size_scale, hjust = 0.4),
                          panel.grid.major = element_blank(),
                          panel.grid.minor = element_blank())
                  + xlab(NULL) + ylab(&amp;quot;   &amp;quot;)
                  + scale_y_continuous(breaks = c(1:nrow(Set_size_data)),
                                       limits = c(0.5,(nrow(Set_size_data) +0.5)),
                                       labels = labels, expand = c(0,0))
                  + scale_x_continuous(limits = c(0,(nrow(Main_bar_data)+1 )), expand = c(0,0))
                  + geom_rect(data = shading_data, aes_string(xmin = &amp;quot;min&amp;quot;, xmax = &amp;quot;max&amp;quot;,
                                                              ymin = &amp;quot;y_min&amp;quot;, ymax = &amp;quot;y_max&amp;quot;),
                              fill = shading_data$shade_color, alpha = shade_alpha)
                  + geom_line(data= Mat_data, aes_string(group = &amp;quot;Intersection&amp;quot;, x=&amp;quot;x&amp;quot;, y=&amp;quot;y&amp;quot;,
                                                         colour = &amp;quot;line_col&amp;quot;), size = line_size)
                 + geom_point(data= Mat_data, aes_string(x= &amp;quot;x&amp;quot;, y= &amp;quot;y&amp;quot;), colour = Mat_data$color,
                     size= point_size, alpha = Mat_data$alpha, shape=16)
                  + scale_color_identity())
  Matrix_plot &amp;lt;- ggplot_gtable(ggplot_build(Matrix_plot))
  return(Matrix_plot)
}&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Using that modified version we can then run the code again (note that we are not using &lt;code&gt;UpSetR:::&lt;/code&gt; before &lt;code&gt;Make_matrix_plot&lt;/code&gt;) and get the plot we wanted.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;Matrix &amp;lt;- Make_matrix_plot(Matrix_layout, Set_sizes, All_Freqs, point.size, line.size,
                             text.scale, labels, ShadingData, shade.alpha)
  Sizes &amp;lt;- UpSetR:::Make_size_plot(Set_sizes, sets.bar.color, mb.ratio, sets.x.label, scale.sets, text.scale, set_size.angles,set_size.show)

  # Make_base_plot(Main_bar, Matrix, Sizes, labels, mb.ratio, att.x, att.y, New_data,
  #                expression, att.pos, first.col, att.color, AllQueryData, attribute.plots,
  #                legend, query.legend, BoxPlots, Set_names, set.metadata, set.metadata.plots)

  structure(class = &amp;quot;upset&amp;quot;,
    .Data=list(
      Main_bar = Main_bar,
      Matrix = Matrix,
      Sizes = Sizes,
      labels = labels,
      mb.ratio = mb.ratio,
      att.x = att.x,
      att.y = att.y,
      New_data = New_data,
      expression = expression,
      att.pos = att.pos,
      first.col = first.col,
      att.color = att.color,
      AllQueryData = AllQueryData,
      attribute.plots = attribute.plots,
      legend = legend,
      query.legend = query.legend,
      BoxPlots = BoxPlots,
      Set_names = Set_names,
      set.metadata = set.metadata,
      set.metadata.plots = set.metadata.plots)
  )&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2018-07-27-hacking-our-way-through-upsetr_files/figure-html/unnamed-chunk-13-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;We have quite a bit more to do in order to complete our pull request. We are also curious if you would have used a different approach to hack your way through &lt;em&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=UpSetR&#34;&gt;UpSetR&lt;/a&gt;&lt;/em&gt; (&lt;a href=&#39;http://github.com/hms-dbmi/UpSetR&#39;&gt;Gehlenborg, 2016&lt;/a&gt;). For example, maybe some functions from &lt;em&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=devtools&#34;&gt;devtools&lt;/a&gt;&lt;/em&gt; &lt;a id=&#39;cite-Wickham_2018&#39;&gt;&lt;/a&gt;(&lt;a href=&#39;https://CRAN.R-project.org/package=devtools&#39;&gt;Wickham, Hester, and Chang, 2018&lt;/a&gt;) would have enabled to do this equally fast without having to introduce &lt;code&gt;UpSetR:::&lt;/code&gt; calls.&lt;/p&gt;
&lt;p&gt;Thanks for reading!&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;acknowledgments&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Acknowledgments&lt;/h3&gt;
&lt;p&gt;This blog post was made possible thanks to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;&lt;a href=&#34;http://bioconductor.org/packages/BiocStyle&#34;&gt;BiocStyle&lt;/a&gt;&lt;/em&gt; &lt;a id=&#39;cite-Oles_2018&#39;&gt;&lt;/a&gt;(&lt;a href=&#39;https://github.com/Bioconductor/BiocStyle&#39;&gt;Oleś, Morgan, and Huber, 2018&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=blogdown&#34;&gt;blogdown&lt;/a&gt;&lt;/em&gt; &lt;a id=&#39;cite-Xie_2017&#39;&gt;&lt;/a&gt;(&lt;a href=&#39;https://github.com/rstudio/blogdown&#39;&gt;Xie, Hill, and Thomas, 2017&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=devtools&#34;&gt;devtools&lt;/a&gt;&lt;/em&gt; (&lt;a href=&#39;https://CRAN.R-project.org/package=devtools&#39;&gt;Wickham, Hester, and Chang, 2018&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=knitcitations&#34;&gt;knitcitations&lt;/a&gt;&lt;/em&gt; &lt;a id=&#39;cite-Boettiger_2017&#39;&gt;&lt;/a&gt;(&lt;a href=&#39;https://CRAN.R-project.org/package=knitcitations&#39;&gt;Boettiger, 2017&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div id=&#34;references&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;References&lt;/h3&gt;
&lt;p&gt;
&lt;a id=&#39;bib-Boettiger_2017&#39;&gt;&lt;/a&gt;&lt;a href=&#34;#cite-Boettiger_2017&#34;&gt;[1]&lt;/a&gt;&lt;cite&gt; C. Boettiger. &lt;em&gt;knitcitations: Citations for ‘Knitr’ Markdown Files&lt;/em&gt;. R package version 1.0.8. 2017. URL: &lt;a href=&#34;https://CRAN.R-project.org/package=knitcitations&#34;&gt;https://CRAN.R-project.org/package=knitcitations&lt;/a&gt;.&lt;/cite&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;a id=&#39;bib-Gehlenborg_2016&#39;&gt;&lt;/a&gt;&lt;a href=&#34;#cite-Gehlenborg_2016&#34;&gt;[2]&lt;/a&gt;&lt;cite&gt; N. Gehlenborg. &lt;em&gt;UpSetR: A More Scalable Alternative to Venn and Euler Diagrams for Visualizing Intersecting Sets&lt;/em&gt;. R package version 1.4.0. 2016. URL: &lt;a href=&#34;http://github.com/hms-dbmi/UpSetR&#34;&gt;http://github.com/hms-dbmi/UpSetR&lt;/a&gt;.&lt;/cite&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;a id=&#39;bib-Oles_2018&#39;&gt;&lt;/a&gt;&lt;a href=&#34;#cite-Oles_2018&#34;&gt;[3]&lt;/a&gt;&lt;cite&gt; A. Oleś, M. Morgan and W. Huber. &lt;em&gt;BiocStyle: Standard styles for vignettes and other Bioconductor documents&lt;/em&gt;. R package version 2.8.2. 2018. URL: &lt;a href=&#34;https://github.com/Bioconductor/BiocStyle&#34;&gt;https://github.com/Bioconductor/BiocStyle&lt;/a&gt;.&lt;/cite&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;a id=&#39;bib-Wickham_2018&#39;&gt;&lt;/a&gt;&lt;a href=&#34;#cite-Wickham_2018&#34;&gt;[4]&lt;/a&gt;&lt;cite&gt; H. Wickham, J. Hester and W. Chang. &lt;em&gt;devtools: Tools to Make Developing R Packages Easier&lt;/em&gt;. R package version 1.13.6. 2018. URL: &lt;a href=&#34;https://CRAN.R-project.org/package=devtools&#34;&gt;https://CRAN.R-project.org/package=devtools&lt;/a&gt;.&lt;/cite&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;a id=&#39;bib-Xie_2017&#39;&gt;&lt;/a&gt;&lt;a href=&#34;#cite-Xie_2017&#34;&gt;[5]&lt;/a&gt;&lt;cite&gt; Y. Xie, A. P. Hill and A. Thomas. &lt;em&gt;blogdown: Creating Websites with R Markdown&lt;/em&gt;. ISBN 978-0815363729. Boca Raton, Florida: Chapman and Hall/CRC, 2017. URL: &lt;a href=&#34;https://github.com/rstudio/blogdown&#34;&gt;https://github.com/rstudio/blogdown&lt;/a&gt;.&lt;/cite&gt;
&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;reproducibility&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Reproducibility&lt;/h3&gt;
&lt;pre&gt;&lt;code&gt;## Session info ----------------------------------------------------------------------------------------------------------&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##  setting  value                       
##  version  R version 3.5.1 (2018-07-02)
##  system   x86_64, darwin15.6.0        
##  ui       X11                         
##  language (EN)                        
##  collate  en_US.UTF-8                 
##  tz       America/New_York            
##  date     2018-07-27&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Packages --------------------------------------------------------------------------------------------------------------&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##  package       * version date       source                            
##  assertthat      0.2.0   2017-04-11 cran (@0.2.0)                     
##  backports       1.1.2   2017-12-13 cran (@1.1.2)                     
##  base          * 3.5.1   2018-07-05 local                             
##  bibtex          0.4.2   2017-06-30 CRAN (R 3.5.0)                    
##  bindr           0.1.1   2018-03-13 cran (@0.1.1)                     
##  bindrcpp        0.2.2   2018-03-29 cran (@0.2.2)                     
##  BiocStyle     * 2.8.2   2018-05-30 Bioconductor                      
##  blogdown        0.8     2018-07-15 CRAN (R 3.5.0)                    
##  bookdown        0.7     2018-02-18 CRAN (R 3.5.0)                    
##  colorout      * 1.2-0   2018-05-03 Github (jalvesaq/colorout@c42088d)
##  colorspace      1.3-2   2016-12-14 cran (@1.3-2)                     
##  compiler        3.5.1   2018-07-05 local                             
##  crayon          1.3.4   2017-09-16 cran (@1.3.4)                     
##  datasets      * 3.5.1   2018-07-05 local                             
##  devtools      * 1.13.6  2018-06-27 cran (@1.13.6)                    
##  digest          0.6.15  2018-01-28 CRAN (R 3.5.0)                    
##  dplyr           0.7.6   2018-06-29 CRAN (R 3.5.1)                    
##  evaluate        0.11    2018-07-17 CRAN (R 3.5.0)                    
##  ggplot2       * 3.0.0   2018-07-03 CRAN (R 3.5.0)                    
##  glue            1.3.0   2018-07-17 CRAN (R 3.5.0)                    
##  graphics      * 3.5.1   2018-07-05 local                             
##  grDevices     * 3.5.1   2018-07-05 local                             
##  grid          * 3.5.1   2018-07-05 local                             
##  gridExtra     * 2.3     2017-09-09 CRAN (R 3.5.0)                    
##  gtable          0.2.0   2016-02-26 CRAN (R 3.5.0)                    
##  htmltools       0.3.6   2017-04-28 cran (@0.3.6)                     
##  httr            1.3.1   2017-08-20 CRAN (R 3.5.0)                    
##  jsonlite        1.5     2017-06-01 CRAN (R 3.5.0)                    
##  knitcitations * 1.0.8   2017-07-04 CRAN (R 3.5.0)                    
##  knitr           1.20    2018-02-20 cran (@1.20)                      
##  labeling        0.3     2014-08-23 cran (@0.3)                       
##  lazyeval        0.2.1   2017-10-29 CRAN (R 3.5.0)                    
##  lubridate       1.7.4   2018-04-11 CRAN (R 3.5.0)                    
##  magrittr        1.5     2014-11-22 cran (@1.5)                       
##  memoise         1.1.0   2017-04-21 CRAN (R 3.5.0)                    
##  methods       * 3.5.1   2018-07-05 local                             
##  munsell         0.5.0   2018-06-12 CRAN (R 3.5.0)                    
##  pillar          1.3.0   2018-07-14 CRAN (R 3.5.0)                    
##  pkgconfig       2.0.1   2017-03-21 cran (@2.0.1)                     
##  plyr          * 1.8.4   2016-06-08 cran (@1.8.4)                     
##  purrr           0.2.5   2018-05-29 cran (@0.2.5)                     
##  R6              2.2.2   2017-06-17 CRAN (R 3.5.0)                    
##  Rcpp            0.12.18 2018-07-23 CRAN (R 3.5.1)                    
##  RefManageR      1.2.0   2018-04-25 CRAN (R 3.5.0)                    
##  rlang           0.2.1   2018-05-30 cran (@0.2.1)                     
##  rmarkdown       1.10    2018-06-11 CRAN (R 3.5.0)                    
##  rprojroot       1.3-2   2018-01-03 cran (@1.3-2)                     
##  scales          0.5.0   2017-08-24 cran (@0.5.0)                     
##  stats         * 3.5.1   2018-07-05 local                             
##  stringi         1.2.4   2018-07-20 CRAN (R 3.5.0)                    
##  stringr         1.3.1   2018-05-10 CRAN (R 3.5.0)                    
##  tibble          1.4.2   2018-01-22 cran (@1.4.2)                     
##  tidyselect      0.2.4   2018-02-26 cran (@0.2.4)                     
##  tools           3.5.1   2018-07-05 local                             
##  UpSetR        * 1.4.0   2018-07-27 Github (hms-dbmi/UpSetR@fe2812c)  
##  utils         * 3.5.1   2018-07-05 local                             
##  withr           2.1.2   2018-03-15 CRAN (R 3.5.0)                    
##  xfun            0.3     2018-07-06 CRAN (R 3.5.0)                    
##  xml2            1.2.0   2018-01-24 CRAN (R 3.5.0)                    
##  yaml            2.1.19  2018-05-01 CRAN (R 3.5.0)&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>LIBD rstats club remote useR!2018 notes</title>
      <link>http://LieberInstitute.github.io/rstatsclub/2018/07/13/libd-rstats-club-remote-user-2018-notes/</link>
      <pubDate>Fri, 13 Jul 2018 00:00:00 +0000</pubDate>
      <guid>http://LieberInstitute.github.io/rstatsclub/2018/07/13/libd-rstats-club-remote-user-2018-notes/</guid>
      <description>


&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2018-07-13-libd-rstats-club-remote-user-2018-notes_files/Screen%20Shot%202018-07-13%20at%2011.51.37%20AM.png&#34; width=&#34;600&#34; /&gt;

&lt;/div&gt;
&lt;p&gt;For our July 13th 2018 LIBD rstats club meeting we decided to check as much as we could the &lt;a href=&#34;https://user2018.r-project.org/&#34;&gt;useR!2018&lt;/a&gt; conference. Here’s what we were able to figure out about it in about an hour. Hopefully our quick notes will help other &lt;a href=&#34;https://twitter.com/search?q=%23rstats&#34;&gt;rstats&lt;/a&gt; enthusiasts, users and developers get a glimpse of the conference. Although there’s bound to me more videos and material about the conference coming out in the following days.&lt;/p&gt;
&lt;div id=&#34;main-links&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Main links:&lt;/h3&gt;
&lt;p&gt;First of all search all the Twitter history for tweets related to the conference by checking the &lt;a href=&#34;https://twitter.com/search?q=%23user2018&#34;&gt;user2018&lt;/a&gt; hashtag.&lt;/p&gt;
&lt;p&gt;Next, check the videos of the talks. There are more videos there than we can check right now but we hope to come back sometime later and check more talks.&lt;/p&gt;
&lt;blockquote class=&#34;twitter-tweet&#34;&gt;&lt;p lang=&#34;en&#34; dir=&#34;ltr&#34;&gt;All of the &lt;a href=&#34;https://twitter.com/hashtag/useR2018?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#useR2018&lt;/a&gt; presentations (unless specifically requested not), including tutorials are being recorded. These will be available at some point after the meeting, we think at this channel &lt;a href=&#34;https://t.co/lq6E2XnXP9&#34;&gt;https://t.co/lq6E2XnXP9&lt;/a&gt;&lt;br&gt;&lt;br&gt;Live streaming is a challenge, hope to attempt one keynote.&lt;/p&gt;&amp;mdash; useR!2018 (@useR2018_conf) &lt;a href=&#34;https://twitter.com/useR2018_conf/status/1016816371533938688?ref_src=twsrc%5Etfw&#34;&gt;July 10, 2018&lt;/a&gt;&lt;/blockquote&gt;
&lt;script async src=&#34;https://platform.twitter.com/widgets.js&#34; charset=&#34;utf-8&#34;&gt;&lt;/script&gt;


&lt;/div&gt;
&lt;div id=&#34;talks&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Talks&lt;/h3&gt;
&lt;p&gt;From checking Twitter, we can say that there lots of great talks and tutorials. Here are some of the main ones we found in this hour.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://twitter.com/rdpeng&#34;&gt;Roger Peng&lt;/a&gt; talking about &lt;em&gt;Teaching R to New Users&lt;/em&gt; got lots of attention. Here are some tweets about it:&lt;/p&gt;
&lt;p&gt;&lt;blockquote class=&#34;twitter-tweet&#34;&gt;&lt;p lang=&#34;en&#34; dir=&#34;ltr&#34;&gt;.&lt;a href=&#34;https://twitter.com/rdpeng?ref_src=twsrc%5Etfw&#34;&gt;@rdpeng&lt;/a&gt; doing a better job of describing the &lt;a href=&#34;https://twitter.com/hashtag/tidyverse?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#tidyverse&lt;/a&gt; design philosophy than I ever have! &lt;a href=&#34;https://t.co/o3KunXe6qq&#34;&gt;https://t.co/o3KunXe6qq&lt;/a&gt;&lt;/p&gt;&amp;mdash; Hadley Wickham (@hadleywickham) &lt;a href=&#34;https://twitter.com/hadleywickham/status/1017553911782076416?ref_src=twsrc%5Etfw&#34;&gt;July 12, 2018&lt;/a&gt;&lt;/blockquote&gt;
&lt;script async src=&#34;https://platform.twitter.com/widgets.js&#34; charset=&#34;utf-8&#34;&gt;&lt;/script&gt;

 &lt;blockquote class=&#34;twitter-tweet&#34;&gt;&lt;p lang=&#34;en&#34; dir=&#34;ltr&#34;&gt;Last day of &lt;a href=&#34;https://twitter.com/hashtag/useR2018?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#useR2018&lt;/a&gt; kicking off with &lt;a href=&#34;https://twitter.com/rdpeng?ref_src=twsrc%5Etfw&#34;&gt;@rdpeng&lt;/a&gt; &amp;quot;Teaching R to new users&amp;quot; &lt;a href=&#34;https://t.co/yIJfiU7s8I&#34;&gt;pic.twitter.com/yIJfiU7s8I&lt;/a&gt;&lt;/p&gt;&amp;mdash; Luke Zappia (@_lazappi_) &lt;a href=&#34;https://twitter.com/_lazappi_/status/1017559295355846657?ref_src=twsrc%5Etfw&#34;&gt;July 13, 2018&lt;/a&gt;&lt;/blockquote&gt;
 &lt;script async src=&#34;https://platform.twitter.com/widgets.js&#34; charset=&#34;utf-8&#34;&gt;&lt;/script&gt;
 
 &lt;blockquote class=&#34;twitter-tweet&#34;&gt;&lt;p lang=&#34;en&#34; dir=&#34;ltr&#34;&gt;Here is the narrative from my &lt;a href=&#34;https://twitter.com/hashtag/useR2018?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#useR2018&lt;/a&gt; keynote &lt;a href=&#34;https://t.co/SbrlShNaDL&#34;&gt;https://t.co/SbrlShNaDL&lt;/a&gt;&lt;/p&gt;&amp;mdash; Roger D. Peng (@rdpeng) &lt;a href=&#34;https://twitter.com/rdpeng/status/1017608009789259778?ref_src=twsrc%5Etfw&#34;&gt;July 13, 2018&lt;/a&gt;&lt;/blockquote&gt;
 &lt;script async src=&#34;https://platform.twitter.com/widgets.js&#34; charset=&#34;utf-8&#34;&gt;&lt;/script&gt;
 
 &lt;blockquote class=&#34;twitter-tweet&#34;&gt;&lt;p lang=&#34;en&#34; dir=&#34;ltr&#34;&gt;Roger Peng’s &lt;a href=&#34;https://twitter.com/hashtag/useR2018?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#useR2018&lt;/a&gt; keynote this morning resonates with me, as another long time user/developer/instructor. Useful, opinionated take on where we are now in &lt;a href=&#34;https://twitter.com/hashtag/rstats?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#rstats&lt;/a&gt; and how we got here. &lt;a href=&#34;https://twitter.com/rdpeng?ref_src=twsrc%5Etfw&#34;&gt;@rdpeng&lt;/a&gt; &lt;a href=&#34;https://t.co/bOLSoaFupd&#34;&gt;https://t.co/bOLSoaFupd&lt;/a&gt; &lt;a href=&#34;https://t.co/ejc9yFYGVA&#34;&gt;pic.twitter.com/ejc9yFYGVA&lt;/a&gt;&lt;/p&gt;&amp;mdash; Jenny Bryan (@JennyBryan) &lt;a href=&#34;https://twitter.com/JennyBryan/status/1017579549435912194?ref_src=twsrc%5Etfw&#34;&gt;July 13, 2018&lt;/a&gt;&lt;/blockquote&gt;
 &lt;script async src=&#34;https://platform.twitter.com/widgets.js&#34; charset=&#34;utf-8&#34;&gt;&lt;/script&gt;
 
&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://twitter.com/JennyBryan&#34;&gt;Jenny Bryan&lt;/a&gt; talked about &lt;em&gt;Code Smells and Feels&lt;/em&gt; which was one of the major highlights. We wish we could have been there. Here are some tweets about it:&lt;/p&gt;
&lt;p&gt;&lt;blockquote class=&#34;twitter-tweet&#34;&gt;&lt;p lang=&#34;en&#34; dir=&#34;ltr&#34;&gt;Code Smells and Feels&lt;br&gt;^^ my keynote talk at &lt;a href=&#34;https://twitter.com/hashtag/useR2018?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#useR2018&lt;/a&gt;&lt;br&gt;Materials at: &lt;a href=&#34;https://t.co/e7dZRMZuSL&#34;&gt;https://t.co/e7dZRMZuSL&lt;/a&gt;&lt;br&gt;It was a great honour to speak and the Brisbane crew upheld the fine tradition of fun and informative useR! meetings 🎉 &lt;a href=&#34;https://t.co/2XkJ64NgsM&#34;&gt;pic.twitter.com/2XkJ64NgsM&lt;/a&gt;&lt;/p&gt;&amp;mdash; Jenny Bryan (@JennyBryan) &lt;a href=&#34;https://twitter.com/JennyBryan/status/1017697356479729665?ref_src=twsrc%5Etfw&#34;&gt;July 13, 2018&lt;/a&gt;&lt;/blockquote&gt;
&lt;script async src=&#34;https://platform.twitter.com/widgets.js&#34; charset=&#34;utf-8&#34;&gt;&lt;/script&gt;

&lt;/p&gt;
&lt;p&gt;Check out her presentation materials on &lt;a href=&#34;https://github.com/jennybc/code-smells-and-feels&#34;&gt;github&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The talk was centered around the idea of writing good code. Using senses such as smell and feel as an extended metaphor, Bryan explains that coding is a sense that is developed through experience. Taking a very supportive tone, she provides pro-tips to writing efficient and effective code, such as writing simple conditions and functions instead of relying on complex function and “Tip #1: Do not comment and uncomment sections of code to alter behavior.”&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://twitter.com/thomasp85&#34;&gt;Thomas Lin Pedersen&lt;/a&gt; talked about the &lt;code&gt;gganimate&lt;/code&gt; package which seems to have included gifs in the talk.&lt;/p&gt;
&lt;p&gt;&lt;blockquote class=&#34;twitter-tweet&#34;&gt;&lt;p lang=&#34;en&#34; dir=&#34;ltr&#34;&gt;First keynote for the second day of &lt;a href=&#34;https://twitter.com/hashtag/useR2018?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#useR2018&lt;/a&gt;. &lt;a href=&#34;https://twitter.com/thomasp85?ref_src=twsrc%5Etfw&#34;&gt;@thomasp85&lt;/a&gt; &amp;quot;The Grammar of Animation&amp;quot; &lt;a href=&#34;https://twitter.com/hashtag/sketchnotes?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#sketchnotes&lt;/a&gt; &lt;a href=&#34;https://t.co/tvNjvbr4ag&#34;&gt;pic.twitter.com/tvNjvbr4ag&lt;/a&gt;&lt;/p&gt;&amp;mdash; Luke Zappia (@_lazappi_) &lt;a href=&#34;https://twitter.com/_lazappi_/status/1017198068360347648?ref_src=twsrc%5Etfw&#34;&gt;July 12, 2018&lt;/a&gt;&lt;/blockquote&gt;
&lt;script async src=&#34;https://platform.twitter.com/widgets.js&#34; charset=&#34;utf-8&#34;&gt;&lt;/script&gt;

 &lt;blockquote class=&#34;twitter-tweet&#34;&gt;&lt;p lang=&#34;en&#34; dir=&#34;ltr&#34;&gt;😬 I said I wasn&amp;#39;t gonna gif it, but I also don&amp;#39;t want you to miss it…&lt;br&gt;&amp;quot;The Grammar of Animation&amp;quot; 👨‍🎨 &lt;a href=&#34;https://twitter.com/thomasp85?ref_src=twsrc%5Etfw&#34;&gt;@thomasp85&lt;/a&gt; &lt;a href=&#34;https://t.co/t2HYRTtHwO&#34;&gt;https://t.co/t2HYRTtHwO&lt;/a&gt; &lt;a href=&#34;https://twitter.com/hashtag/rstats?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#rstats&lt;/a&gt; &lt;a href=&#34;https://twitter.com/hashtag/useR2018?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#useR2018&lt;/a&gt; &lt;a href=&#34;https://twitter.com/hashtag/gganimate?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#gganimate&lt;/a&gt; &lt;a href=&#34;https://t.co/YOyuNn5p1g&#34;&gt;pic.twitter.com/YOyuNn5p1g&lt;/a&gt;&lt;/p&gt;&amp;mdash; Mara Averick (@dataandme) &lt;a href=&#34;https://twitter.com/dataandme/status/1017393683379949568?ref_src=twsrc%5Etfw&#34;&gt;July 12, 2018&lt;/a&gt;&lt;/blockquote&gt;
 &lt;script async src=&#34;https://platform.twitter.com/widgets.js&#34; charset=&#34;utf-8&#34;&gt;&lt;/script&gt;
 
&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://twitter.com/StephdeSilva&#34;&gt;Steph de Silva&lt;/a&gt; started the useR!2018 keynotes with her &lt;em&gt;Beyond syntax, towards culture&lt;/em&gt; talk which covered different R communities and how we all interact.&lt;/p&gt;
&lt;p&gt;&lt;blockquote class=&#34;twitter-tweet&#34;&gt;&lt;p lang=&#34;en&#34; dir=&#34;ltr&#34;&gt;Kicking off the &lt;a href=&#34;https://twitter.com/hashtag/useR2018?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#useR2018&lt;/a&gt; talks with @StephdeSilva&amp;#39;s keynote &amp;quot;Beyond syntax, towards culture&amp;quot; &lt;a href=&#34;https://twitter.com/hashtag/sketchnotes?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#sketchnotes&lt;/a&gt; &lt;a href=&#34;https://t.co/vgBsfOIFJU&#34;&gt;pic.twitter.com/vgBsfOIFJU&lt;/a&gt;&lt;/p&gt;&amp;mdash; Luke Zappia (@_lazappi_) &lt;a href=&#34;https://twitter.com/_lazappi_/status/1016904394103668736?ref_src=twsrc%5Etfw&#34;&gt;July 11, 2018&lt;/a&gt;&lt;/blockquote&gt;
&lt;script async src=&#34;https://platform.twitter.com/widgets.js&#34; charset=&#34;utf-8&#34;&gt;&lt;/script&gt;

 &lt;blockquote class=&#34;twitter-tweet&#34;&gt;&lt;p lang=&#34;en&#34; dir=&#34;ltr&#34;&gt;Late to the party, I was a little busy: my slides for my talk &lt;a href=&#34;https://twitter.com/hashtag/useR2018?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#useR2018&lt;/a&gt;&lt;a href=&#34;https://t.co/OzqcSTEx2v&#34;&gt;https://t.co/OzqcSTEx2v&lt;/a&gt; &lt;a href=&#34;https://t.co/RNnCm6K2ym&#34;&gt;pic.twitter.com/RNnCm6K2ym&lt;/a&gt;&lt;/p&gt;&amp;mdash; Steph Stammel (@StephStammel) &lt;a href=&#34;https://twitter.com/StephStammel/status/1017757332254515201?ref_src=twsrc%5Etfw&#34;&gt;July 13, 2018&lt;/a&gt;&lt;/blockquote&gt;
 &lt;script async src=&#34;https://platform.twitter.com/widgets.js&#34; charset=&#34;utf-8&#34;&gt;&lt;/script&gt;
 
&lt;/p&gt;
&lt;p&gt;The slides and video for the &lt;code&gt;workflowr&lt;/code&gt; talk by &lt;a href=&#34;https://twitter.com/jdblischak&#34;&gt;John Blischak&lt;/a&gt; are already online too which got the big thumbs up by &lt;a href=&#34;https://twitter.com/PeteHaitch&#34;&gt;Peter Hickey&lt;/a&gt;!&lt;/p&gt;
&lt;p&gt;&lt;blockquote class=&#34;twitter-tweet&#34;&gt;&lt;p lang=&#34;en&#34; dir=&#34;ltr&#34;&gt;Here are the slides for my &lt;a href=&#34;https://twitter.com/hashtag/useR2018?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#useR2018&lt;/a&gt; presentation on my &lt;a href=&#34;https://twitter.com/hashtag/rstats?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#rstats&lt;/a&gt; package &lt;a href=&#34;https://twitter.com/hashtag/workflowr?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#workflowr&lt;/a&gt; for reproducible research &lt;a href=&#34;https://t.co/O2yZ7RemN6&#34;&gt;https://t.co/O2yZ7RemN6&lt;/a&gt;&lt;/p&gt;&amp;mdash; John Blischak (@jdblischak) &lt;a href=&#34;https://twitter.com/jdblischak/status/1016912611529510913?ref_src=twsrc%5Etfw&#34;&gt;July 11, 2018&lt;/a&gt;&lt;/blockquote&gt;
&lt;script async src=&#34;https://platform.twitter.com/widgets.js&#34; charset=&#34;utf-8&#34;&gt;&lt;/script&gt;

 &lt;blockquote class=&#34;twitter-tweet&#34;&gt;&lt;p lang=&#34;en&#34; dir=&#34;ltr&#34;&gt;I feel like &lt;a href=&#34;https://twitter.com/jdblischak?ref_src=twsrc%5Etfw&#34;&gt;@jdblischak&lt;/a&gt; has read my mind with workflowr. It&amp;#39;s like my current workflow but, like, actually good and reproducible! Will be especially great for collaborative and consulting type projects &lt;a href=&#34;https://twitter.com/hashtag/useR2018?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#useR2018&lt;/a&gt; &lt;a href=&#34;https://t.co/tZtqyH3sc2&#34;&gt;https://t.co/tZtqyH3sc2&lt;/a&gt;&lt;/p&gt;&amp;mdash; Peter Hickey (@PeteHaitch) &lt;a href=&#34;https://twitter.com/PeteHaitch/status/1016920426792943616?ref_src=twsrc%5Etfw&#34;&gt;July 11, 2018&lt;/a&gt;&lt;/blockquote&gt;
 &lt;script async src=&#34;https://platform.twitter.com/widgets.js&#34; charset=&#34;utf-8&#34;&gt;&lt;/script&gt;
 
&lt;/p&gt;
&lt;p&gt;If you are starting out with the tidyverse, this tutorial about &lt;em&gt;Wrangling with the Tidyverse&lt;/em&gt; by &lt;a href=&#34;https://twitter.com/drsimonj&#34;&gt;Simon Jackson&lt;/a&gt; seems interesting!&lt;/p&gt;
&lt;blockquote class=&#34;twitter-tweet&#34;&gt;&lt;p lang=&#34;en&#34; dir=&#34;ltr&#34;&gt;Great to meet everyone today who attended my &lt;a href=&#34;https://twitter.com/hashtag/useR2018?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#useR2018&lt;/a&gt; &lt;a href=&#34;https://twitter.com/hashtag/rstats?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#rstats&lt;/a&gt; tutorial, &amp;quot;Wrangling with the Tidyverse!&amp;quot;&lt;br&gt;&lt;br&gt;Missed out or forgot anything? Get all the material at &lt;a href=&#34;https://t.co/YfYZBlkwMs&#34;&gt;https://t.co/YfYZBlkwMs&lt;/a&gt;&lt;br&gt;&lt;br&gt;Special thanks to &lt;a href=&#34;https://twitter.com/Rhydwyn?ref_src=twsrc%5Etfw&#34;&gt;@Rhydwyn&lt;/a&gt; &lt;a href=&#34;https://twitter.com/orchid00?ref_src=twsrc%5Etfw&#34;&gt;@orchid00&lt;/a&gt; and &lt;a href=&#34;https://twitter.com/SayaniGupta5?ref_src=twsrc%5Etfw&#34;&gt;@SayaniGupta5&lt;/a&gt; for their support too &lt;a href=&#34;https://t.co/5MnXLSQxq9&#34;&gt;pic.twitter.com/5MnXLSQxq9&lt;/a&gt;&lt;/p&gt;&amp;mdash; Simon Jackson (@drsimonj) &lt;a href=&#34;https://twitter.com/drsimonj/status/1016592786865287169?ref_src=twsrc%5Etfw&#34;&gt;July 10, 2018&lt;/a&gt;&lt;/blockquote&gt;
&lt;script async src=&#34;https://platform.twitter.com/widgets.js&#34; charset=&#34;utf-8&#34;&gt;&lt;/script&gt;


&lt;p&gt;Did anyone else think about the Diablo game with the &lt;code&gt;deckard&lt;/code&gt; package? This new package by &lt;a href=&#34;https://twitter.com/VergeLabsAI&#34;&gt;Verge Labs&lt;/a&gt; could be very useful when working with large datasets.&lt;/p&gt;
&lt;blockquote class=&#34;twitter-tweet&#34;&gt;&lt;p lang=&#34;en&#34; dir=&#34;ltr&#34;&gt;Introducing deckard for large scale visualisations in &lt;a href=&#34;https://twitter.com/hashtag/rstats?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#rstats&lt;/a&gt;! If you want to hear more about it come catch us present this Thursday at 4:30 at &lt;a href=&#34;https://twitter.com/hashtag/user2018?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#user2018&lt;/a&gt;. &lt;a href=&#34;https://t.co/sIcd3ztqVl&#34;&gt;https://t.co/sIcd3ztqVl&lt;/a&gt; &lt;a href=&#34;https://t.co/ggu0N7JMWH&#34;&gt;pic.twitter.com/ggu0N7JMWH&lt;/a&gt;&lt;/p&gt;&amp;mdash; Verge Labs (@VergeLabsAI) &lt;a href=&#34;https://twitter.com/VergeLabsAI/status/1016793618051301376?ref_src=twsrc%5Etfw&#34;&gt;July 10, 2018&lt;/a&gt;&lt;/blockquote&gt;
&lt;script async src=&#34;https://platform.twitter.com/widgets.js&#34; charset=&#34;utf-8&#34;&gt;&lt;/script&gt;


&lt;p&gt;&lt;a href=&#34;https://twitter.com/jimhester_&#34;&gt;Jim Hester&lt;/a&gt;’s talk about the &lt;code&gt;glue&lt;/code&gt; package was highly recommended by Jenny Bryan. And more likely than not, you are using R packages that Jim has helped in some way or the other.&lt;/p&gt;
&lt;blockquote class=&#34;twitter-tweet&#34;&gt;&lt;p lang=&#34;en&#34; dir=&#34;ltr&#34;&gt;Slides for my talk at &lt;a href=&#34;https://t.co/7JO8G1nQup&#34;&gt;https://t.co/7JO8G1nQup&lt;/a&gt;&lt;/p&gt;&amp;mdash; Jim Hester (@jimhester_) &lt;a href=&#34;https://twitter.com/jimhester_/status/1017226381082558464?ref_src=twsrc%5Etfw&#34;&gt;July 12, 2018&lt;/a&gt;&lt;/blockquote&gt;
&lt;script async src=&#34;https://platform.twitter.com/widgets.js&#34; charset=&#34;utf-8&#34;&gt;&lt;/script&gt;


&lt;p&gt;&lt;a href=&#34;https://twitter.com/tslumley&#34;&gt;Thomas Lumley&lt;/a&gt; talked about &lt;code&gt;fasteR&lt;/code&gt;: ways to speed up R code; check the video of his talk at &lt;a href=&#34;https://www.youtube.com/watch?v=P2MDIzflp9k&#34;&gt;YouTube&lt;/a&gt;.&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;https://imgs.xkcd.com/comics/is_it_worth_the_time.png&#34; width=&#34;600&#34; /&gt;

&lt;/div&gt;
&lt;p&gt;Major takehomes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;If you repeat a task frequently, it is worth taking the time to optimize it for speed. (See xkcd cartoon!)&lt;/li&gt;
&lt;li&gt;Packages are available to measure how “efficient” your code is, in time and/or memory. Options: &lt;code&gt;microbenchmark()&lt;/code&gt;, &lt;code&gt;Rprof()&lt;/code&gt;, &lt;code&gt;system.time()&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Reasons your code may be slower than necessary:
&lt;ul&gt;
&lt;li&gt;Dataframes are slower than matrices, data.tables, tbls, and lists&lt;/li&gt;
&lt;li&gt;Vectorize your code whenever possible&lt;/li&gt;
&lt;li&gt;Preallocate for the size of your objects, rather than “growing” your objects.&lt;/li&gt;
&lt;li&gt;Linear algebra / matrix algebra functions can be much faster than alternatives because they are coded in C. E.g. for a large matrix, &lt;code&gt;crossprod(scale(x))&lt;/code&gt; &lt;em&gt;if you know there is no missing data or NAs&lt;/em&gt; is many times faster than &lt;code&gt;cor(x)&lt;/code&gt;. If you know the linear algebra, use matrix operations when possible.&lt;/li&gt;
&lt;li&gt;Packages exist for modeling large data. Example: &lt;code&gt;biglm&lt;/code&gt; for linear models.&lt;/li&gt;
&lt;li&gt;Thomas Lumley is a Rosalind Franklin fan :)&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div id=&#34;miscellaneous&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Miscellaneous&lt;/h3&gt;
&lt;p&gt;They made an awesome hex wall with the hex stickers from packages represented at useR!2018. It’s awesome!&lt;/p&gt;
&lt;p&gt;&lt;blockquote class=&#34;twitter-tweet&#34;&gt;&lt;p lang=&#34;en&#34; dir=&#34;ltr&#34;&gt;The &lt;a href=&#34;https://twitter.com/hashtag/useR2018?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#useR2018&lt;/a&gt; &lt;a href=&#34;https://twitter.com/hashtag/hexwall?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#hexwall&lt;/a&gt; has been revealed! Read about how it was created in &lt;a href=&#34;https://twitter.com/hashtag/rstats?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#rstats&lt;/a&gt; on this blog post: &lt;a href=&#34;https://t.co/krYYOQ3N84&#34;&gt;https://t.co/krYYOQ3N84&lt;/a&gt;&lt;br&gt;&lt;br&gt;A huge thanks to everyone who has submitted stickers and provided feedback. I hope you enjoy the end result as much as I have had creating it! 🎉 &lt;a href=&#34;https://t.co/GnG9m2cZme&#34;&gt;pic.twitter.com/GnG9m2cZme&lt;/a&gt;&lt;/p&gt;&amp;mdash; Mitchell O&amp;#39;Hara-Wild (@mitchoharawild) &lt;a href=&#34;https://twitter.com/mitchoharawild/status/1016867974597074944?ref_src=twsrc%5Etfw&#34;&gt;July 11, 2018&lt;/a&gt;&lt;/blockquote&gt;
&lt;script async src=&#34;https://platform.twitter.com/widgets.js&#34; charset=&#34;utf-8&#34;&gt;&lt;/script&gt;

 &lt;blockquote class=&#34;twitter-tweet&#34;&gt;&lt;p lang=&#34;en&#34; dir=&#34;ltr&#34;&gt;Controversial choice of wearing a &lt;a href=&#34;https://twitter.com/hashtag/python?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#python&lt;/a&gt; in front of an &lt;a href=&#34;https://twitter.com/hashtag/hexwall?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#hexwall&lt;/a&gt; of &lt;a href=&#34;https://twitter.com/hashtag/R?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#R&lt;/a&gt; &lt;a href=&#34;https://twitter.com/hashtag/packages?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#packages&lt;/a&gt; but hey, we&amp;#39;re all friends! &lt;a href=&#34;https://twitter.com/hashtag/useR2018?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#useR2018&lt;/a&gt; &lt;a href=&#34;https://t.co/rSG1EsH4Wi&#34;&gt;pic.twitter.com/rSG1EsH4Wi&lt;/a&gt;&lt;/p&gt;&amp;mdash; Anna Quaglieri (@annaquagli) &lt;a href=&#34;https://twitter.com/annaquagli/status/1016920889101709312?ref_src=twsrc%5Etfw&#34;&gt;July 11, 2018&lt;/a&gt;&lt;/blockquote&gt;
 &lt;script async src=&#34;https://platform.twitter.com/widgets.js&#34; charset=&#34;utf-8&#34;&gt;&lt;/script&gt;
 
&lt;/p&gt;
&lt;p&gt;It’s awesome to see the &lt;a href=&#34;https://twitter.com/RLadiesGlobal&#34;&gt;RLadies&lt;/a&gt; community thriving! A few of us didn’t know about &lt;a href=&#34;https://r-posts.com/introducing-r-ladies-remote-chapter/&#34;&gt;RLadies Remote&lt;/a&gt; which everyone can join.&lt;/p&gt;
&lt;blockquote class=&#34;twitter-tweet&#34;&gt;&lt;p lang=&#34;en&#34; dir=&#34;ltr&#34;&gt;Here are amazing &lt;a href=&#34;https://twitter.com/hashtag/RLadies?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#RLadies&lt;/a&gt; from around the world having dinner after an excellent day at &lt;a href=&#34;https://twitter.com/useR2018_conf?ref_src=twsrc%5Etfw&#34;&gt;@useR2018_conf&lt;/a&gt; &lt;a href=&#34;https://twitter.com/RLadiesGlobal?ref_src=twsrc%5Etfw&#34;&gt;@RLadiesGlobal&lt;/a&gt; &lt;a href=&#34;https://twitter.com/RLadiesBrisbane?ref_src=twsrc%5Etfw&#34;&gt;@RLadiesBrisbane&lt;/a&gt; &lt;a href=&#34;https://twitter.com/RLadiesSydney?ref_src=twsrc%5Etfw&#34;&gt;@RLadiesSydney&lt;/a&gt; &lt;a href=&#34;https://twitter.com/RLadiesIstanbul?ref_src=twsrc%5Etfw&#34;&gt;@RLadiesIstanbul&lt;/a&gt; &lt;a href=&#34;https://twitter.com/RLadiesIzmir?ref_src=twsrc%5Etfw&#34;&gt;@RLadiesIzmir&lt;/a&gt; &lt;a href=&#34;https://twitter.com/RLadiesAKL?ref_src=twsrc%5Etfw&#34;&gt;@RLadiesAKL&lt;/a&gt; &lt;a href=&#34;https://twitter.com/RLadiesDC?ref_src=twsrc%5Etfw&#34;&gt;@RLadiesDC&lt;/a&gt; &lt;a href=&#34;https://twitter.com/RLadiesRemote?ref_src=twsrc%5Etfw&#34;&gt;@RLadiesRemote&lt;/a&gt; &lt;a href=&#34;https://twitter.com/RLadiesMVD?ref_src=twsrc%5Etfw&#34;&gt;@RLadiesMVD&lt;/a&gt; &lt;a href=&#34;https://twitter.com/hashtag/rladies?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#rladies&lt;/a&gt; &lt;a href=&#34;https://twitter.com/hashtag/useR2018?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#useR2018&lt;/a&gt; &lt;a href=&#34;https://twitter.com/hashtag/rstat?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#rstat&lt;/a&gt; &lt;a href=&#34;https://t.co/aZoSuAU0Gi&#34;&gt;pic.twitter.com/aZoSuAU0Gi&lt;/a&gt;&lt;/p&gt;&amp;mdash; R-Ladies Melbourne Inc (@RLadiesMelb) &lt;a href=&#34;https://twitter.com/RLadiesMelb/status/1016604921154584576?ref_src=twsrc%5Etfw&#34;&gt;July 10, 2018&lt;/a&gt;&lt;/blockquote&gt;
&lt;script async src=&#34;https://platform.twitter.com/widgets.js&#34; charset=&#34;utf-8&#34;&gt;&lt;/script&gt;


&lt;p&gt;These are some awesome socks!&lt;/p&gt;
&lt;p&gt;And who doesn’t love this image of &lt;a href=&#34;https://twitter.com/hadleywickham&#34;&gt;Hadley Wickham&lt;/a&gt; being mobbed by deers? He even meme-fied it himself on this tweet.&lt;/p&gt;
&lt;blockquote class=&#34;twitter-tweet&#34;&gt;&lt;p lang=&#34;en&#34; dir=&#34;ltr&#34;&gt;People often ask why dplyr &amp;amp; tibble don&amp;#39;t support row names. I&amp;#39;ve (finally) written up my reasons at &lt;a href=&#34;https://t.co/UmZjaSk7UX&#34;&gt;https://t.co/UmZjaSk7UX&lt;/a&gt; &lt;a href=&#34;https://twitter.com/hashtag/rstats?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#rstats&lt;/a&gt; (photo credit: &lt;a href=&#34;https://twitter.com/hspter?ref_src=twsrc%5Etfw&#34;&gt;@hspter&lt;/a&gt;) &lt;a href=&#34;https://t.co/IVbaVmKhYp&#34;&gt;pic.twitter.com/IVbaVmKhYp&lt;/a&gt;&lt;/p&gt;&amp;mdash; Hadley Wickham (@hadleywickham) &lt;a href=&#34;https://twitter.com/hadleywickham/status/1017562721456275456?ref_src=twsrc%5Etfw&#34;&gt;July 13, 2018&lt;/a&gt;&lt;/blockquote&gt;
&lt;script async src=&#34;https://platform.twitter.com/widgets.js&#34; charset=&#34;utf-8&#34;&gt;&lt;/script&gt;


&lt;p&gt;useR!2019 is already lined up, &lt;a href=&#34;http://www.user2019.fr/&#34;&gt;check it out&lt;/a&gt;. It’ll be in Toulouse, France! Follow them on Twitter at &lt;a href=&#34;https://twitter.com/UseR2019_Conf&#34; class=&#34;uri&#34;&gt;https://twitter.com/UseR2019_Conf&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;acknowledgments&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Acknowledgments&lt;/h3&gt;
&lt;p&gt;We are greatful to everyone that tweeted about the conference and shared their materials online!&lt;/p&gt;
&lt;p&gt;This blog post was made possible thanks to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=blogdown&#34;&gt;blogdown&lt;/a&gt;&lt;/em&gt; &lt;a id=&#39;cite-Xie_2017&#39;&gt;&lt;/a&gt;(&lt;a href=&#39;https://github.com/rstudio/blogdown&#39;&gt;Xie, Hill, and Thomas, 2017&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=devtools&#34;&gt;devtools&lt;/a&gt;&lt;/em&gt; &lt;a id=&#39;cite-Wickham_2018&#39;&gt;&lt;/a&gt;(&lt;a href=&#39;https://CRAN.R-project.org/package=devtools&#39;&gt;Wickham, Hester, and Chang, 2018&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=knitcitations&#34;&gt;knitcitations&lt;/a&gt;&lt;/em&gt; &lt;a id=&#39;cite-Boettiger_2017&#39;&gt;&lt;/a&gt;(&lt;a href=&#39;https://CRAN.R-project.org/package=knitcitations&#39;&gt;Boettiger, 2017&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div id=&#34;references&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;References&lt;/h3&gt;
&lt;p&gt;
&lt;a id=&#39;bib-Boettiger_2017&#39;&gt;&lt;/a&gt;&lt;a href=&#34;#cite-Boettiger_2017&#34;&gt;[1]&lt;/a&gt;&lt;cite&gt; C. Boettiger. &lt;em&gt;knitcitations: Citations for ‘Knitr’ Markdown Files&lt;/em&gt;. R package version 1.0.8. 2017. URL: &lt;a href=&#34;https://CRAN.R-project.org/package=knitcitations&#34;&gt;https://CRAN.R-project.org/package=knitcitations&lt;/a&gt;.&lt;/cite&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;a id=&#39;bib-Wickham_2018&#39;&gt;&lt;/a&gt;&lt;a href=&#34;#cite-Wickham_2018&#34;&gt;[2]&lt;/a&gt;&lt;cite&gt; H. Wickham, J. Hester and W. Chang. &lt;em&gt;devtools: Tools to Make Developing R Packages Easier&lt;/em&gt;. R package version 1.13.6. 2018. URL: &lt;a href=&#34;https://CRAN.R-project.org/package=devtools&#34;&gt;https://CRAN.R-project.org/package=devtools&lt;/a&gt;.&lt;/cite&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;a id=&#39;bib-Xie_2017&#39;&gt;&lt;/a&gt;&lt;a href=&#34;#cite-Xie_2017&#34;&gt;[3]&lt;/a&gt;&lt;cite&gt; Y. Xie, A. P. Hill and A. Thomas. &lt;em&gt;blogdown: Creating Websites with R Markdown&lt;/em&gt;. ISBN 978-0815363729. Boca Raton, Florida: Chapman and Hall/CRC, 2017. URL: &lt;a href=&#34;https://github.com/rstudio/blogdown&#34;&gt;https://github.com/rstudio/blogdown&lt;/a&gt;.&lt;/cite&gt;
&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;reproducibility&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Reproducibility&lt;/h3&gt;
&lt;pre&gt;&lt;code&gt;## Session info ----------------------------------------------------------------------------------------------------------&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##  setting  value                       
##  version  R version 3.5.1 (2018-07-02)
##  system   x86_64, darwin15.6.0        
##  ui       X11                         
##  language (EN)                        
##  collate  en_US.UTF-8                 
##  tz       America/New_York            
##  date     2018-07-13&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Packages --------------------------------------------------------------------------------------------------------------&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##  package       * version date       source                            
##  backports       1.1.2   2017-12-13 cran (@1.1.2)                     
##  base          * 3.5.1   2018-07-05 local                             
##  bibtex          0.4.2   2017-06-30 CRAN (R 3.5.0)                    
##  BiocStyle     * 2.8.2   2018-05-30 Bioconductor                      
##  blogdown        0.7     2018-07-07 CRAN (R 3.5.0)                    
##  bookdown        0.7     2018-02-18 CRAN (R 3.5.0)                    
##  colorout      * 1.2-0   2018-05-03 Github (jalvesaq/colorout@c42088d)
##  compiler        3.5.1   2018-07-05 local                             
##  datasets      * 3.5.1   2018-07-05 local                             
##  devtools      * 1.13.6  2018-06-27 cran (@1.13.6)                    
##  digest          0.6.15  2018-01-28 CRAN (R 3.5.0)                    
##  evaluate        0.10.1  2017-06-24 cran (@0.10.1)                    
##  graphics      * 3.5.1   2018-07-05 local                             
##  grDevices     * 3.5.1   2018-07-05 local                             
##  htmltools       0.3.6   2017-04-28 cran (@0.3.6)                     
##  httr            1.3.1   2017-08-20 CRAN (R 3.5.0)                    
##  jsonlite        1.5     2017-06-01 CRAN (R 3.5.0)                    
##  knitcitations * 1.0.8   2017-07-04 CRAN (R 3.5.0)                    
##  knitr           1.20    2018-02-20 cran (@1.20)                      
##  lubridate       1.7.4   2018-04-11 CRAN (R 3.5.0)                    
##  magrittr        1.5     2014-11-22 cran (@1.5)                       
##  memoise         1.1.0   2017-04-21 CRAN (R 3.5.0)                    
##  methods       * 3.5.1   2018-07-05 local                             
##  plyr            1.8.4   2016-06-08 cran (@1.8.4)                     
##  R6              2.2.2   2017-06-17 CRAN (R 3.5.0)                    
##  Rcpp            0.12.17 2018-05-18 cran (@0.12.17)                   
##  RefManageR      1.2.0   2018-04-25 CRAN (R 3.5.0)                    
##  rmarkdown       1.10    2018-06-11 CRAN (R 3.5.0)                    
##  rprojroot       1.3-2   2018-01-03 cran (@1.3-2)                     
##  stats         * 3.5.1   2018-07-05 local                             
##  stringi         1.2.3   2018-06-12 CRAN (R 3.5.0)                    
##  stringr         1.3.1   2018-05-10 CRAN (R 3.5.0)                    
##  tools           3.5.1   2018-07-05 local                             
##  utils         * 3.5.1   2018-07-05 local                             
##  withr           2.1.2   2018-03-15 CRAN (R 3.5.0)                    
##  xfun            0.3     2018-07-06 CRAN (R 3.5.0)                    
##  xml2            1.2.0   2018-01-24 CRAN (R 3.5.0)                    
##  yaml            2.1.19  2018-05-01 CRAN (R 3.5.0)&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>git to know git: an 8 minute introduction</title>
      <link>http://LieberInstitute.github.io/rstatsclub/2018/04/17/git-to-know-git/</link>
      <pubDate>Tue, 17 Apr 2018 00:00:00 +0000</pubDate>
      <guid>http://LieberInstitute.github.io/rstatsclub/2018/04/17/git-to-know-git/</guid>
      <description>


&lt;p&gt;By &lt;a href=&#34;http://amy-peterson.github.io&#34;&gt;Amy Peterson&lt;/a&gt;&lt;/p&gt;
&lt;div id=&#34;using-git&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Using Git&lt;/h3&gt;
&lt;p&gt;Git is a version control system that allows you to track changes made to files while working on a project, either independently or in collaboration with others. It provides a way to save many different components of a project in progress, including the source code, but also the figures and data that the code produces. The importance of understanding and using Git lies in its ability to maintain an organized record of a project, also referred to as a &lt;strong&gt;repository&lt;/strong&gt; or &lt;strong&gt;repo&lt;/strong&gt;, as it evolves. While setting up and learning to use Git may seem intimidating, the majority of the work is in the initial setup.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;github&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;GitHub&lt;/h3&gt;
&lt;p&gt;GitHub is one of the hosting services that provides an interface for using Git, and can be thought of as Dropbox for version control projects. GitHub is one of the ways to store &lt;strong&gt;repositories&lt;/strong&gt; using Git, and is an easy way to routinely back-up your work as you make progress on a project. It is also helpful for tracking changes, demonstrating who contributed to which projects, when they contributed, and what their contributions were.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;why-i-use-git&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Why I Use Git&lt;/h3&gt;
&lt;p&gt;When I started as an Intern at the &lt;a href=&#34;http://www.libd.org&#34;&gt;LIBD&lt;/a&gt;, I noticed how frequently GitHub was used. As I familiarized myself with some of the projects I would be working on, it became clear how much easier it was to use a system that could document project changes made throughout time in a way that was widely accessible to contributors. Using GitHub also made it easier to re-visit certain scripts or documents to determine what changes were made, when, and why they were needed. Having a detailed history of various project components is an easy way to ensure that contributors have information organized in the same way.&lt;/p&gt;
&lt;p&gt;Beyond working on projects with collaborators, using GitHub is equally rewarding when used for individual projects. Particularly if working on some projects at work on one computer, and needing those updates to be accessible on a different computer at home, GitHub is a quick and easy way to keep a project updated across computers to ensure you are always working on the latest updates.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;terms&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Terms&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;commit&lt;/strong&gt;: saves changes, either adding a new file to GitHub, or updating the existing version of that file&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;issue&lt;/strong&gt;: option on GitHub that creates a list of action items for a repository, similar to a to-do list; tasks can be assigned to particular contributors; also possible to commented on and reference particular tasks within a commit message by including # and the issue number&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;push&lt;/strong&gt;: sends the commits made locally to the repository on GitHub&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;pull&lt;/strong&gt;: downloads modified or newly added files, so the local directory matches the current repository on GitHub&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;public-v.-private-repositories&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Public v. Private Repositories&lt;/h2&gt;
&lt;p&gt;Repositories can be public or private. Public repositories are readable to everyone, but permissions are still required to make edits by pushing commits. Private repositories are inaccessible and unreadable without permission, with the repository owner having control to moderate who has access to read, edit, or extend admin access.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;features&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Features&lt;/h2&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2018-04-17-git-to-know-git_files/git%20features.png&#34; height=&#34;60&#34; /&gt; &lt;strong&gt;Watch&lt;/strong&gt;: Provides a way to receive notifications regarding all updates on a particular repository of interest.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Star&lt;/strong&gt;: Marks a specific repository of interest, making it easier to refer back to it later. Differs from &lt;strong&gt;watch&lt;/strong&gt; in that you do not receive notifications for repository updates.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Fork&lt;/strong&gt;: Downloads a copy of the current version of the file from GitHub. The downloaded copy exists separately from the repo, and reflects the file as is at the time of the download.&lt;/p&gt;
&lt;div id=&#34;initial-set-up&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Initial Set-Up&lt;/h3&gt;
&lt;p&gt;Make an account on &lt;a href=&#34;https://github.com/join&#34;&gt;GitHub&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;mac&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Mac&lt;/h2&gt;
&lt;p&gt;On newer Macs this should already be set up, but checking is easy!&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;## Open Terminal Application
which git # determine if Git is installed
git --version # lists current Git version installed

## If not installed, use the following to install
git --version
git config &lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;windows&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Windows&lt;/h2&gt;
&lt;p&gt;Install &lt;a href=&#34;https://gitforwindows.org&#34;&gt;Git for Windows&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;after-git-installation&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;After Git Installation&lt;/h2&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;## Open Terminal (Mac) or Git Bash (Windows)
## Enter the name and email associated with your GitHub account
git config --global user.name &amp;quot;Amy Peterson&amp;quot;
git config --global.email &amp;quot;amy.peterson@jhu.edu&amp;quot;
git config --global --list # Lists global configuration options &lt;/code&gt;&lt;/pre&gt;
&lt;div id=&#34;setting-up-a-repository&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Setting Up a Repository&lt;/h3&gt;
&lt;p&gt;Identify a repository you want to contribute to, or create your own! Repositories can be created on the front page using “Start a project” or by clicking the green “New” button by clicking repositories from your profile page.&lt;/p&gt;
&lt;p&gt;Next, take the following steps&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;## Open Terminal (Mac) or Git Bash (Windows)
# Change directories so you are in the directory where you want to set up the repository
get pwd() #gives name of current directory 
cd /~Desktop #changes current directory to Desktop 
ls #lists folders you can cd into 
# On the repository page on GitHub, click the green &amp;quot;Clone or Download&amp;quot;
git clone git@GitHub.com:SampleLink.git # Paste link from GitHub to download the repository locally&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;saving-your-work&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Saving Your Work&lt;/h3&gt;
&lt;p&gt;The process of updating GitHub is as follows: &lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2018-04-17-git-to-know-git_files/git%20process.png&#34; height=&#34;80&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;## Open Terminal (Mac) or Git Bash (Windows)
git add File1.R # adds file, here File1.R, to GitHub
git commit -m &amp;quot;Example message&amp;quot; # attaches the message in parentheses to the files being added to GitHub
git push # save file to GitHub

# Once updates are pushed, other repository members need to do the following
git pull # updates local directory to reflect the changes made to GitHub&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2018-04-17-git-to-know-git_files/git%20status.png&#34; height=&#34;50&#34; /&gt; Useful at any time throughout the process of updating a repository, &lt;code&gt;git status&lt;/code&gt; provides information regarding how your local directory differs from the repository on GitHub, and separates those differences into which files have had changes made, and which files are entirely new. In the example below, &lt;strong&gt;File1.R&lt;/strong&gt; and &lt;strong&gt;File2.pdf&lt;/strong&gt; have been modified from what exists on GitHub, while &lt;strong&gt;File3.R&lt;/strong&gt; and &lt;strong&gt;File4.pdf&lt;/strong&gt; are &lt;code&gt;untracked&lt;/code&gt;, or entirely new to the repository.&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2018-04-17-git-to-know-git_files/git%20status%20result.png&#34; width=&#34;500&#34; /&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;committing-folders&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Committing Folders&lt;/h2&gt;
&lt;p&gt;Folders associated with a project can also be committed to a repository on GitHub. Folders that are currently untracked will be listed in response to &lt;code&gt;git status&lt;/code&gt;, and committing a folder to a repository will simultaneously commit all of its contents. This is particularly useful and efficient when creating a repository for an existing project.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;making-multiple-commits&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Making Multiple Commits&lt;/h2&gt;
&lt;p&gt;Multiple commits can be made to group files before pushing them to GitHub. Each set of files you have added using &lt;code&gt;git add&lt;/code&gt; will be grouped together as a single commit once you type &lt;code&gt;git commit&lt;/code&gt; and enter the commit message you want associated with the files. Then, once all the commits you are ready to make are finished, use &lt;code&gt;git push&lt;/code&gt; to save the commits to GitHub.&lt;/p&gt;
&lt;div id=&#34;starting-a-repository-for-an-existing-project&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Starting a Repository for an Existing Project&lt;/h3&gt;
&lt;p&gt;There are only a few differences for setting up a repository for an existing project, compared to the steps previously described.&lt;/p&gt;
&lt;p&gt;Most importantly, after setting up a new repository on GitHub, the next screen will list a number of options. If you are setting up a repository for an existing project, and hoping to commit locally saved files, you will first need to &lt;code&gt;cd&lt;/code&gt; into the locally existing project folder. Then use the instructions below that appear under on the GitHub website under the header “create a new repository on the command line”. In the screenshot from the example below, the repository I named is called “test”.&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2018-04-17-git-to-know-git_files/Existing%20Project.png&#34; width=&#34;500&#34; /&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;git-ignore&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Git Ignore&lt;/h2&gt;
&lt;p&gt;Git ignore files are important for both new and old project repositories. They are scripts that specify which file types should be ignored, meaning they will not be included in the list &lt;code&gt;git status&lt;/code&gt; provides to inform you of local files that are not currently saved to GitHub. Git ignore is important when creating a repository for an existing project, since there will be some existing local files that you will not want to include in the repository, for example, larger files that are not necessary to upload and include on the repository long-term. With new project repositories, you do not need to start with an extensive git ignore file, but can edit it as the project evolves, since it will become more clear over time which file types you do not want to include in the repository.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;## Open Terminal (Mac) or Git Bash (Windows)
touch .gitignore # Creates git ignore file
# Open the file to edit, then commit the file to your GitHub repository&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;An example of a git ignore file is below. As demonstrated, an asterisk can be used to designate entire file types to ignore. For example, adding &lt;code&gt;*.zip&lt;/code&gt; would ignore any zip files that are saved locally when using &lt;code&gt;git status&lt;/code&gt; to determine the differences between local files and the repository on GitHub. &lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2018-04-17-git-to-know-git_files/git%20ignore.png&#34; width=&#34;350&#34; /&gt;&lt;/p&gt;
&lt;div id=&#34;summary&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Summary&lt;/h3&gt;
&lt;p&gt;The general steps for saving files from your local directory to GitHub is&lt;/p&gt;
&lt;p&gt;&lt;code&gt;git add -&amp;gt; git commit -&amp;gt; git push&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Git pull will be used to download files from GitHub to match what exists on your local directory.&lt;/p&gt;
&lt;p&gt;This project was written as a brief introduction to Git and GitHub, for individuals who are interested in incorporating Git into their work. This post is by no means a comprehensive introduction. For more detailed information regarding GitHub, and using Git, &lt;a href=&#34;http://happygitwithr.com&#34;&gt;Happy Git and GitHub for the useR&lt;/a&gt; is a great resource.&lt;/p&gt;
&lt;p&gt;Hopefully this post was helpful in serving as a brief introduction and a way to become more familiarized with some of the basic concepts behind Git and GitHub. Feel free to leave questions or share your story in the comments!&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Introduction to Scraping and Wrangling Tables from Research Articles</title>
      <link>http://LieberInstitute.github.io/rstatsclub/2018/03/19/introduction-to-scraping-and-wranging-tables-from-research-articles/</link>
      <pubDate>Mon, 19 Mar 2018 00:00:00 +0000</pubDate>
      <guid>http://LieberInstitute.github.io/rstatsclub/2018/03/19/introduction-to-scraping-and-wranging-tables-from-research-articles/</guid>
      <description>


&lt;p&gt;By &lt;a href=&#34;https://www.libd.org/team/Stephen-Semick/&#34;&gt;Steve Semick&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;What do you do when you want to use results from the literature to anchor your own analysis? When these results are in the form of an easily accessible table, such as a .csv file or .xlsx file, then it is simple enough to download them and incorporate them into your research. Often times, however, published findings are not so easy to handle. Today, we’ll go through a practical scenario on scraping an html table from a &lt;a href=&#34;https://www.nature.com/articles/ng.2802/&#34;&gt;Nature Genetics article&lt;/a&gt; into R and wrangling the data into a useful format. This is what the online table we want to scrape looks like:&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2018-03-19-introduction-to-scraping-and-wranging-scientific-research-articles_files/blogpost01_table2_ng_screenshot.png&#34; width=&#34;800&#34; /&gt;

&lt;/div&gt;
&lt;div id=&#34;example-1a-scraping-a-html-table-from-a-webpage&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Example 1A: Scraping a html table from a webpage&lt;/h3&gt;
&lt;p&gt;Sometimes a table is online as part of a research article but can’t be easily coerced into a useful format. You can’t copy and paste the table into excel and its not stored elsewhere. In these situations you can use the handy R package &lt;code&gt;rvest&lt;/code&gt; to scrape it into R from the webpage.&lt;/p&gt;
&lt;p&gt;First load the &lt;code&gt;rvest&lt;/code&gt; package to scrape the table.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(&amp;quot;rvest&amp;quot;)
library(&amp;quot;knitr&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Next, get the url of the webpage where the table is stored.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;url &amp;lt;- &amp;quot;https://www.nature.com/articles/ng.2802/tables/2&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now, the trickiest part of the process. We need find where the table lives on this webpage. An excellent guide to finding out how to do this can be found on &lt;a href=&#34;http://blog.corynissen.com/2015/01/using-rvest-to-scrape-html-table.html&#34;&gt;Cory Nissen’s blogpost&lt;/a&gt;– this is also where I learned about using &lt;code&gt;rvest&lt;/code&gt; to scrape html tables. Once you have copied the table location, the rest is easy!&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;table_path=&amp;#39;//*[@id=&amp;quot;content&amp;quot;]/div/div/figure/div[1]/div/div[1]/table&amp;#39;
nature_genetics_table2 &amp;lt;- url %&amp;gt;%
  read_html() %&amp;gt;%
  html_nodes(xpath=table_path) %&amp;gt;%
  html_table(fill=T)
nature_genetics_table2 = nature_genetics_table2[[1]]&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is what the first few lines of our scraped product looks like:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;kable(nature_genetics_table2[1:4,])&lt;/code&gt;&lt;/pre&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr class=&#34;header&#34;&gt;
&lt;th align=&#34;left&#34;&gt;SNPa&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;Chr.&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;Positionb&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;Closest genec&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;Major/minor alleles&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;MAFd&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;Stage 1&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;Stage 1&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;Stage 2&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;Stage 2&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;Overall&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;Overall&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;Overall&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;left&#34;&gt;SNPa&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;Chr.&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;Positionb&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;Closest genec&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;Major/minor alleles&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;MAFd&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;OR (95% CI)e&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;Meta P value&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;OR (95% CI)e&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;Meta P value&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;OR (95% CI)e&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;Meta P value&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;I2 (%), P valuef&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td align=&#34;left&#34;&gt;Known GWAS-defined associated genes&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;Known GWAS-defined associated genes&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;Known GWAS-defined associated genes&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;Known GWAS-defined associated genes&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;Known GWAS-defined associated genes&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;Known GWAS-defined associated genes&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;Known GWAS-defined associated genes&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;Known GWAS-defined associated genes&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;Known GWAS-defined associated genes&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;Known GWAS-defined associated genes&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;Known GWAS-defined associated genes&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;Known GWAS-defined associated genes&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;Known GWAS-defined associated genes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;left&#34;&gt;rs6656401&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;1&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;207692049&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;CR1&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;G/A&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;0.197&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;1.17 (1.12–1.22)&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;7.7 × 10−15&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;1.21 (1.14–1.28)&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;7.9 × 10−11&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;1.18 (1.14–1.22)&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;5.7 × 10−24&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;0, 7.8 × 10−1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td align=&#34;left&#34;&gt;rs6733839&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;2&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;127892810&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;BIN1&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;C/T&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;0.409&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;1.21 (1.17–1.25)&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;1.7 × 10−26&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;1.24 (1.18–1.29)&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;3.4 × 10−19&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;1.22 (1.18–1.25)&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;6.9 × 10−44&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;28, 6.1 × 10−2&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;While this table has the information we want, it is clearly still a mess. Which brings us to…&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;example-1b-making-messy-data-useful&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Example 1B: Making messy data useful&lt;/h3&gt;
&lt;p&gt;Fortunately, there is the right number of columns, but there are lines of text breaking the table and stretching through at least one row of the table. There are two others (not shown), so getting these rows into better data format will be our first task.&lt;/p&gt;
&lt;div id=&#34;cleaning-up-the-rows&#34; class=&#34;section level4&#34;&gt;
&lt;h4&gt;Cleaning up the rows&lt;/h4&gt;
&lt;p&gt;We could look at the table and see these lines are on rows 2, 12, and 18. But let’s do this instead using some R code. The trick here is to note that all the elements of these rows contain the exact same text.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;v=which(apply(nature_genetics_table2,1, function(x) length(unique(unlist(x))) )==1)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Great, now let’s get rid of these rows but retain the information. We are going to do this in three steps. First, we will split the data.frame into a list based on the location of these descriptions (rows 2,12, and 18). Then, we will clean this list up by keeping only the list elements with data. We will move the text taking up entire rows to an additional column titled “Description”. Lastly, we will concatenate this cleaned list back into a data.frame.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;nature_genetics_table2_list = split(nature_genetics_table2, cumsum(1:nrow(nature_genetics_table2) %in% v))
nature_genetics_table2_list = lapply(nature_genetics_table2_list[2:4], function(y){ y$Description= unique(as.character(y[1,]) ) 
y[-1,] } )

nature_genetics_table2_clean = do.call(&amp;quot;rbind&amp;quot;,nature_genetics_table2_list)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now let’s look at our data.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;kable(nature_genetics_table2_clean[1:3,])&lt;/code&gt;&lt;/pre&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr class=&#34;header&#34;&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;SNPa&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;Chr.&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;Positionb&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;Closest genec&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;Major/minor alleles&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;MAFd&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;Stage 1&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;Stage 1&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;Stage 2&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;Stage 2&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;Overall&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;Overall&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;Overall&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td&gt;1.3&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;rs6656401&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;1&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;207692049&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;CR1&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;G/A&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;0.197&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;1.17 (1.12–1.22)&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;7.7 × 10−15&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;1.21 (1.14–1.28)&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;7.9 × 10−11&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;1.18 (1.14–1.22)&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;5.7 × 10−24&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;0, 7.8 × 10−1&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;Known GWAS-defined associated genes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td&gt;1.4&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;rs6733839&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;2&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;127892810&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;BIN1&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;C/T&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;0.409&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;1.21 (1.17–1.25)&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;1.7 × 10−26&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;1.24 (1.18–1.29)&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;3.4 × 10−19&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;1.22 (1.18–1.25)&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;6.9 × 10−44&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;28, 6.1 × 10−2&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;Known GWAS-defined associated genes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td&gt;1.5&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;rs10948363&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;6&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;47487762&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;CD2AP&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;A/G&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;0.266&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;1.10 (1.07–1.14)&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;3.1 × 10−8&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;1.09 (1.04–1.15)&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;4.1 × 10−4&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;1.10 (1.07–1.13)&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;5.2 × 10−11&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;0, 9 × 10−1&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;Known GWAS-defined associated genes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;div id=&#34;fixing-column-names&#34; class=&#34;section level4&#34;&gt;
&lt;h4&gt;Fixing column names&lt;/h4&gt;
&lt;p&gt;It’s getting better but is still messy. Let’s clean up those columns names. This part we will do by hand.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;colnames(nature_genetics_table2_clean) &amp;lt;- c(&amp;quot;SNP&amp;quot;, &amp;quot;Chr&amp;quot;, &amp;quot;Position&amp;quot;, &amp;quot;Closest gene&amp;quot;, &amp;quot;Major/minor alleles&amp;quot;, &amp;quot;MAF&amp;quot;, &amp;quot;Stage1_OR&amp;quot;, &amp;quot;Stage1_MetaP&amp;quot;, &amp;quot;Stage2_OR&amp;quot;,&amp;quot;Stage2_MetaP&amp;quot;,    &amp;quot;Overall_OR&amp;quot;, &amp;quot;Overall_MetaP&amp;quot;, &amp;quot;I2_Percent/P&amp;quot;,&amp;quot;Description&amp;quot;)
colnames(nature_genetics_table2_clean)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##  [1] &amp;quot;SNP&amp;quot;                 &amp;quot;Chr&amp;quot;                 &amp;quot;Position&amp;quot;           
##  [4] &amp;quot;Closest gene&amp;quot;        &amp;quot;Major/minor alleles&amp;quot; &amp;quot;MAF&amp;quot;                
##  [7] &amp;quot;Stage1_OR&amp;quot;           &amp;quot;Stage1_MetaP&amp;quot;        &amp;quot;Stage2_OR&amp;quot;          
## [10] &amp;quot;Stage2_MetaP&amp;quot;        &amp;quot;Overall_OR&amp;quot;          &amp;quot;Overall_MetaP&amp;quot;      
## [13] &amp;quot;I2_Percent/P&amp;quot;        &amp;quot;Description&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;making-a-character-variable-into-a-numeric-variable&#34; class=&#34;section level4&#34;&gt;
&lt;h4&gt;Making a character variable into a numeric variable&lt;/h4&gt;
&lt;p&gt;It’s coming along.. Next, we need to make the numbers into, well numbers. This will be more useful in R than a character class of data. To do this, try using the &lt;code&gt;as.numeric&lt;/code&gt; function as shown.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;as.numeric(nature_genetics_table2_clean$Stage1_MetaP)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning: NAs introduced by coercion&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##  [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It doesn’t work because there are weird symbols that R doesn’t understand. Get rid of them with the &lt;code&gt;gsub&lt;/code&gt; command and replace them an E (scientific notation).&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;nature_genetics_table2_clean$Stage1_MetaP = gsub(&amp;quot; × 10&amp;quot;,&amp;quot;E&amp;quot;,nature_genetics_table2_clean$Stage1_MetaP)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now try converting to a numeric.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;as.numeric(nature_genetics_table2_clean$Stage1_MetaP)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning: NAs introduced by coercion&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##  [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It still doesn’t work!! Take a second closer look at the data. Can you discern why the code failed?&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;nature_genetics_table2_clean$Stage1_MetaP&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##  [1] &amp;quot;7.7E-15&amp;quot; &amp;quot;1.7E-26&amp;quot; &amp;quot;3.1E-8&amp;quot;  &amp;quot;8.8E-10&amp;quot; &amp;quot;9.6E-17&amp;quot; &amp;quot;2.8E-11&amp;quot; &amp;quot;6.5E-16&amp;quot;
##  [8] &amp;quot;1.7E-9&amp;quot;  &amp;quot;5.1E-8&amp;quot;  &amp;quot;1.6E-8&amp;quot;  &amp;quot;3.3E-9&amp;quot;  &amp;quot;5.0E-11&amp;quot; &amp;quot;1.5E-7&amp;quot;  &amp;quot;4.6E-8&amp;quot; 
## [15] &amp;quot;9.6E-5&amp;quot;  &amp;quot;2.5E-6&amp;quot;  &amp;quot;1.3E-5&amp;quot;  &amp;quot;7.4E-6&amp;quot;  &amp;quot;6.7E-6&amp;quot;  &amp;quot;1.0E-5&amp;quot;  &amp;quot;1.6E-6&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Yep, that’s right the &lt;code&gt;−&lt;/code&gt; symbol is not in fact the same as a minus symbol &lt;code&gt;-&lt;/code&gt;! We need to replace it. We’ll use the fact that that symbol always appears immediately after a capital E to our advantage.&lt;/p&gt;
&lt;p&gt;Split the string on the E using &lt;code&gt;strsplit&lt;/code&gt;.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;s = strsplit(nature_genetics_table2_clean$Stage1_MetaP, &amp;quot;E&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Get the first and second half of each string.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;firstStr = lapply(s, `[[`, 1 )
secondStr=lapply(s, `[[`, 2 )&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Knock off that first character: which the symbol we don’t want and slap a minus sign back on .&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;secondStr= lapply(secondStr,function(x) {paste0(&amp;quot;E-&amp;quot;,substring(x,2))})&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Finally, stitch the two parts of the string back together now that the minus sign has been corrected and convert it to numeric.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;mapply( function(firstStr, secondStr) {as.numeric(paste0(firstStr,secondStr) )}, firstStr, secondStr )&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##  [1] 7.7e-15 1.7e-26 3.1e-08 8.8e-10 9.6e-17 2.8e-11 6.5e-16 1.7e-09
##  [9] 5.1e-08 1.6e-08 3.3e-09 5.0e-11 1.5e-07 4.6e-08 9.6e-05 2.5e-06
## [17] 1.3e-05 7.4e-06 6.7e-06 1.0e-05 1.6e-06&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It works! Make sure to replace the column in the data.frame.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;nature_genetics_table2_clean$Stage1_MetaP= mapply( function(firstStr, secondStr) {as.numeric(paste0(firstStr,secondStr) )}, firstStr, secondStr )&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;See how it appears in the table now as a numeric? Try wrangling the rest of these columns into a useful format on your own and let me know how it goes.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;kable(nature_genetics_table2_clean[1:3,])&lt;/code&gt;&lt;/pre&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr class=&#34;header&#34;&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;SNP&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;Chr&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;Position&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;Closest gene&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;Major/minor alleles&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;MAF&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;Stage1_OR&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;Stage1_MetaP&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;Stage2_OR&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;Stage2_MetaP&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;Overall_OR&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;Overall_MetaP&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;I2_Percent/P&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td&gt;1.3&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;rs6656401&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;1&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;207692049&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;CR1&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;G/A&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;0.197&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;1.17 (1.12–1.22)&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;1.21 (1.14–1.28)&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;7.9 × 10−11&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;1.18 (1.14–1.22)&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;5.7 × 10−24&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;0, 7.8 × 10−1&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;Known GWAS-defined associated genes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td&gt;1.4&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;rs6733839&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;2&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;127892810&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;BIN1&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;C/T&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;0.409&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;1.21 (1.17–1.25)&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;1.24 (1.18–1.29)&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;3.4 × 10−19&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;1.22 (1.18–1.25)&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;6.9 × 10−44&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;28, 6.1 × 10−2&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;Known GWAS-defined associated genes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td&gt;1.5&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;rs10948363&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;6&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;47487762&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;CD2AP&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;A/G&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;0.266&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;1.10 (1.07–1.14)&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;1.09 (1.04–1.15)&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;4.1 × 10−4&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;1.10 (1.07–1.13)&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;5.2 × 10−11&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;0, 9 × 10−1&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;Known GWAS-defined associated genes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;conclusions&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Conclusions&lt;/h3&gt;
&lt;p&gt;Today we went through getting data directly into R from an html table (a table on a webpage) and demonstrated how to get the data into a useful format. There are a couple advantages of doing this work in R instead of using excel– if that’s even an option. First, R is more reproducible. Second, once you have code for wrangling one table, wrangling the next one will be much faster. At the end of the day though, it is always easiest when the table is shared as a csv or excel file; something to keep in mind when preparing your own papers.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;reproducibility&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Reproducibility&lt;/h3&gt;
&lt;pre&gt;&lt;code&gt;## Session info ----------------------------------------------------------------------------------------------------------&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##  setting  value                       
##  version  R version 3.4.4 (2018-03-15)
##  system   x86_64, mingw32             
##  ui       RTerm                       
##  language (EN)                        
##  collate  English_United States.1252  
##  tz       America/New_York            
##  date     2018-04-20&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Packages --------------------------------------------------------------------------------------------------------------&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##  package   * version date       source                           
##  backports   1.1.2   2017-12-13 CRAN (R 3.4.3)                   
##  base      * 3.4.4   2018-03-15 local                            
##  blogdown    0.5.12  2018-03-23 Github (rstudio/blogdown@21f14af)
##  bookdown    0.7     2018-02-18 CRAN (R 3.4.3)                   
##  compiler    3.4.4   2018-03-15 local                            
##  curl        3.1     2017-12-12 CRAN (R 3.4.3)                   
##  datasets  * 3.4.4   2018-03-15 local                            
##  devtools  * 1.13.5  2018-02-18 CRAN (R 3.4.3)                   
##  digest      0.6.15  2018-01-28 CRAN (R 3.4.3)                   
##  evaluate    0.10.1  2017-06-24 CRAN (R 3.4.3)                   
##  graphics  * 3.4.4   2018-03-15 local                            
##  grDevices * 3.4.4   2018-03-15 local                            
##  highr       0.6     2016-05-09 CRAN (R 3.4.3)                   
##  htmltools   0.3.6   2017-04-28 CRAN (R 3.4.3)                   
##  httr        1.3.1   2017-08-20 CRAN (R 3.4.3)                   
##  knitr     * 1.20    2018-02-20 CRAN (R 3.4.3)                   
##  magrittr    1.5     2014-11-22 CRAN (R 3.4.3)                   
##  memoise     1.1.0   2017-04-21 CRAN (R 3.4.3)                   
##  methods   * 3.4.4   2018-03-15 local                            
##  R6          2.2.2   2017-06-17 CRAN (R 3.4.3)                   
##  Rcpp        0.12.16 2018-03-13 CRAN (R 3.4.4)                   
##  rmarkdown   1.9     2018-03-01 CRAN (R 3.4.3)                   
##  rprojroot   1.3-2   2018-01-03 CRAN (R 3.4.3)                   
##  rvest     * 0.3.2   2016-06-17 CRAN (R 3.4.3)                   
##  selectr     0.3-2   2018-03-05 CRAN (R 3.4.3)                   
##  stats     * 3.4.4   2018-03-15 local                            
##  stringi     1.1.7   2018-03-12 CRAN (R 3.4.4)                   
##  stringr     1.3.0   2018-02-19 CRAN (R 3.4.3)                   
##  tools       3.4.4   2018-03-15 local                            
##  utils     * 3.4.4   2018-03-15 local                            
##  withr       2.1.2   2018-03-15 CRAN (R 3.4.4)                   
##  xfun        0.1     2018-01-22 CRAN (R 3.4.3)                   
##  xml2      * 1.2.0   2018-01-24 CRAN (R 3.4.3)                   
##  yaml        2.1.18  2018-03-08 CRAN (R 3.4.3)&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Edit your bashrc file for a nicer terminal experience</title>
      <link>http://LieberInstitute.github.io/rstatsclub/2018/03/11/edit-your-bashrc-file-for-a-nicer-terminal-experience/</link>
      <pubDate>Sun, 11 Mar 2018 00:00:00 +0000</pubDate>
      <guid>http://LieberInstitute.github.io/rstatsclub/2018/03/11/edit-your-bashrc-file-for-a-nicer-terminal-experience/</guid>
      <description>


&lt;p&gt;By &lt;a href=&#34;http://lcolladotor.github.io&#34;&gt;L. Collado-Torres&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;If you are working at LIBD or with large data, it’s very likely that it won’t fit in your laptop and that you’ll be using the terminal to interact with a high performance computing cluster (like &lt;a href=&#34;http://www.jhpce.jhu.edu/&#34;&gt;JHPCE&lt;/a&gt;) or server. Some small edits to your bash configuration file can make your terminal experience much more enjoyable and hopefully boost your productivity. The edits described below work for any OS. On Windows, I’m assuming that you are using &lt;code&gt;git bash&lt;/code&gt; or a similar terminal program.&lt;/p&gt;
&lt;p&gt;The way we can control our terminal appearance and some behavior is through the &lt;code&gt;.bashrc&lt;/code&gt; file. That file typically gets read once when loading a new terminal window and that is where we can save some shortcuts we like to use, alter the colors of our terminal, change the behavior of the up and down arrow keys, etc.&lt;/p&gt;
&lt;div id=&#34;bashrc-file&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;&lt;code&gt;.bashrc&lt;/code&gt; file&lt;/h3&gt;
&lt;p&gt;First, we need to learn where to locate this file. On all OS (Mac, Windows, Linux) machines/servers, the &lt;code&gt;.bashrc&lt;/code&gt; file typically lives at &lt;code&gt;~/.bashrc&lt;/code&gt;. For my Mac for example that is &lt;code&gt;/home/lcollado/.bashrc&lt;/code&gt;. For my Windows machine, that’s &lt;code&gt;/c/Users/Leonardo/.bashrc&lt;/code&gt;. Now, the dot before the file makes it a &lt;em&gt;hidden file&lt;/em&gt;. &lt;a href=&#34;http://lmgtfy.com/?q=show+hidden+files&#34;&gt;A quick search can help you&lt;/a&gt; find the options for your computer that lets you see these hidden files. From a terminal window, I typically use this bash command to show all the hidden files (that’s from the &lt;code&gt;a&lt;/code&gt; option).&lt;/p&gt;
&lt;pre class=&#34;bash&#34;&gt;&lt;code&gt;## List files in human readable format including hidden files
ls -lha&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;initial-files&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Initial files&lt;/h3&gt;
&lt;p&gt;You might have a &lt;code&gt;~/.bashrc&lt;/code&gt; and a &lt;code&gt;~/.bash_profile&lt;/code&gt; already. If not, lets create simple ones. You can use the &lt;code&gt;touch&lt;/code&gt; bash command for make a new file (&lt;code&gt;touch ~/.bashrc&lt;/code&gt;), or you could use this R code:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;file.create(&amp;#39;~/.bashrc&amp;#39;)
file.create(&amp;#39;~/.bash_profile&amp;#39;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Next open them with your text editor (say Notepad++, TextMate 2, RStudio, among others) and paste the following contents.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;~/.bash_profile&lt;/code&gt; contents&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# Get the aliases and functions
if [ -f ~/.bashrc ]; then
        . ~/.bashrc
fi

# In my Mac one I also have this:
if [ -f ~/.profile ]; then
        . ~/.profile
fi&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Minimal &lt;code&gt;~/.bashrc&lt;/code&gt; contents&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# Source global definitions
if [ -f /etc/bashrc ]; then
    . /etc/bashrc
fi&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;control-your-bash-history&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Control your bash history&lt;/h3&gt;
&lt;p&gt;Lets start adding features to our terminal experience by editing the &lt;code&gt;~/.bashrc&lt;/code&gt; file. I typically include comments &lt;code&gt;#&lt;/code&gt; describing what the code is doing and where I learned how to do &lt;span class=&#34;math inline&#34;&gt;\(X\)&lt;/span&gt;. The first part is controlling your bash history. I want to have a longer history than what is included by default and where duplicates are deleted.&lt;/p&gt;
&lt;pre class=&#34;bash&#34;&gt;&lt;code&gt;# http://www.biostat.jhsph.edu/~afisher/ComputingClub/webfiles/KasperHansenPres/IntermediateUnix.pdf
# https://unix.stackexchange.com/questions/48713/how-can-i-remove-duplicates-in-my-bash-history-preserving-order
export HISTCONTROL=ignoreboth:erasedups
export HISTSIZE=10000
shopt -s histappend
shopt -s cmdhist&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;change-the-up-and-down-arrows&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Change the up and down arrows&lt;/h3&gt;
&lt;p&gt;The next change will save you a lot of time! Plus it goes nicely with the bash history changes we just made. Normally, the up and down arrow let you select previous commands from your bash history (up) or select one of your latest commands (down, after having used up). The following changes make it so that the up arrow searches only commands that start with exactly the letters you had already typed.&lt;/p&gt;
&lt;p&gt;Lets say that you just requested a compute node with &lt;code&gt;qrsh&lt;/code&gt; and you have an empty line.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2018-03-11-edit-your-bashrc-file-for-a-nicer-terminal-experience_files/Screen Shot 2018-03-11 at 7.38.34 PM.png&#34; alt=&#34;qrsh&#34; width=&#34;500&#34;/&gt;&lt;/p&gt;
&lt;p&gt;If you use the &lt;code&gt;up&lt;/code&gt; arrow, you can navigate your command history. So far, this is the same as the default &lt;code&gt;up&lt;/code&gt; arrow behavior.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2018-03-11-edit-your-bashrc-file-for-a-nicer-terminal-experience_files/Screen Shot 2018-03-11 at 7.38.48 PM.png&#34; alt=&#34;empty up&#34; width=&#34;500&#34;/&gt;&lt;/p&gt;
&lt;p&gt;Lets say that I want to change directory to one of my recent projects. So I type &lt;code&gt;cd /&lt;/code&gt; in the terminal window (without hitting enter).&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2018-03-11-edit-your-bashrc-file-for-a-nicer-terminal-experience_files/Screen Shot 2018-03-11 at 7.39.16 PM.png&#34; alt=&#34;empty cd&#34; width=&#34;300&#34;/&gt;&lt;/p&gt;
&lt;p&gt;Next I use the &lt;code&gt;up&lt;/code&gt; arrow, and it only finds for me commands that start with &lt;code&gt;cd /&lt;/code&gt;, including this long one.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2018-03-11-edit-your-bashrc-file-for-a-nicer-terminal-experience_files/Screen Shot 2018-03-11 at 7.39.52 PM.png&#34; alt=&#34;cd and up&#34; width=&#34;500&#34;/&gt;&lt;/p&gt;
&lt;p&gt;Did you like this? Well, add the following code to your &lt;code&gt;~/.bashrc&lt;/code&gt; file&lt;/p&gt;
&lt;pre class=&#34;bash&#34;&gt;&lt;code&gt;# Auto-complete command from history
# http://lindesk.com/2009/04/customize-terminal-configuration-setting-bash-cli-power-user/
export INPUTRC=~/.inputrc&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;where &lt;code&gt;~/.inputrc&lt;/code&gt; file has the following contents:&lt;/p&gt;
&lt;pre class=&#34;bash&#34;&gt;&lt;code&gt;#Page up/page down
&amp;quot;\e[B&amp;quot;: history-search-forward
&amp;quot;\e[A&amp;quot;: history-search-backward

$include /etc/inputrc&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As an added benefit, the up and down arrows will now have this improved behavior when you run &lt;code&gt;R&lt;/code&gt; inside a terminal, although it’s limited to your current R history: actually, I guess that you could change your .Rprofile to load the previous R history.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;interactive-deleting-of-files&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Interactive deleting of files&lt;/h3&gt;
&lt;p&gt;In a terminal, you normally delete files with &lt;code&gt;rm&lt;/code&gt;, but you can make an alias (shortcut) so that when you are deleting files with &lt;code&gt;rmi&lt;/code&gt; you will get asked to confirm whether you want to delete the file or not. This can be useful if you are using some patterns for finding the files that you are trying to delete but want to make sure that the pattern didn’t catch other files you want to keep.&lt;/p&gt;
&lt;pre class=&#34;bash&#34;&gt;&lt;code&gt;# http://superuser.com/questions/384769/alias-rm-rm-i-considered-harmful
alias rmi=&amp;#39;rm -i&amp;#39;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;change-the-command-prompt&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Change the command prompt&lt;/h3&gt;
&lt;p&gt;You can also control the command prompt. That is, the parts that are shown before you start typing in your terminal. I like keeping it short, so it only shows me the parent directory instead of the full path, plus a small version for the time (hh:mm) in a 12 hour clock. This is sometimes useful if I run some commands and later on want to get a quick idea if any of them took a while to run (specially if I was not looking at the terminal).&lt;/p&gt;
&lt;pre class=&#34;bash&#34;&gt;&lt;code&gt;# Change command prompt
# http://www.cyberciti.biz/tips/howto-linux-unix-bash-shell-setup-prompt.html
# http://www.cyberciti.biz/faq/bash-shell-change-the-color-of-my-shell-prompt-under-linux-or-unix/
# https://bbs.archlinux.org/viewtopic.php?id=48910
# previous in enigma2: &amp;quot;[\u@\h \W]\$ &amp;quot;
# previously in mac: &amp;quot;\h:\W \u\$ &amp;quot;
export PS1=&amp;quot;\[\e[0;33m\]\A \W \$ \[\e[m\]&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;colors&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Colors&lt;/h3&gt;
&lt;p&gt;You can change the colors of your terminal. For example, do you want directories to be shown in blue and/or bold font while executable files are shown in red. This goes in hand with the &lt;code&gt;ls --color=auto&lt;/code&gt; shortcut to make sure that the colors are used (Mac: you might need &lt;code&gt;brew install coreutils&lt;/code&gt; as described &lt;a href=&#34;https://superuser.com/questions/183876/how-do-i-get-ls-color-auto-to-work-on-mac-os-x&#34;&gt;in this blog post&lt;/a&gt;). The following lines of my &lt;code&gt;~/.bashrc&lt;/code&gt; file include some old history of the colors and how I use to have other options.&lt;/p&gt;
&lt;pre class=&#34;bash&#34;&gt;&lt;code&gt;# colors
# http://norbauer.com/notebooks/code/notes/ls-colors-and-terminal-app
# used BSD pattern ExGxFxDxBxEgEdxbxgxhxd on http://geoff.greer.fm/lscolors/
# that tool does not specify the colors, which I did by looking manually at
# http://blog.twistedcode.org/2008/04/lscolors-explained.html
# and the norbauer.com site previously mentioned
alias ls=&amp;quot;ls --color=auto&amp;quot;
#export LS_COLORS=&amp;quot;di=1;34;40:ln=1;36;40:so=1;35;40:pi=1;93;40:ex=1;31;40:bd=1;34;46:cd=1;34;43:su=0;41:sg=0;46:tw=0;47:ow=0;43&amp;quot;
## After switching to RStudio:
# https://askubuntu.com/questions/466198/how-do-i-change-the-color-for-directories-with-ls-in-the-console
export LS_COLORS=&amp;quot;di=0;32:ln=0;36:so=0;35:pi=0;93:ex=0;31:bd=0;34;46:cd=0;34;43:su=0;41:sg=0;46:tw=0;47:ow=0;43:fi=0;33&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Mac extra lines:&lt;/p&gt;
&lt;pre class=&#34;bash&#34;&gt;&lt;code&gt;# Uncomment below for Mac and comment the two previous commands
#export CLICOLOR=1
#export LSCOLORS=&amp;quot;ExGxFxDxBxEgEdxbxgxhxd&amp;quot;
## Actually from https://superuser.com/questions/183876/how-do-i-get-ls-color-auto-to-work-on-mac-os-x
# brew install coreutils
# then change the aliast to use gls instead of ls
# that way I can use the same config file =)

alias ls=&amp;quot;gls --color=auto&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I use the same &lt;code&gt;LS_COLORS&lt;/code&gt; now on my Mac too, but you don’t need to.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;UPDATE&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;We got this note from Mark Miller, admin of &lt;a href=&#34;http://www.jhpce.jhu.edu/&#34;&gt;JHPCE&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;One quick note on your page. You mention setting colors for the ls output, which is great. One thing we (and others) have found is that, for a directory on a Lustre filesystem (/dcl01 or /dcl02), using “ls –colors=auto” or “ls -al” on a directory with lots (thousands+) of files in it can be super slow. With these options, the ls command needs to iterate through each file in the directory, and query the lustre server for each and every file to retrieve information about the file in order to determine what color to display. So, if you’re regularly using directories on Lustre that have lots of files in them, and your “ls” command it taking too long, we recommend using “ls –color=none”. &lt;a href=&#34;https://wikis.nyu.edu/display/NYUHPC/Lustre+FAQ&#34; class=&#34;uri&#34;&gt;https://wikis.nyu.edu/display/NYUHPC/Lustre+FAQ&lt;/a&gt; &lt;a href=&#34;https://groups.google.com/forum/#!topic/lustre-discuss-list/3afjd4j2Q-g&#34; class=&#34;uri&#34;&gt;https://groups.google.com/forum/#!topic/lustre-discuss-list/3afjd4j2Q-g&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/div&gt;
&lt;div id=&#34;shortcuts-for-main-project-directories&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Shortcuts for main project directories&lt;/h3&gt;
&lt;p&gt;We’ve seen several aliases (shortcuts) already such as the one for &lt;code&gt;ls --color=auto&lt;/code&gt; which is the one I use the most. But I also use aliases for changing to the root directories that I use &lt;em&gt;the&lt;/em&gt; most.&lt;/p&gt;
&lt;pre class=&#34;bash&#34;&gt;&lt;code&gt;alias labold=&amp;quot;cd /dcl01/lieber/ajaffe/lab&amp;quot;
alias lab=&amp;quot;cd /dcl01/ajaffe/data/lab&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Actually, we were supposed to just use the new disk here and I should have probably chosen better names to differentiate the two.&lt;/p&gt;
&lt;p&gt;The next terminal window you open after editing the &lt;code&gt;~/.bashrc&lt;/code&gt; file will have all your new features enabled.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://media.giphy.com/media/10UeedrT5MIfPG/giphy.gif&#34;/&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://giphy.com/reactions/search/yay&#34;&gt;Source&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;extra&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Extra&lt;/h3&gt;
&lt;p&gt;Sometimes you might need to export other environment variables, such as &lt;code&gt;RMATE_PORT&lt;/code&gt; described in the &lt;code&gt;rmate&lt;/code&gt; setup &lt;a href=&#34;http://research.libd.org/rstatsclub/2018/03/11/textmate-setup-mac-only/&#34;&gt;post&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;repeat-this-process&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Repeat this process&lt;/h3&gt;
&lt;p&gt;You can/should repeat this process for other &lt;code&gt;~/.bashrc&lt;/code&gt; files you interact with. In my case, that would be:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;~/.bashrc&lt;/code&gt; in my Mac laptop&lt;/li&gt;
&lt;li&gt;&lt;code&gt;~/.bashrc&lt;/code&gt; at my JHPCE home&lt;/li&gt;
&lt;li&gt;&lt;code&gt;~/.bashrc&lt;/code&gt; in my Windows laptop&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div id=&#34;acknowledgments&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Acknowledgments&lt;/h3&gt;
&lt;p&gt;This blog post was made possible thanks to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;&lt;a href=&#34;http://bioconductor.org/packages/BiocStyle&#34;&gt;BiocStyle&lt;/a&gt;&lt;/em&gt; &lt;a id=&#39;cite-Oles_2018&#39;&gt;&lt;/a&gt;(&lt;a href=&#39;https://github.com/Bioconductor/BiocStyle&#39;&gt;Oleś, Morgan, and Huber, 2018&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=blogdown&#34;&gt;blogdown&lt;/a&gt;&lt;/em&gt; &lt;a id=&#39;cite-Xie_2017&#39;&gt;&lt;/a&gt;(&lt;a href=&#39;https://github.com/rstudio/blogdown&#39;&gt;Xie, Hill, and Thomas, 2017&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=devtools&#34;&gt;devtools&lt;/a&gt;&lt;/em&gt; &lt;a id=&#39;cite-Wickham_2018&#39;&gt;&lt;/a&gt;(&lt;a href=&#39;https://CRAN.R-project.org/package=devtools&#39;&gt;Wickham, Hester, and Chang, 2018&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=knitcitations&#34;&gt;knitcitations&lt;/a&gt;&lt;/em&gt; &lt;a id=&#39;cite-Boettiger_2017&#39;&gt;&lt;/a&gt;(&lt;a href=&#39;https://CRAN.R-project.org/package=knitcitations&#39;&gt;Boettiger, 2017&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div id=&#34;references&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;References&lt;/h3&gt;
&lt;p&gt;
&lt;a id=&#39;bib-Boettiger_2017&#39;&gt;&lt;/a&gt;&lt;a href=&#34;#cite-Boettiger_2017&#34;&gt;[1]&lt;/a&gt;&lt;cite&gt; C. Boettiger. &lt;em&gt;knitcitations: Citations for ‘Knitr’ Markdown Files&lt;/em&gt;. R package version 1.0.8. 2017. URL: &lt;a href=&#34;https://CRAN.R-project.org/package=knitcitations&#34;&gt;https://CRAN.R-project.org/package=knitcitations&lt;/a&gt;.&lt;/cite&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;a id=&#39;bib-Oles_2018&#39;&gt;&lt;/a&gt;&lt;a href=&#34;#cite-Oles_2018&#34;&gt;[2]&lt;/a&gt;&lt;cite&gt; A. Oleś, M. Morgan and W. Huber. &lt;em&gt;BiocStyle: Standard styles for vignettes and other Bioconductor documents&lt;/em&gt;. R package version 2.8.2. 2018. URL: &lt;a href=&#34;https://github.com/Bioconductor/BiocStyle&#34;&gt;https://github.com/Bioconductor/BiocStyle&lt;/a&gt;.&lt;/cite&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;a id=&#39;bib-Wickham_2018&#39;&gt;&lt;/a&gt;&lt;a href=&#34;#cite-Wickham_2018&#34;&gt;[3]&lt;/a&gt;&lt;cite&gt; H. Wickham, J. Hester and W. Chang. &lt;em&gt;devtools: Tools to Make Developing R Packages Easier&lt;/em&gt;. R package version 1.13.6. 2018. URL: &lt;a href=&#34;https://CRAN.R-project.org/package=devtools&#34;&gt;https://CRAN.R-project.org/package=devtools&lt;/a&gt;.&lt;/cite&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;a id=&#39;bib-Xie_2017&#39;&gt;&lt;/a&gt;&lt;a href=&#34;#cite-Xie_2017&#34;&gt;[4]&lt;/a&gt;&lt;cite&gt; Y. Xie, A. P. Hill and A. Thomas. &lt;em&gt;blogdown: Creating Websites with R Markdown&lt;/em&gt;. ISBN 978-0815363729. Boca Raton, Florida: Chapman and Hall/CRC, 2017. URL: &lt;a href=&#34;https://github.com/rstudio/blogdown&#34;&gt;https://github.com/rstudio/blogdown&lt;/a&gt;.&lt;/cite&gt;
&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;reproducibility&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Reproducibility&lt;/h3&gt;
&lt;pre&gt;&lt;code&gt;## Session info ----------------------------------------------------------------------------------------------------------&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##  setting  value                                      
##  version  R version 3.5.1 Patched (2018-10-14 r75439)
##  system   x86_64, darwin15.6.0                       
##  ui       X11                                        
##  language (EN)                                       
##  collate  en_US.UTF-8                                
##  tz       America/New_York                           
##  date     2018-10-26&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Packages --------------------------------------------------------------------------------------------------------------&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##  package       * version date       source                            
##  backports       1.1.2   2017-12-13 cran (@1.1.2)                     
##  base          * 3.5.1   2018-10-15 local                             
##  bibtex          0.4.2   2017-06-30 CRAN (R 3.5.0)                    
##  BiocStyle     * 2.8.2   2018-05-30 Bioconductor                      
##  blogdown        0.8     2018-07-15 CRAN (R 3.5.0)                    
##  bookdown        0.7     2018-02-18 CRAN (R 3.5.0)                    
##  colorout      * 1.2-0   2018-05-03 Github (jalvesaq/colorout@c42088d)
##  compiler        3.5.1   2018-10-15 local                             
##  datasets      * 3.5.1   2018-10-15 local                             
##  devtools      * 1.13.6  2018-06-27 cran (@1.13.6)                    
##  digest          0.6.18  2018-10-10 CRAN (R 3.5.0)                    
##  evaluate        0.12    2018-10-09 CRAN (R 3.5.0)                    
##  graphics      * 3.5.1   2018-10-15 local                             
##  grDevices     * 3.5.1   2018-10-15 local                             
##  htmltools       0.3.6   2017-04-28 cran (@0.3.6)                     
##  httr            1.3.1   2017-08-20 CRAN (R 3.5.0)                    
##  jsonlite        1.5     2017-06-01 CRAN (R 3.5.0)                    
##  knitcitations * 1.0.8   2017-07-04 CRAN (R 3.5.0)                    
##  knitr           1.20    2018-02-20 cran (@1.20)                      
##  lubridate       1.7.4   2018-04-11 CRAN (R 3.5.0)                    
##  magrittr        1.5     2014-11-22 cran (@1.5)                       
##  memoise         1.1.0   2017-04-21 CRAN (R 3.5.0)                    
##  methods       * 3.5.1   2018-10-15 local                             
##  plyr            1.8.4   2016-06-08 cran (@1.8.4)                     
##  R6              2.3.0   2018-10-04 CRAN (R 3.5.0)                    
##  Rcpp            0.12.19 2018-10-01 CRAN (R 3.5.1)                    
##  RefManageR      1.2.0   2018-04-25 CRAN (R 3.5.0)                    
##  rmarkdown       1.10    2018-06-11 CRAN (R 3.5.0)                    
##  rprojroot       1.3-2   2018-01-03 cran (@1.3-2)                     
##  stats         * 3.5.1   2018-10-15 local                             
##  stringi         1.2.4   2018-07-20 CRAN (R 3.5.0)                    
##  stringr         1.3.1   2018-05-10 CRAN (R 3.5.0)                    
##  tools           3.5.1   2018-10-15 local                             
##  utils         * 3.5.1   2018-10-15 local                             
##  withr           2.1.2   2018-03-15 CRAN (R 3.5.0)                    
##  xfun            0.3     2018-07-06 CRAN (R 3.5.0)                    
##  xml2            1.2.0   2018-01-24 CRAN (R 3.5.0)                    
##  yaml            2.2.0   2018-07-25 CRAN (R 3.5.0)&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Textmate setup (Mac only)</title>
      <link>http://LieberInstitute.github.io/rstatsclub/2018/03/11/textmate-setup-mac-only/</link>
      <pubDate>Sun, 11 Mar 2018 00:00:00 +0000</pubDate>
      <guid>http://LieberInstitute.github.io/rstatsclub/2018/03/11/textmate-setup-mac-only/</guid>
      <description>


&lt;p&gt;By &lt;a href=&#34;http://lcolladotor.github.io&#34;&gt;L. Collado-Torres&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;For the past 6-7 years I have been using &lt;em&gt;TextMate 2&lt;/em&gt; as my text editor which I’ve found useful for R code, bash, Markdown, etc. You could also look into &lt;em&gt;Sublime Text&lt;/em&gt; or use &lt;em&gt;RStudio&lt;/em&gt; (post about this setup coming soon).&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2018-03-11-textmate-setup-mac-only_files/Screen Shot 2018-03-11 at 1.37.28 PM.png&#34; alt=&#34;textmate look&#34; width=&#34;500&#34;/&gt;&lt;/p&gt;
&lt;p&gt;Sometimes students are interested in this setup, which is what I’ll document here. Though I want to highlight that you can get a very similar setup using other tools. Note that this setup only works for Mac computers.&lt;/p&gt;
&lt;div id=&#34;setup&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Setup&lt;/h3&gt;
&lt;p&gt;First, install &lt;a href=&#34;http://macromates.com/download&#34;&gt;TextMate 2&lt;/a&gt; for free. &lt;em&gt;TextMate&lt;/em&gt; allows users to contribute &lt;em&gt;bundles&lt;/em&gt; which are a set of tools that enhance the editor. For example, if you want to quickly insert an R code chunk in a &lt;code&gt;.Rmd&lt;/code&gt; file you can add a &lt;em&gt;command&lt;/em&gt; for it inside a bundle. You can also use a bundle to get the editor to recognize R code inside an R markdown code chunk. Probably the easiest way to get jump-started is to copy my exact setup.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;change-some-preferences&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Change some preferences&lt;/h3&gt;
&lt;p&gt;So next, go to the &lt;em&gt;preferences&lt;/em&gt; menu&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2018-03-11-textmate-setup-mac-only_files/Screen Shot 2018-03-11 at 1.41.05 PM.png&#34; alt=&#34;preferences&#34; width=&#34;500&#34;/&gt;&lt;/p&gt;
&lt;p&gt;and under bundle, choose the R bundle as shown below.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2018-03-11-textmate-setup-mac-only_files/Screen Shot 2018-03-11 at 1.40.54 PM.png&#34; alt=&#34;bundle&#34; width=&#34;500&#34;/&gt;&lt;/p&gt;
&lt;p&gt;As you can see, it hasn’t been updated in a while. I have made a few edits myself here and there which I’ll describe in the next section.&lt;/p&gt;
&lt;p&gt;You should also enable running &lt;em&gt;TextMate&lt;/em&gt; from the terminal.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2018-03-11-textmate-setup-mac-only_files/Screen Shot 2018-03-11 at 2.15.06 PM.png&#34; alt=&#34;enable terminal&#34; width=&#34;500&#34;/&gt;&lt;/p&gt;
&lt;p&gt;Finally, here are my main file preferences: I want my files to be &lt;code&gt;.R&lt;/code&gt; files by default and to use linux line endings to avoid issues later on.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2018-03-11-textmate-setup-mac-only_files/Screen Shot 2018-03-11 at 2.39.09 PM.png&#34; alt=&#34;main prefs&#34; width=&#34;500&#34;/&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;adding-bundles-from-git&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Adding bundles from git&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;TextMate&lt;/em&gt; allows you to install bundles by adding the bundle files in a specific folder. The bundle files are most likely in a GitHub repository, so you just need to clone (download) them to where &lt;em&gt;TextMate&lt;/em&gt; expect them to be.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/lcolladotor/r.tmbundle&#34; class=&#34;uri&#34;&gt;https://github.com/lcolladotor/r.tmbundle&lt;/a&gt; for R and sending code to be evaluated in an iTerm2 terminal (setup explained later)&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/noniq/Merge-Markers.tmbundle&#34; class=&#34;uri&#34;&gt;https://github.com/noniq/Merge-Markers.tmbundle&lt;/a&gt; for git merging&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/elia/base16-themes.tmbundle&#34; class=&#34;uri&#34;&gt;https://github.com/elia/base16-themes.tmbundle&lt;/a&gt; for theme colors&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/lcolladotor/criticmarkup.tmbundle&#34; class=&#34;uri&#34;&gt;https://github.com/lcolladotor/criticmarkup.tmbundle&lt;/a&gt; for CriticMarkup in Markdown files&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/lcolladotor/knitr.tmbundle&#34; class=&#34;uri&#34;&gt;https://github.com/lcolladotor/knitr.tmbundle&lt;/a&gt; for &lt;code&gt;knitr::knit()&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/lcolladotor/markdown-redcarpet.tmbundle&#34; class=&#34;uri&#34;&gt;https://github.com/lcolladotor/markdown-redcarpet.tmbundle&lt;/a&gt; for basically running &lt;code&gt;rmarkdown::render()&lt;/code&gt; on the document at hand and previewing it live (if it’s an html doc). It also makes it so that R code inside code chunks will be recognized as such, enabling all the R code shortcuts.&lt;/li&gt;
&lt;/ul&gt;
&lt;pre class=&#34;bash&#34;&gt;&lt;code&gt;## Go to main bundle directory
cd ~/Library/Application\ Support/TextMate/

## Download Leonardo&amp;#39;s bundles (he uses the leo branch)
## For R, sendind code to iTerm2
git clone https://github.com/lcolladotor/r.tmbundle.git

## For merging
git clone https://github.com/noniq/Merge-Markers.tmbundle.git

## For more color themes
git clone https://github.com/elia/base16-themes.tmbundle.git

## For commenting Markdown files
git clone https://github.com/lcolladotor/criticmarkup.tmbundle.git

## For knitr::knit()
git clone https://github.com/lcolladotor/knitr.tmbundle.git

## For rmarkdown::render()
git clone https://github.com/lcolladotor/markdown-redcarpet.tmbundle.git&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As you can see, these bundles help adapt &lt;em&gt;TextMate2&lt;/em&gt; for working with R files of different flavors. But it’s not beginner friendly, hence the upcoming blog post about using RStudio.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;some-feature-highlights&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Some feature highlights&lt;/h3&gt;
&lt;p&gt;One of the features that I really like from &lt;em&gt;TextMate&lt;/em&gt; is searching/replacing (you can use regex) across all the files and sub-directories of a given directory. It’s very useful when trying to fix a common typo across different files or finding all the places where a function/argument was used. The latter one is handy when you are looking at someone else’s code. It’s basically like searching inside a GitHub repository: example, &lt;a href=&#34;https://github.com/rstudio/blogdown/search?utf8=%E2%9C%93&amp;amp;q=baseurl&amp;amp;type=&#34;&gt;search &lt;code&gt;baseurl&lt;/code&gt;&lt;/a&gt; in all of &lt;em&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=blogdown&#34;&gt;blogdown&lt;/a&gt;&lt;/em&gt; for finding the code that reads it from a config file, which I did for this &lt;a href=&#34;https://github.com/rstudio/blogdown/pull/275&#34;&gt;PR&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2018-03-11-textmate-setup-mac-only_files/Screen Shot 2018-03-11 at 2.29.33 PM.png&#34; alt=&#34;search in dir&#34; width=&#34;500&#34;/&gt;&lt;/p&gt;
&lt;p&gt;You can also comment all the lines of code you have selected fairly easily using the &lt;code&gt;Source&lt;/code&gt; bundle.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2018-03-11-textmate-setup-mac-only_files/Screen Shot 2018-03-11 at 2.41.31 PM.png&#34; alt=&#34;comment all lines&#34; width=&#34;500&#34;/&gt;&lt;/p&gt;
&lt;p&gt;I’ve also used the &lt;code&gt;Text&lt;/code&gt;, &lt;code&gt;LaTeX&lt;/code&gt; and &lt;code&gt;Gist&lt;/code&gt; bundles, though not as frequently. Also, &lt;em&gt;TextMate&lt;/em&gt; automatically spell checks for you and knows to ignore R markdown code chunks.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;evaluating-code-in-r-console-or-iterm2&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Evaluating code in R console or iTerm2&lt;/h3&gt;
&lt;p&gt;If you download and install the &lt;a href=&#34;https://www.iterm2.com/&#34;&gt;iTerm2&lt;/a&gt; terminal, you can configure &lt;em&gt;TextMate&lt;/em&gt; to evaluate the code in that terminal. The particular code I have for doing this is in the &lt;code&gt;leo&lt;/code&gt; branch of the following repo &lt;a href=&#34;https://github.com/lcolladotor/r.tmbundle/commits/leo&#34; class=&#34;uri&#34;&gt;https://github.com/lcolladotor/r.tmbundle/commits/leo&lt;/a&gt;. In total I use 3 different keyboard shortcuts depending on whether I want to evaluate the code:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;in an R console window;&lt;/li&gt;
&lt;li&gt;in an R console window after automatically running &lt;code&gt;setwd()&lt;/code&gt; to the directory that contains the files I’m looking at;&lt;/li&gt;
&lt;li&gt;to the &lt;em&gt;iTerm2&lt;/em&gt; terminal, which is useful when working with a computing cluster.&lt;/li&gt;
&lt;/ul&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;getwd() ## run in terminal with cmd + enter shortcut

getwd() ## run in R console using backtip (`) shortcut

getwd() ## run in R console using cmd + R, runs setwd() first&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2018-03-11-textmate-setup-mac-only_files/Screen Shot 2018-03-11 at 2.13.32 PM.png&#34; alt=&#34;evaluating code&#34; width=&#34;500&#34;/&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;running-rmarkdownrender&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Running &lt;code&gt;rmarkdown::render()&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;If I’m working with an R Markdown file (&lt;code&gt;.Rmd&lt;/code&gt; extension), I frequently use the &lt;code&gt;alt (option) + t&lt;/code&gt; shortcut for running &lt;code&gt;rmarkdown::render()&lt;/code&gt; and viewing the output file.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2018-03-11-textmate-setup-mac-only_files/Screen Shot 2018-03-11 at 2.19.30 PM.png&#34; alt=&#34;knit with rmarkdown&#34; width=&#34;500&#34;/&gt;&lt;/p&gt;
&lt;p&gt;For example, if I’m working with the &lt;code&gt;recount-brain/index.Rmd&lt;/code&gt; file (you can get it &lt;a href=&#34;https://github.com/LieberInstitute/recount-brain/blob/master/index.Rmd&#34;&gt;here&lt;/a&gt;), my setup allows me to run all the R shortcuts. That’s because it recognizes the R code chunk syntax and uses the &lt;code&gt;source.r&lt;/code&gt; &lt;em&gt;scope&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2018-03-11-textmate-setup-mac-only_files/Screen Shot 2018-03-11 at 2.23.57 PM.png&#34; alt=&#34;rmd scope&#34; width=&#34;500&#34;/&gt;&lt;/p&gt;
&lt;p&gt;Anyway, after using &lt;code&gt;alt (option) + t&lt;/code&gt; &lt;em&gt;TextMate&lt;/em&gt; shows me the final html.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2018-03-11-textmate-setup-mac-only_files/Screen Shot 2018-03-11 at 2.21.37 PM.png&#34; alt=&#34;rendered html&#34; width=&#34;500&#34;/&gt;&lt;/p&gt;
&lt;p&gt;A lot of the bundles in TextMate are from the days when we run &lt;code&gt;Sweave()&lt;/code&gt;. So they work well with &lt;code&gt;.Rnw&lt;/code&gt; files and all the like. I did modify one of them to use &lt;code&gt;knitr::knit()&lt;/code&gt; instead of &lt;code&gt;Sweave()&lt;/code&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;mate-and-rmate&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;&lt;code&gt;mate&lt;/code&gt; and &lt;code&gt;rmate&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;If you enable the terminal preferences you can now use the &lt;code&gt;mate&lt;/code&gt; command in any directory in your laptop. &lt;em&gt;TextMate&lt;/em&gt; will open and show you all the tabs of files you had last opened in that same directory. This behavior is also a part of the &lt;code&gt;.Rproj&lt;/code&gt; files with RStudio.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2018-03-11-textmate-setup-mac-only_files/Screen Shot 2018-03-11 at 2.46.51 PM.png&#34; alt=&#34;run mate&#34; width=&#34;500&#34;/&gt;&lt;/p&gt;
&lt;p&gt;The command I &lt;em&gt;really&lt;/em&gt; like is &lt;code&gt;rmate&lt;/code&gt; because it enables me to remotely open a file from the cluster in &lt;em&gt;TextMate&lt;/em&gt;, which combined with the &lt;em&gt;evaluate in iTerm2&lt;/em&gt; command makes it easy to work. Basically, I power up an &lt;em&gt;iTerm2&lt;/em&gt; terminal, log into the cluster, navigate to the directory that contains the files I’m working with, and then open them remotely with &lt;code&gt;rmate&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2018-03-11-textmate-setup-mac-only_files/Screen Shot 2018-03-11 at 2.52.21 PM.png&#34; alt=&#34;rmate&#34; width=&#34;500&#34;/&gt;&lt;/p&gt;
&lt;p&gt;Setting up &lt;code&gt;rmate&lt;/code&gt; takes a bit of work but it’s definitely worth it.&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;In the cluster, install &lt;code&gt;rmate&lt;/code&gt; following the instructions at &lt;a href=&#34;https://github.com/textmate/rmate&#34; class=&#34;uri&#34;&gt;https://github.com/textmate/rmate&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Find a port that works for doing the forwarding. The default one will likely be taken already by another user. Find more about this &lt;a href=&#34;https://www.ssh.com/ssh/tunneling/example&#34;&gt;here&lt;/a&gt;. There they use &lt;code&gt;ssh -R 8080:localhost:80 public.example.com&lt;/code&gt; for testing. Sadly, I don’t know of a quick and easy way to find a port for you to use :/&lt;/li&gt;
&lt;li&gt;Edit your cluster’s &lt;code&gt;~/.bashrc&lt;/code&gt; file with the port information. Mine includes these lines where &lt;code&gt;someSecretPortNumber&lt;/code&gt; is replaced by my port number.&lt;/li&gt;
&lt;/ol&gt;
&lt;pre&gt;&lt;code&gt;## rmate port
# https://github.com/textmate/rmate
export RMATE_PORT=&amp;quot;someSecretPortNumber&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;ol start=&#34;4&#34; style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;Edit your laptop’s &lt;code&gt;~/.ssh/config&lt;/code&gt; file so you don’t have to specify the port every time you &lt;code&gt;ssh&lt;/code&gt; into the &lt;code&gt;JHPCE&lt;/code&gt; cluster:&lt;/li&gt;
&lt;/ol&gt;
&lt;pre&gt;&lt;code&gt;## Will work later as (aka, less typing):
## ssh j
## ssh cluster
Host j cluster
    User yourUsernameGoesHere
    Hostname jhpce01.jhsph.edu
    RemoteForward someSecretPortNumber localhost:someSecretPortNumber
    ForwardX11 yes
    ForwardX11Trusted yes&lt;/code&gt;&lt;/pre&gt;
&lt;ol start=&#34;5&#34; style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;Edit your cluster’s &lt;code&gt;~/.ssh/config&lt;/code&gt; file so the port gets forwarded also when you access a compute node with &lt;code&gt;qrsh&lt;/code&gt;. All of JHPCE’s compute nodes are named &lt;code&gt;compute&lt;/code&gt;something, so we can take advantage of that in the config file.&lt;/li&gt;
&lt;/ol&gt;
&lt;pre&gt;&lt;code&gt;# For rmate
Host compute*
    RemoteForward someSecretPortNumber localhost:someSecretPortNumber&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Once you do these steps, &lt;code&gt;rmate&lt;/code&gt; should work on a new terminal window.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://media.giphy.com/media/3oeSAz6FqXCKuNFX6o/giphy.gif&#34;/&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://giphy.com/reactions/featured/good-luck&#34;&gt;Source&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;textmate-variables&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;&lt;em&gt;TextMate&lt;/em&gt; variables&lt;/h3&gt;
&lt;p&gt;I don’t remember right now if I manually edited the &lt;em&gt;TextMate&lt;/em&gt; variables, but just in case, here’s the info.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2018-03-11-textmate-setup-mac-only_files/Screen Shot 2018-03-11 at 2.34.38 PM.png&#34; alt=&#34;textmate vars&#34; width=&#34;500&#34;/&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;acknowledgments&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Acknowledgments&lt;/h3&gt;
&lt;p&gt;This blog post was made possible thanks to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;&lt;a href=&#34;http://bioconductor.org/packages/BiocStyle&#34;&gt;BiocStyle&lt;/a&gt;&lt;/em&gt; &lt;a id=&#39;cite-Oles_2017&#39;&gt;&lt;/a&gt;(&lt;a href=&#39;https://github.com/Bioconductor/BiocStyle&#39;&gt;Oleś, Morgan, and Huber, 2017&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=blogdown&#34;&gt;blogdown&lt;/a&gt;&lt;/em&gt; &lt;a id=&#39;cite-Xie_2017&#39;&gt;&lt;/a&gt;(&lt;a href=&#39;https://github.com/rstudio/blogdown&#39;&gt;Xie, Hill, and Thomas, 2017&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=devtools&#34;&gt;devtools&lt;/a&gt;&lt;/em&gt; &lt;a id=&#39;cite-Wickham_2018&#39;&gt;&lt;/a&gt;(&lt;a href=&#39;https://CRAN.R-project.org/package=devtools&#39;&gt;Wickham, Hester, and Chang, 2018&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=knitcitations&#34;&gt;knitcitations&lt;/a&gt;&lt;/em&gt; &lt;a id=&#39;cite-Boettiger_2017&#39;&gt;&lt;/a&gt;(&lt;a href=&#39;https://CRAN.R-project.org/package=knitcitations&#39;&gt;Boettiger, 2017&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div id=&#34;references&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;References&lt;/h3&gt;
&lt;p&gt;
&lt;a id=&#39;bib-Boettiger_2017&#39;&gt;&lt;/a&gt;&lt;a href=&#34;#cite-Boettiger_2017&#34;&gt;[1]&lt;/a&gt;&lt;cite&gt; C. Boettiger. &lt;em&gt;knitcitations: Citations for ‘Knitr’ Markdown Files&lt;/em&gt;. R package version 1.0.8. 2017. URL: &lt;a href=&#34;https://CRAN.R-project.org/package=knitcitations&#34;&gt;https://CRAN.R-project.org/package=knitcitations&lt;/a&gt;.&lt;/cite&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;a id=&#39;bib-Oles_2017&#39;&gt;&lt;/a&gt;&lt;a href=&#34;#cite-Oles_2017&#34;&gt;[2]&lt;/a&gt;&lt;cite&gt; A. Oleś, M. Morgan and W. Huber. &lt;em&gt;BiocStyle: Standard styles for vignettes and other Bioconductor documents&lt;/em&gt;. R package version 2.6.1. 2017. URL: &lt;a href=&#34;https://github.com/Bioconductor/BiocStyle&#34;&gt;https://github.com/Bioconductor/BiocStyle&lt;/a&gt;.&lt;/cite&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;a id=&#39;bib-Wickham_2018&#39;&gt;&lt;/a&gt;&lt;a href=&#34;#cite-Wickham_2018&#34;&gt;[3]&lt;/a&gt;&lt;cite&gt; H. Wickham, J. Hester and W. Chang. &lt;em&gt;devtools: Tools to Make Developing R Packages Easier&lt;/em&gt;. R package version 1.13.5. 2018. URL: &lt;a href=&#34;https://CRAN.R-project.org/package=devtools&#34;&gt;https://CRAN.R-project.org/package=devtools&lt;/a&gt;.&lt;/cite&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;a id=&#39;bib-Xie_2017&#39;&gt;&lt;/a&gt;&lt;a href=&#34;#cite-Xie_2017&#34;&gt;[4]&lt;/a&gt;&lt;cite&gt; Y. Xie, A. P. Hill and A. Thomas. &lt;em&gt;blogdown: Creating Websites with R Markdown&lt;/em&gt;. ISBN 978-0815363729. Boca Raton, Florida: Chapman and Hall/CRC, 2017. URL: &lt;a href=&#34;https://github.com/rstudio/blogdown&#34;&gt;https://github.com/rstudio/blogdown&lt;/a&gt;.&lt;/cite&gt;
&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;reproducibility&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Reproducibility&lt;/h3&gt;
&lt;pre&gt;&lt;code&gt;## Session info ----------------------------------------------------------------------------------------------------------&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##  setting  value                       
##  version  R version 3.4.3 (2017-11-30)
##  system   x86_64, darwin15.6.0        
##  ui       X11                         
##  language (EN)                        
##  collate  en_US.UTF-8                 
##  tz       America/New_York            
##  date     2018-03-11&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Packages --------------------------------------------------------------------------------------------------------------&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##  package       * version date       source                               
##  backports       1.1.2   2017-12-13 CRAN (R 3.4.3)                       
##  base          * 3.4.3   2017-12-07 local                                
##  bibtex          0.4.2   2017-06-30 CRAN (R 3.4.1)                       
##  BiocStyle     * 2.6.1   2017-11-30 Bioconductor                         
##  blogdown        0.5.10  2018-03-10 Github (lcolladotor/blogdown@471b086)
##  bookdown        0.7     2018-02-18 cran (@0.7)                          
##  colorout      * 1.2-0   2018-02-19 Github (jalvesaq/colorout@2f01173)   
##  compiler        3.4.3   2017-12-07 local                                
##  datasets      * 3.4.3   2017-12-07 local                                
##  devtools      * 1.13.5  2018-02-18 CRAN (R 3.4.3)                       
##  digest          0.6.15  2018-01-28 CRAN (R 3.4.3)                       
##  evaluate        0.10.1  2017-06-24 CRAN (R 3.4.1)                       
##  graphics      * 3.4.3   2017-12-07 local                                
##  grDevices     * 3.4.3   2017-12-07 local                                
##  htmltools       0.3.6   2017-04-28 CRAN (R 3.4.0)                       
##  httr            1.3.1   2017-08-20 CRAN (R 3.4.1)                       
##  jsonlite        1.5     2017-06-01 CRAN (R 3.4.0)                       
##  knitcitations * 1.0.8   2017-07-04 CRAN (R 3.4.1)                       
##  knitr           1.20    2018-02-20 cran (@1.20)                         
##  lubridate       1.7.3   2018-02-27 CRAN (R 3.4.3)                       
##  magrittr        1.5     2014-11-22 CRAN (R 3.4.0)                       
##  memoise         1.1.0   2017-04-21 CRAN (R 3.4.0)                       
##  methods       * 3.4.3   2017-12-07 local                                
##  plyr            1.8.4   2016-06-08 CRAN (R 3.4.0)                       
##  R6              2.2.2   2017-06-17 CRAN (R 3.4.0)                       
##  Rcpp            0.12.15 2018-01-20 CRAN (R 3.4.3)                       
##  RefManageR      0.14.20 2017-08-17 CRAN (R 3.4.1)                       
##  rmarkdown       1.9     2018-03-01 cran (@1.9)                          
##  rprojroot       1.3-2   2018-01-03 CRAN (R 3.4.3)                       
##  stats         * 3.4.3   2017-12-07 local                                
##  stringi         1.1.6   2017-11-17 CRAN (R 3.4.2)                       
##  stringr         1.3.0   2018-02-19 cran (@1.3.0)                        
##  tools           3.4.3   2017-12-07 local                                
##  utils         * 3.4.3   2017-12-07 local                                
##  withr           2.1.1   2017-12-19 CRAN (R 3.4.3)                       
##  xfun            0.1     2018-01-22 CRAN (R 3.4.3)                       
##  xml2            1.2.0   2018-01-24 CRAN (R 3.4.3)                       
##  yaml            2.1.18  2018-03-08 cran (@2.1.18)&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Contributing to the LIBD rstats club</title>
      <link>http://LieberInstitute.github.io/rstatsclub/2018/03/09/contributing-to-the-libd-rstats-club/</link>
      <pubDate>Fri, 09 Mar 2018 00:00:00 +0000</pubDate>
      <guid>http://LieberInstitute.github.io/rstatsclub/2018/03/09/contributing-to-the-libd-rstats-club/</guid>
      <description>


&lt;p&gt;In this blog post &lt;a href=&#34;https://www.libd.org/team/Leonardo-Collado-Torres/&#34;&gt;Leonardo Collado-Torres&lt;/a&gt; explains how to contribute posts to the &lt;strong&gt;LIBD rstats club&lt;/strong&gt;. While some parameters are specific to this blog, you could also use this for creating your own community blog.&lt;/p&gt;
&lt;div id=&#34;install-necessary-tools&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Install necessary tools&lt;/h3&gt;
&lt;p&gt;We first need to get the appropriate tools installed in our computer.&lt;/p&gt;
&lt;div id=&#34;r&#34; class=&#34;section level4&#34;&gt;
&lt;h4&gt;1. R&lt;/h4&gt;
&lt;p&gt;Lets start by installing the latest version of &lt;a href=&#34;https://cran.r-project.org/&#34;&gt;R&lt;/a&gt;. At the time of writing this post, that would be R 3.4.3 but in reality this should work with any R 3.4.x version. It might even work with earlier versions, but it’d be a bummer to find out that we have an R version problem later on.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;rstudio&#34; class=&#34;section level4&#34;&gt;
&lt;h4&gt;2. RStudio&lt;/h4&gt;
&lt;p&gt;Once we have an up to date R version, lets install &lt;a href=&#34;https://www.rstudio.com/products/rstudio/download/&#34;&gt;RStudio&lt;/a&gt;. By using the latest version we will have access to &lt;em&gt;RStudio addin&lt;/em&gt; menus, which will make our life much easier. Since we will be using RStudio quite a bit, it’s best to work from your laptop/computer than any server or cluster where you might not have the flexibility to install/update RStudio and R.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;blogdown&#34; class=&#34;section level4&#34;&gt;
&lt;h4&gt;3. &lt;em&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=blogdown&#34;&gt;blogdown&lt;/a&gt;&lt;/em&gt;&lt;/h4&gt;
&lt;p&gt;We next need to install the main package that we will be using for creating our blog posts, that is &lt;em&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=blogdown&#34;&gt;blogdown&lt;/a&gt;&lt;/em&gt; &lt;a id=&#39;cite-Xie_2017&#39;&gt;&lt;/a&gt;(&lt;a href=&#39;https://github.com/rstudio/blogdown&#39;&gt;Xie, Hill, and Thomas, 2017&lt;/a&gt;). At the time of writing this post, the version that we need to use is only available in the development branch. So we need to install &lt;em&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=blogdown&#34;&gt;blogdown&lt;/a&gt;&lt;/em&gt; via &lt;em&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=devtools&#34;&gt;devtools&lt;/a&gt;&lt;/em&gt; &lt;a id=&#39;cite-Wickham_2018&#39;&gt;&lt;/a&gt;(&lt;a href=&#39;https://CRAN.R-project.org/package=devtools&#39;&gt;Wickham, Hester, and Chang, 2018&lt;/a&gt;).&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;## If needed install devtools first:
install.packages(&amp;#39;devtools&amp;#39;)

## Install development version
devtools::install_github(&amp;#39;rstudio/blogdown&amp;#39;)

## If you are reading this in the fiture, you might only need
install.packages(&amp;#39;blogdown&amp;#39;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;At some point we might need two other extra packages that &lt;em&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=blogdown&#34;&gt;blogdown&lt;/a&gt;&lt;/em&gt; uses. It will show you a message when that happens, so you can install them when you need them or you could go ahead and get them anyway.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;install.packages(c(&amp;#39;later&amp;#39;, &amp;#39;processx&amp;#39;))&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;hugo&#34; class=&#34;section level4&#34;&gt;
&lt;h4&gt;4. &lt;code&gt;hugo&lt;/code&gt;&lt;/h4&gt;
&lt;p&gt;We are almost there! &lt;em&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=blogdown&#34;&gt;blogdown&lt;/a&gt;&lt;/em&gt; uses &lt;a href=&#34;https://gohugo.io/&#34;&gt;hugo&lt;/a&gt; which they claim is &lt;em&gt;the world’s fastest framework for building websites&lt;/em&gt;. &lt;code&gt;hugo&lt;/code&gt; is a bit complicated, but it makes maintaining a complicated website such as a blog super easy. Basically, we will be working in the &lt;code&gt;rstatsclubsource&lt;/code&gt; directory and &lt;code&gt;hugo&lt;/code&gt; will create the final version we can share around in the directory &lt;code&gt;rstatsclubsource/public&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Ok, lets go head and install it with&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;blogdown::install_hugo()&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;git&#34; class=&#34;section level4&#34;&gt;
&lt;h4&gt;5. &lt;code&gt;git&lt;/code&gt;&lt;/h4&gt;
&lt;p&gt;To get the files and interact with other &lt;strong&gt;LIBD rstats club&lt;/strong&gt; members you will need to use &lt;code&gt;git&lt;/code&gt;, which you can install following &lt;a href=&#34;http://happygitwithr.com/install-git.html&#34;&gt;these instructions&lt;/a&gt; from the &lt;a href=&#34;http://happygitwithr.com/&#34;&gt;happygitwithr&lt;/a&gt; website. A note for Windows users: &lt;a href=&#34;https://gitforwindows.org/&#34;&gt;get git for windows&lt;/a&gt; because it includes &lt;code&gt;git bash&lt;/code&gt; and will enable you to run some commands that we will need later on. Even more advanced for Windows users, when installing &lt;code&gt;git bash&lt;/code&gt; choose the &lt;em&gt;use git and optional Unix tools from the Windows Command Prompt&lt;/em&gt; as described &lt;a href=&#34;https://github.com/rstudio/rstudio/issues/2224#issuecomment-368260312&#34;&gt;in detail here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Also get a &lt;a href=&#34;https://github.com/&#34;&gt;GitHub&lt;/a&gt; account if you don’t have one already. Optionally &lt;a href=&#34;https://help.github.com/articles/connecting-to-github-with-ssh/&#34;&gt;set up ssh keys&lt;/a&gt; for password-less login, though it’s not needed.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;access-blog-source-files&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Access blog source files&lt;/h3&gt;
&lt;p&gt;The file structure of our blog involves a total of 3 GitHub repositories that are related to each other as shown below. However, you will only need to interact with one of them.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;rstatsclubsource
    ## From https://github.com/LieberInstitute/rstatsclubsource
    content/
        post/
        ...
    static/
        post/
        img/
        ...
    themes/
        hugo-academic/
        ## From https://github.com/gcushen/hugo-academic
        ## linked as a git submodule
    public/
        ## https://github.com/LieberInstitute/rstatsclub
        ## Hosts the files that make up the website
        ## http://LieberInstitute.github.io/rstatsclub/&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The main directory for the blog is &lt;code&gt;rstatsclubsource&lt;/code&gt; and you can access it at &lt;a href=&#34;https://github.com/LieberInstitute/rstatsclubsource&#34;&gt;github.com/LieberInstitute/rstatsclubsource&lt;/a&gt; if you are a member of the &lt;strong&gt;LIBD rstats club&lt;/strong&gt; and have logged in to your GitHub account. This directory contains many files that &lt;em&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=blogdown&#34;&gt;blogdown&lt;/a&gt;&lt;/em&gt; understands and that you don’t really need to change. The main sub-directory that you will be interacting with is &lt;code&gt;rstatsclubsource/content/post&lt;/code&gt; where your new post files will get saved by &lt;em&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=blogdown&#34;&gt;blogdown&lt;/a&gt;&lt;/em&gt;. If you insert images, &lt;em&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=blogdown&#34;&gt;blogdown&lt;/a&gt;&lt;/em&gt; will automatically create the directory &lt;code&gt;rstatsclubsource/static/post/2018-03-09_postTitle_files&lt;/code&gt; (&lt;a href=&#34;https://github.com/LieberInstitute/rstatsclubsource/tree/master/static/post/2018-03-06-test-post-for-checking-website_files&#34;&gt;example&lt;/a&gt;), but don’t worry about it.&lt;/p&gt;
&lt;p&gt;Ok, lets get the files. Open your terminal or &lt;code&gt;git bash&lt;/code&gt; (Windows) and navigate to the directory where you will store your copy of the source files.&lt;/p&gt;
&lt;pre class=&#34;bash&#34;&gt;&lt;code&gt;## Works in Mac and Windows (with git bash)
cd ~/Desktop

## Windows example command if you have more than 1 disk
cd /f/Desktop&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Next clone &lt;a href=&#34;https://github.com/LieberInstitute/rstatsclubsource&#34;&gt;github.com/LieberInstitute/rstatsclubsource&lt;/a&gt;. Here we use https since that doesn’t require extra setup, but cloning by ssh also works.&lt;/p&gt;
&lt;pre class=&#34;bash&#34;&gt;&lt;code&gt;git clone https://github.com/LieberInstitute/rstatsclubsource.git&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2018-03-09-contributing-to-the-libd-rstats-club_files/Screen Shot 2018-03-08 at 8.47.18 PM.png&#34; alt=&#34;clone source repo&#34; width=&#34;500&#34;/&gt;&lt;/p&gt;
&lt;p&gt;After a successful clone, we should have created the directory &lt;code&gt;~/Desktop/rstatsclubsource&lt;/code&gt;. Lets change to this new directory.&lt;/p&gt;
&lt;pre class=&#34;bash&#34;&gt;&lt;code&gt;cd rstatsclubsource&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Our cloning process also created a placeholder for our hugo theme (&lt;a href=&#34;https://github.com/gcushen/hugo-academic&#34;&gt;hugo-academic&lt;/a&gt;) but it’s contents are empty.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ ls -lh themes/hugo-academic/
total 0&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Lets fill them up by running the following git command.&lt;/p&gt;
&lt;pre class=&#34;bash&#34;&gt;&lt;code&gt;git submodule update --init --recursive&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2018-03-09-contributing-to-the-libd-rstats-club_files/Screen Shot 2018-03-08 at 9.05.56 PM.png&#34; alt=&#34;git submodule&#34; width=&#34;500&#34;/&gt;&lt;/p&gt;
&lt;p&gt;If we now check the contents of the &lt;code&gt;themes/hugo-academic&lt;/code&gt; directory we should see several files now (here’s a screenshot from Windows using &lt;code&gt;git bash&lt;/code&gt;).&lt;/p&gt;
&lt;pre class=&#34;bash&#34;&gt;&lt;code&gt;ls -lh themes/hugo-academic&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2018-03-09-contributing-to-the-libd-rstats-club_files/theme files.PNG&#34; alt=&#34;windows theme files&#34; width=&#34;500&#34;/&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;preview-website&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Preview website&lt;/h3&gt;
&lt;p&gt;Our setup is now complete! We can now start writing blog posts using &lt;em&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=blogdown&#34;&gt;blogdown&lt;/a&gt;&lt;/em&gt;. Open the &lt;code&gt;~/Desktop/rstatsclubsource/rstatsclubsource.Rproj&lt;/code&gt; file, which should open a new RStudio window automatically. Find the &lt;em&gt;Addins&lt;/em&gt; menu on the top of your window, and select the &lt;code&gt;Serve Site&lt;/code&gt; option in the drop-down menu.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2018-03-09-contributing-to-the-libd-rstats-club_files/Screen Shot 2018-03-08 at 9.35.34 PM.png&#34; alt=&#34;serve site&#34; width=&#34;500&#34;/&gt;&lt;/p&gt;
&lt;p&gt;This addin will create a local version of the website you can preview in the RStudio Viewer pane.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2018-03-09-contributing-to-the-libd-rstats-club_files/Screen Shot 2018-03-08 at 9.19.02 PM.png&#34; alt=&#34;RStudio viewer pane&#34; width=&#34;500&#34;/&gt;&lt;/p&gt;
&lt;p&gt;If you click in the &lt;em&gt;show in new window&lt;/em&gt; symbol as shown below&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2018-03-09-contributing-to-the-libd-rstats-club_files/Screen%20Shot%202018-03-08%20at%209.20.28%20PM.png&#34; /&gt;

&lt;/div&gt;
&lt;p&gt;you then can preview the website in your main browser:&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2018-03-09-contributing-to-the-libd-rstats-club_files/Screen Shot 2018-03-08 at 9.21.44 PM.png&#34; alt=&#34;preview browser&#34; width=&#34;500&#34;/&gt;&lt;/p&gt;
&lt;p&gt;This will be super useful when you are writing a new blog post because any changes you make will automatically refresh the local preview version after a second of two (only after you save the file you are editing). Sometimes you might have to manually refresh your browser (like when you make too many updates in a row). The preview website works as long as you have the &lt;em&gt;Serve Site&lt;/em&gt; addin running on the background.&lt;/p&gt;
&lt;div id=&#34;all-our-files&#34; class=&#34;section level4&#34;&gt;
&lt;h4&gt;All our files&lt;/h4&gt;
&lt;p&gt;By running &lt;em&gt;Serve Site&lt;/em&gt; &lt;em&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=blogdown&#34;&gt;blogdown&lt;/a&gt;&lt;/em&gt; actually populated automatically our &lt;code&gt;rstatsclubsource/public&lt;/code&gt; directory. So our full list of files look somewhat like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;## Found about tree from
## https://stackoverflow.com/questions/3455625/linux-command-to-print-directory-structure-in-the-form-of-a-tree
## Intall in a Mac with homebrew:
## brew install tree
$ tree -d rstatsclubsource
.
├── archetypes
├── content
│   ├── home
│   └── post
├── data
│   ├── fonts
│   └── themes
├── layouts
│   └── partials
├── public
│   ├── 2018
│   │   └── 03
│   │       ├── 06
│   │       │   └── test-post-for-checking-website
│   │       └── 09
│   │           └── welcome-to-the-libd-rstats-club
│   ├── categories
│   │   ├── page
│   │   │   └── 1
│   │   ├── rstats
│   │   │   └── page
│   │   │       └── 1
│   │   └── setup
│   │       └── page
│   │           └── 1
│   ├── home
│   ├── img
│   ├── js
│   ├── post
│   │   ├── 2018-03-06-test-post-for-checking-website_files
│   │   │   └── figure-html
│   │   └── page
│   │       └── 1
│   ├── publication_types
│   │   └── page
│   │       └── 1
│   └── tags
│       ├── blog
│       │   └── page
│       │       └── 1
│       └── page
│           └── 1
├── static
│   ├── img
│   └── post
│       └── 2018-03-06-test-post-for-checking-website_files
│           └── figure-html
└── themes
    └── hugo-academic
        ├── archetypes
        ├── data
        ├── exampleSite
        ├── i18n
        ├── images
        ├── layouts
        └── static&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Some of the structure looks redundant right? Well, that’s because &lt;code&gt;hugo&lt;/code&gt; and &lt;em&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=blogdown&#34;&gt;blogdown&lt;/a&gt;&lt;/em&gt; tried to keep everything super organized and re-use some of the file structure for the final version of the website (the one inside &lt;code&gt;rstatsclubsource/public&lt;/code&gt;).&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;write-a-blog-post&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Write a blog post&lt;/h3&gt;
&lt;p&gt;We now have a working full preview version of the website in our computer. It’s finally time to write a blog post. The good thing is that we don’t need to do all the setup steps again!&lt;/p&gt;
&lt;div id=&#34;update-if-necessary&#34; class=&#34;section level4&#34;&gt;
&lt;h4&gt;Update if necessary&lt;/h4&gt;
&lt;p&gt;If you paused midway for a few days, it’s likely that your files are not the latest ones. So please &lt;code&gt;git pull&lt;/code&gt; the latest files from &lt;a href=&#34;https://github.com/LieberInstitute/rstatsclubsource&#34;&gt;github.com/LieberInstitute/rstatsclubsource&lt;/a&gt;.&lt;/p&gt;
&lt;pre class=&#34;bash&#34;&gt;&lt;code&gt;cd ~/Desktop/rstatsclubsource
git pull&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;start-a-blog-post&#34; class=&#34;section level4&#34;&gt;
&lt;h4&gt;Start a blog post&lt;/h4&gt;
&lt;p&gt;To start a new blog post, use the &lt;em&gt;New Post&lt;/em&gt; &lt;em&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=blogdown&#34;&gt;blogdown&lt;/a&gt;&lt;/em&gt; addin.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2018-03-09-contributing-to-the-libd-rstats-club_files/Screen Shot 2018-03-08 at 9.18.21 PM.png&#34; alt=&#34;new post&#34; width=&#34;500&#34;/&gt;&lt;/p&gt;
&lt;p&gt;This will open up a window where you can specify the blog post title. The title will automatically fill out the &lt;em&gt;slug&lt;/em&gt; which is used in the final URL of the post. It also lets you choose a date, when is when the blog post will be made publicly visible.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2018-03-09-contributing-to-the-libd-rstats-club_files/Screen Shot 2018-03-08 at 6.41.49 PM.png&#34; alt=&#34;New post addin&#34; width=&#34;500&#34;/&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;author-and-extension&#34; class=&#34;section level4&#34;&gt;
&lt;h4&gt;Author and extension&lt;/h4&gt;
&lt;p&gt;Lets start by entering the author information (leave blank if you want it to be anonymous) and selecting the post format. We highly recommend you use the &lt;code&gt;.Rmd&lt;/code&gt; format for your blog posts, even to the point that you should just make it our default option. To do so you have to edit/create an &lt;code&gt;.Rprofile&lt;/code&gt; file in your computer at your home directory, that is &lt;code&gt;~/.Rprofile&lt;/code&gt;. Then use these options (with how you prefer the author information to be).&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# https://bookdown.org/yihui/blogdown/global-options.html
options(blogdown.author = &amp;#39;L. Collado-Torres&amp;#39;)
options(blogdown.ext = &amp;#39;.Rmd&amp;#39;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The next time you write a blog post, &lt;em&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=blogdown&#34;&gt;blogdown&lt;/a&gt;&lt;/em&gt; will know what options you prefer.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2018-03-09-contributing-to-the-libd-rstats-club_files/Screen Shot 2018-03-08 at 6.42.04 PM.png&#34; alt=&#34;Author and extension&#34; width=&#34;500&#34;/&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;use-a-blog-template&#34; class=&#34;section level4&#34;&gt;
&lt;h4&gt;Use a blog template&lt;/h4&gt;
&lt;p&gt;Next lets choose the blog archetype (template) that for posts, that is &lt;code&gt;post&lt;/code&gt; under &lt;code&gt;Archetype&lt;/code&gt;.&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2018-03-09-contributing-to-the-libd-rstats-club_files/Screen%20Shot%202018-03-08%20at%206.42.23%20PM.png&#34; /&gt;

&lt;/div&gt;
&lt;p&gt;The reasons why a blog template is useful are described in full detail &lt;a href=&#34;http://lcolladotor.github.io/2018/03/08/blogdown-archetype-template/#.WqH1WJPwY0o&#34;&gt;in this blog post by L. Collado-Torres&lt;/a&gt;. The quick version is that it will pre-populate your new blog post with helpful information and reminders on how to do &lt;span class=&#34;math inline&#34;&gt;\(X\)&lt;/span&gt; or &lt;span class=&#34;math inline&#34;&gt;\(Y\)&lt;/span&gt;. For example, how to &lt;a href=&#34;http://lcolladotor.github.io/2018/03/07/blogdown-insert-image-addin/#.WqH1bJPwY0o&#34;&gt;insert external images&lt;/a&gt;, how to control the figure height and width for images generated by R, how to cite packages, and including the R reproducibility information by default.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;use-appropriate-post-categories&#34; class=&#34;section level4&#34;&gt;
&lt;h4&gt;Use appropriate post categories&lt;/h4&gt;
&lt;p&gt;Most of our blog posts will likely be about R. For those blog posts, select the &lt;code&gt;rstats&lt;/code&gt; category as shown below.&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2018-03-09-contributing-to-the-libd-rstats-club_files/Screen%20Shot%202018-03-08%20at%206.42.33%20PM.png&#34; /&gt;

&lt;/div&gt;
&lt;p&gt;If your blog post covers a topic that might be useful to new LIBD members use the &lt;code&gt;setup&lt;/code&gt; category. If it involves using the JHPCE cluster, use &lt;code&gt;jhpce&lt;/code&gt; as a category, etc. The &lt;em&gt;New Post&lt;/em&gt; &lt;em&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=blogdown&#34;&gt;blogdown&lt;/a&gt;&lt;/em&gt; addin lets you see which categories and tags have been used before to classify posts. In general, we should aim to have a handful of categories while having as many tags as necessary. These will be useful for filtering our blog posts.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;hit-done-well-check-before&#34; class=&#34;section level4&#34;&gt;
&lt;h4&gt;Hit done! (well check before)&lt;/h4&gt;
&lt;p&gt;We are now basically set to start our new blog post. Just double check:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;the categories,&lt;/li&gt;
&lt;li&gt;that the format is &lt;code&gt;.Rmd&lt;/code&gt;,&lt;/li&gt;
&lt;li&gt;that the archetype is &lt;code&gt;post&lt;/code&gt;,&lt;/li&gt;
&lt;li&gt;post title,&lt;/li&gt;
&lt;li&gt;post date,&lt;/li&gt;
&lt;li&gt;post author.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/post/2018-03-09-contributing-to-the-libd-rstats-club_files/Screen Shot 2018-03-08 at 6.42.53 PM.png&#34; alt=&#34;&#34; width=&#34;500&#34;/&gt;&lt;/p&gt;
&lt;p&gt;Aaaaand hit &lt;strong&gt;done&lt;/strong&gt;!&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;edit-your-blog-post&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Edit your blog post&lt;/h3&gt;
&lt;p&gt;You can now write your blog post using the &lt;a href=&#34;https://rmarkdown.rstudio.com/lesson-2.html&#34;&gt;R Markdown syntax&lt;/a&gt;. Note that you won’t be using the &lt;code&gt;knit&lt;/code&gt; button from RStudio since the &lt;em&gt;Serve Site&lt;/em&gt; &lt;em&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=blogdown&#34;&gt;blogdown&lt;/a&gt;&lt;/em&gt; addin should be doing all the work for you and updating your local website preview.&lt;/p&gt;
&lt;p&gt;When writing your blog post keep in mind our &lt;a href=&#34;http://research.libd.org/rstatsclub/#coc&#34;&gt;Code of Conduct&lt;/a&gt; and that your our blog posts have to comply with the confidentiality agreement we signed at LIBD.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;share-your-blog-post&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Share your blog post&lt;/h3&gt;
&lt;p&gt;Once you are happy with the final version of your blog post and have used the spell check from RStudio, add your blog post files to &lt;a href=&#34;https://github.com/LieberInstitute/rstatsclubsource&#34;&gt;github.com/LieberInstitute/rstatsclubsource&lt;/a&gt;. First, remember to &lt;code&gt;git pull&lt;/code&gt; in case that there are new files in the repository that you don’t have. Next check which files you modified by running &lt;code&gt;git status&lt;/code&gt;&lt;/p&gt;
&lt;pre class=&#34;bash&#34;&gt;&lt;code&gt;cd ~/Desktop/rstatsclubsource
git status&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Typically, you should see that you have untracked files under &lt;code&gt;rstatsclubsource/content/post&lt;/code&gt; and maybe &lt;code&gt;rstatsclubsource/static/post/&lt;/code&gt;. If so, add them to the github repository with the following sequence of commands.&lt;/p&gt;
&lt;pre class=&#34;bash&#34;&gt;&lt;code&gt;git add content/post/*
git add static/post/*
git commit -m &amp;quot;Short description of your new blog post&amp;quot;
git push&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You could also do this with a git GUI such as the git tools in RStudio, &lt;a href=&#34;https://www.sourcetreeapp.com/&#34;&gt;SourceTree&lt;/a&gt; (works for both Mac and Windows), the &lt;code&gt;gitk&lt;/code&gt; command, etc.&lt;/p&gt;
&lt;p&gt;After a quick review we will update the files at &lt;a href=&#34;https://github.com/LieberInstitute/rstatsclub&#34;&gt;github.com/LieberInstitute/rstatsclub&lt;/a&gt; and deploy the changes to GitHub.&lt;/p&gt;
&lt;p&gt;Good luck with your first of many &lt;em&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=blogdown&#34;&gt;blogdown&lt;/a&gt;&lt;/em&gt; posts!!!&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://media.giphy.com/media/3ornk7TgUdhjhTYgta/giphy.gif&#34;/&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://giphy.com/reactions/featured/good-luck&#34;&gt;Source&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;details&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Details&lt;/h3&gt;
&lt;p&gt;We are keeping both &lt;a href=&#34;https://github.com/LieberInstitute/rstatsclubsource&#34;&gt;github.com/LieberInstitute/rstatsclubsource&lt;/a&gt; and &lt;a href=&#34;https://github.com/LieberInstitute/rstatsclub&#34;&gt;github.com/LieberInstitute/rstatsclub&lt;/a&gt; as private repositories to enable contributors to write posts anonymously.&lt;/p&gt;
&lt;p&gt;The default license for our blog posts is (CC) BY-NC-SA 4.0 which you can read more about &lt;a href=&#34;https://creativecommons.org/licenses/by-nc-sa/4.0/&#34;&gt;here&lt;/a&gt;. If you write a blog post under a different license, please make it clear. Also please attribute the sources of the material you use.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;further-reading&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Further reading&lt;/h3&gt;
&lt;p&gt;If you want to learn even more, check the &lt;a href=&#34;https://bookdown.org/yihui/blogdown/&#34;&gt;blogdown book&lt;/a&gt; and the &lt;a href=&#34;https://github.com/gcushen/hugo-academic&#34;&gt;hugo-academic theme&lt;/a&gt;. Maybe you can help with &lt;a href=&#34;https://github.com/gcushen/hugo-academic/issues/467&#34;&gt;this currently unresolved issue&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;If you want to check a public version equivalent to the one described here, check:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/lcolladotor/hugoblog&#34;&gt;github.com/lcolladotor/hugoblog&lt;/a&gt; which is the equivalent of &lt;a href=&#34;https://github.com/LieberInstitute/rstatsclubsource&#34;&gt;github.com/LieberInstitute/rstatsclubsource&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/lcolladotor/lcolladotor.github.com&#34;&gt;github.com/lcolladotor/lcolladotor.github.com&lt;/a&gt; which is the equivalent of &lt;a href=&#34;https://github.com/LieberInstitute/rstatsclub&#34;&gt;github.com/LieberInstitute/rstatsclub&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;hugoblog
    ## From https://github.com/lcolladotor/hugoblog
    content/
        post/
        ...
    static/
        post/
        img/
        ...
    themes/
        hugo-academic/
        ## From https://github.com/gcushen/hugo-academic
        ## linked as a git submodule
    public/
        ## https://github.com/lcolladotor/lcolladotor.github.com
        ## Hosts the files that make up the website
        ## http://lcolladotor.github.io/&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;acknowledgments&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Acknowledgments&lt;/h3&gt;
&lt;p&gt;This blog post was made possible thanks to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;&lt;a href=&#34;http://bioconductor.org/packages/BiocStyle&#34;&gt;BiocStyle&lt;/a&gt;&lt;/em&gt; &lt;a id=&#39;cite-Oles_2017&#39;&gt;&lt;/a&gt;(&lt;a href=&#39;https://github.com/Bioconductor/BiocStyle&#39;&gt;Oleś, Morgan, and Huber, 2017&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=blogdown&#34;&gt;blogdown&lt;/a&gt;&lt;/em&gt; (&lt;a href=&#39;https://github.com/rstudio/blogdown&#39;&gt;Xie, Hill, and Thomas, 2017&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=devtools&#34;&gt;devtools&lt;/a&gt;&lt;/em&gt; (&lt;a href=&#39;https://CRAN.R-project.org/package=devtools&#39;&gt;Wickham, Hester, and Chang, 2018&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=knitcitations&#34;&gt;knitcitations&lt;/a&gt;&lt;/em&gt; &lt;a id=&#39;cite-Boettiger_2017&#39;&gt;&lt;/a&gt;(&lt;a href=&#39;https://CRAN.R-project.org/package=knitcitations&#39;&gt;Boettiger, 2017&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div id=&#34;references&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;References&lt;/h3&gt;
&lt;p&gt;
&lt;a id=&#39;bib-Boettiger_2017&#39;&gt;&lt;/a&gt;&lt;a href=&#34;#cite-Boettiger_2017&#34;&gt;[1]&lt;/a&gt;&lt;cite&gt; C. Boettiger. &lt;em&gt;knitcitations: Citations for ‘Knitr’ Markdown Files&lt;/em&gt;. R package version 1.0.8. 2017. URL: &lt;a href=&#34;https://CRAN.R-project.org/package=knitcitations&#34;&gt;https://CRAN.R-project.org/package=knitcitations&lt;/a&gt;.&lt;/cite&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;a id=&#39;bib-Oles_2017&#39;&gt;&lt;/a&gt;&lt;a href=&#34;#cite-Oles_2017&#34;&gt;[2]&lt;/a&gt;&lt;cite&gt; A. Oleś, M. Morgan and W. Huber. &lt;em&gt;BiocStyle: Standard styles for vignettes and other Bioconductor documents&lt;/em&gt;. R package version 2.6.1. 2017. URL: &lt;a href=&#34;https://github.com/Bioconductor/BiocStyle&#34;&gt;https://github.com/Bioconductor/BiocStyle&lt;/a&gt;.&lt;/cite&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;a id=&#39;bib-Wickham_2018&#39;&gt;&lt;/a&gt;&lt;a href=&#34;#cite-Wickham_2018&#34;&gt;[3]&lt;/a&gt;&lt;cite&gt; H. Wickham, J. Hester and W. Chang. &lt;em&gt;devtools: Tools to Make Developing R Packages Easier&lt;/em&gt;. R package version 1.13.5. 2018. URL: &lt;a href=&#34;https://CRAN.R-project.org/package=devtools&#34;&gt;https://CRAN.R-project.org/package=devtools&lt;/a&gt;.&lt;/cite&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;a id=&#39;bib-Xie_2017&#39;&gt;&lt;/a&gt;&lt;a href=&#34;#cite-Xie_2017&#34;&gt;[4]&lt;/a&gt;&lt;cite&gt; Y. Xie, A. P. Hill and A. Thomas. &lt;em&gt;blogdown: Creating Websites with R Markdown&lt;/em&gt;. ISBN 978-0815363729. Boca Raton, Florida: Chapman and Hall/CRC, 2017. URL: &lt;a href=&#34;https://github.com/rstudio/blogdown&#34;&gt;https://github.com/rstudio/blogdown&lt;/a&gt;.&lt;/cite&gt;
&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;reproducibility&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Reproducibility&lt;/h3&gt;
&lt;pre&gt;&lt;code&gt;## Session info ----------------------------------------------------------------------------------------------------------&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##  setting  value                       
##  version  R version 3.4.3 (2017-11-30)
##  system   x86_64, darwin15.6.0        
##  ui       X11                         
##  language (EN)                        
##  collate  en_US.UTF-8                 
##  tz       America/New_York            
##  date     2018-03-10&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Packages --------------------------------------------------------------------------------------------------------------&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##  package       * version date       source                            
##  backports       1.1.2   2017-12-13 CRAN (R 3.4.3)                    
##  base          * 3.4.3   2017-12-07 local                             
##  bibtex          0.4.2   2017-06-30 CRAN (R 3.4.1)                    
##  BiocStyle     * 2.6.1   2017-11-30 Bioconductor                      
##  blogdown        0.5.9   2018-03-08 Github (rstudio/blogdown@dc1f41c) 
##  bookdown        0.7     2018-02-18 cran (@0.7)                       
##  colorout      * 1.2-0   2018-02-19 Github (jalvesaq/colorout@2f01173)
##  compiler        3.4.3   2017-12-07 local                             
##  datasets      * 3.4.3   2017-12-07 local                             
##  devtools      * 1.13.5  2018-02-18 CRAN (R 3.4.3)                    
##  digest          0.6.15  2018-01-28 CRAN (R 3.4.3)                    
##  evaluate        0.10.1  2017-06-24 CRAN (R 3.4.1)                    
##  graphics      * 3.4.3   2017-12-07 local                             
##  grDevices     * 3.4.3   2017-12-07 local                             
##  htmltools       0.3.6   2017-04-28 CRAN (R 3.4.0)                    
##  httr            1.3.1   2017-08-20 CRAN (R 3.4.1)                    
##  jsonlite        1.5     2017-06-01 CRAN (R 3.4.0)                    
##  knitcitations * 1.0.8   2017-07-04 CRAN (R 3.4.1)                    
##  knitr           1.20    2018-02-20 cran (@1.20)                      
##  lubridate       1.7.3   2018-02-27 CRAN (R 3.4.3)                    
##  magrittr        1.5     2014-11-22 CRAN (R 3.4.0)                    
##  memoise         1.1.0   2017-04-21 CRAN (R 3.4.0)                    
##  methods       * 3.4.3   2017-12-07 local                             
##  plyr            1.8.4   2016-06-08 CRAN (R 3.4.0)                    
##  R6              2.2.2   2017-06-17 CRAN (R 3.4.0)                    
##  Rcpp            0.12.15 2018-01-20 CRAN (R 3.4.3)                    
##  RefManageR      0.14.20 2017-08-17 CRAN (R 3.4.1)                    
##  rmarkdown       1.9     2018-03-01 cran (@1.9)                       
##  rprojroot       1.3-2   2018-01-03 CRAN (R 3.4.3)                    
##  stats         * 3.4.3   2017-12-07 local                             
##  stringi         1.1.6   2017-11-17 CRAN (R 3.4.2)                    
##  stringr         1.3.0   2018-02-19 cran (@1.3.0)                     
##  tools           3.4.3   2017-12-07 local                             
##  utils         * 3.4.3   2017-12-07 local                             
##  withr           2.1.1   2017-12-19 CRAN (R 3.4.3)                    
##  xfun            0.1     2018-01-22 CRAN (R 3.4.3)                    
##  xml2            1.2.0   2018-01-24 CRAN (R 3.4.3)                    
##  yaml            2.1.17  2018-02-27 cran (@2.1.17)&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Welcome to the LIBD rstats club!</title>
      <link>http://LieberInstitute.github.io/rstatsclub/2018/03/09/welcome-to-the-libd-rstats-club/</link>
      <pubDate>Fri, 09 Mar 2018 00:00:00 +0000</pubDate>
      <guid>http://LieberInstitute.github.io/rstatsclub/2018/03/09/welcome-to-the-libd-rstats-club/</guid>
      <description>


&lt;p&gt;Welcome to the &lt;strong&gt;LIBD rstats club&lt;/strong&gt;!&lt;/p&gt;
&lt;p&gt;A few months ago &lt;a href=&#34;https://www.libd.org/team/Carrie-Wright/&#34;&gt;Carrie Wright&lt;/a&gt; and &lt;a href=&#34;https://www.libd.org/team/Leonardo-Collado-Torres/&#34;&gt;Leonardo Collado-Torres&lt;/a&gt; started an &lt;em&gt;R + Journal&lt;/em&gt; club where we have been meeting every other week to talk about &lt;a href=&#34;https://cran.r-project.org/&#34;&gt;R&lt;/a&gt; packages and discuss journal articles in our field. Some examples of what we covered are &lt;em&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=tidyverse&#34;&gt;tidyverse&lt;/a&gt;&lt;/em&gt; &lt;a id=&#39;cite-Wickham_2017&#39;&gt;&lt;/a&gt;(&lt;a href=&#39;https://CRAN.R-project.org/package=tidyverse&#39;&gt;Wickham, 2017&lt;/a&gt;), &lt;em&gt;&lt;a href=&#34;http://bioconductor.org/packages/BiocStyle&#34;&gt;BiocStyle&lt;/a&gt;&lt;/em&gt; &lt;a id=&#39;cite-Oles_2017&#39;&gt;&lt;/a&gt;(&lt;a href=&#39;https://github.com/Bioconductor/BiocStyle&#39;&gt;Oleś, Morgan, and Huber, 2017&lt;/a&gt;) and &lt;em&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=rmarkdown&#34;&gt;rmarkdown&lt;/a&gt;&lt;/em&gt; &lt;a id=&#39;cite-Allaire_2018&#39;&gt;&lt;/a&gt;(&lt;a href=&#39;https://CRAN.R-project.org/package=rmarkdown&#39;&gt;Allaire, Xie, McPherson, Luraschi, et al., 2018&lt;/a&gt;) presentations. We are now taking the R portion of the club to the next level.&lt;/p&gt;
&lt;p&gt;Like we say in the about section, we are researchers at &lt;a href=&#34;https://www.libd.org/&#34;&gt;LIBD&lt;/a&gt; that frequently use &lt;a href=&#34;https://cran.r-project.org/&#34;&gt;R&lt;/a&gt; and other tools. The &lt;a href=&#34;https://twitter.com/search?q=%23rstats&#34;&gt;R community&lt;/a&gt; has been &lt;a href=&#34;https://stackoverflow.blog/2017/10/10/impressive-growth-r/&#34;&gt;growing a lot in recent years&lt;/a&gt; and it’s not easy to keep ourselves updated on all the recent developments. We also work in a dynamic environment where new people join LIBD frequently as students, post-docs and staff.&lt;/p&gt;
&lt;p&gt;In the &lt;strong&gt;LIBD rstats club&lt;/strong&gt; we plan to write blog posts about R packages we are interested in or are just learning how to use, &lt;em&gt;how to do&lt;/em&gt; guides, and occasionally our &lt;a href=&#34;http://github.com/LieberInstitute&#34;&gt;own open-source software&lt;/a&gt;. For our &lt;em&gt;how to do&lt;/em&gt; guides, the idea is that if two people asks us how to do &lt;span class=&#34;math inline&#34;&gt;\(X\)&lt;/span&gt;, then it’s probably a good time to write a blog post. Similar to &lt;a href=&#34;https://twitter.com/drob&#34;&gt;David Robinson&lt;/a&gt;’s advice:&lt;/p&gt;
&lt;blockquote class=&#34;twitter-tweet&#34; data-lang=&#34;en&#34;&gt;
&lt;p lang=&#34;en&#34; dir=&#34;ltr&#34;&gt;
When you’ve written the same code 3 times, write a function&lt;br&gt;&lt;br&gt;When you’ve given the same in-person advice 3 times, write a blog post
&lt;/p&gt;
— David Robinson (&lt;span class=&#34;citation&#34;&gt;@drob&lt;/span&gt;) &lt;a href=&#34;https://twitter.com/drob/status/928447584712253440?ref_src=twsrc%5Etfw&#34;&gt;November 9, 2017&lt;/a&gt;
&lt;/blockquote&gt;
&lt;script async src=&#34;https://platform.twitter.com/widgets.js&#34; charset=&#34;utf-8&#34;&gt;&lt;/script&gt;
&lt;p&gt;We are doing this to help each other at &lt;a href=&#34;https://www.libd.org/&#34;&gt;LIBD&lt;/a&gt;, to help future new colleagues and students, and the community at large. We also hope to receive feedback from the community (that complies with our &lt;a href=&#34;http://LieberInstitute.github.io/rstatsclub/#coc&#34;&gt;code of conduct&lt;/a&gt;) that will be beneficial to everyone reading the posts. Maybe you know of an alternative way to do the same task we describe.&lt;/p&gt;
&lt;p&gt;Thank you for tuning in! Follow us via &lt;a href=&#34;http://feeds.feedburner.com/LIBDrstatsclub&#34;&gt;RSS&lt;/a&gt; (allows email subscriptions) or &lt;a href=&#34;https://twitter.com/LIBDrstats&#34;&gt;Twitter&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Founding members: &lt;a href=&#34;https://www.libd.org/team/Leonardo-Collado-Torres/&#34;&gt;Leonardo Collado-Torres&lt;/a&gt;, &lt;a href=&#34;https://www.libd.org/team/Carrie-Wright/&#34;&gt;Carrie Wright&lt;/a&gt; and &lt;a href=&#34;https://www.libd.org/team/emily-e-burke/&#34;&gt;Emily E. Burke&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;PS This is not a course or boot camp site to get started using R, for that there are other resources available.&lt;/p&gt;
&lt;div id=&#34;details&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Details&lt;/h3&gt;
&lt;p&gt;Anyone at LIBD is welcome to participate and contribute blog posts: we will add you to the &lt;a href=&#34;http://LieberInstitute.github.io/rstatsclub/#members&#34;&gt;members&lt;/a&gt; section. Just remember that your blog posts have to comply with the confidentiality agreement we signed. We also welcome anonymous posts, though signing them &lt;em&gt;can&lt;/em&gt; be &lt;a href=&#34;http://varianceexplained.org/r/start-blog/&#34;&gt;useful for exposure and CV building&lt;/a&gt;, although not necessary.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;acknowledgments&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Acknowledgments&lt;/h3&gt;
&lt;p&gt;This blog post was made possible thanks to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;&lt;a href=&#34;http://bioconductor.org/packages/BiocStyle&#34;&gt;BiocStyle&lt;/a&gt;&lt;/em&gt; (&lt;a href=&#39;https://github.com/Bioconductor/BiocStyle&#39;&gt;Oleś, Morgan, and Huber, 2017&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=blogdown&#34;&gt;blogdown&lt;/a&gt;&lt;/em&gt; &lt;a id=&#39;cite-Xie_2017&#39;&gt;&lt;/a&gt;(&lt;a href=&#39;https://github.com/rstudio/blogdown&#39;&gt;Xie, Hill, and Thomas, 2017&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=devtools&#34;&gt;devtools&lt;/a&gt;&lt;/em&gt; &lt;a id=&#39;cite-Wickham_2018&#39;&gt;&lt;/a&gt;(&lt;a href=&#39;https://CRAN.R-project.org/package=devtools&#39;&gt;Wickham, Hester, and Chang, 2018&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=knitcitations&#34;&gt;knitcitations&lt;/a&gt;&lt;/em&gt; &lt;a id=&#39;cite-Boettiger_2017&#39;&gt;&lt;/a&gt;(&lt;a href=&#39;https://CRAN.R-project.org/package=knitcitations&#39;&gt;Boettiger, 2017&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div id=&#34;references&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;References&lt;/h3&gt;
&lt;p&gt;
&lt;a id=&#39;bib-Allaire_2018&#39;&gt;&lt;/a&gt;&lt;a href=&#34;#cite-Allaire_2018&#34;&gt;[1]&lt;/a&gt;&lt;cite&gt; J. Allaire, Y. Xie, J. McPherson, J. Luraschi, et al. &lt;em&gt;rmarkdown: Dynamic Documents for R&lt;/em&gt;. R package version 1.9. 2018. URL: &lt;a href=&#34;https://CRAN.R-project.org/package=rmarkdown&#34;&gt;https://CRAN.R-project.org/package=rmarkdown&lt;/a&gt;.&lt;/cite&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;a id=&#39;bib-Boettiger_2017&#39;&gt;&lt;/a&gt;&lt;a href=&#34;#cite-Boettiger_2017&#34;&gt;[2]&lt;/a&gt;&lt;cite&gt; C. Boettiger. &lt;em&gt;knitcitations: Citations for ‘Knitr’ Markdown Files&lt;/em&gt;. R package version 1.0.8. 2017. URL: &lt;a href=&#34;https://CRAN.R-project.org/package=knitcitations&#34;&gt;https://CRAN.R-project.org/package=knitcitations&lt;/a&gt;.&lt;/cite&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;a id=&#39;bib-Oles_2017&#39;&gt;&lt;/a&gt;&lt;a href=&#34;#cite-Oles_2017&#34;&gt;[3]&lt;/a&gt;&lt;cite&gt; A. Oleś, M. Morgan and W. Huber. &lt;em&gt;BiocStyle: Standard styles for vignettes and other Bioconductor documents&lt;/em&gt;. R package version 2.6.1. 2017. URL: &lt;a href=&#34;https://github.com/Bioconductor/BiocStyle&#34;&gt;https://github.com/Bioconductor/BiocStyle&lt;/a&gt;.&lt;/cite&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;a id=&#39;bib-Wickham_2017&#39;&gt;&lt;/a&gt;&lt;a href=&#34;#cite-Wickham_2017&#34;&gt;[4]&lt;/a&gt;&lt;cite&gt; H. Wickham. &lt;em&gt;tidyverse: Easily Install and Load the ‘Tidyverse’&lt;/em&gt;. R package version 1.2.1. 2017. URL: &lt;a href=&#34;https://CRAN.R-project.org/package=tidyverse&#34;&gt;https://CRAN.R-project.org/package=tidyverse&lt;/a&gt;.&lt;/cite&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;a id=&#39;bib-Wickham_2018&#39;&gt;&lt;/a&gt;&lt;a href=&#34;#cite-Wickham_2018&#34;&gt;[5]&lt;/a&gt;&lt;cite&gt; H. Wickham, J. Hester and W. Chang. &lt;em&gt;devtools: Tools to Make Developing R Packages Easier&lt;/em&gt;. R package version 1.13.5. 2018. URL: &lt;a href=&#34;https://CRAN.R-project.org/package=devtools&#34;&gt;https://CRAN.R-project.org/package=devtools&lt;/a&gt;.&lt;/cite&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;a id=&#39;bib-Xie_2017&#39;&gt;&lt;/a&gt;&lt;a href=&#34;#cite-Xie_2017&#34;&gt;[6]&lt;/a&gt;&lt;cite&gt; Y. Xie, A. P. Hill and A. Thomas. &lt;em&gt;blogdown: Creating Websites with R Markdown&lt;/em&gt;. ISBN 978-0815363729. Boca Raton, Florida: Chapman and Hall/CRC, 2017. URL: &lt;a href=&#34;https://github.com/rstudio/blogdown&#34;&gt;https://github.com/rstudio/blogdown&lt;/a&gt;.&lt;/cite&gt;
&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;reproducibility&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Reproducibility&lt;/h3&gt;
&lt;pre&gt;&lt;code&gt;## Session info ----------------------------------------------------------------------------------------------------------&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##  setting  value                       
##  version  R version 3.4.3 (2017-11-30)
##  system   x86_64, darwin15.6.0        
##  ui       X11                         
##  language (EN)                        
##  collate  en_US.UTF-8                 
##  tz       America/New_York            
##  date     2018-03-08&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Packages --------------------------------------------------------------------------------------------------------------&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##  package       * version date       source                            
##  backports       1.1.2   2017-12-13 CRAN (R 3.4.3)                    
##  base          * 3.4.3   2017-12-07 local                             
##  bibtex          0.4.2   2017-06-30 CRAN (R 3.4.1)                    
##  BiocStyle     * 2.6.1   2017-11-30 Bioconductor                      
##  blogdown        0.5.9   2018-03-08 Github (rstudio/blogdown@dc1f41c) 
##  bookdown        0.7     2018-02-18 cran (@0.7)                       
##  colorout      * 1.2-0   2018-02-19 Github (jalvesaq/colorout@2f01173)
##  compiler        3.4.3   2017-12-07 local                             
##  datasets      * 3.4.3   2017-12-07 local                             
##  devtools      * 1.13.5  2018-02-18 CRAN (R 3.4.3)                    
##  digest          0.6.15  2018-01-28 CRAN (R 3.4.3)                    
##  evaluate        0.10.1  2017-06-24 CRAN (R 3.4.1)                    
##  graphics      * 3.4.3   2017-12-07 local                             
##  grDevices     * 3.4.3   2017-12-07 local                             
##  htmltools       0.3.6   2017-04-28 CRAN (R 3.4.0)                    
##  httr            1.3.1   2017-08-20 CRAN (R 3.4.1)                    
##  jsonlite        1.5     2017-06-01 CRAN (R 3.4.0)                    
##  knitcitations * 1.0.8   2017-07-04 CRAN (R 3.4.1)                    
##  knitr           1.20    2018-02-20 cran (@1.20)                      
##  lubridate       1.7.3   2018-02-27 CRAN (R 3.4.3)                    
##  magrittr        1.5     2014-11-22 CRAN (R 3.4.0)                    
##  memoise         1.1.0   2017-04-21 CRAN (R 3.4.0)                    
##  methods       * 3.4.3   2017-12-07 local                             
##  plyr            1.8.4   2016-06-08 CRAN (R 3.4.0)                    
##  R6              2.2.2   2017-06-17 CRAN (R 3.4.0)                    
##  Rcpp            0.12.15 2018-01-20 CRAN (R 3.4.3)                    
##  RefManageR      0.14.20 2017-08-17 CRAN (R 3.4.1)                    
##  rmarkdown       1.9     2018-03-01 cran (@1.9)                       
##  rprojroot       1.3-2   2018-01-03 CRAN (R 3.4.3)                    
##  stats         * 3.4.3   2017-12-07 local                             
##  stringi         1.1.6   2017-11-17 CRAN (R 3.4.2)                    
##  stringr         1.3.0   2018-02-19 cran (@1.3.0)                     
##  tools           3.4.3   2017-12-07 local                             
##  utils         * 3.4.3   2017-12-07 local                             
##  withr           2.1.1   2017-12-19 CRAN (R 3.4.3)                    
##  xfun            0.1     2018-01-22 CRAN (R 3.4.3)                    
##  xml2            1.2.0   2018-01-24 CRAN (R 3.4.3)                    
##  yaml            2.1.17  2018-02-27 cran (@2.1.17)&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Test post for checking website</title>
      <link>http://LieberInstitute.github.io/rstatsclub/2018/03/06/test-post-for-checking-website/</link>
      <pubDate>Tue, 06 Mar 2018 00:00:00 +0000</pubDate>
      <guid>http://LieberInstitute.github.io/rstatsclub/2018/03/06/test-post-for-checking-website/</guid>
      <description>


&lt;p&gt;This is a test post for checking the formatting of the website. You can basically ignore the rest. It’s showing the contents of the &lt;em&gt;post.md&lt;/em&gt; archetype (blog post template).&lt;/p&gt;
&lt;p&gt;Useful links for editing the theme:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://sourcethemes.com/academic/docs/get-started/&#34; class=&#34;uri&#34;&gt;https://sourcethemes.com/academic/docs/get-started/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://sourcethemes.com/academic/docs/customization/&#34; class=&#34;uri&#34;&gt;https://sourcethemes.com/academic/docs/customization/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/gcushen/hugo-academic/tree/master/data/fonts&#34; class=&#34;uri&#34;&gt;https://github.com/gcushen/hugo-academic/tree/master/data/fonts&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/gcushen/hugo-academic/tree/master/data/themes&#34; class=&#34;uri&#34;&gt;https://github.com/gcushen/hugo-academic/tree/master/data/themes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.libd.org/&#34; class=&#34;uri&#34;&gt;https://www.libd.org/&lt;/a&gt; (for getting colors under inspect mode)&lt;/li&gt;
&lt;/ul&gt;
&lt;div id=&#34;post-content&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Post content&lt;/h3&gt;
&lt;p&gt;Typical location to start editing since the bibliography chunk is hidden. Make sure that you selected &lt;code&gt;R Markdown (.Rmd)&lt;/code&gt; as the &lt;em&gt;format&lt;/em&gt; option of the post when using the &lt;code&gt;New Post&lt;/code&gt; &lt;em&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=blogdown&#34;&gt;blogdown&lt;/a&gt;&lt;/em&gt; addin.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;r-image&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;R image&lt;/h3&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;## This imaged will be saved in the /post/*_files/ directory
## Use echo = FALSE if you want to hide the code for making the plot
plot(1:10, 10:1)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;http://LieberInstitute.github.io/rstatsclub/rstatsclub/post/2018-03-06-test-post-for-checking-website_files/figure-html/plot-1.png&#34; width=&#34;576&#34; /&gt;&lt;/p&gt;
&lt;p&gt;If you prefer not to use the &lt;code&gt;fig.width&lt;/code&gt; and &lt;code&gt;fig.height&lt;/code&gt; options in every plot chunk, edit the YAML (the part at the top of the post) with:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;output:
  blogdown::html_page:
    toc: no
    fig_width: 5
    fig_height: 5&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;custom-image&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Custom image&lt;/h3&gt;
&lt;p&gt;Use the &lt;em&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=blogdown&#34;&gt;blogdown&lt;/a&gt;&lt;/em&gt; &lt;em&gt;Insert Image&lt;/em&gt; plugin to add an external image. You need to use version 0.5.7 or newer to have access to this plugin. At the time of writing this post, this version is only available via GitHub. That is, install it with:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;devtools::install_github(&amp;#39;rstudio/blogdown&amp;#39;)&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;acknowledgments&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Acknowledgments&lt;/h3&gt;
&lt;p&gt;This blog post was made possible thanks to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;&lt;a href=&#34;http://bioconductor.org/packages/BiocStyle&#34;&gt;BiocStyle&lt;/a&gt;&lt;/em&gt; &lt;a id=&#39;cite-Oles_2017&#39;&gt;&lt;/a&gt;(&lt;a href=&#39;https://github.com/Bioconductor/BiocStyle&#39;&gt;Oles, Morgan, and Huber, 2017&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=blogdown&#34;&gt;blogdown&lt;/a&gt;&lt;/em&gt; &lt;a id=&#39;cite-Xie_2017&#39;&gt;&lt;/a&gt;(&lt;a href=&#39;https://github.com/rstudio/blogdown&#39;&gt;Xie, Hill, and Thomas, 2017&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=devtools&#34;&gt;devtools&lt;/a&gt;&lt;/em&gt; &lt;a id=&#39;cite-Wickham_2018&#39;&gt;&lt;/a&gt;(&lt;a href=&#39;https://CRAN.R-project.org/package=devtools&#39;&gt;Wickham, Hester, and Chang, 2018&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=knitcitations&#34;&gt;knitcitations&lt;/a&gt;&lt;/em&gt; &lt;a id=&#39;cite-Boettiger_2017&#39;&gt;&lt;/a&gt;(&lt;a href=&#39;https://CRAN.R-project.org/package=knitcitations&#39;&gt;Boettiger, 2017&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div id=&#34;references&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;References&lt;/h3&gt;
&lt;p&gt;
&lt;a id=&#39;bib-Boettiger_2017&#39;&gt;&lt;/a&gt;&lt;a href=&#34;#cite-Boettiger_2017&#34;&gt;[1]&lt;/a&gt;&lt;cite&gt; C. Boettiger. &lt;em&gt;knitcitations: Citations for ‘Knitr’ Markdown Files&lt;/em&gt;. R package version 1.0.8. 2017. URL: &lt;a href=&#34;https://CRAN.R-project.org/package=knitcitations&#34;&gt;https://CRAN.R-project.org/package=knitcitations&lt;/a&gt;.&lt;/cite&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;a id=&#39;bib-Oles_2017&#39;&gt;&lt;/a&gt;&lt;a href=&#34;#cite-Oles_2017&#34;&gt;[2]&lt;/a&gt;&lt;cite&gt; A. Oles, M. Morgan and W. Huber. &lt;em&gt;BiocStyle: Standard styles for vignettes and other Bioconductor documents&lt;/em&gt;. R package version 2.6.1. 2017. URL: &lt;a href=&#34;https://github.com/Bioconductor/BiocStyle&#34;&gt;https://github.com/Bioconductor/BiocStyle&lt;/a&gt;.&lt;/cite&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;a id=&#39;bib-Wickham_2018&#39;&gt;&lt;/a&gt;&lt;a href=&#34;#cite-Wickham_2018&#34;&gt;[3]&lt;/a&gt;&lt;cite&gt; H. Wickham, J. Hester and W. Chang. &lt;em&gt;devtools: Tools to Make Developing R Packages Easier&lt;/em&gt;. R package version 1.13.5. 2018. URL: &lt;a href=&#34;https://CRAN.R-project.org/package=devtools&#34;&gt;https://CRAN.R-project.org/package=devtools&lt;/a&gt;.&lt;/cite&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;a id=&#39;bib-Xie_2017&#39;&gt;&lt;/a&gt;&lt;a href=&#34;#cite-Xie_2017&#34;&gt;[4]&lt;/a&gt;&lt;cite&gt; Y. Xie, A. P. Hill and A. Thomas. &lt;em&gt;blogdown: Creating Websites with R Markdown&lt;/em&gt;. ISBN 978-0815363729. Boca Raton, Florida: Chapman and Hall/CRC, 2017. URL: &lt;a href=&#34;https://github.com/rstudio/blogdown&#34;&gt;https://github.com/rstudio/blogdown&lt;/a&gt;.&lt;/cite&gt;
&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;reproducibility&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Reproducibility&lt;/h3&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;## Reproducibility info
options(width = 120)
session_info()&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Session info ----------------------------------------------------------------------------------------------------------&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##  setting  value                       
##  version  R version 3.4.2 (2017-09-28)
##  system   x86_64, mingw32             
##  ui       RTerm                       
##  language (EN)                        
##  collate  English_United States.1252  
##  tz       America/New_York            
##  date     2018-03-07&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Packages --------------------------------------------------------------------------------------------------------------&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##  package       * version date       source                               
##  backports       1.1.2   2017-12-13 CRAN (R 3.4.3)                       
##  base          * 3.4.2   2017-09-28 local                                
##  bibtex          0.4.2   2017-06-30 CRAN (R 3.4.2)                       
##  BiocStyle     * 2.6.1   2017-11-30 Bioconductor                         
##  blogdown        0.5.7   2018-03-07 Github (lcolladotor/blogdown@7b8761b)
##  bookdown        0.7     2018-02-18 CRAN (R 3.4.3)                       
##  compiler        3.4.2   2017-09-28 local                                
##  datasets      * 3.4.2   2017-09-28 local                                
##  devtools      * 1.13.5  2018-02-18 CRAN (R 3.4.3)                       
##  digest          0.6.15  2018-01-28 CRAN (R 3.4.3)                       
##  evaluate        0.10.1  2017-06-24 CRAN (R 3.4.2)                       
##  graphics      * 3.4.2   2017-09-28 local                                
##  grDevices     * 3.4.2   2017-09-28 local                                
##  htmltools       0.3.6   2017-04-28 CRAN (R 3.4.2)                       
##  httr            1.3.1   2017-08-20 CRAN (R 3.4.2)                       
##  jsonlite        1.5     2017-06-01 CRAN (R 3.4.2)                       
##  knitcitations * 1.0.8   2017-07-04 CRAN (R 3.4.2)                       
##  knitr           1.20    2018-02-20 CRAN (R 3.4.3)                       
##  lubridate       1.7.2   2018-02-06 CRAN (R 3.4.3)                       
##  magrittr        1.5     2014-11-22 CRAN (R 3.4.2)                       
##  memoise         1.1.0   2017-04-21 CRAN (R 3.4.2)                       
##  methods       * 3.4.2   2017-09-28 local                                
##  plyr            1.8.4   2016-06-08 CRAN (R 3.4.2)                       
##  R6              2.2.2   2017-06-17 CRAN (R 3.4.2)                       
##  Rcpp            0.12.15 2018-01-20 CRAN (R 3.4.3)                       
##  RefManageR      0.14.20 2017-08-17 CRAN (R 3.4.2)                       
##  rmarkdown       1.9     2018-03-01 CRAN (R 3.4.3)                       
##  rprojroot       1.3-2   2018-01-03 CRAN (R 3.4.3)                       
##  stats         * 3.4.2   2017-09-28 local                                
##  stringi         1.1.6   2017-11-17 CRAN (R 3.4.2)                       
##  stringr         1.3.0   2018-02-19 CRAN (R 3.4.3)                       
##  tools           3.4.2   2017-09-28 local                                
##  utils         * 3.4.2   2017-09-28 local                                
##  withr           2.1.1   2017-12-19 CRAN (R 3.4.3)                       
##  xfun            0.1     2018-01-22 CRAN (R 3.4.3)                       
##  xml2            1.2.0   2018-01-24 CRAN (R 3.4.3)                       
##  yaml            2.1.17  2018-02-27 CRAN (R 3.4.3)&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
</description>
    </item>
    
  </channel>
</rss>
