<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">

<channel>
	<title>Inundata</title>
	
	<link>http://inundata.org</link>
	<description>Just another WordPress site</description>
	<lastBuildDate>Mon, 07 May 2012 22:20:13 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/Inundata" /><feedburner:info uri="inundata" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><item>
		<title>Imposter week</title>
		<link>http://feedproxy.google.com/~r/Inundata/~3/vFwjQA64iUE/</link>
		<comments>http://inundata.org/2012/04/30/imposter-week/#comments</comments>
		<pubDate>Mon, 30 Apr 2012 15:04:57 +0000</pubDate>
		<dc:creator>Karthik Ram</dc:creator>
				<category><![CDATA[academia]]></category>
		<category><![CDATA[ecology]]></category>

		<guid isPermaLink="false">http://inundata.org/?p=356</guid>
		<description><![CDATA[I&#8217;ll freely admit that even as a postdoc I suffer from quite a bit of impostor syndrome, more so than when I was a grad student. Although this feeling is widespread among academics, it is not impossible to beat. Looks like everyone has decided to speak out about it this week on the academic blogosphere. [...]]]></description>
			<content:encoded><![CDATA[
<p>I&#8217;ll freely admit that even as a postdoc I suffer from quite a bit of impostor syndrome, more so than when I was a grad student. Although this feeling is widespread among academics, it is not impossible to beat. Looks like everyone has decided to speak out about it this week on the academic blogosphere. It started out last week with a great post by fellow blogger and tweep <a href="http://contemplativemammoth.wordpress.com/2012/04/25/how-i-cured-my-imposter-syndrome/">Jacqueline Gill</a> on how she overcame her impostor syndrome. There is also this really comprehensive post (with a bucket load of links) at <a href="http://contemplativemammoth.wordpress.com/2012/04/25/how-i-cured-my-imposter-syndrome/">Neurotic Physiology</a>.<br />
If you&#8217;re a postdoc reading this, this post (<a href="http://thetightropeblog.wordpress.com/2012/04/19/some-days-i-just-want-to-crawl-under-my-desk-and-cry-16/">Some days, I just want to crawl under my desk and cry</a>) best describes how I feel some days.  </p>
<p>PS: It gets better.</p>

]]></content:encoded>
			<wfw:commentRss>http://inundata.org/2012/04/30/imposter-week/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://inundata.org/2012/04/30/imposter-week/</feedburner:origLink></item>
		<item>
		<title>An intro to R</title>
		<link>http://feedproxy.google.com/~r/Inundata/~3/KWbodubSYOQ/</link>
		<comments>http://inundata.org/2012/04/05/an-intro-to-r/#comments</comments>
		<pubDate>Thu, 05 Apr 2012 21:05:06 +0000</pubDate>
		<dc:creator>Karthik Ram</dc:creator>
				<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://inundata.org/?p=339</guid>
		<description><![CDATA[A few weeks back I gave a talk at the local Berkeley R meetup group. The idea was to help people not make the same mistakes I made when I first started out learning R. It was the first time I made an entire presentation with Deck.js and I generated the syntax highlighted R code [...]]]></description>
			<content:encoded><![CDATA[
<p>A few weeks back I gave a talk at the local <a href="http://www.meetup.com/r-enthusiasts/">Berkeley R meetup group</a>. The idea was to help people not make the same mistakes I made when I first started out learning R. It was the first time I made an entire presentation with <a href="http://imakewebthings.com/deck.js/" title="Deck JS">Deck.js</a> and I generated the syntax highlighted R code using knitr. <a href="http://inundata.org/R_talks/meetup/">Check it out.</a></p>
<p><a href="http://inundata.org/R_talks/meetup/"><img src="http://inundata.org/wp-content/uploads/2012/04/intro_to_R2.png" alt="" title="An intro to R talk made with Deck.JS" width="399" height="185" border="1" class="alignnone size-full wp-image-343" /></a></p>

]]></content:encoded>
			<wfw:commentRss>http://inundata.org/2012/04/05/an-intro-to-r/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		<feedburner:origLink>http://inundata.org/2012/04/05/an-intro-to-r/</feedburner:origLink></item>
		<item>
		<title>Two incredibly useful functions to throw into your .rprofile</title>
		<link>http://feedproxy.google.com/~r/Inundata/~3/qZ8saYf8A5A/</link>
		<comments>http://inundata.org/2012/02/07/two-incredibly-useful-functions-to-throw-into-your-rprofile/#comments</comments>
		<pubDate>Wed, 08 Feb 2012 02:04:25 +0000</pubDate>
		<dc:creator>Karthik Ram</dc:creator>
				<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://inundata.org/?p=314</guid>
		<description><![CDATA[I&#8217;ve neglected this blog for quite some time but I&#8217;m getting around to finishing up a bunch of draft posts. But here is a quick one: Listing objects in your global environment A simple ls() doesn&#8217;t really tell you enough useful information at a glance. Most often I just want to know what I named [...]]]></description>
			<content:encoded><![CDATA[
<p>I&#8217;ve neglected this blog for quite some time but I&#8217;m getting around to finishing up a bunch of draft posts. But here is a quick one:</p>
<p><strong>Listing objects in your global environment</strong><br />
A simple ls() doesn&#8217;t really tell you enough useful information at a glance. Most often I just want to know what I named certain data.frames or functions. This handy little function, called as <code>lsa()</code> will do that for you:</p>
<p>
<pre class="brush: r; title: ; notranslate">
lsa &lt;- function()
 {
    obj_type &lt;- function(x) { class(get(x)) }
  foo=data.frame(sapply(ls(envir=.GlobalEnv),obj_type))
  foo$object_name=rownames(foo)
  names(foo)[1]=&amp;quot;class&amp;quot;
  names(foo)[2]=&amp;quot;object&amp;quot;
    return(unrowname(foo))
}
</pre>
</p>
<p><strong> Listing all functions in a certain package</strong><br />
This can be called with <code>lsp()</code>. The pattern argument will allow you to quickly find the right function if you vaguely remember the name.</p>
<p>
<pre class="brush: r; title: ; notranslate">
lsp &lt;-function(package, all.names = FALSE, pattern)
{
  package &lt;- deparse(substitute(package))
  ls(
      pos = paste(&quot;package&quot;, package, sep = &quot;:&quot;),
      all.names = all.names,
      pattern = pattern
  )
}
</pre>
</p>
<p>Be sure to throw them both in a new environment (i.e. not the global one) so they don&#8217;t get accidentally removed when you clear your variables.</p>

]]></content:encoded>
			<wfw:commentRss>http://inundata.org/2012/02/07/two-incredibly-useful-functions-to-throw-into-your-rprofile/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		<feedburner:origLink>http://inundata.org/2012/02/07/two-incredibly-useful-functions-to-throw-into-your-rprofile/</feedburner:origLink></item>
		<item>
		<title>Customizing your .rprofile</title>
		<link>http://feedproxy.google.com/~r/Inundata/~3/S_XzPjrIlzg/</link>
		<comments>http://inundata.org/2011/09/29/customizing-your-rprofile/#comments</comments>
		<pubDate>Fri, 30 Sep 2011 03:17:43 +0000</pubDate>
		<dc:creator>Karthik Ram</dc:creator>
				<category><![CDATA[code-snippet]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://inundata.org/?p=278</guid>
		<description><![CDATA[I searched around to see if there was a blog post somewhere describing how to customize one&#8217;s .rprofile but was surprised to find just one outdated post. So here is quick intro on the topic. If you are a power R user, you already know about what it does. For those of you that don&#8217;t, [...]]]></description>
			<content:encoded><![CDATA[
<p>I searched around to see if there was a blog post somewhere describing how to customize one&#8217;s <span style="color:#3300FF">.rprofile</span> but was surprised to find just one outdated post. So here is quick intro on the topic. If you are a power R user, you already know about what it does. For those of you that don&#8217;t, it is just a text file called <span style="color:#3300FF">.rprofile</span> that sits in your R home directory (not sure where it is? Instructions to find it on a <a href="http://cran.r-project.org/bin/windows/rw-FAQ.html#What-are-HOME-and-working-directories_003f">pc</a> or a <a href="http://cran.r-project.org/bin/macosx/RMacOSX-FAQ.html#The-R-Console">mac</a>) and all of the commands in there are executed at startup.  </p>
<ol>
<li><strong>Load frequently used packages</strong><br />
These days I never run R without having to use ggplot2 or plyr so I just include that here (although I hope that someday both packages will become absorbed into the R core).</p>
<pre class="brush: r; title: ; notranslate">
library(ggplot2)
library(plyr)
</pre>
</li>
<li><strong>Create aliases for frequently used functions</strong>
<pre class="brush: r; title: ; notranslate">
# Shorten S3 methods so s(obj) instead of summary(obj)
s &lt;- base::summary;
h &lt;- utils::head;
n &lt;- base::names;
</pre>
</li>
<li><strong>Set your preferred repository</strong>
<p>Hate the menu that asks you to choose a repository when installing a package? Just hardcode it.</p>
<pre class="brush: r; title: ; notranslate">
# Get your current repo name
current_repo &lt;- getOption(&quot;repos&quot;)
# change this to your closest one
current_repo[&quot;CRAN&quot;] &lt;- &quot;http://cran.us.r-project.org&quot;
options(repos = current_repo)
</pre>
</li>
<li><strong>Create a new environment so you don&#8217;t lose your custom startup functions</strong>
<p>I always start a new script with rm(list=ls()) to clear out everything. The unfortunate consequence of this is that it also takes out all the cool new functions from your .rprofile. Get around that by creating a new environment and putting your functions there.</p>
<pre class="brush: r; title: ; notranslate">
custom_env &lt;- new.env()
# If you don't want to clutter this file, leave functions elsewhere.
sys.source(&quot;.my_custom_functions.r&quot;, envir = custom_env)
attach(custom_env)
</pre>
</li>
</ol>
<p>You can also set a range of other options but these are a good start.</p>
<p><strong>Update:</strong> As <a href="https://twitter.com/jasonpriem/status/119615870958964736">Jason Priem</a> astutely points out, these tricks can impede reproducibility of your work (especially if you fail to load the appropriate libraries &#038; functions in your final script). While these are valuable time savers during the development phase, you certainly want to be more thorough before sharing your code.</p>

]]></content:encoded>
			<wfw:commentRss>http://inundata.org/2011/09/29/customizing-your-rprofile/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		<feedburner:origLink>http://inundata.org/2011/09/29/customizing-your-rprofile/</feedburner:origLink></item>
		<item>
		<title>HPC for biological research</title>
		<link>http://feedproxy.google.com/~r/Inundata/~3/A3m11GpzkRE/</link>
		<comments>http://inundata.org/2011/08/28/hpc-for-biological-research/#comments</comments>
		<pubDate>Sun, 28 Aug 2011 23:46:54 +0000</pubDate>
		<dc:creator>Karthik Ram</dc:creator>
				<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://inundata.org/?p=139</guid>
		<description><![CDATA[In early May I had the opportunity to attend a workshop on using high performance computing in R hosted at Nimbios. I&#8217;ve been meaning to write a summary of the meeting ever since but got sidetracked by various other projects. Since a collaborator recently asked for meeting notes I finally took the time to write [...]]]></description>
			<content:encoded><![CDATA[
<p>In early May I had the opportunity to attend a workshop on using <a href="http://nimbios.org/tutorials/TT_hpc2011_talks">high performance computing in R</a> hosted at <a title="The National institute for mathematical and biological synthesis" href="http://Nimbios.org">Nimbios</a>. I&#8217;ve been meaning to write a summary of the meeting ever since but got sidetracked by various other projects. Since a collaborator recently asked for meeting notes I finally took the time to write this post.</p>
<p>The meeting was jointly organized by folks from Nimbios and the <a href="http://rdav.nics.tennessee.edu/">remote data analysis and visualization group (rDAV)</a>. The idea behind the workshop was to introduce biologists dealing with big-data problems to a variety of analytical (mostly just R) and visualization tools (R and a few other open-source tools). The presentations were either technical (HPC resources, tools, demos) or application oriented.</p>
<p>Of the technical talks (<a href="http://nimbios.org/tutorials/talks/hpc_intro">HPC intro</a>, <a href="http://nimbios.org/tutorials/talks/hpc_lab">utilities</a>), the one I found most valuable from the workshop was by Pragnesh Patel from rDAV who did an excellent job outlining all the ins and outs of running R on a cluster. Slides from his talk are available <a href="http://nimbios.org/tutorials/talks/Pragnesh-talk.pdf">here</a>. A more recent summary from his UseR! 2011 presentation is available <a href="http://csgillespie.wordpress.com/2011/08/16/high-performance-computing/">here</a>.</p>
<p>On the application side, there were a couple of talks from Nimbios scientists. One by <a href="http://www.tiem.utk.edu/~mikeg/">Michael Gilchrist</a> on Evolutionary bioinformatics (<a href="http://nimbios.org/tutorials/talks/rdav-talk.pdf">pdf of slides</a>) and the other by a Nimbios postdoc, <a href="https://sites.google.com/site/williamgodsoesresearchpage/">William Godsoe</a> on using hpc to build species distribution models [cite]10.1093/sysbio/syq005[/cite].</p>
<p>In addition to R, we also discussed other open-source tools for visualizing large datasets.</p>
<ul>
<li><a href="http://nimbios.org/tutorials/talks/vis.pdf">Scientific visualization using VisIT</a> [pdf] <em>- A tutorial for using <a href="https://wci.llnl.gov/codes/visit/home.html">Visit</a>. There is also a <a href="http://visitusers.org/index.php?title=Main_Page">wiki</a>.</em></li>
<li><a href="http://www.paraview.org/">Paraview</a> <em>- Another visualization tool. Both tools can take advantage of HPC resources.</em></li>
<li><a href="http://nimbios.org/tutorials/talks/R_Graphics.R">A tutorial on R visualization</a> &#8211; <em>This wasn&#8217;t HPC specific. Mostly just examples on how to use ggplot2.</em></li>
</ul>
<p>Although I wrote a <a href="http://inundata.org/2011/03/30/r-ec2-rstudio-server/">detailed post</a> on how to use Amazon&#8217;s EC2 cloud for HPC, this workshop convinced me to use resources that NSF already provides. <a href="https://www.teragrid.org/">Teragrid</a> is a portal that provides access to numerous cluster resources funded by NSF or one of its partners.  Using their <a href="https://www.xsede.org/">XSEDE</a> portal (which has replaced POPS), academics can request an allocation for computing time. For new and exploratory projects, there are &#8216;starter grants&#8217; where one can get a rather generous allocation within a fairly short time. Larger allocations involve a review process. If the efforts you currently seek time for are being actively funded, the review process is likely to move through faster since it has already been favorably reviewed. Amazon&#8217;s computing cluster is still a useful service but there is no need to spend grant money elsewhere when NSF already provides these resources. As more scientists use and acknowledge Teragrid&#8217;s resources in their publications, that will provide the incentive and justification for organizations like rDav to continue seeking funding, especially in todays budgetary climate.</p>

]]></content:encoded>
			<wfw:commentRss>http://inundata.org/2011/08/28/hpc-for-biological-research/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		<feedburner:origLink>http://inundata.org/2011/08/28/hpc-for-biological-research/</feedburner:origLink></item>
		<item>
		<title>DataCite 2011, recap</title>
		<link>http://feedproxy.google.com/~r/Inundata/~3/cdARELKKwBY/</link>
		<comments>http://inundata.org/2011/08/25/datacite-2011-recap/#comments</comments>
		<pubDate>Fri, 26 Aug 2011 03:39:00 +0000</pubDate>
		<dc:creator>Karthik Ram</dc:creator>
				<category><![CDATA[open-science]]></category>

		<guid isPermaLink="false">http://inundata.org/?p=212</guid>
		<description><![CDATA[I had a fantastic time at the DataCite summer 2011 meeting: Data and the Scholarly Record: The Changing Landscape [full schedule] that happened right here in Berkeley. In addition to great talks, I was pretty stoked to interact with a diverse group of people (practicing scientists/data researchers to publishers/repository managers) and also connect with twitter folks IRL. [...]]]></description>
			<content:encoded><![CDATA[
<p>I had a fantastic time at the <a href="http://datacite.org/"> DataCite</a> summer 2011 meeting: <strong>Data and the Scholarly Record: The Changing Landscape </strong><a href="http://datacite.org/node/30">[full schedule]</a> that happened right here in Berkeley. In addition to great talks, I was pretty stoked to interact with a diverse group of people (practicing scientists/data researchers to publishers/repository managers) and also connect with twitter folks IRL. I would be remiss if I didn&#8217;t mention my thanks to Trisha Cruse and Perry Willett for making it possible for me to attend. Here is my (incomplete) summary of the conference while everything is still current in my head.</p>
<p><strong>Talks</strong></p>
<ol>
<li>John Wilbanks (Creative Commons) presented a great keynote.<br />
<blockquote><p>Knowledge is a public good and increases in value as the number of people possessing it increase.</p></blockquote>
<p>
<a href="http://www.slideshare.net/wilbanks/datacite-wilbanks">slides</a></li>
<li><a href="http://www.researchremix.org/wordpress/">Heather Piwowar</a> spoke on <a href="http://www.slideshare.net/hpiwowar/data-citation-from-the-perspective-of-tracking-data-reuse">data citation challenges</a><br />
<blockquote><p>As more datasets become citable, fewer papers will be cited as proxy which is the current practice.</p></blockquote>
<p>Related: <a href="http://researchremix.wordpress.com/2011/05/19/nature-letter/">Data archiving is a good investment</a> [cite]10.1038/473285a[/cite]</li>
<li><a href="http://jasonpriem.com/">Jason Priem</a> on <a href="https://docs.google.com/present/view?id=0ASyDkfrsAcUjZGRmZzc4N2NfMjgwZjU3aDVuZ3E&amp;hl=en_US">altmetrics</a><br />
<blockquote><p>Todays journals are still the best scholarly communication system possible using 17th century technology!</p></blockquote>
<p>
Related: <a href="http://altmetrics.org/manifesto/">Altmetrics Manifesto</a> 
</li>
</ol>
<p><strong>Journals</strong></p>
<p><a href="http://www.gigasciencejournal.com/">Gigascience</a> [<a href="http://twitter.com/#!/GigaScience">@</a>] - <em>a new open-access journal that aims to revolutionize data dissemination, organization, understanding, and use. The journal will also publish large datasets alongside papers (also independently?) with citable DOIs</em></p>
<p><em></em><a href="http://www.opennetworkbiology.com/">Open Network Biology</a> -  <em>Another new journal that links papers to datasets</em></p>
<p><strong>Repositories</strong></p>
<p><a href="http://www.pangaea.de/">Pangea</a> &#8211; repository for earth sciences and environmental data</p>
<p><strong>Miscellaneous</strong></p>
<p><a href="http://www.nanopub.org/">Nanopub</a></p>
<p><a href="http://schema.datacite.org/meta/kernel-2.1/doc/DataCite-MetadataKernel_v2.1.pdf">DataCite Metadata schema for citing research data</a> [pdf]</p>
<p><strong>Other summaries of the meeting</strong></p>
<p><a href="http://synthesis.williamgunn.org/about/">William Gunn&#8217;s</a> <a href="http://storify.com/mrgunn/links-to-datacite-presentations-and-sites-mentione?awesm=sfy.co_FgI">storify summary</a></p>
<p><a href="http://www.carlboettiger.info/">Carl&#8217;s</a> summary &#8211; <a href="http://www.carlboettiger.info/archives/2381">day 1</a>, <a href="http://www.carlboettiger.info/archives/2385">day 2 </a></p>
<p><a href="http://twitter.com/#!/search/datacite">Tweets from the meeting</a></p>

]]></content:encoded>
			<wfw:commentRss>http://inundata.org/2011/08/25/datacite-2011-recap/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		<feedburner:origLink>http://inundata.org/2011/08/25/datacite-2011-recap/</feedburner:origLink></item>
		<item>
		<title>Climate datasets in R</title>
		<link>http://feedproxy.google.com/~r/Inundata/~3/6IVXkEhZyWA/</link>
		<comments>http://inundata.org/2011/08/10/climate-datasets-in-r/#comments</comments>
		<pubDate>Thu, 11 Aug 2011 04:59:10 +0000</pubDate>
		<dc:creator>Karthik Ram</dc:creator>
				<category><![CDATA[ecology]]></category>
		<category><![CDATA[open-science]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://inundata.org/?p=176</guid>
		<description><![CDATA[As an ecologist working on climate change questions, I&#8217;ve always found it rather tedious to acquire and process climate data, especially when dealing with large spatiotemporal scales. Although many agencies provide free access to climate data, there is often some overhead (typically one to two days) before the data are made available for download via [...]]]></description>
			<content:encoded><![CDATA[
<p>As an ecologist working on climate change questions, I&#8217;ve always found it rather tedious to acquire and process climate data, especially when dealing with large spatiotemporal scales. Although many agencies provide free access to climate data, there is often some overhead (typically one to two days) before the data are made available for download via ftp. Next, one has to process such data to match the structure of the biological information. Some of these data are provided in one of many binary formats which requires additional processing. While individual scientists and labs have workflows to complete such disparate steps, they are rarely included as part of a publication thereby leaving out critical data provenance. Even when  peer-reviewed articles include one-off scripts (and associated data), missing provenance information makes it difficult to reproduce the results [cite]10.1038/nm1107-1276b[/cite]. Workflow repositories are needed to address the larger issue. In the meantime, one way to address the problem would be to encapsulate the above mentioned steps (data acquisition, format conversion and interpolation) as part of the code that are already included in supplementary materials.</p>
<p>On that note, I&#8217;m pretty excited by the announcement of a new R package called <a title="RNCEP home page" href="https://sites.google.com/site/michaelukemp/rncep">RNCEP</a> in the current issue of <a href="http://www.methodsinecologyandevolution.org/view/0/index.html">Methods in Ecology and Evolution</a> [cite]10.1111/j.2041-210X.2011.00138.x[/cite]. The package provides an interface to atmospheric data from <a href="http://www.ncep.noaa.gov/">National centers for environmental prediction</a> and <a href="http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanalysis2.html">NCEP/DOE</a>. By encapsulating all the steps from data acquisition and format conversion to interpolation and aggregation from <em>within R</em>, the package provides a way to document an entire workflow as part of an article supplement. As more data repositories open up APIs, similar packages will go a long way towards promoting more open science.</p>

]]></content:encoded>
			<wfw:commentRss>http://inundata.org/2011/08/10/climate-datasets-in-r/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		<feedburner:origLink>http://inundata.org/2011/08/10/climate-datasets-in-r/</feedburner:origLink></item>
		<item>
		<title>A roundup of academic workflow tools</title>
		<link>http://feedproxy.google.com/~r/Inundata/~3/n9npHzCGVz4/</link>
		<comments>http://inundata.org/2011/06/13/tools-for-a-research-workflow/#comments</comments>
		<pubDate>Mon, 13 Jun 2011 21:54:20 +0000</pubDate>
		<dc:creator>Karthik Ram</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://inundata.org/?p=142</guid>
		<description><![CDATA[Lately I&#8217;ve come to rely on a whole bunch of &#8220;2.0&#8243; tools that I now find indispensable. I tried and given up on many products (e.g. Papers and its ios app) but below is a list of tools that I find myself using several times each day. I&#8217;ve chosen to highlight a few that don&#8217;t [...]]]></description>
			<content:encoded><![CDATA[
<p>Lately I&#8217;ve come to rely on a whole bunch of &#8220;2.0&#8243; tools that I now find indispensable. I tried and given up on many products (e.g. Papers and its ios app) but below is a list of tools that I find myself using several times each day. I&#8217;ve chosen to highlight a few that don&#8217;t typically appear on most lists.</p>
<p><a title="Pinboard" href="http://pinboard.in"><strong>Pinboard</strong></a> -Bookmarks are an essential part of any research/teaching effort. I frequently bookmark pages for field work supplies (to buy later), grant announcements (that I plan to apply for in a future round), tutorials (frequently R related, useful StackOverflow answers), workshops (especially ones with useful handouts), syllabi (ideas for future lecture topics), teaching material, and even papers to read later. Compared with other alternatives, Pinboard is a no nonsense (yet feature rich bookmarking tool). Among all its features the killer (in my opinion) is <em><a href="http://pinboard.in/upgrade/">archival bookmarking</a></em> ($25 annual fee). The service saves a cache (including images , videos, and overall formatting) of any page you bookmark. This is great for links that change, expire, or disappear long before you actually need to refer to the content.</p>
<p>Pinboard also allows you to automatically archive items from Google reader (any articles you star are saved automatically), Twitter (your own or anyone else&#8217;s with the added option to bookmark links alone), and Instapaper (see below).</p>
<p><a title="Instapaper" href="http://instapaper.com"><strong>Instapaper</strong></a> &#8211; One can only read for so long a day on a computer and carrying around dead-tree copies of articles gets old real fast. Although I am yet to hop on the ipad bandwagon, I&#8217;ve found instapaper to be a great tool to send dense review articles to my kindle for later reading when I am away from internet distractions (Kindle&#8217;s painfully slow browser is a blessing).</p>
<p><strong><a href="http://ietherpad.com/">iEtherPad</a> </strong>- Collaborate on notes in <span style="text-decoration: underline;"><em>realtime</em></span> (either publicly or privately with a group of collaborators), with authorship colors and automatic versioning. iEtherpad also lets you create a subdomain for your research group.</p>
<p><strong><a href="http://www.mendeley.com/">Mendeley groups</a></strong> &#8211; The killer feature of Mendeley, in my opinion, is the ability for small groups of collaborators to share PDFs and cite directly from it. Changes made by any collaborator are automatically pushed to everyone else!</p>
<p><strong><a title="Dropbox" href="http://db.tt/EmKx1dS">Dropbox</a></strong> (and its alternatives): Although <a title="Dropbox" href="http://db.tt/EmKx1dS">dropbox</a> appears to be the de facto file sharing platform for most researchers, there are also other great alternatives such as <a title="Spider Oak is another file sharing service" href="https://spideroak.com/download/referral/846ceca9207b0e9479c6aeb61eecd7c1">Spider Oak</a> (way<a href="http://www.virturoff.com/spideoak-sync-backup-service-vs-drop-the-box/"> more secure</a> than dropbox) and <a href="https://www.sugarsync.com/">Sugar Sync</a> (more free space to begin with) if you end up running out of (free) space. Dropbox, however does increase space with referrals (more for academic users). This <a href="http://chronicle.com/blogs/profhacker/dropbox-edu/33911">Prohacker post</a> explains all the details.</p>
<p>Useful dropbox hacks:</p>
<ol>
<li><a href="http://dropdav.com/">DropDAV</a> &#8211; <em>A webDAV interface for Dropbox</em></li>
<li><a href="http://www.jotform.com/dropbox/">Dropbox Forms</a> &#8211; <em>Receive assignments from students and data sheets from your research assistants/collaborators straight into your dropbox</em></li>
<li>Add a <a href="http://wiki.dropbox.com/DropboxAddons/Dropbox%20Dashboard%20Widget">widget</a> for a quick drag n drop (OSX users only)</li>
</ol>
<p>Did I miss something really useful? Add it to the comments below!</p>
<p>&nbsp;</p>
<p>&nbsp;</p>

]]></content:encoded>
			<wfw:commentRss>http://inundata.org/2011/06/13/tools-for-a-research-workflow/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://inundata.org/2011/06/13/tools-for-a-research-workflow/</feedburner:origLink></item>
		<item>
		<title>R + EC2 + RStudio Server</title>
		<link>http://feedproxy.google.com/~r/Inundata/~3/sgLuaoZu3oQ/</link>
		<comments>http://inundata.org/2011/03/30/r-ec2-rstudio-server/#comments</comments>
		<pubDate>Wed, 30 Mar 2011 18:13:32 +0000</pubDate>
		<dc:creator>Karthik Ram</dc:creator>
				<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://inundata.org/?p=89</guid>
		<description><![CDATA[I’ve been battling memory limits in R for over two years. Although R has numerous resources for high-performance computing, I still couldn&#8217;t get around hardware limitations. Things really got out of control last summer when I started analyzing data on how climate change influences population synchrony across large spatiotemporal gradients. My datasets were simply too [...]]]></description>
			<content:encoded><![CDATA[
<p>I’ve been battling memory limits in R for over two years. Although R has numerous resources for <a href="http://cran.r-project.org/web/views/HighPerformanceComputing.html">high-performance computing</a>, I still couldn&#8217;t get around hardware limitations. Things really got out of control last summer when I started analyzing data on how climate change influences population synchrony across large spatiotemporal gradients. My datasets were simply too many and too large and no amount of code finessing, nor heavy use of <a href="http://had.co.nz/">Hadley’s</a> <a href="http://had.co.nz/plyr/">approach</a> helped much.</p>
<p>Initially I was turned off by the learning curve associated with the ins and outs of setting up R on <a href="http://aws.amazon.com/ec2/">EC2</a> but eventually I set up my own Ubuntu box with R, all of my packages and customizations, and saved that as a 64-bit AMI capable of running high memory quadruple extra large instances. <span id="more-89"></span> This set up has worked really well for me over the last few months.</p>
<p>With the recent release of <a href="http://www.rstudio.org/">RStudio</a>, and Rstudio <a href="http://www.rstudio.org/download/server">server</a>, I&#8217;ve been toying with the idea of running it on an EBS backed instance. Inspired by <a href="http://www.cerebralmastication.com/">JD&#8217;s</a> <a href="http://twitter.com/cmastication/status/50030256513499136">tweet</a>, I got around to setting mine up this weekend. Here is a quick walk through.</p>
<p>Assuming you&#8217;ve launched and used EC2 services before, start out by launching a newer version of Ubuntu (I&#8217;m running 10.04 Lucid) and <a title="Instructions for installing R on Ubuntu" href="http://help.nceas.ucsb.edu/Installing_R_on_Ubuntu">install the current release of R</a> (2.13).</p>
<p>Next, install RStudio server by following the instructions <a href="http://www.rstudio.org/download/server">here</a> (be sure to follow 64-bit).</p>
<p>Once successfully installed, create a new user like so:
<pre class="brush: plain; title: ; notranslate">sudo adduser username</pre>
<p>At this point, be sure to go change your <a href="https://console.aws.amazon.com/ec2/home?region=us-east-1#s=SecurityGroups">EC2 security group</a> to allow port 8787 on TCP.</p>
<p>If the instructions so far seem complicated or if you&#8217;d rather not start from scratch, you can follow instructions <a href="http://community.mis.temple.edu/stevenljohnson/2011/03/12/rstudio-and-amazon-web-services-ec2/">here</a> to launch an existing AMI with Rstudio server compatible versions and take it from there.</p>
<p>Next, launch Rstudio from the server using your instance DNS like so:</p>
<pre class="brush: plain; title: ; notranslate">http://ec2-75-102-193-170.compute-1.amazonaws.com:8787</pre>
<p>(be sure to replace the DNS above with your current DNS from the EC2 Dashboard)</p>
<p>Next, login with the username and password set earlier and if everything worked, you should see something like this:</p>
<p><a href="http://inundata.org/wp-content/uploads/2011/03/2011-03-30-10.00.19-am.png"><img class="aligncenter size-full wp-image-93" title="Rstudio Server" src="http://inundata.org/wp-content/uploads/2011/03/2011-03-30-10.00.19-am.png" alt="" width="732" height="492" /></a></p>
<p>Next, install all the packages you would like. If you require Java backed packages such as <a title="Automated model selection and model-averaging, requires rJava" href="http://cran.r-project.org/web/packages/glmulti/index.html">glmulti</a>, go ahead and set up Java from the terminal.</p>
<p>After that, you can easily (using GUI menus) save this customized instance by following instructions <a title="Turn this running AMI into your own EBS backed instance" href="http://docs.amazonwebservices.com/AWSEC2/latest/UserGuide/Tutorial_CreateImage.html">here</a>. Voila. From now on, whenever you need to run a high-memory instance of R, just launch new instance, choose My AMIs, and once launched, connect to it via the browser using the current DNS. Brilliant!</p>

]]></content:encoded>
			<wfw:commentRss>http://inundata.org/2011/03/30/r-ec2-rstudio-server/feed/</wfw:commentRss>
		<slash:comments>21</slash:comments>
		<feedburner:origLink>http://inundata.org/2011/03/30/r-ec2-rstudio-server/</feedburner:origLink></item>
		<item>
		<title>Staying up to date on R packages</title>
		<link>http://feedproxy.google.com/~r/Inundata/~3/33WcfwfegYs/</link>
		<comments>http://inundata.org/2011/03/17/staying-up-to-date-on-r-packages/#comments</comments>
		<pubDate>Thu, 17 Mar 2011 21:18:16 +0000</pubDate>
		<dc:creator>Karthik Ram</dc:creator>
				<category><![CDATA[code-snippet]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://inundata.org/?p=76</guid>
		<description><![CDATA[Unless you regularly use particular R packages,  it&#8217;s becomes difficult to stay on top of updates and bug fixes.  Updates usually also include significant improvements in performance.  I wrote this short snippet of code which I run about once a month to keep up on updates. This short bit of code will give you a [...]]]></description>
			<content:encoded><![CDATA[
<p>Unless you regularly use particular R packages,  it&#8217;s becomes difficult to stay on top of updates and bug fixes.  Updates usually also include significant improvements in performance.  I wrote this short snippet of code which I run about once a month to keep up on updates.<span id="more-76"></span> This short bit of code will give you a list of changes and decide which ones to update:</p>

<div class="wp_syntax"><div class="code"><pre class="rsplus" style="font-family:monospace;">installed<span style="color: #080;">&lt;-</span><span style="color: #0000FF; font-weight: bold;">installed.<span style="">packages</span></span><span style="color: #080;">&#40;</span><span style="color: #080;">&#41;</span>
available <span style="color: #080;">&lt;-</span><span style="color: #0000FF; font-weight: bold;">available.<span style="">packages</span></span><span style="color: #080;">&#40;</span><span style="color: #080;">&#41;</span>
ia <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">merge</span><span style="color: #080;">&#40;</span>installed, available, <span style="color: #0000FF; font-weight: bold;">by</span><span style="color: #080;">=</span><span style="color: #ff0000;">&quot;Package&quot;</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#91;</span>,<span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;Package&quot;</span>, <span style="color: #ff0000;">&quot;Version.x&quot;</span>, <span style="color: #ff0000;">&quot;Version.y&quot;</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#93;</span>
updates<span style="color: #080;">&lt;-</span>ia<span style="color: #080;">&#91;</span><span style="color: #0000FF; font-weight: bold;">as.<span style="">character</span></span><span style="color: #080;">&#40;</span>ia$Version.<span style="">x</span><span style="color: #080;">&#41;</span> <span style="color: #080;">!=</span> <span style="color: #0000FF; font-weight: bold;">as.<span style="">character</span></span><span style="color: #080;">&#40;</span>ia$Version.<span style="">y</span><span style="color: #080;">&#41;</span>,<span style="color: #080;">&#93;</span>
updates</pre></div></div>

<p>If you would like to install every available update:
</pre>

<div class="wp_syntax"><div class="code"><pre class="rsplus" style="font-family:monospace;"><span style="color: #0000FF; font-weight: bold;">update.<span style="">packages</span></span><span style="color: #080;">&#40;</span><span style="color: #080;">&#41;</span></pre></div></div>

<p>If you would like to keep up on new packages being released, I highly recommend following <a title="New CRAN packages" href="http://twitter.com/CRANberriesfeed">@CRANberries</a> </p>

]]></content:encoded>
			<wfw:commentRss>http://inundata.org/2011/03/17/staying-up-to-date-on-r-packages/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		<feedburner:origLink>http://inundata.org/2011/03/17/staying-up-to-date-on-r-packages/</feedburner:origLink></item>
	</channel>
</rss>

