<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/atom10full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><feed xmlns="http://www.w3.org/2005/Atom" xmlns:openSearch="http://a9.com/-/spec/opensearch/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:gd="http://schemas.google.com/g/2005" xmlns:thr="http://purl.org/syndication/thread/1.0" gd:etag="W/&quot;A0ABRHozfip7ImA9WhVTEEQ.&quot;"><id>tag:blogger.com,1999:blog-6232819486261696035</id><updated>2012-02-24T10:09:15.486-06:00</updated><category term="Policy" /><category term="Stata" /><category term="1000 genomes" /><category term="Recommended Reading" /><category term="Twitter" /><category term="SQL" /><category term="Visualization" /><category term="Statistics" /><category term="Machine Learning" /><category term="Noteworthy blogs" /><category term="Pathways" /><category term="ggplot2" /><category term="Imputation" /><category term="Perl" /><category term="Sequencing" /><category term="Search" /><category term="Tutorials" /><category term="Announcements" /><category term="PubMed" /><category term="Productivity" /><category term="RSS" /><category term="GWAS" /><category term="Journal club" /><category term="Linux" /><category term="Clustering" /><category term="Software" /><category term="Writing" /><category term="Web Apps" /><category term="Ethics" /><category term="PLINK" /><category term="News" /><category term="Bioinformatics" /><category term="R" /><title>Getting Genetics Done</title><subtitle type="html">Software, tips, &amp;amp; productivity hacks for getting things done in genetics research</subtitle><link rel="http://schemas.google.com/g/2005#feed" type="application/atom+xml" href="http://gettinggeneticsdone.blogspot.com/feeds/posts/default" /><link rel="alternate" type="text/html" href="http://gettinggeneticsdone.blogspot.com/" /><link rel="next" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default?start-index=26&amp;max-results=25&amp;redirect=false&amp;v=2" /><author><name>Stephen Turner</name><uri>http://www.blogger.com/profile/06656711316726116187</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="26" height="32" src="http://3.bp.blogspot.com/-aT3qBWI4VYc/TgvR9CnlS0I/AAAAAAAAMDk/KuA2GGqURcc/s220/pic2-cropped-400x500.jpg" /></author><generator version="7.00" uri="http://www.blogger.com">Blogger</generator><openSearch:totalResults>301</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/atom+xml" href="http://feeds.feedburner.com/GettingGeneticsDone" /><feedburner:info xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" uri="gettinggeneticsdone" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><link rel="license" type="text/html" href="http://creativecommons.org/licenses/by-sa/3.0/" /><feedburner:emailServiceId xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0">GettingGeneticsDone</feedburner:emailServiceId><feedburner:feedburnerHostname xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0">http://feedburner.google.com</feedburner:feedburnerHostname><entry gd:etag="W/&quot;DEMNSHo8fyp7ImA9WhVTEEQ.&quot;"><id>tag:blogger.com,1999:blog-6232819486261696035.post-1880125310682424368</id><published>2012-02-24T09:10:00.000-06:00</published><updated>2012-02-24T09:14:59.477-06:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-02-24T09:14:59.477-06:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="R" /><category scheme="http://www.blogger.com/atom/ns#" term="Bioinformatics" /><title>I'm Hiring!</title><content type="html">I direct the &lt;a href="http://bioinformatics.virginia.edu/" target="_blank"&gt;Bioinformatics Core&lt;/a&gt; at the University of Virginia, and I'm hiring. Visit &lt;a href="http://jobs.virginia.edu/applicants/Central?quickFind=66485" target="_blank"&gt;this link on the UVA Jobs website&lt;/a&gt; for more information. Here's the description:&lt;br /&gt;
&lt;blockquote class="tr_bq"&gt;
The University of Virginia Bioinformatics Core is seeking a full-time position as a bioinformatics analyst. The analyst will work with other core staff on grant-funded and chargeback-based projects to manage and analyze large-scale datasets produced by next-generation sequencing. The analyst will identify opportunities and implement solutions for managing, visualizing, analyzing, and interpreting genomic data, including studies of gene expression (RNA-seq and microarrays), pathway analysis, protein-DNA binding (e.g. ChIP-seq), DNA methylation, and DNA variation, using Affymetrix, Illumina, Nimblegen, Agilent, Roche 454, Ion Torrent, and other high-throughput platforms in both human and model organisms. The analyst will work closely with the core director to assist in experimental design and provide expert consultation, technical, and scientific support for UVA investigators, and assist in outreach and training activities. The analyst will organize large-scale sequence data sets, manipulate and format data with perl, python, or other scripting languages, use established software to assess quality and analyze data, schedule and run jobs on a high-performance computing cluster, use Unix or a scripting language to extract meaningful results from output, use software or genome browsers for visualization, and use established databases and techniques for annotating genetic variants and results from expression/DNA-binding experiments. The successful candidate will have a demonstrated ability to translate biological questions into technical designs, and to identify, prioritize, and execute bioinformatics tasks to meet project goals and deadlines. An M.S. in Bioinformatics, Genomics, Biostatistics, or a related field is required for this position. &amp;nbsp;&lt;/blockquote&gt;
&lt;a href="http://jobs.virginia.edu/applicants/Central?quickFind=66485" target="_blank"&gt;I'm Hiring - Bioinformatics Analyst in the UVA Bioinformatics Core&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6232819486261696035-1880125310682424368?l=gettinggeneticsdone.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/GettingGeneticsDone/~4/Ud9fosVxXLQ" height="1" width="1"/&gt;</content><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/1880125310682424368?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/1880125310682424368?v=2" /><link rel="alternate" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2012/02/im-hiring.html" title="I'm Hiring!" /><author><name>Stephen Turner</name><uri>http://www.blogger.com/profile/06656711316726116187</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="26" height="32" src="http://3.bp.blogspot.com/-aT3qBWI4VYc/TgvR9CnlS0I/AAAAAAAAMDk/KuA2GGqURcc/s220/pic2-cropped-400x500.jpg" /></author></entry><entry gd:etag="W/&quot;A0MNQ3czfip7ImA9WhRaFEQ.&quot;"><id>tag:blogger.com,1999:blog-6232819486261696035.post-4291567172235685541</id><published>2012-02-17T11:24:00.003-06:00</published><updated>2012-02-17T11:24:52.986-06:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-02-17T11:24:52.986-06:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="PubMed" /><title>Your Publications (with PMCID) as a PubMed Query</title><content type="html">&lt;div&gt;
I'm updating my CV and biosketch for a few grant applications, and for some time now, NIH has required you to include the PubMed Central ID for each article you publish that arose from NIH support. I only have a dozen or so papers indexed in PubMed, but I still wanted a way to do this automatically. If you have scores of publications, looking up all the PMCIDs could easily become a hassle.&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
First, create an account at &lt;a href="http://www.ncbi.nlm.nih.gov/sites/myncbi/" target="_blank"&gt;My NCBI&lt;/a&gt;. Under your bibliography, click "Manage My Bibliography." Then click "Add citation," then in the new window that comes up, select "Citation from PubMed" and hit the "Go To PubMed" button.&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
Now the trick here is constructing a PubMed query that will get your publications only. There are lots of Stephen D. Turner's out there, so I had to get creative.&amp;nbsp;This query construction tip comes to me by way of my colleague here at UVA, &lt;a href="http://cphg.virginia.edu/?page_id=556" target="_blank"&gt;Aaron Mackey&lt;/a&gt;:&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
&lt;blockquote class="tr_bq"&gt;
For many people, simple PubMed author searches suffice, e.g. "Pearson WR[Author]". For some, such name-based searches get it mostly right, but may include a few spurious false hits. For these cases, it's easy enough to exclude those false hits explicitly (e.g. "Mackey AJ"[Author] NOT 9850730[PMID] NOT 10730495[PMID] gets rid of the two AJ Mackey publications that are not, in fact, mine).&amp;nbsp;For others, simple author searches do not suffice at all, but usually adding an institution and/or departmental affiliation does narrow the results sufficiently (e.g. for Jeff Smith, Biochemistry: "Smith JS"[au] AND "University of Virginia"[Affiliation] AND "Biochemistry"[Affiliation] identifies the 16 articles for which Jeff Smith is the senior author; Jeff could also add a few collaborative publications by adding those pubmed IDs to the search, i.e. adding "OR 17482543[PMID]" to the end of his query.&lt;/blockquote&gt;
&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
When I did this for myself, I searched by author, AND (any of my institutional affiliations separated by OR's), but NOT (any of the PMIDs that were not mine, separated by OR's). Apparently there was once another Stephen D. Turner at UVA in the department of Urology. Here are the &lt;a href="http://www.ncbi.nlm.nih.gov/pubmed?term=%22Turner%20SD%22%5BAu%5D%20AND%20(%22James%20Madison%22%5BAffiliation%5D%20OR%20Vanderbilt%5BAffiliation%5D%20OR%20Hawaii%5BAffiliation%5D%20OR%20%22University%20of%20Virginia%22%5BAffiliation%5D)%20NOT%20(11514333%5BPMID%5D%20OR%2011058553%5BPMID%5D)" target="_blank"&gt;results&lt;/a&gt; returned by my unique query:&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;








&lt;div class="p1"&gt;
&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;"Turner SD"[Au] AND ("James Madison"[Affiliation] OR Vanderbilt[Affiliation] OR Hawaii[Affiliation] OR "University of Virginia"[Affiliation]) NOT (11514333[PMID] OR 11058553[PMID])&lt;/span&gt;&lt;/div&gt;
&lt;div class="p1"&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div class="p1"&gt;
The final step is clicking the "Send to" link at the top right, and sending the results of your query to My Bibliography.&lt;/div&gt;
&lt;div class="p1"&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://4.bp.blogspot.com/-wMs939ZXUvA/Tz543QuBOsI/AAAAAAAAnQo/g5TbXGbgSRk/s1600/myncbi.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://4.bp.blogspot.com/-wMs939ZXUvA/Tz543QuBOsI/AAAAAAAAnQo/g5TbXGbgSRk/s1600/myncbi.png" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;div class="p1"&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div class="p1"&gt;
Now, when you are back at My NCBI, you should see a list of all your publications, complete with both the PMID and PMCID, ready to go in your biosketch.&lt;/div&gt;
&lt;div class="p1"&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://2.bp.blogspot.com/-x8KUPoAjqcM/Tz6LrUgyKzI/AAAAAAAAnQw/_kEpta7PME8/s1600/mybib.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="310" src="http://2.bp.blogspot.com/-x8KUPoAjqcM/Tz6LrUgyKzI/AAAAAAAAnQw/_kEpta7PME8/s400/mybib.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;div class="p1"&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div class="p1"&gt;
You can then export this bibliography as text, or simply copy/paste. Finally, you have the option of making your bibliography public (&lt;a href="http://www.ncbi.nlm.nih.gov/myncbi/browse/collection/40221542/?sort=date&amp;amp;direction=ascending" target="_blank"&gt;example&lt;/a&gt;).&lt;/div&gt;
&lt;div class="p1"&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div class="p1"&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div class="p1"&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6232819486261696035-4291567172235685541?l=gettinggeneticsdone.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/GettingGeneticsDone/~4/gHVV7blN-hA" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://gettinggeneticsdone.blogspot.com/feeds/4291567172235685541/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2012/02/your-publications-with-pmcid-as-pubmed.html#comment-form" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/4291567172235685541?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/4291567172235685541?v=2" /><link rel="alternate" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2012/02/your-publications-with-pmcid-as-pubmed.html" title="Your Publications (with PMCID) as a PubMed Query" /><author><name>Stephen Turner</name><uri>http://www.blogger.com/profile/06656711316726116187</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="26" height="32" src="http://3.bp.blogspot.com/-aT3qBWI4VYc/TgvR9CnlS0I/AAAAAAAAMDk/KuA2GGqURcc/s220/pic2-cropped-400x500.jpg" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://4.bp.blogspot.com/-wMs939ZXUvA/Tz543QuBOsI/AAAAAAAAnQo/g5TbXGbgSRk/s72-c/myncbi.png" height="72" width="72" /><thr:total>2</thr:total></entry><entry gd:etag="W/&quot;CkYHRns6eip7ImA9WhRbF04.&quot;"><id>tag:blogger.com,1999:blog-6232819486261696035.post-7203047735418516532</id><published>2012-02-08T13:41:00.002-06:00</published><updated>2012-02-08T13:42:17.512-06:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-02-08T13:42:17.512-06:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Bioinformatics" /><title>Webinar: Genomic Networks - Resolving Biomarkers from a Cloud of Data</title><content type="html">&lt;div&gt;
Kevin White from the University of Chicago will be giving a special guest lecture at NCI next week on systems biology approaches to mine genomics data for biomarkers and therapeutic targets. The lecture will be &lt;a href="http://videocast.nih.gov/summary.asp?live=10995" target="_blank"&gt;available online as a videocast&lt;/a&gt;.&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;b&gt;Title&lt;/b&gt;: Genomic Networks in Development and Cancer: Resolving Biomarkers and Therapeutic Targets from a Cloud of Data&lt;br /&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
&lt;b&gt;Speaker&lt;/b&gt;: Kevin White, University of Chicago&lt;br /&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
&lt;b&gt;When&lt;/b&gt;: Tuesday February 14, 2012, 1:00pm EST&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
&lt;b&gt;Summary&lt;/b&gt;:&amp;nbsp;&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
Systems level approaches to construct abstract molecular networks can lead to predictions about genetic and biochemical functions in cells, organisms and in disease states. I will show examples of this approach from work in my laboratory. In one example we used an integrated experimental and computational approach to construct a large scale functional network in Drosophila melanogaster built around key transcription factors involved in the process of embryonic segmentation. Our network model is based on a combination of gene expression, transcription factor DNA binding site mapping, automated literature mining and protein-protein interaction mapping. We provide a strategy for reducing the dimensionality of the massive networks that result from such integrated whole genome analyses.&amp;nbsp;&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
Using results from one factor in particular, we demonstrated that our approach can rapidly translate a finding in a model organism to the development of a therapeutic target in kidney cancer. In another example, we built a large scale network based on gene expression and genome-wide ChIP results for 40 transcription factors, including two dozen Nuclear Receptor (NR) class proteins. Using this NR network we identified novel prognostic signatures for breast cancer survival and recurrence, as well as new therapeutic leads.&amp;nbsp;&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
Finally, if time permits I will talk about how we are mining The Cancer Genome Atlas along with data from the Chicago Cancer Genomes Project using the Bionimbus Cloud in order to identify new tumor suppressors and panels of genetic markers capable of classifying cancer subtypes that correspond to patient outcome.&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
&lt;a href="http://videocast.nih.gov/summary.asp?live=10995" target="_blank"&gt;NIH Videocast:&amp;nbsp;Genomic Networks in Development and Cancer: Resolving Biomarkers and Therapeutic Targets from a Cloud of Data&lt;/a&gt;&lt;/div&gt;
&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6232819486261696035-7203047735418516532?l=gettinggeneticsdone.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/GettingGeneticsDone/~4/SiMmYabTQes" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://gettinggeneticsdone.blogspot.com/feeds/7203047735418516532/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2012/02/webinar-genomic-networks-resolving.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/7203047735418516532?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/7203047735418516532?v=2" /><link rel="alternate" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2012/02/webinar-genomic-networks-resolving.html" title="Webinar: Genomic Networks - Resolving Biomarkers from a Cloud of Data" /><author><name>Stephen Turner</name><uri>http://www.blogger.com/profile/06656711316726116187</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="26" height="32" src="http://3.bp.blogspot.com/-aT3qBWI4VYc/TgvR9CnlS0I/AAAAAAAAMDk/KuA2GGqURcc/s220/pic2-cropped-400x500.jpg" /></author><thr:total>0</thr:total></entry><entry gd:etag="W/&quot;CkYEQH04fip7ImA9WhRbF0w.&quot;"><id>tag:blogger.com,1999:blog-6232819486261696035.post-7854899309726992677</id><published>2012-02-08T08:08:00.000-06:00</published><updated>2012-02-08T08:08:21.336-06:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-02-08T08:08:21.336-06:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="ggplot2" /><category scheme="http://www.blogger.com/atom/ns#" term="Visualization" /><category scheme="http://www.blogger.com/atom/ns#" term="R" /><title>Hadley Wickham: ggplot2 Webinar (Today!)</title><content type="html">&lt;br /&gt;
&lt;b&gt;Title:&lt;span class="Apple-tab-span" style="white-space: pre;"&gt; &lt;/span&gt;A Backstage Tour of ggplot2 with Hadley Wickham&lt;/b&gt;&lt;br /&gt;
&lt;b&gt;Date:&lt;span class="Apple-tab-span" style="white-space: pre;"&gt; &lt;/span&gt;Wednesday, February 8, 2012&lt;/b&gt;&lt;br /&gt;
&lt;b&gt;Time:&lt;span class="Apple-tab-span" style="white-space: pre;"&gt; &lt;/span&gt;11:00AM - 12:00PM Pacific&lt;/b&gt;&lt;br /&gt;
&lt;b&gt;Presenter:&lt;span class="Apple-tab-span" style="white-space: pre;"&gt; &lt;/span&gt;Hadley Wickham, Professor of Statistics, Rice University&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;a href="http://www.revolutionanalytics.com/news-events/free-webinars/2012/ggplot2-with-hadley-wickham/" target="_blank"&gt;Register here&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
I used &lt;a href="http://gettinggeneticsdone.blogspot.com/search/label/ggplot2" target="_blank"&gt;ggplot2&lt;/a&gt; extensively a few years ago, but reverted back to base graphics when ggplot2 was too slow for a project I was working on. But ggplot2 and plyr have improved much in the last few years, and I'm starting to pick it back up again. This webinar will give an overview of ggplot2, a preview of some of ggplot2's forthcoming new features, and will discuss ggplot2's internals and development over the last few years and how ggplot2 development is becoming easier.&lt;br /&gt;
&lt;br /&gt;
I received an email yesterday saying that the registration list is over 1000 long, so it's a good idea to sign into the webinar early to make sure you get a spot. Hit the link below to register and you'll get a link to the webinar.&lt;br /&gt;
&lt;br /&gt;
&lt;a href="http://www.revolutionanalytics.com/news-events/free-webinars/2012/ggplot2-with-hadley-wickham/" target="_blank"&gt;A Backstage Tour of ggplot2 with Hadley Wickham&lt;/a&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6232819486261696035-7854899309726992677?l=gettinggeneticsdone.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/GettingGeneticsDone/~4/SZM8kZTaMbk" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://gettinggeneticsdone.blogspot.com/feeds/7854899309726992677/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2012/02/hadley-wickham-ggplot2-webinar-today.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/7854899309726992677?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/7854899309726992677?v=2" /><link rel="alternate" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2012/02/hadley-wickham-ggplot2-webinar-today.html" title="Hadley Wickham: ggplot2 Webinar (Today!)" /><author><name>Stephen Turner</name><uri>http://www.blogger.com/profile/06656711316726116187</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="26" height="32" src="http://3.bp.blogspot.com/-aT3qBWI4VYc/TgvR9CnlS0I/AAAAAAAAMDk/KuA2GGqURcc/s220/pic2-cropped-400x500.jpg" /></author><thr:total>0</thr:total></entry><entry gd:etag="W/&quot;Ak4FQXY8fSp7ImA9WhRUEEs.&quot;"><id>tag:blogger.com,1999:blog-6232819486261696035.post-6536223172088465316</id><published>2012-01-20T08:15:00.001-06:00</published><updated>2012-01-20T08:15:10.875-06:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-01-20T08:15:10.875-06:00</app:edited><title>Joint Techs Netcast: Enhancing Infrastructure Support for Data Intensive Science</title><content type="html">The winter Joint Techs meeting is next week in Baton Rouge. I'm not going, but I plan on participating via a &lt;a href="http://events.internet2.edu/2012/jt-loni/agenda.cfm?go=netcast" target="_blank"&gt;netcast&lt;/a&gt; to see what's going on.&amp;nbsp;Jim Bottum, Clemson's CIO, is moderating an entire day devoted to the topic Enhancing Infrastructure Support for Data Intensive Science. Of particular interest to me are the talks from 9:30-11am Tuesday January 24 from researchers and those supporting climatology, genomics, and the XSEDE projects. The afternoon of January 24 has some talks from academic and government labs who've&amp;nbsp;successfully deployed methods to enhance their infrastructure support for data intensive science. Check out the full agenda for the day &lt;a href="http://events.internet2.edu/2012/jt-loni/agenda.cfm?types=&amp;amp;details=&amp;amp;timespan=2012-01-24" target="_blank"&gt;here&lt;/a&gt;. These sessions sound particularly relevant for those researching and supporting large-scale genomics and bioinformatics projects.&lt;br /&gt;
&lt;br /&gt;
&lt;a href="http://events.internet2.edu/2012/jt-loni/agenda.cfm?go=netcast" target="_blank"&gt;Joint Techs Meeting Netcast&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6232819486261696035-6536223172088465316?l=gettinggeneticsdone.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/GettingGeneticsDone/~4/5HUR8LbGTB0" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://gettinggeneticsdone.blogspot.com/feeds/6536223172088465316/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2012/01/joint-techs-netcast-enhancing.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/6536223172088465316?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/6536223172088465316?v=2" /><link rel="alternate" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2012/01/joint-techs-netcast-enhancing.html" title="Joint Techs Netcast: Enhancing Infrastructure Support for Data Intensive Science" /><author><name>Stephen Turner</name><uri>http://www.blogger.com/profile/06656711316726116187</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="26" height="32" src="http://3.bp.blogspot.com/-aT3qBWI4VYc/TgvR9CnlS0I/AAAAAAAAMDk/KuA2GGqURcc/s220/pic2-cropped-400x500.jpg" /></author><thr:total>0</thr:total></entry><entry gd:etag="W/&quot;DkUBR3k-eip7ImA9WhRVGE8.&quot;"><id>tag:blogger.com,1999:blog-6232819486261696035.post-4779486207616584590</id><published>2012-01-17T12:17:00.000-06:00</published><updated>2012-01-17T12:17:36.752-06:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-01-17T12:17:36.752-06:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Software" /><category scheme="http://www.blogger.com/atom/ns#" term="R" /><category scheme="http://www.blogger.com/atom/ns#" term="Bioinformatics" /><title>Annotating limma Results with Gene Names for Affy Microarrays</title><content type="html">Lately I've been using the &lt;a href="http://bioconductor.org/packages/release/bioc/html/limma.html" target="_blank"&gt;limma&lt;/a&gt; package often for analyzing microarray data. When I read in Affy CEL files using ReadAffy(), the resulting ExpressionSet won't contain any featureData annotation. Consequentially, when I run topTable to get a list of differentially expressed genes, there's no annotation information other than the Affymetrix probeset IDs or transcript cluster IDs. There are other ways of annotating these results (INNER JOIN to a MySQL database, &lt;a href="http://www.bioconductor.org/packages/2.2/bioc/html/biomaRt.html" target="_blank"&gt;biomaRt&lt;/a&gt;, etc), but I would like to have the output from topTable already annotated with gene information. Ideally, I could annotate each probeset ID with a gene symbol, gene name, Ensembl ID, and have that Ensembl ID hyperlink out to the Ensembl genome browser. With some &lt;a href="https://stat.ethz.ch/pipermail/bioconductor/2011-February/037866.html" target="_blank"&gt;help from Gordon Smyth&lt;/a&gt; on the Bioconductor Mailing list, I found that annotating the ExpressionSet object results in the output from topTable also being annotated.&lt;br /&gt;
&lt;br /&gt;
The results from topTable are pretty uninformative without annotation:&lt;br /&gt;
&lt;script src="https://gist.github.com/1627892.js?file=noanno.txt"&gt;
&lt;/script&gt;&lt;br /&gt;
&lt;br /&gt;
After annotation:&lt;br /&gt;
&lt;script src="https://gist.github.com/1627896.js?file=afteranno.txt"&gt;
&lt;/script&gt;&lt;br /&gt;
&lt;br /&gt;
You can generate an HTML file with clickable links to the Ensembl Genome Browser for each gene:&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://4.bp.blogspot.com/-gzRPpFeET40/TxW6NqSiDcI/AAAAAAAAnE4/C8ixO4MXx7o/s1600/Picture.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="157" src="http://4.bp.blogspot.com/-gzRPpFeET40/TxW6NqSiDcI/AAAAAAAAnE4/C8ixO4MXx7o/s320/Picture.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://1.bp.blogspot.com/-_HE6mHwMggw/TxW6P7e1HzI/AAAAAAAAnFA/1eqL66KET3U/s1600/Picture+1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="320" src="http://1.bp.blogspot.com/-_HE6mHwMggw/TxW6P7e1HzI/AAAAAAAAnFA/1eqL66KET3U/s320/Picture+1.png" width="262" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
&lt;br /&gt;
Here's the R code to do it:&lt;br /&gt;
&lt;script src="https://gist.github.com/1627927.js?file=annotatelimma.r"&gt;
&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6232819486261696035-4779486207616584590?l=gettinggeneticsdone.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/GettingGeneticsDone/~4/sGcx_FKdKPk" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://gettinggeneticsdone.blogspot.com/feeds/4779486207616584590/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2012/01/annotating-limma-results-with-gene.html#comment-form" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/4779486207616584590?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/4779486207616584590?v=2" /><link rel="alternate" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2012/01/annotating-limma-results-with-gene.html" title="Annotating limma Results with Gene Names for Affy Microarrays" /><author><name>Stephen Turner</name><uri>http://www.blogger.com/profile/06656711316726116187</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="26" height="32" src="http://3.bp.blogspot.com/-aT3qBWI4VYc/TgvR9CnlS0I/AAAAAAAAMDk/KuA2GGqURcc/s220/pic2-cropped-400x500.jpg" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://4.bp.blogspot.com/-gzRPpFeET40/TxW6NqSiDcI/AAAAAAAAnE4/C8ixO4MXx7o/s72-c/Picture.png" height="72" width="72" /><thr:total>2</thr:total></entry><entry gd:etag="W/&quot;DEIFQ3w5eSp7ImA9WhRWF0o.&quot;"><id>tag:blogger.com,1999:blog-6232819486261696035.post-3801894636132263702</id><published>2012-01-05T09:13:00.000-06:00</published><updated>2012-01-05T09:15:12.221-06:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-01-05T09:15:12.221-06:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Productivity" /><category scheme="http://www.blogger.com/atom/ns#" term="Tutorials" /><category scheme="http://www.blogger.com/atom/ns#" term="R" /><title>New Year's Resolution: Learn How to Code</title><content type="html">Farhad Manjoo at Slate has a good article on &lt;a href="http://www.slate.com/articles/technology/technology/2012/01/learn_to_program_make_a_free_weekly_coding_lesson_your_new_year_s_resolution_.single.html" target="_blank"&gt;why you need to learn how to program&lt;/a&gt;. Chances are, if you're reading this post here you're already fairly adept at some form of programming. But if you're not, you should give it some serious thought.&lt;br /&gt;
&lt;br /&gt;
Gina Trapani, former editor of tech blog Lifehacker, is quoted in the article:&lt;br /&gt;
&lt;blockquote class="tr_bq"&gt;
“Learning to code demystifies tech in a way that empowers and enlightens. When you start coding you realize that every digital tool you have ever used involved lines of code just like the ones you're writing, and that if you want to make an existing app better, you can do just that with the same foreach and if-then statements every coder has ever used.”&lt;/blockquote&gt;
Farhad makes the point that programming is important even in traditionally non-computational fields: if you were a travel agent in the 90's and knew how to code, not only would you have been able to see the approaching inevitable collapse of your profession, but perhaps you would have been able to get in early on the dot-com travel industry boom.&lt;br /&gt;
&lt;br /&gt;
&lt;a href="http://gettinggeneticsdone.blogspot.com/2011/02/get-all-your-questions-answered.html" target="_blank"&gt;Q&amp;amp;A sites for biologists&lt;/a&gt; are littered with questions from researchers asking for non-technical, code-free ways of doing a particular analysis. Your friendly bioinformatics or computational biology neighbor can often point to a resource or design a solution that can get you 90% of the way, but usually won't grok the biological problem as truly as you do. By learning even the smallest bit of programming, you can at least be equipped with the knowledge of what is programmatically possible, and collaborations with your bioinformatician can be more fruitful. As every field of biological research becomes more computational in nature, learning how to code is becoming more important than ever.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;Where to start&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
Getting started really isn't that difficult. Grab a good text editor like Notepad++ for windows, TextMate or Macvim for Mac, or vim for Linux/Unix. What language should you start with? This can be a subject of intense debate, but in reality, it doesn't matter - just pick something that's relevant to what you're doing. If you know Perl or Java, you can pick up the basics of Ruby or C++ in a weekend. I started with Perl (using the &lt;a href="http://www.amazon.com/o/ASIN/1449303587/ref=nosim/gettgenedone-20" target="_blank"&gt;Llama book&lt;/a&gt;), but for scientific computing and basic scripting/automation, I would recommend learning Python instead. While Perl lets you get away with sloppy coding, terse shortcuts, with the motto of "there's more than one way to do it," Python forces you to keep your code tidy, and has a model that there's probably one best way to do something, and that's the way you should use. Python has a huge following in the scientific community - chances are you'll find plenty of useful functionality in the &lt;a href="http://biopython.org/" target="_blank"&gt;BioPython&lt;/a&gt; and &lt;a href="http://www.scipy.org/" target="_blank"&gt;SciPy&lt;/a&gt; modules. I learned Python in an afternoon through watching videos and doing exercises in&amp;nbsp;&lt;a href="http://code.google.com/edu/languages/google-python-class/" target="_blank"&gt;Google's Python Class&lt;/a&gt;, and the free book &lt;a href="http://www.diveintopython.net/" target="_blank"&gt;Dive Into Python&lt;/a&gt; is a great reference. If you're on Windows, you can get Python from ActiveState; if you're on Mac or Linux, you already have Python.&lt;br /&gt;
&lt;br /&gt;
The Slate article also points to &lt;a href="http://codeyear.com/" target="_blank"&gt;Code Year&lt;/a&gt; - a site that will send you interactive coding projects once a week throughout 2012 starting January 9. Code Year is from the creators of &lt;a href="http://www.codecademy.com/" target="_blank"&gt;Code Academy&lt;/a&gt; - a site with a series of fun, interactive JavaScript tutorials. Lifehacker has a 5-part &lt;a href="http://lifehacker.com/5744113/learn-to-code-the-full-beginners-guide" target="_blank"&gt;"Night School" series on the basics of programming&lt;/a&gt;. Once you have some basic programming chops, take a look at Stanford's free &lt;a href="http://jan2012.ml-class.org/" target="_blank"&gt;machine learning&lt;/a&gt;,&amp;nbsp;&lt;a href="https://www.ai-class.com/" target="_blank"&gt;artificial intelligence&lt;/a&gt;, and &lt;a href="http://www.nlp-class.org/" target="_blank"&gt;Natural Language Processing&lt;/a&gt;&amp;nbsp;classes to hone your scientific computing skills. Need a challenge? Try the &lt;a href="http://www.pythonchallenge.com/" target="_blank"&gt;Python Challenge&lt;/a&gt; for fun puzzles to hone your Python skills, or check out &lt;a href="http://projecteuler.net/" target="_blank"&gt;Project Euler&lt;/a&gt; if you want to tackle more math-oriented programming challenges with any language. The point is - there is no lack of free resources to help you get started or get better at programming.&lt;br /&gt;
&lt;br /&gt;
&lt;a href="http://www.slate.com/articles/technology/technology/2012/01/learn_to_program_make_a_free_weekly_coding_lesson_your_new_year_s_resolution_.single.html" target="_blank"&gt;Slate - You Need to Learn How to Program&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6232819486261696035-3801894636132263702?l=gettinggeneticsdone.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/GettingGeneticsDone/~4/pwJJv7G46eA" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://gettinggeneticsdone.blogspot.com/feeds/3801894636132263702/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2012/01/new-years-resolution-learn-how-to-code.html#comment-form" title="1 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/3801894636132263702?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/3801894636132263702?v=2" /><link rel="alternate" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2012/01/new-years-resolution-learn-how-to-code.html" title="New Year's Resolution: Learn How to Code" /><author><name>Stephen Turner</name><uri>http://www.blogger.com/profile/06656711316726116187</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="26" height="32" src="http://3.bp.blogspot.com/-aT3qBWI4VYc/TgvR9CnlS0I/AAAAAAAAMDk/KuA2GGqURcc/s220/pic2-cropped-400x500.jpg" /></author><thr:total>1</thr:total></entry><entry gd:etag="W/&quot;Ak8GR3k8fip7ImA9WhRXE00.&quot;"><id>tag:blogger.com,1999:blog-6232819486261696035.post-4441580376952065191</id><published>2011-12-15T15:37:00.000-06:00</published><updated>2011-12-19T09:33:46.776-06:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-12-19T09:33:46.776-06:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="R" /><title>Query a MySQL Database from R using RMySQL</title><content type="html">I use this all the time, and the setup is dead simple. Follow &lt;a href="https://gist.github.com/1482991"&gt;the code&lt;/a&gt; below to load the RMySQL package, connect to a database (here the UCSC genome browser's public MySQL instance), set up a function to make querying easier, and query the database to return results as a data frame.&lt;br /&gt;
&lt;br /&gt;
&lt;script src="https://gist.github.com/1482991.js?file=rmysql.r"&gt;
&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6232819486261696035-4441580376952065191?l=gettinggeneticsdone.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/GettingGeneticsDone/~4/TYbbdLxe6gI" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://gettinggeneticsdone.blogspot.com/feeds/4441580376952065191/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/12/query-mysql-database-from-r-using.html#comment-form" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/4441580376952065191?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/4441580376952065191?v=2" /><link rel="alternate" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/12/query-mysql-database-from-r-using.html" title="Query a MySQL Database from R using RMySQL" /><author><name>Stephen Turner</name><uri>http://www.blogger.com/profile/06656711316726116187</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="26" height="32" src="http://3.bp.blogspot.com/-aT3qBWI4VYc/TgvR9CnlS0I/AAAAAAAAMDk/KuA2GGqURcc/s220/pic2-cropped-400x500.jpg" /></author><thr:total>2</thr:total></entry><entry gd:etag="W/&quot;CEUDRns8fCp7ImA9WhRQGUU.&quot;"><id>tag:blogger.com,1999:blog-6232819486261696035.post-3250033445526668928</id><published>2011-12-15T14:51:00.000-06:00</published><updated>2011-12-15T14:51:17.574-06:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-12-15T14:51:17.574-06:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Writing" /><category scheme="http://www.blogger.com/atom/ns#" term="Announcements" /><category scheme="http://www.blogger.com/atom/ns#" term="Bioinformatics" /><title>Galaxy Project Group on CiteULike and Mendeley</title><content type="html">&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://wiki.g2.bx.psu.edu/CiteULike" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://1.bp.blogspot.com/-V4mwYdS6Ihw/Tupdfg5TZBI/AAAAAAAAmCI/aR2Kg62Pg1s/s1600/Picture+3.png" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;br /&gt;&lt;/div&gt;
The &lt;a href="http://usegalaxy.org/"&gt;Galaxy Project&lt;/a&gt; started using CiteULike to organize papers that are about, use, or reference Galaxy. The &lt;a href="http://www.citeulike.org/group/16008"&gt;Galaxy CiteULike group&lt;/a&gt; is open to any CUL user, and once you join, you can add papers to the group, assign tags, and rate papers. &lt;br /&gt;
&lt;br /&gt;
While not a CUL user, I'm a big fan of &lt;a href="http://www.mendeley.com/"&gt;Mendeley&lt;/a&gt; for managing references, PDFs, and creating bibliographies (&lt;a href="http://gettinggeneticsdone.blogspot.com/2011/02/results-from-reference-management-poll.html"&gt;and so are many of you&lt;/a&gt;). I'm happy to hear that the Galaxy folks also set up a &lt;a href="http://www.mendeley.com/groups/1710745/galaxy-project/"&gt;Galaxy Mendeley Group&lt;/a&gt;, also open to the public for anyone to join.&amp;nbsp; If you join the Galaxy public Mendeley group, all of the groups references will show up in your Mendeley library (and these won't count against your personal quota).&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://2.bp.blogspot.com/-My2xF2i0CgI/TupbZSwRs0I/AAAAAAAAmBo/XNC5p4aB0GI/s1600/galaxymendeley.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="395" src="http://2.bp.blogspot.com/-My2xF2i0CgI/TupbZSwRs0I/AAAAAAAAmBo/XNC5p4aB0GI/s400/galaxymendeley.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
Just one important thing to note: The Mendeley group is a &lt;i&gt;mirror&lt;/i&gt; of the CiteULike group, so if you want to add more publications to the Galaxy Group, &lt;i&gt;add them on CiteULike,&lt;/i&gt; not Mendeley (it doesn't work the other way around - papers added to Mendeley won't make it to the CUL group).&lt;br /&gt;
&lt;br /&gt;
&lt;a href="http://wiki.g2.bx.psu.edu/CiteULike"&gt;Galaxy Project Group on CiteULike and Mendeley&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6232819486261696035-3250033445526668928?l=gettinggeneticsdone.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/GettingGeneticsDone/~4/4wJkF3fJhJk" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://gettinggeneticsdone.blogspot.com/feeds/3250033445526668928/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/12/galaxy-project-group-on-citeulike-and.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/3250033445526668928?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/3250033445526668928?v=2" /><link rel="alternate" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/12/galaxy-project-group-on-citeulike-and.html" title="Galaxy Project Group on CiteULike and Mendeley" /><author><name>Stephen Turner</name><uri>http://www.blogger.com/profile/06656711316726116187</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="26" height="32" src="http://3.bp.blogspot.com/-aT3qBWI4VYc/TgvR9CnlS0I/AAAAAAAAMDk/KuA2GGqURcc/s220/pic2-cropped-400x500.jpg" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://1.bp.blogspot.com/-V4mwYdS6Ihw/Tupdfg5TZBI/AAAAAAAAmCI/aR2Kg62Pg1s/s72-c/Picture+3.png" height="72" width="72" /><thr:total>0</thr:total></entry><entry gd:etag="W/&quot;D0UDQno_fSp7ImA9WhRQE0k.&quot;"><id>tag:blogger.com,1999:blog-6232819486261696035.post-2348199117673614095</id><published>2011-12-08T05:48:00.001-06:00</published><updated>2011-12-08T05:54:33.445-06:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-12-08T05:54:33.445-06:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Sequencing" /><category scheme="http://www.blogger.com/atom/ns#" term="R" /><category scheme="http://www.blogger.com/atom/ns#" term="Bioinformatics" /><title>RNA-Seq &amp; ChiP-Seq Data Analysis Course at EBI</title><content type="html">I just got this announcement from EMBL-EBI about an RNA-seq/ChIP-seq analysis hands-on course. Find the full details, schedule, and speaker list &lt;a href="http://www.ebi.ac.uk/training/handson/course_120502_RNA.html"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;i&gt;Title&lt;/i&gt;: &lt;a href="http://www.ebi.ac.uk/training/handson/course_120502_RNA.html"&gt;Advanced RNA-Seq and Chip-Seq Data Analysis Course&lt;/a&gt;&lt;br /&gt; &lt;i&gt;Date&lt;/i&gt;: May 1-4 2012&lt;br /&gt; &lt;i&gt;Venue&lt;/i&gt;: EMBL-EBI, Hinxton, Nr Cambridge, CB10 1SD, UK&lt;br /&gt; &lt;i&gt;Registration Closing Date&lt;/i&gt;: March 6 2012 (12:00 midday GMT)&lt;br /&gt;&lt;br /&gt;This course is aimed at advanced PhD students and post-doctoral researchers who are applying or planning to apply high throughput sequencing technologies and bioinformatics methods in their research. The aim of this course is to familiarize the participants with advanced data analysis methodologies and provide hands-on training on the latest analytical approaches. &lt;br /&gt;&lt;br /&gt; Lectures will give insight into how biological knowledge can be generated from RNA-seq and ChIP-seq experiments and illustrate different ways of analyzing such data Practicals will consist of computer exercises that will enable the participants to apply statistical methods to the analysis of RNA-seq and ChIP-seq data under the guidance of the lecturers and teaching assistants. Familiarity with the technology and biological use cases of high throughput sequencing is required, as is some experience with R/Bioconductor. &lt;br /&gt;&lt;br /&gt; The course covers data analysis of RNA-Seq and ChIP-Seq experiments.&lt;br /&gt; Topics will include: alignment, data handling and visualisation, region identification, differential expression, data quality assessment and statistical analysis, using R/Bioconductor.&lt;div class="blogger-post-footer"&gt;Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6232819486261696035-2348199117673614095?l=gettinggeneticsdone.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/GettingGeneticsDone/~4/A9sC66SmlSc" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://gettinggeneticsdone.blogspot.com/feeds/2348199117673614095/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/12/rna-seq-chip-seq-data-analysis-course.html#comment-form" title="1 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/2348199117673614095?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/2348199117673614095?v=2" /><link rel="alternate" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/12/rna-seq-chip-seq-data-analysis-course.html" title="RNA-Seq &amp; ChiP-Seq Data Analysis Course at EBI" /><author><name>Stephen Turner</name><uri>http://www.blogger.com/profile/06656711316726116187</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="26" height="32" src="http://3.bp.blogspot.com/-aT3qBWI4VYc/TgvR9CnlS0I/AAAAAAAAMDk/KuA2GGqURcc/s220/pic2-cropped-400x500.jpg" /></author><thr:total>1</thr:total></entry><entry gd:etag="W/&quot;CUUCSHczfip7ImA9WhRQEk0.&quot;"><id>tag:blogger.com,1999:blog-6232819486261696035.post-1205612774126081401</id><published>2011-12-06T14:11:00.001-06:00</published><updated>2011-12-06T14:27:49.986-06:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-12-06T14:27:49.986-06:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Sequencing" /><category scheme="http://www.blogger.com/atom/ns#" term="R" /><category scheme="http://www.blogger.com/atom/ns#" term="Bioinformatics" /><title>An example RNA-Seq Quality Control and Analysis Workflow</title><content type="html">I found the slides below on the &lt;a href="http://jura.wi.mit.edu/bio/education/"&gt;education page&lt;/a&gt; from Bioinformatics &amp;amp; Research Computing at the Whitehead Institute. The first set (&lt;a href="http://jura.wi.mit.edu/bio/education/hot_topics/QC_HTP/QC_HTP.pdf"&gt;PDF&lt;/a&gt;) gives an overview of the methods and software available for quality assessment of microarray and RNA-seq experiments using the &lt;a href="http://hannonlab.cshl.edu/fastx_toolkit/"&gt;FastX toolkit&lt;/a&gt; and &lt;a href="http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/"&gt;FastQC&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
&lt;iframe height="450" src="http://docs.google.com/viewer?url=http%3A%2F%2Fjura.wi.mit.edu%2Fbio%2Feducation%2Fhot_topics%2FQC_HTP%2FQC_HTP.pdf&amp;amp;embedded=true" style="border: none;" width="500"&gt;&lt;/iframe&gt;&lt;br /&gt;
&lt;br /&gt;
The second set (&lt;a href="http://jura.wi.mit.edu/bio/education/hot_topics/RNAseq/RNAseqDE_Dec2011.pdf"&gt;PDF&lt;/a&gt;)&amp;nbsp; gives an example RNA-seq workflow using &lt;a href="http://tophat.cbcb.umd.edu/"&gt;TopHat&lt;/a&gt;, &lt;a href="http://samtools.sourceforge.net/"&gt;SAMtools&lt;/a&gt;, &lt;a href="http://www-huber.embl.de/users/anders/HTSeq/doc/overview.html"&gt;Python/HTseq&lt;/a&gt;, and &lt;a href="http://www-huber.embl.de/users/anders/DESeq/"&gt;R/DEseq&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
&lt;iframe height="450" src="http://docs.google.com/viewer?url=http%3A%2F%2Fjura.wi.mit.edu%2Fbio%2Feducation%2Fhot_topics%2FRNAseq%2FRNAseqDE_Dec2011.pdf&amp;amp;embedded=true" style="border: none;" width="500"&gt;&lt;/iframe&gt;&lt;br /&gt;
&lt;br /&gt;
If you're doing any RNA-seq work these are both really nice resources to help you get a command-line based analysis workflow up and running (if you're not using &lt;a href="http://gettinggeneticsdone.blogspot.com/2011/11/guide-to-rna-seq-analysis-in-galaxy.html"&gt;Galaxy for RNA-seq&lt;/a&gt;).&lt;div class="blogger-post-footer"&gt;Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6232819486261696035-1205612774126081401?l=gettinggeneticsdone.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/GettingGeneticsDone/~4/8fhFT_mTsiU" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://gettinggeneticsdone.blogspot.com/feeds/1205612774126081401/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/12/example-rna-seq-quality-control-and.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/1205612774126081401?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/1205612774126081401?v=2" /><link rel="alternate" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/12/example-rna-seq-quality-control-and.html" title="An example RNA-Seq Quality Control and Analysis Workflow" /><author><name>Stephen Turner</name><uri>http://www.blogger.com/profile/06656711316726116187</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="26" height="32" src="http://3.bp.blogspot.com/-aT3qBWI4VYc/TgvR9CnlS0I/AAAAAAAAMDk/KuA2GGqURcc/s220/pic2-cropped-400x500.jpg" /></author><thr:total>0</thr:total></entry><entry gd:etag="W/&quot;AkAGQ3k5eyp7ImA9WhRQEU0.&quot;"><id>tag:blogger.com,1999:blog-6232819486261696035.post-5444651446771477730</id><published>2011-12-05T12:06:00.001-06:00</published><updated>2011-12-05T12:12:02.723-06:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-12-05T12:12:02.723-06:00</app:edited><title>Webinar: Applications of Next-Generation Sequencing in Clinical Care</title><content type="html">I just got an email from Illumina about a webinar that looks interesting this Wednesday at 9am PST (noon EST) on clinical applications of next-gen sequencing. &lt;br /&gt;
&lt;br /&gt;
Date: Wednesday, December 7, 2011&lt;br /&gt;Time: 9:00 AM (PST)&lt;br /&gt;Speaker: Rick Dewey, MD, Stanford Center for Inherited Cardiovascular Disease&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Next-generation sequencing (NGS) presents both challenges and opportunities for clinical care. Dr. Dewey will share examples from his experience at Stanford, successful and otherwise, in which NGS has been applied to cases of familial cardiomyopathy, and other inherited conditions. Bring your questions for a Q&amp;amp;A session. In this webinar, Dr. Dewey will discuss approaches to: Data storage and management; Error identification and reduction; Disease risk encoded in the reference sequence; and Variant validation.&lt;br /&gt;
&lt;br /&gt;
The webinar will be recorded and available to you afterwards if you register.&lt;br /&gt;
&lt;br /&gt;
&lt;a href="http://mkt.illumina.com/Webinar_Landingpage_IGS_112811.html"&gt;Registration - Applications of Next-Generation Sequencing in Clinical Care&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6232819486261696035-5444651446771477730?l=gettinggeneticsdone.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/GettingGeneticsDone/~4/DIwC6C1xGEY" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://gettinggeneticsdone.blogspot.com/feeds/5444651446771477730/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/12/webinar-applications-of-next-generation.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/5444651446771477730?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/5444651446771477730?v=2" /><link rel="alternate" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/12/webinar-applications-of-next-generation.html" title="Webinar: Applications of Next-Generation Sequencing in Clinical Care" /><author><name>Stephen Turner</name><uri>http://www.blogger.com/profile/06656711316726116187</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="26" height="32" src="http://3.bp.blogspot.com/-aT3qBWI4VYc/TgvR9CnlS0I/AAAAAAAAMDk/KuA2GGqURcc/s220/pic2-cropped-400x500.jpg" /></author><thr:total>0</thr:total></entry><entry gd:etag="W/&quot;AkUBQHs_cCp7ImA9WhRSFk8.&quot;"><id>tag:blogger.com,1999:blog-6232819486261696035.post-4352481818485473354</id><published>2011-11-18T08:50:00.001-06:00</published><updated>2011-11-18T08:57:31.548-06:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-11-18T08:57:31.548-06:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Bioinformatics" /><title>BioMart Gene ID Converter</title><content type="html">&lt;a href="http://www.biomart.org/"&gt;BioMart&lt;/a&gt; recently got a facelift. I'm not sure if this was always available in the old BioMart, but there's now a link to a &lt;a href="http://central.biomart.org/converter/#!/ID_converter/gene_ensembl_config_2"&gt;gene ID converter&lt;/a&gt; that worked pretty well for me for converting S. cerevisiae gene IDs to standard gene names. It looks like the tool will convert nearly any ID you could imagine. Looks like it will also map Affy probe IDs to gene, transcript, or protein IDs and names.&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://central.biomart.org/converter/#!/ID_converter/gene_ensembl_config_2" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="230" src="http://4.bp.blogspot.com/-SdvoN6SqDoY/TsZxs_LlFPI/AAAAAAAAl94/4_lKYy8u1G0/s400/2011-11-18_095157.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;a href="http://central.biomart.org/converter/#!/ID_converter/gene_ensembl_config_2"&gt;BioMart Gene ID Converter&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6232819486261696035-4352481818485473354?l=gettinggeneticsdone.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/GettingGeneticsDone/~4/yYd6CzJZsNA" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://gettinggeneticsdone.blogspot.com/feeds/4352481818485473354/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/11/biomart-gene-id-converter.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/4352481818485473354?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/4352481818485473354?v=2" /><link rel="alternate" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/11/biomart-gene-id-converter.html" title="BioMart Gene ID Converter" /><author><name>Stephen Turner</name><uri>http://www.blogger.com/profile/06656711316726116187</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="26" height="32" src="http://3.bp.blogspot.com/-aT3qBWI4VYc/TgvR9CnlS0I/AAAAAAAAMDk/KuA2GGqURcc/s220/pic2-cropped-400x500.jpg" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://4.bp.blogspot.com/-SdvoN6SqDoY/TsZxs_LlFPI/AAAAAAAAl94/4_lKYy8u1G0/s72-c/2011-11-18_095157.png" height="72" width="72" /><thr:total>0</thr:total></entry><entry gd:etag="W/&quot;CkIMQHc-fSp7ImA9WhRSFUs.&quot;"><id>tag:blogger.com,1999:blog-6232819486261696035.post-2113732268572589058</id><published>2011-11-17T13:29:00.001-06:00</published><updated>2011-11-17T14:09:41.955-06:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-11-17T14:09:41.955-06:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="R" /><category scheme="http://www.blogger.com/atom/ns#" term="Bioinformatics" /><title>GEO2R: Web App to Analyze Gene Expression in GEO Datasets Using R</title><content type="html">&lt;a href="http://www.ncbi.nlm.nih.gov/geo/"&gt;Gene Expression Omnibus&lt;/a&gt; is NCBI's repository for publicly available gene expression data with thousands of datasets having over 600,000 samples with array or sequencing data. You can download data from GEO using FTP, or download and load the data directly into R using the &lt;a href="http://www.bioconductor.org/packages/1.8/bioc/html/GEOquery.html"&gt;GEOquery&lt;/a&gt; bioconductor package written (and &lt;a href="http://www.bioconductor.org/packages/1.8/bioc/vignettes/GEOquery/inst/doc/GEOquery.pdf"&gt;well documented&lt;/a&gt;) by &lt;a href="https://twitter.com/#%21/seandavis12"&gt;Sean Davis&lt;/a&gt;, and analyze the data using the &lt;a href="http://www.bioconductor.org/packages/release/bioc/html/limma.html"&gt;limma package&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
&lt;a href="http://www.ncbi.nlm.nih.gov/geo/info/geo2r.html"&gt;GEO2R&lt;/a&gt; is a very nice web-based tool to do this graphically and automatically. Enter the GEO series number in the search box (or use &lt;a href="http://www.ncbi.nlm.nih.gov/geo/geo2r/?acc=GSE7442"&gt;this one&lt;/a&gt; for an example). Start by creating groups (e.g. control vs treatment, early vs late time points in a time course, etc), then select samples to add to that group.&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://4.bp.blogspot.com/-AUPHkYFCYzA/TsVj9lORyNI/AAAAAAAAl9k/EXgKM90XNKU/s1600/2011-11-17_144235.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://4.bp.blogspot.com/-AUPHkYFCYzA/TsVj9lORyNI/AAAAAAAAl9k/EXgKM90XNKU/s1600/2011-11-17_144235.png" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
&lt;br /&gt;
Scroll down to the bottom and click Top 250 to run an analysis in limma (the &lt;a href="http://www.bioconductor.org/packages/2.9/bioc/vignettes/limma/inst/doc/usersguide.pdf"&gt;users guide&lt;/a&gt; documents this well). GEO2R will automatically fetch the data, group your samples, create your design matrix for your differential expression analysis, run the analysis, and annotate the results. A big complaint with point-and-click GUI and web based applications is the lack of reproducibility. GEO2R obviates this problem by giving you all the R code it generated to run the analysis. Click the R script tab to see the R code it generated, and save it for later.&lt;br /&gt;
&lt;br /&gt;
&lt;script src="https://gist.github.com/1374252.js?file=demo_geo2r.r"&gt;
&lt;/script&gt;&lt;br /&gt;
&lt;br /&gt;
The options tab allows you to adjust the multiple testing correction method, and the value distribution tab lets you take a look at the distribution gene expression values among the samples that you assigned to your groups.&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://1.bp.blogspot.com/-GtAWj2M2H84/TsVmL7YJguI/AAAAAAAAl9s/7dCa4Zr0Nug/s1600/Rplot.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://1.bp.blogspot.com/-GtAWj2M2H84/TsVmL7YJguI/AAAAAAAAl9s/7dCa4Zr0Nug/s1600/Rplot.png" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
&lt;br /&gt;
There's no built-in quality assessment tools in GEO2R, but you can always take the R code it generated and do your own QA/QC. It's also important to verify what values it's pulling from each array into the data matrix. In this example, epithelial cells at various time points were compared to a reference cell line, and the log base 2 fold change was calculated. This was used in the data matrix rather than the actual expression values.&lt;br /&gt;
&lt;br /&gt;
GEO2R is a very nice tool to quickly run an analysis on data in GEO. Now, if we could only see something similar for the European repository, ArrayExpress.&lt;br /&gt;
&lt;br /&gt;
&lt;a href="http://www.ncbi.nlm.nih.gov/geo/geo2r/"&gt;GEO2R: Web App to Analyze Gene Expression in GEO Datasets Using R&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6232819486261696035-2113732268572589058?l=gettinggeneticsdone.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/GettingGeneticsDone/~4/X0ariGAC_Wo" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://gettinggeneticsdone.blogspot.com/feeds/2113732268572589058/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/11/geo2r-web-app-to-analyze-gene.html#comment-form" title="3 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/2113732268572589058?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/2113732268572589058?v=2" /><link rel="alternate" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/11/geo2r-web-app-to-analyze-gene.html" title="GEO2R: Web App to Analyze Gene Expression in GEO Datasets Using R" /><author><name>Stephen Turner</name><uri>http://www.blogger.com/profile/06656711316726116187</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="26" height="32" src="http://3.bp.blogspot.com/-aT3qBWI4VYc/TgvR9CnlS0I/AAAAAAAAMDk/KuA2GGqURcc/s220/pic2-cropped-400x500.jpg" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://4.bp.blogspot.com/-AUPHkYFCYzA/TsVj9lORyNI/AAAAAAAAl9k/EXgKM90XNKU/s72-c/2011-11-17_144235.png" height="72" width="72" /><thr:total>3</thr:total></entry><entry gd:etag="W/&quot;C0YFR3gyeCp7ImA9WhRTEUs.&quot;"><id>tag:blogger.com,1999:blog-6232819486261696035.post-3184286931372991887</id><published>2011-11-01T10:24:00.001-05:00</published><updated>2011-11-01T10:25:16.690-05:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-11-01T10:25:16.690-05:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Tutorials" /><category scheme="http://www.blogger.com/atom/ns#" term="Recommended Reading" /><category scheme="http://www.blogger.com/atom/ns#" term="Sequencing" /><title>Guide to RNA-seq Analysis in Galaxy</title><content type="html">&lt;a href="http://bx.mathcs.emory.edu/"&gt;James Taylor&lt;/a&gt; came to UVA last week and gave an excellent talk on how &lt;a href="http://usegalaxy.org/"&gt;Galaxy&lt;/a&gt; enables transparent and reproducible research in genomics. I'm gearing up to take on several projects that involve next-generation sequencing, and I'm considering &lt;a href="http://getgalaxy.org/"&gt;installing my own&lt;/a&gt; Galaxy framework on a local cluster or &lt;a href="http://wiki.g2.bx.psu.edu/Admin/Cloud"&gt;on the cloud&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
If you've used Galaxy in the past you're probably aware that it allows you to share data, workflows, and histories with other users. New to me was the &lt;a href="http://main.g2.bx.psu.edu/page/list_published"&gt;pages section&lt;/a&gt;, where an entire analysis is packaged on a single pages, and vetting is crowdsourced to other Galaxy users in the form of comments and voting.&lt;br /&gt;
&lt;br /&gt;
I recently found a page published by Galaxy user Jeremy that serves as a &lt;a href="http://main.g2.bx.psu.edu/u/jeremy/p/galaxy-rna-seq-analysis-exercise"&gt;guide to RNA-seq analysis using Galaxy&lt;/a&gt;. If you've never done RNA-seq before it's a great place to start. The guide has all the data you need to get started on an experiment where you'll use TopHat/Bowtie to align reads to a reference genome, and Cufflinks to assemble transcripts and quantify differential gene expression, alternative splicing, etc. The dataset is small, so all the analyses start and finish quickly, allowing you to finish the tutorial in just a few hours. The author was kind enough to include links to relevant sections of the TopHat and Cufflinks documentation where it's needed in the tutorial. Hit the link below to get started.&lt;br /&gt;
&lt;br /&gt;
&lt;a href="http://main.g2.bx.psu.edu/u/jeremy/p/galaxy-rna-seq-analysis-exercise"&gt;Galaxy Pages: RNA-seq Analysis Exercise&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6232819486261696035-3184286931372991887?l=gettinggeneticsdone.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/GettingGeneticsDone/~4/NzLbBkNsgn0" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://gettinggeneticsdone.blogspot.com/feeds/3184286931372991887/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/11/guide-to-rna-seq-analysis-in-galaxy.html#comment-form" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/3184286931372991887?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/3184286931372991887?v=2" /><link rel="alternate" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/11/guide-to-rna-seq-analysis-in-galaxy.html" title="Guide to RNA-seq Analysis in Galaxy" /><author><name>Stephen Turner</name><uri>http://www.blogger.com/profile/06656711316726116187</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="26" height="32" src="http://3.bp.blogspot.com/-aT3qBWI4VYc/TgvR9CnlS0I/AAAAAAAAMDk/KuA2GGqURcc/s220/pic2-cropped-400x500.jpg" /></author><thr:total>2</thr:total></entry><entry gd:etag="W/&quot;DkcDQXY6fSp7ImA9WhdaF04.&quot;"><id>tag:blogger.com,1999:blog-6232819486261696035.post-7000425127932039812</id><published>2011-10-27T11:47:00.001-05:00</published><updated>2011-10-27T11:47:50.815-05:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-10-27T11:47:50.815-05:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Clustering" /><category scheme="http://www.blogger.com/atom/ns#" term="Visualization" /><category scheme="http://www.blogger.com/atom/ns#" term="GWAS" /><category scheme="http://www.blogger.com/atom/ns#" term="R" /><category scheme="http://www.blogger.com/atom/ns#" term="Bioinformatics" /><title>A New Dimension to Principal Components Analysis</title><content type="html">&lt;br /&gt;
&lt;br /&gt;
&lt;div class="MsoNormal"&gt;
In general, the standard practice for correcting for population stratification in genetic studies is to use principal components analysis (PCA) to categorize samples along different&amp;nbsp;&lt;i&gt;ethnic axes&lt;/i&gt;.&amp;nbsp;&amp;nbsp;&lt;a href="http://genepath.med.harvard.edu/~reich/Price%20et%20al.pdf"&gt;Price et al.&lt;/a&gt;&amp;nbsp;published on this in 2006, and since then PCA plots are a common component of many published GWAS studies.&amp;nbsp; One key advantage to using PCA for ethnicity is that each sample is given coordinates in a multidimensional space corresponding to the varying components of their ethnic ancestry.&amp;nbsp; Using either full GWAS data or a set of ancestral informative markers (AIMs), PCA can be easily conducted using available software packages like&amp;nbsp;&lt;a href="http://genepath.med.harvard.edu/~reich/Software.htm"&gt;EIGENSOFT&lt;/a&gt;&amp;nbsp;or&amp;nbsp;&lt;a href="http://gump.qimr.edu.au/gcta/"&gt;GCTA&lt;/a&gt;.&amp;nbsp;HapMap samples are sometimes included in the PCA analysis to provide a frame of reference for the ethnic groups. &amp;nbsp;&lt;o:p&gt;&lt;/o:p&gt;&lt;/div&gt;
&lt;div class="MsoNormal"&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div class="MsoNormal"&gt;
Once computed, each sample will have values that correspond to a position in the new coordinate system that effectively clusters samples together by ethnic similarity. &amp;nbsp;The results of this analysis are usually plotted/visualized to identify ethnic outliers or to simply examine the structure of the data. &amp;nbsp;A common problem however is that it may take more than the first two principal components to identify groups. &amp;nbsp;&lt;/div&gt;
&lt;div class="MsoNormal"&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div class="MsoNormal"&gt;
To illustrate, I will plot some PCs generated based on 125 AIMs markers for a recent study of ours. &amp;nbsp;I generated these using GCTA software and loaded the top 5 PCs into R using the read.table() function. &amp;nbsp;I loaded the top 5, but for continental ancestry, I've found that the top 3 are usually enough to separate groups. &amp;nbsp;The values look something like this: &amp;nbsp;&lt;/div&gt;
&lt;blockquote class="tr_bq"&gt;
&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;&amp;nbsp; &amp;nbsp; new_ruid &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;pc1 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;pc2 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; pc3 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;pc4 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; pc5&lt;br /&gt;1 &amp;nbsp; &amp;nbsp;11596 &amp;nbsp;4.10996e-03 -0.002883830 &amp;nbsp;0.003100840 -0.00638232 &amp;nbsp;0.00709780&lt;br /&gt;2 &amp;nbsp; &amp;nbsp; 5415 &amp;nbsp;3.22958e-03 -0.000299851 -0.005358910 &amp;nbsp;0.00660643 &amp;nbsp;0.00430520&lt;br /&gt;3 &amp;nbsp; &amp;nbsp;11597 -4.35116e-03 &amp;nbsp;0.013282400 &amp;nbsp;0.006398130 &amp;nbsp;0.01721600 -0.02275470&lt;br /&gt;4 &amp;nbsp; &amp;nbsp; 5416 &amp;nbsp;4.01592e-03 &amp;nbsp;0.001408180 &amp;nbsp;0.005077310 &amp;nbsp;0.00159497 &amp;nbsp;0.00394816&lt;br /&gt;5 &amp;nbsp; &amp;nbsp; 3111 &amp;nbsp;3.04779e-03 -0.002079510 -0.000127967 -0.00420436 &amp;nbsp;0.01257460&lt;br /&gt;6 &amp;nbsp; &amp;nbsp;11598 &amp;nbsp;6.15318e-06 -0.000279919 &amp;nbsp;0.001060880 &amp;nbsp;0.00606267 &amp;nbsp;0.00954331&lt;/span&gt;&lt;/blockquote&gt;
&lt;div&gt;
I loaded this into a dataframe called pca, so I can plot the first two PCs using this command:&lt;/div&gt;
&lt;blockquote class="tr_bq"&gt;
plot(pca$pc1, pca$pc2)&lt;/blockquote&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://1.bp.blogspot.com/-WuZRG08fmDU/TqheB_z7d_I/AAAAAAAABKc/tLKelT_-ctc/s1600/pc1and2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="238" src="http://1.bp.blogspot.com/-WuZRG08fmDU/TqheB_z7d_I/AAAAAAAABKc/tLKelT_-ctc/s320/pc1and2.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
We might also want to look at the next two PCs:&lt;br /&gt;
&lt;blockquote class="tr_bq"&gt;
plot(pca$pc2, pca$pc3)&lt;/blockquote&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://1.bp.blogspot.com/-xqG7cSLxGvc/TqheV71q5oI/AAAAAAAABKk/Escu9Z6nUvE/s1600/pc2and3.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="238" src="http://1.bp.blogspot.com/-xqG7cSLxGvc/TqheV71q5oI/AAAAAAAABKk/Escu9Z6nUvE/s320/pc2and3.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
&amp;nbsp;Its probably best to look at all of them together:&lt;br /&gt;
&lt;blockquote class="tr_bq"&gt;
pairs(pca[2:4])&lt;/blockquote&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://1.bp.blogspot.com/-qSfFRq3dZcw/TqheurajM8I/AAAAAAAABKs/Gq4eIeZmzGM/s1600/pairs.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="238" src="http://1.bp.blogspot.com/-qSfFRq3dZcw/TqheurajM8I/AAAAAAAABKs/Gq4eIeZmzGM/s320/pairs.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
&lt;br /&gt;
So this is where my mind plays tricks on me. &amp;nbsp;I can't make much sense out of these plots -- there should be four ethnic groups represented, but its hard to see who goes where. &amp;nbsp;To look at all of these dimensions simultaneously, we need a 3D plot. &amp;nbsp;Now 3D plots (especially 3D&amp;nbsp;&lt;i&gt;scatterplots)&amp;nbsp;&lt;/i&gt;aren't highly regarded -- in fact I hear that some&amp;nbsp;&lt;a href="https://lh6.googleusercontent.com/-qS7jJhy91Vg/Rn9pBLaq_JI/AAAAAAAAC8Y/3-ar4b8AGIA/s640/Mount2Sound%2525206-24-2007%2525203-28-17%252520PM.JPG"&gt;poor soul&lt;/a&gt;&amp;nbsp;at the University of Washington gets laughed at for showing his 3D plots &amp;nbsp;-- but in this case I found them quite useful.&lt;br /&gt;
&lt;br /&gt;
Using a library called rgl, I generated a 3D scatterplot like so:&lt;br /&gt;
&lt;blockquote class="tr_bq"&gt;
&amp;nbsp;plot3d(pca[2:4])&lt;/blockquote&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://2.bp.blogspot.com/-x-K3ZcsMJOM/Tql4wiXVEwI/AAAAAAAABLA/_b3QVFccTT8/s1600/noclust.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://2.bp.blogspot.com/-x-K3ZcsMJOM/Tql4wiXVEwI/AAAAAAAABLA/_b3QVFccTT8/s1600/noclust.gif" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
&lt;br /&gt;
Now, using the mouse I could rotate and play with the cloud of data points, and it became more clear how the ethnic groups sorted out. &amp;nbsp;Just to double check my intuition, I ran a model-based clustering algorithm (&lt;a href="http://www.stat.washington.edu/fraley/mclust/tr504.pdf"&gt;mclust&lt;/a&gt;) on the data. &amp;nbsp;Different parameters obviously produce different cluster patterns, but I found that using an "ellipsoidal model with equal variances" and a cluster size of 4 identified the groups I thought should be there based on the overlay with the HapMap samples.&lt;br /&gt;
&lt;br /&gt;
&lt;blockquote class="tr_bq"&gt;
fit &amp;lt;- Mclust(pca[2:4], G=4, modelNames = "EEV")&lt;/blockquote&gt;
&lt;blockquote class="tr_bq"&gt;
plot3d(pca[2:4], col = fit$classification)&amp;nbsp;&lt;/blockquote&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;/div&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;/div&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://1.bp.blogspot.com/-pgMAHiIWvuw/Tql5HIXNdRI/AAAAAAAABLI/I2zPF5cLRwQ/s1600/clust.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://1.bp.blogspot.com/-pgMAHiIWvuw/Tql5HIXNdRI/AAAAAAAABLI/I2zPF5cLRwQ/s1600/clust.gif" /&gt;&lt;/a&gt;&lt;/div&gt;
Basically, the red sphere corresponds to the European descent group, the green indicates the admixed African American group, the black group corresponds to the Hispanic group, and the blue identifying the Asian descent group. &amp;nbsp;We are still a bit confused as to why the Asian descent samples don't form a more concise cluster -- it may be due to relatively poor performance of these AIMs in Asian descent groups. &amp;nbsp; Whatever the case, you might notice several individuals falling either outside a clear cluster or at the interface between two groups. &amp;nbsp;The ethnic assignment for these individuals is questionable, but the clustering algorithm gives us a very nice measure of cluster assignment uncertainty. &amp;nbsp;We can plot this like so:&lt;br /&gt;
&lt;blockquote class="tr_bq"&gt;
plot(pca[2:3], cex = fit$uncertainty*10)&lt;/blockquote&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://1.bp.blogspot.com/-uaGDcHAHoIw/Tql_NAuNf1I/AAAAAAAABLQ/71PTT2lUyLA/s1600/uncertainty.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="238" src="http://1.bp.blogspot.com/-uaGDcHAHoIw/Tql_NAuNf1I/AAAAAAAABLQ/71PTT2lUyLA/s320/uncertainty.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;
I had to scale the uncertainty factor by 10 to make the questionable points more visible in this plot, shown as the hollow circles. &amp;nbsp;We will likely drop these samples from any stratified analyses. &amp;nbsp;We can export the cluster assignment by accessing the fit$classification column, and we have our samples assigned to an ethnic group.&lt;br /&gt;
&lt;br /&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6232819486261696035-7000425127932039812?l=gettinggeneticsdone.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/GettingGeneticsDone/~4/rPEXrjMv2kc" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://gettinggeneticsdone.blogspot.com/feeds/7000425127932039812/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/10/new-dimension-to-principal-components_27.html#comment-form" title="5 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/7000425127932039812?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/7000425127932039812?v=2" /><link rel="alternate" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/10/new-dimension-to-principal-components_27.html" title="A New Dimension to Principal Components Analysis" /><author><name>Will</name><uri>http://www.blogger.com/profile/09703349044940180835</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://1.bp.blogspot.com/-WuZRG08fmDU/TqheB_z7d_I/AAAAAAAABKc/tLKelT_-ctc/s72-c/pc1and2.png" height="72" width="72" /><thr:total>5</thr:total></entry><entry gd:etag="W/&quot;Ak4DRH4-eip7ImA9WhdbGUk.&quot;"><id>tag:blogger.com,1999:blog-6232819486261696035.post-1475248792265832281</id><published>2011-10-18T09:41:00.000-05:00</published><updated>2011-10-18T09:42:55.052-05:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-10-18T09:42:55.052-05:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Machine Learning" /><category scheme="http://www.blogger.com/atom/ns#" term="Recommended Reading" /><category scheme="http://www.blogger.com/atom/ns#" term="Twitter" /><category scheme="http://www.blogger.com/atom/ns#" term="GWAS" /><category scheme="http://www.blogger.com/atom/ns#" term="Bioinformatics" /><title>My thoughts on ICHG 2011</title><content type="html">&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://2.bp.blogspot.com/-85pQXz9VB8E/TpyArwOEY6I/AAAAAAAABKA/bhYrzX2qAhM/s1600/ichg_pg.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://2.bp.blogspot.com/-85pQXz9VB8E/TpyArwOEY6I/AAAAAAAABKA/bhYrzX2qAhM/s1600/ichg_pg.png" /&gt;&lt;/a&gt;&lt;/div&gt;
I’m a bit exhausted from a week of excellent science at ICHG.  First, let me say that Montreal is a truly remarkable city with fantastic food and a fascinating blend of architectural styles, all making the meeting a fun place to be…. Now on to the genomics – I’ll recap a few of the most exciting sessions I attended.   You can find a live-stream of tweets from the meeting by searching the &lt;a href="http://twitter.com/#!/search/realtime/%23ICHG2011"&gt;#ICHG2011&lt;/a&gt;&amp;nbsp;and &lt;a href="http://twitter.com/#!/search/realtime/%23ICHG"&gt;#ICHG&lt;/a&gt; hashtags.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
On Wednesday, Marylyn Ritchie(&lt;a href="http://twitter.com/#!/MarylynRitchie"&gt;@MarylynRitchie&lt;/a&gt;) and Nancy Cox organized “Beyond Genome-wide association studies”. &amp;nbsp;Nancy Cox presented some ideas on how to integrate multiple “intermediate” associations for SNPs, such as expression QTLs and newly discovered protein QTLs (More on pQTLs later).  This approach which she called a &lt;i&gt;Functional Unit Analysis&lt;/i&gt; would group signals together based on the genes they influence.  Nicholas Shork presented some nice examples of pros and cons of sequence level annotation algorithms. &amp;nbsp;Trey Idekker gave a very nice talk illustrating some of the properties of epistasis in yeast protein interaction networks.  One of the more striking points he made was that epistasis tends to occur between full protein complexes rather than within elements of the complexes themselves.   Marylyn Ritchie presented the ideas behind her &lt;a href="http://ritchielab.com/method.php?method=athena"&gt;ATHENA&lt;/a&gt; software for machine learning analysis of genetic data, and Manuel Mattesian from Tim Becker’s group presented the methods in their &lt;a href="http://intersnp.meb.uni-bonn.de/"&gt;INTERSNP&lt;/a&gt; software for doing large-scale interaction analysis.  What was most impressive with this session is that there were clear attempts to incorporate underlying biological complexity into data analysis. &lt;br /&gt;
&lt;br /&gt;
On Thursday, I attended the second Statistical Genetics section called “Expanding Genome-wide Association Studies”, organized by Saurabh Ghosh and Daniel Shriner.  Having recently attended IGES, I feel pretty “up” on newer analysis techniques, but this session had a few talks that sparked my interest.  The first three talks were related to haplotype phasing and the issues surrounding computational accuracy and speed.  The basic goal of all these methods is to efficiently estimate genotypes for a common set of loci for all samples of a study using a set of reference haplotypes, usually from the HapMap or 1000 genomes data.  Despite these advances, it seems like phasing haplotypes for thousands of samples is still a massive undertaking that requires a high-performance computing cluster.  There were several talks about ongoing epidemiological studies, including the Kaiser Permanente UCSF cohort.  Neil Risch presented an elegant study design implementing four custom GWAS chips for the four targeted populations.  Looks like the data hasn't started to flow from this yet, but when it does we’re sure to learn about lots of interesting ethnic-specific disease effects.  My good friend and colleague Dana Crawford presented an &lt;i&gt;in silico&lt;/i&gt; GWAS study of hypothyroidism.  In her best NPR voice, Dana showed how electronic medical records with GWAS data in the&lt;a href="https://www.mc.vanderbilt.edu/victr/dcc/projects/acc/index.php/Main_Page"&gt; EMERGE&lt;/a&gt; network can be re-used to construct entirely new studies nested within the data collected for other specific disease purposes.  Her excellent Post-Doc, Logan Dumitrescu presented several gene-environment interactions between Lipid levels and vitamin A and E from Dana’s EAGLE study.  Finally Paul O’Reilly presented a cool new way to look at multiple phenotypes by essentially flipping a typical regression equation around, estimating coefficients that relate each phenotype in a study to a single SNP genotype as an outcome.  This rather clever approach called &lt;a href="http://cran.r-project.org/web/packages/MultiPhen/index.html"&gt;MultiPhen&lt;/a&gt; is similar to log-linear models I’ve seen used for transmission-based analysis, and allows you to model the “interaction” among phenotypes in much the same way you would look at SNP interactions.&lt;br /&gt;
&lt;br /&gt;
&amp;nbsp;By far the most interesting talks of the meeting (for me) were in the Genomics section on Gene Expression, organized by Tomi Pastinen and Mark Corbett.  Chris Mason started the session off with a fantastic demonstration of the power of RNA-seq.  Examining transcriptomes of 14 non-human primate species, they validated many of the computational predictions in the &lt;a href="http://www.ncbi.nlm.nih.gov/IEB/Research/Acembly/"&gt;AceView&lt;/a&gt; gene build, and illustrated that most “exome” sequencing is probably examining less than half of all transcribed sequences.  Rupali Patwardhan talked about a system for examining the impact of promoter and enhancer mutations in whole mice, essentially using mutagenesis screens to localize these regions.  Ron Hause presented work on the protein QTLs that Nancy Cox alluded to earlier in the conference.  Using a high-throughput form of western blots, they systematically examined levels for over 400 proteins in the Yoruba HapMap cell lines.  They also illustrate that only about 50% of eQTLs identified in these lines actually alter protein levels.  Stephen Montgomery spoke about the impact of rare genetic variants within a transcript on transcript levels.  Essentially he showed an epistatic effect on expression, where transcripts with deleterious alleles are less likely to be expressed – an intuitive and fascinating finding, especially for those considering rare-variant analysis. Athma Pai presented a new QTL that influences mRNA decay rates.  By measuring multiple time points using RNA-seq, she found individual-level variants that alter decay, which she calls dQTLs.  Veronique Adoue looked at cis-eQTLs relative to transcription factor binding sites using ChIP, and Alfonso Buil showed how genetic variants influence gene expression networks (or correlation among gene expression) across tissue types.&lt;br /&gt;
&lt;br /&gt;
&amp;nbsp;I must say despite all the awesome work presented in this session, Michael Snyder stole the show with his talk on the “Snyderome” – his own personal –omics profile collected over 21 months.  His whole-genome was sequenced by Complete Genomics, and processed using Rong Chen and Atul Butte’s risk-o-gram to quantify his disease risk.  His profile predicted increased risk of T2D, so he began collecting glucose measures and low and behold, he saw a sustained spike in blood glucose levels following a few days following a common cold.  His interpretation was that an environmental stress knocked him into a pseudo-diabetic state, and his transcriptome and proteome results corroborated this idea.  Granted, this is an N of 1, and there is still lots of work to be done before this type of analysis revolutionizes medicine, but the take home message is salient – multiple -omics are better than one, and everyone’s manifestation of a complex disease is different.  This was truly thought-provoking work, and it nicely closed an entire session devoted to understanding the intermediate impact of genetic variants to better understand disease complexity.  


&lt;br /&gt;
&lt;br /&gt;
This is just my take of a really great meeting -- I'm sure I missed lots of excellent talks. &amp;nbsp;If you saw something good please leave a comment and share!&lt;div class="blogger-post-footer"&gt;Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6232819486261696035-1475248792265832281?l=gettinggeneticsdone.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/GettingGeneticsDone/~4/QRGgNIRXLSI" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://gettinggeneticsdone.blogspot.com/feeds/1475248792265832281/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/10/my-thoughts-on-ichg-2011.html#comment-form" title="1 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/1475248792265832281?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/1475248792265832281?v=2" /><link rel="alternate" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/10/my-thoughts-on-ichg-2011.html" title="My thoughts on ICHG 2011" /><author><name>Will</name><uri>http://www.blogger.com/profile/09703349044940180835</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://2.bp.blogspot.com/-85pQXz9VB8E/TpyArwOEY6I/AAAAAAAABKA/bhYrzX2qAhM/s72-c/ichg_pg.png" height="72" width="72" /><thr:total>1</thr:total></entry><entry gd:etag="W/&quot;DEQEQX84fyp7ImA9WhdbEEU.&quot;"><id>tag:blogger.com,1999:blog-6232819486261696035.post-7553789175644076917</id><published>2011-10-08T09:54:00.003-05:00</published><updated>2011-10-08T10:05:00.137-05:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-10-08T10:05:00.137-05:00</app:edited><title>Find me at ICHG!</title><content type="html">This week, I'm off to Montreal for the International Congress on Human Genetics and I hope to see you there!  &lt;br /&gt;&lt;br /&gt;If you are attending and are already a part of the &lt;a href="http://www.twitter.com"&gt;Twitterverse&lt;/a&gt;, bring a tablet or phone and tweet away about the meeting using the official hashtag,  #ICHG2011.  If you are new to twitter, go &lt;a href="http://twitter.com/"&gt;sign up&lt;/a&gt;!  Using nearly any twitter application, you can search for tweets that contain the #ICHG2011 hashtag and follow the thoughts of your fellow conference goers.  &lt;br /&gt;&lt;br /&gt;Using Twitter at an academic conference is a fascinating experience!  Not only can you get fantastic information about what is being presented at the multitude of sessions, you get lots of opinion on what is going on in the field, and sometimes practically useful tips, like the location of the nearest &lt;a href="http://maps.google.com/maps?hl=en&amp;prmd=imvns&amp;resnum=1&amp;bav=on.2,or.r_gc.r_pw.r_cp.,cf.osb&amp;biw=1212&amp;bih=794&amp;um=1&amp;ie=UTF-8&amp;q=Starbucks+Montreal&amp;fb=1&amp;gl=us&amp;hq=Starbucks&amp;hnear=0x4cc91a541c64b70d:0x654e3138211fefef,Montreal,+QC,+Canada&amp;ei=XWWQTriFKYHAtgeix6SlDA&amp;sa=X&amp;oi=local_group&amp;ct=image&amp;ved=0CAkQtgM"&gt;Starbucks&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;If you see me wandering aimlessly around the conference center, please come say hello!  Its always great to make new academic friends.&lt;br /&gt;&lt;br /&gt;See you there!&lt;br /&gt;&lt;br /&gt;Will&lt;div class="blogger-post-footer"&gt;Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6232819486261696035-7553789175644076917?l=gettinggeneticsdone.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/GettingGeneticsDone/~4/X-FiRuyDvA0" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://gettinggeneticsdone.blogspot.com/feeds/7553789175644076917/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/10/find-me-at-ichg.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/7553789175644076917?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/7553789175644076917?v=2" /><link rel="alternate" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/10/find-me-at-ichg.html" title="Find me at ICHG!" /><author><name>Will</name><uri>http://www.blogger.com/profile/09703349044940180835</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total></entry><entry gd:etag="W/&quot;AkYNQX09eip7ImA9WhdUE04.&quot;"><id>tag:blogger.com,1999:blog-6232819486261696035.post-7225595639745584053</id><published>2011-09-29T11:41:00.009-05:00</published><updated>2011-09-29T18:16:30.362-05:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-09-29T18:16:30.362-05:00</app:edited><title>The Utility of Network Analysis</title><content type="html">Like most bioinformatics nerds (or anyone with a facebook account), I’m fascinated by networks.  Most people immediately think of protein-protein interaction networks, or biological pathways when thinking about networks, but sometimes representing a problem as a network makes solving problems easier.  &lt;br /&gt;&lt;br /&gt;Recently, some collaborators from the &lt;a href="http://www.pagestudy.org"&gt;PAGE study&lt;/a&gt; had a list of a few hundred SNPs gathered from multiple loci across the genome.  For analysis purposes, they were interested in quantifying the number of loci these SNPs represented – in other words, how many distinct signals were represented by their collection of SNPs.  &lt;br /&gt;&lt;br /&gt;We had linkage disequilibrium data from the HapMap for all pairs of SNPs, and we filtered this using an r-squared cutoff.  What we were left with was a mess of SNP pairs that could be tedious to sort through in a spreadsheet.  Instead, I represented each pair of SNPs as an edge in a network and loaded the data into &lt;a href="http://gephi.org/"&gt;Gephi&lt;/a&gt;, which provides some wonderful analysis tools.  Suppose my LD data is structured like this:&lt;br /&gt;&lt;span style="font-family:arial"&gt;&lt;br /&gt;&lt;table border="0"&gt;&lt;tr&gt;&lt;td&gt;&lt;span style="font-weight:bold;"&gt;SNP1&lt;/span&gt;&lt;/td&gt;&lt;td&gt;&lt;span style="font-weight:bold;"&gt;SNP2&lt;/span&gt;&lt;/td&gt;&lt;td&gt;&lt;span style="font-weight:bold;"&gt;d-prime&lt;/span&gt;&lt;/td&gt;&lt;td&gt;&lt;span style="font-weight:bold;"&gt;r-squared&lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;br /&gt;&lt;tr&gt;&lt;td&gt;16969968&lt;/td&gt;&lt;td&gt;1051730&lt;/td&gt;&lt;td&gt;0.98&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;/tr&gt;&lt;br /&gt;&lt;tr&gt;&lt;td&gt;2036534&lt;/td&gt;&lt;td&gt;1051730&lt;/td&gt;&lt;td&gt;0.92&lt;/td&gt;&lt;td&gt;0.205&lt;/td&gt;&lt;/tr&gt;&lt;br /&gt;&lt;tr&gt;&lt;td&gt;578776&lt;/td&gt;&lt;td&gt;1051730&lt;/td&gt;&lt;td&gt;0.96&lt;/td&gt;&lt;td&gt;0.23&lt;/td&gt;&lt;/tr&gt;&lt;br /&gt;&lt;tr&gt;&lt;td&gt;8034191&lt;/td&gt;&lt;td&gt;1051730&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;0.961&lt;/td&gt;&lt;/tr&gt;&lt;br /&gt;&lt;tr&gt;&lt;td&gt;8042374&lt;/td&gt;&lt;td&gt;1051730&lt;/td&gt;&lt;td&gt;0.99&lt;/td&gt;&lt;td&gt;0.205&lt;/td&gt;&lt;/tr&gt;&lt;br /&gt;&lt;tr&gt;&lt;td&gt;...&lt;/td&gt;&lt;td&gt;...&lt;/td&gt;&lt;td&gt;...&lt;/td&gt;&lt;td&gt;...&lt;/td&gt;&lt;/tr&gt;&lt;br /&gt;&lt;/table&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;In a spreadsheet application, I sorted and filtered the LD pairings I wanted using either the r-squared or the d-prime columns.  I then deleted any rows that didn’t meet my cutoff, renamed the header for SNP1 to “Source” and SNP2 to “Target”, and exported the file as a comma-separated file (.csv).  I opened &lt;a href="http://gephi.org/"&gt;Gephi&lt;/a&gt;, clicked the “Data Laboratory” tab, and Import Spreadsheet to load my data.  &lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/-ygNQoXfodMI/ToT5gfVG4xI/AAAAAAAABIs/yISdP-0G7fs/s1600/gephi1.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 314px;" src="http://2.bp.blogspot.com/-ygNQoXfodMI/ToT5gfVG4xI/AAAAAAAABIs/yISdP-0G7fs/s400/gephi1.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5657921368445346578" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Once loaded, I clicked on the “Overview” tab and I can see my graph.  The graph looks like a big mess, but we don’t really care how it looks – we’re going to run an analysis.  In the “statistics” tab on the right-hand side, you’ll see an option for “connected components”.  This runs an algorithm that picks apart and labels collections of nodes that are connected.  Running this only takes a second.  &lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/-7P1BN7bX28o/ToT5uYkubvI/AAAAAAAABI0/t3GJ2acQGDM/s1600/gephi2.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 315px;" src="http://2.bp.blogspot.com/-7P1BN7bX28o/ToT5uYkubvI/AAAAAAAABI0/t3GJ2acQGDM/s400/gephi2.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5657921607149973234" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;I then click on the “Data Laboratory” tab again, and I can see that my nodes are labeled with an ID.  This corresponds to the Locus those SNPs represent.  &lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/-5laCf3ZSY0M/ToT6h9gJfFI/AAAAAAAABI8/680gMCKJSzA/s1600/gephi3.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 314px;" src="http://3.bp.blogspot.com/-5laCf3ZSY0M/ToT6h9gJfFI/AAAAAAAABI8/680gMCKJSzA/s400/gephi3.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5657922493236214866" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;If you want to actually SEE how these relationships fall out, we’ll need to run a layout engine.  Back on the “Overview” tab, on the lower left-hand side, there is a drop-down allowing you to choose a layout engine.  I have found YifanHu’s Multilevel to be the quickest and most effective for separating small groups like these.  Depending on the size of your graph, it may take a moment to run.  Once its finished, you should be able to see the components clearly separated.  If you want, you can color code them by clicking the green “refresh” button in the “partition” tab in the upper left corner.  This reloads the drop-down menu and will provide you with an option to color the nodes by component ID.  Select this, and click apply to see the results!  &lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/-INapvDuRwS0/ToT6pbD0E3I/AAAAAAAABJE/V-YDNOQuElY/s1600/gephi4.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 324px;" src="http://4.bp.blogspot.com/-INapvDuRwS0/ToT6pbD0E3I/AAAAAAAABJE/V-YDNOQuElY/s400/gephi4.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5657922621429519218" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;I’ve used Gephi component analysis to do all kinds of fun things, like the number of families in a study using pairwise IBD estimates, looking at patterns of phenotype sharing in pedigrees, and even visualizing citation networks.  Sometimes representing a problem as a graph lets you find patterns more easily than examining tables of numbers.&lt;div class="blogger-post-footer"&gt;Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6232819486261696035-7225595639745584053?l=gettinggeneticsdone.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/GettingGeneticsDone/~4/RFisqV6uC7c" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://gettinggeneticsdone.blogspot.com/feeds/7225595639745584053/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/09/utility-of-network-analysis.html#comment-form" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/7225595639745584053?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/7225595639745584053?v=2" /><link rel="alternate" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/09/utility-of-network-analysis.html" title="The Utility of Network Analysis" /><author><name>Will</name><uri>http://www.blogger.com/profile/09703349044940180835</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://2.bp.blogspot.com/-ygNQoXfodMI/ToT5gfVG4xI/AAAAAAAABIs/yISdP-0G7fs/s72-c/gephi1.png" height="72" width="72" /><thr:total>2</thr:total></entry><entry gd:etag="W/&quot;C0QCRno6cSp7ImA9WhdWFUw.&quot;"><id>tag:blogger.com,1999:blog-6232819486261696035.post-6681360119986198582</id><published>2011-09-08T14:49:00.000-05:00</published><updated>2011-09-08T14:49:27.419-05:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-09-08T14:49:27.419-05:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Twitter" /><category scheme="http://www.blogger.com/atom/ns#" term="Announcements" /><category scheme="http://www.blogger.com/atom/ns#" term="R" /><category scheme="http://www.blogger.com/atom/ns#" term="Bioinformatics" /><title>I'm Starting a New Position at the University of Virginia</title><content type="html">I just accepted an offer for a faculty position at the University of Virginia in the Center for Public Health Genomics / Department of Public Health Sciences. Starting in October I will be developing and directing a new centralized bioinformatics core in the UVA School of Medicine. Over the next few weeks I'm taking a much-needed vacation next door in Kauai and then packing up for the move to Charlottesville. Posts here may be sparse over the next few weeks, but once I start my new gig I'll be sure to make up for it. And if you're bioinformatics-savvy and in the job market keep an eye out here - once I figure out what I need I will soon be hiring, and will repost any job announcements here.&lt;br /&gt;
&lt;br /&gt;
I've enjoyed my postdoc here at the University of Hawaii Cancer Center, and there is much I'll miss about island life out here in the Pacific. But I'm very seriously looking forward to getting started in this wonderful opportunity at UVA. Thank you all for your comments, suggestions, and help when I needed it. I'll be back online in a few weeks - until then, follow me on Twitter (&lt;a href="http://twitter.com/#%21/genetics_blog"&gt;@genetics_blog&lt;/a&gt;).&lt;br /&gt;
&lt;br /&gt;
Aloha!&lt;div class="blogger-post-footer"&gt;Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6232819486261696035-6681360119986198582?l=gettinggeneticsdone.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/GettingGeneticsDone/~4/OExPXrX_1aQ" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://gettinggeneticsdone.blogspot.com/feeds/6681360119986198582/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/09/im-starting-new-position-at-university.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/6681360119986198582?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/6681360119986198582?v=2" /><link rel="alternate" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/09/im-starting-new-position-at-university.html" title="I'm Starting a New Position at the University of Virginia" /><author><name>Stephen Turner</name><uri>http://www.blogger.com/profile/06656711316726116187</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="26" height="32" src="http://3.bp.blogspot.com/-aT3qBWI4VYc/TgvR9CnlS0I/AAAAAAAAMDk/KuA2GGqURcc/s220/pic2-cropped-400x500.jpg" /></author><thr:total>0</thr:total></entry><entry gd:etag="W/&quot;DUYGQ30-fip7ImA9WhdWFU0.&quot;"><id>tag:blogger.com,1999:blog-6232819486261696035.post-1065081340277534395</id><published>2011-09-08T13:38:00.000-05:00</published><updated>2011-09-08T13:38:42.356-05:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-09-08T13:38:42.356-05:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Recommended Reading" /><category scheme="http://www.blogger.com/atom/ns#" term="Sequencing" /><category scheme="http://www.blogger.com/atom/ns#" term="Bioinformatics" /><title>True Hypotheses are True, False Hypotheses are False</title><content type="html">I just read Gregory Cooper and Jay Shendure's review &lt;a href="http://www.nature.com/nrg/journal/v12/n9/full/nrg3046.html"&gt;"Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data"&lt;/a&gt; in Nature Reviews Genetics. It's a good review about how to narrow down deleterious disease-causing variants from many, many variants throughout the genome when statistics and genetic information alone isn't enough.&lt;br /&gt;
&lt;br /&gt;
I really liked how they framed the multiple-testing problem that routinely plagues large-scale genetic studies, where nominal significance thresholds can yield many false positives when applied to multiple hypothesis tests:&lt;br /&gt;
&lt;br /&gt;
&lt;blockquote&gt;
However, true hypotheses are true, and false hypotheses are false, 
regardless of how many are tested. As such, the actual 'multiple testing
 burden' depends on the proportion of true and false hypotheses in any 
given set: that is, the 'prior probability'
 that any given hypothesis is true, rather than the number of tests per 
se. This challenge can thus be viewed as a 'naive hypothesis testing' 
problem — that is, when in reality only one or a few variants are causal
 for a given phenotype, but all (or many) variants are &lt;i&gt;&lt;span class="i"&gt;a priori&lt;/span&gt;&lt;/i&gt;
 equally likely candidates, the prior probability of any given variant 
being causal is miniscule. As a consequence, extremely convincing data 
are required to support causality, which is potentially unachievable for
 some true positives.&lt;br /&gt;
&lt;br /&gt;
Defining the challenge in terms of hypothesis quality rather than 
quantity, however, points to a solution. Specifically, experimental or 
computational approaches that provide assessments of variant function 
can be used to better estimate the prior probability that any given 
variant is phenotypically important, and these approaches thereby boost 
discovery power.&lt;/blockquote&gt;
&lt;br /&gt;
Check out the full review at Nature Reviews Genetics:&lt;br /&gt;
&lt;br /&gt;
&lt;a href="http://www.nature.com/nrg/journal/v12/n9/full/nrg3046.html"&gt;Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6232819486261696035-1065081340277534395?l=gettinggeneticsdone.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/GettingGeneticsDone/~4/m7aWES1mgHA" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://gettinggeneticsdone.blogspot.com/feeds/1065081340277534395/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/09/true-hypotheses-are-true-false.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/1065081340277534395?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/1065081340277534395?v=2" /><link rel="alternate" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/09/true-hypotheses-are-true-false.html" title="True Hypotheses are True, False Hypotheses are False" /><author><name>Stephen Turner</name><uri>http://www.blogger.com/profile/06656711316726116187</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="26" height="32" src="http://3.bp.blogspot.com/-aT3qBWI4VYc/TgvR9CnlS0I/AAAAAAAAMDk/KuA2GGqURcc/s220/pic2-cropped-400x500.jpg" /></author><thr:total>0</thr:total></entry><entry gd:etag="W/&quot;DkMCQn8zcCp7ImA9WhdWFE8.&quot;"><id>tag:blogger.com,1999:blog-6232819486261696035.post-7003954756642604275</id><published>2011-09-07T14:41:00.000-05:00</published><updated>2011-09-07T14:41:03.188-05:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-09-07T14:41:03.188-05:00</app:edited><title>Excel Template for Mapping Four 96-Well Plates to One 384-Well Plate</title><content type="html">Daniel Cook in &lt;a href="http://genetics.uiowa.edu/"&gt;Jeff Murray&lt;/a&gt;'s lab at the University of Iowa put together &lt;a href="http://www.stephenturner.us/96_to_384_platemapper.xlsx"&gt;this handy Excel template&lt;/a&gt; for keeping track of how samples from four 96-well plates are interleaved to configure a single 384-well plate using robotic liquid handling systems, like the &lt;a href="http://www.stephenturner.us/96_to_384_platemapper.xlsx"&gt;Hydra II&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
Paste in lists of samples on your 96-well plates:&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://3.bp.blogspot.com/-tCfRzLKF62A/TmfH7P6QeJI/AAAAAAAAjmU/gJV0axFAXrA/s1600/2011-09-07_093420.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://3.bp.blogspot.com/-tCfRzLKF62A/TmfH7P6QeJI/AAAAAAAAjmU/gJV0axFAXrA/s1600/2011-09-07_093420.png" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
And you'll get out a map of how the 384-well plate layout:&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://1.bp.blogspot.com/-0ubwxDPGWS4/TmfIL1h3HgI/AAAAAAAAjmY/sG_aqexZrSQ/s1600/2011-09-07_093513.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="85" src="http://1.bp.blogspot.com/-0ubwxDPGWS4/TmfIL1h3HgI/AAAAAAAAjmY/sG_aqexZrSQ/s400/2011-09-07_093513.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
And a summary list:&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://1.bp.blogspot.com/-d0YjwIGjysY/TmfITIFM7HI/AAAAAAAAjmc/jrT8cVFfEE4/s1600/2011-09-07_093600.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="253" src="http://1.bp.blogspot.com/-d0YjwIGjysY/TmfITIFM7HI/AAAAAAAAjmc/jrT8cVFfEE4/s400/2011-09-07_093600.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
&lt;br /&gt;You can &lt;a href="http://www.stephenturner.us/96_to_384_platemapper.xlsx"&gt;download the Excel file here&lt;/a&gt;. Thanks for sharing, Daniel. &lt;div class="blogger-post-footer"&gt;Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6232819486261696035-7003954756642604275?l=gettinggeneticsdone.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/GettingGeneticsDone/~4/NCvAJY2c5EA" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://gettinggeneticsdone.blogspot.com/feeds/7003954756642604275/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/09/excel-template-for-mapping-four-96-well.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/7003954756642604275?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/7003954756642604275?v=2" /><link rel="alternate" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/09/excel-template-for-mapping-four-96-well.html" title="Excel Template for Mapping Four 96-Well Plates to One 384-Well Plate" /><author><name>Stephen Turner</name><uri>http://www.blogger.com/profile/06656711316726116187</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="26" height="32" src="http://3.bp.blogspot.com/-aT3qBWI4VYc/TgvR9CnlS0I/AAAAAAAAMDk/KuA2GGqURcc/s220/pic2-cropped-400x500.jpg" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://3.bp.blogspot.com/-tCfRzLKF62A/TmfH7P6QeJI/AAAAAAAAjmU/gJV0axFAXrA/s72-c/2011-09-07_093420.png" height="72" width="72" /><thr:total>0</thr:total></entry><entry gd:etag="W/&quot;AkMFSXw8eip7ImA9WhdXGEw.&quot;"><id>tag:blogger.com,1999:blog-6232819486261696035.post-3372228763074294102</id><published>2011-08-31T14:15:00.003-05:00</published><updated>2011-08-31T14:20:18.272-05:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-08-31T14:20:18.272-05:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Announcements" /><title>Personal Genomics and Data Sharing Survey</title><content type="html">I was recently contacted by a couple of German biologists working on a project evaluating opinions on sharing raw data from DTC genetic testing companies like 23andme. A handful of people like the gang at &lt;a href="http://www.genomesunzipped.org/data"&gt;Genomes Unzipped&lt;/a&gt;, the &lt;a href="http://www.personalgenomes.org/public/"&gt;PGP-10&lt;/a&gt;, and others at &lt;a href="http://www.snpedia.com/index.php/Genomes"&gt;SNPedia&lt;/a&gt; have released their own genotype or sequencing data into the public domain. As of now, data like this is scattered around the web and most of it is not attached to any phenotype data.&lt;br /&gt;
&lt;br /&gt;
These three biologists are working on a website that collects genetic data as well as phenotypic data. The hope is to make it easy to find and access appropriate data and to become a resource for a kind of open-source GWAS - similar to the &lt;a href="https://www.23andme.com/research/"&gt;research&lt;/a&gt; 23andMe performs in its walled garden right now.&lt;br /&gt;
&lt;br /&gt;
But because of privacy concerns, many people (myself included) hesitate to freely publish their genetic data for the world to see. These three biologists are conducting a survey to assess how willing people might be to participate in something like this, and for what reasons they would (or would not). The survey can be accessed at &lt;a href="http://bit.ly/genotyping_survey"&gt;http://bit.ly/genotyping_survey&lt;/a&gt;. It took about 2 minutes for me to complete, and you can optionally sign up to receive an email with their results once they've completed the survey.&lt;br /&gt;
&lt;br /&gt;
Although I'm still hesitant to participate in something like this myself, I like the idea, and I'm very interested to see the results of their survey. Hit the link below if you'd like to take the quick survey. &lt;br /&gt;
&lt;br /&gt;
&lt;a href="http://bit.ly/genotyping_survey"&gt;Personal Genomics and Data Sharing Survey&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6232819486261696035-3372228763074294102?l=gettinggeneticsdone.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/GettingGeneticsDone/~4/mQ71wslo4sQ" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://gettinggeneticsdone.blogspot.com/feeds/3372228763074294102/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/08/personal-genomics-and-data-sharing.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/3372228763074294102?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/3372228763074294102?v=2" /><link rel="alternate" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/08/personal-genomics-and-data-sharing.html" title="Personal Genomics and Data Sharing Survey" /><author><name>Stephen Turner</name><uri>http://www.blogger.com/profile/06656711316726116187</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="26" height="32" src="http://3.bp.blogspot.com/-aT3qBWI4VYc/TgvR9CnlS0I/AAAAAAAAMDk/KuA2GGqURcc/s220/pic2-cropped-400x500.jpg" /></author><thr:total>0</thr:total></entry><entry gd:etag="W/&quot;DUIGSXo5cCp7ImA9WhdXFkk.&quot;"><id>tag:blogger.com,1999:blog-6232819486261696035.post-7007398607800893924</id><published>2011-08-29T14:52:00.000-05:00</published><updated>2011-08-29T14:52:08.428-05:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-08-29T14:52:08.428-05:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Bioinformatics" /><title>Bioinformatics Posters Collection</title><content type="html">I mentioned &lt;a href="http://biostar.stackexchange.com/"&gt;BioStar&lt;/a&gt; in a &lt;a href="http://gettinggeneticsdone.blogspot.com/2011/02/get-all-your-questions-answered.html"&gt;previous post about getting all your questions answered&lt;/a&gt;. I can't emphasize enough how helpful the BioStar and other StackExchange communities are. Whenever I ask a statistics question on &lt;a href="http://stats.stackexchange.com/"&gt;CrossValidated&lt;/a&gt; or a programming question on &lt;a href="http://stackoverflow.com/"&gt;StackOverflow&lt;/a&gt; I often multiple answers within 10 minutes.&lt;br /&gt;
&lt;br /&gt;
Recently there was a &lt;a href="http://biostar.stackexchange.com/questions/10597/bioinformatics-posters-collection"&gt;question&lt;/a&gt; on BioStar from someone making their poster for a bioinformatics poster presentation and wanted some inspiration for design and layout. No less than 7 community members posted responses the same day, linking to sites where you can download poster presentations, including &lt;a href="http://vizbi.org/2011/Posters/Collection/?poster=A05"&gt;VIZBI 2011&lt;/a&gt; (workshop on visualizing biological data), &lt;a href="http://posters.f1000.com/PosterList?facID=8001"&gt;F1000 Posters&lt;/a&gt; (which collects posters from the Intelligent Systems for Molecular Biology conference), &lt;a href="http://precedings.nature.com/documents/type/poster/revisions"&gt;Nature Precedings&lt;/a&gt; (not specifically limited to bioinformatics), and several others.&lt;br /&gt;
&lt;br /&gt;
While you can see plenty of posters at the meeting you're attending, it isn't much help when you're trying to design and layout your poster beforehand. I've used the same tired old template for poster presentations for years, and it's helpful to see examples of other bioinformatics posters for fresh ideas about design and layout.&lt;br /&gt;
&lt;br /&gt;
I would also encourage you to deposit some of your posters in places like &lt;a href="http://posters.f1000.com/Index?page=Deposit"&gt;F1000&lt;/a&gt; (deposit link) or Nature Precedings (&lt;a href="http://precedings.nature.com/documents/new"&gt;submission link&lt;/a&gt;). While these aren't peer-reviewed, it can really increase the visibility of your work, and it gives you a permanent DOI (at least for Nature Precedings) that you can link to or reference in other scientific communication.&lt;br /&gt;
&lt;br /&gt;
&lt;a href="http://biostar.stackexchange.com/questions/10597/bioinformatics-posters-collection"&gt;See this Q&amp;amp;A at BioStar for more&lt;/a&gt;. &lt;div class="blogger-post-footer"&gt;Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6232819486261696035-7007398607800893924?l=gettinggeneticsdone.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/GettingGeneticsDone/~4/g1BAGi4G4Wo" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://gettinggeneticsdone.blogspot.com/feeds/7007398607800893924/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/08/bioinformatics-posters-collection.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/7007398607800893924?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/7007398607800893924?v=2" /><link rel="alternate" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/08/bioinformatics-posters-collection.html" title="Bioinformatics Posters Collection" /><author><name>Stephen Turner</name><uri>http://www.blogger.com/profile/06656711316726116187</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="26" height="32" src="http://3.bp.blogspot.com/-aT3qBWI4VYc/TgvR9CnlS0I/AAAAAAAAMDk/KuA2GGqURcc/s220/pic2-cropped-400x500.jpg" /></author><thr:total>0</thr:total></entry><entry gd:etag="W/&quot;CUIHSHo-eyp7ImA9WhdXEUw.&quot;"><id>tag:blogger.com,1999:blog-6232819486261696035.post-3341529212990763250</id><published>2011-08-22T17:47:00.007-05:00</published><updated>2011-08-23T10:32:19.453-05:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-08-23T10:32:19.453-05:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Recommended Reading" /><category scheme="http://www.blogger.com/atom/ns#" term="Software" /><category scheme="http://www.blogger.com/atom/ns#" term="GWAS" /><title>Estimating Trait Heritability from GWAS Data</title><content type="html">Peter Visscher and colleagues have recently published a flurry of papers employing a new software package called GCTA to estimate the heritability of traits using GWAS data (GCTA stands for Genome-wide Complex Trait Analysis -- clever acronymity!).  The tool, supported (and presumably coded) by Jian Yang is remarkably easy to use, based in part on the familiar PLINK commandline interface.  The &lt;a href="http://gump.qimr.edu.au/gcta/"&gt;GCTA Homepage&lt;/a&gt; provides an excellent walk-through of the available options.  
&lt;br /&gt;
&lt;br /&gt;The basic idea is to use GWAS data to estimate the degree of "genetic sharing" or relatedness among the samples, computing what the authors call a genetic relationship matrix (GRM).  The degree of genetic sharing among samples is then related to the amount of phenotypic sharing using restricted maximum likelihood analysis (REML).  The result is an estimate of the variance explained by the SNPs used to generate the GRM.  Full details of the stats along with all the gory matrix notation can be found in their &lt;a href="http://www.ncbi.nlm.nih.gov/pubmed/21167468"&gt;software publication&lt;/a&gt;.
&lt;br /&gt;
&lt;br /&gt;The approach has been applied to &lt;a href="http://www.ncbi.nlm.nih.gov/pubmed?term=Estimating%20Missing%20Heritability%20for%20Disease%20from%20Genome-wide%20Association%20Studies"&gt;several disorders studied by the WTCCC&lt;/a&gt; and to a recent study of &lt;a href="http://www.ncbi.nlm.nih.gov/pubmed/20562875"&gt;human height&lt;/a&gt;.  Interestingly, the developers have also used the approach to partition the trait variance &lt;a href="http://www.ncbi.nlm.nih.gov/pubmed/21552263"&gt;across chromosomes&lt;/a&gt;, resulting in something similar to population-based variance-components linkage analysis.  The approach works for both quantitative and dichotomous traits, however the authors warn that variance estimates of dichotomous trait liability are influenced by genotyping artifacts.
&lt;br /&gt;
&lt;br /&gt;The package also includes several other handy features, including a relatively easy way to estimate principal components for population structure correction, a GWAS simulation tool, and a regression-based LD mapping tool.  &lt;a href="http://gump.qimr.edu.au/gcta/download.html"&gt;Download&lt;/a&gt; and play -- a binary is available for Linux, MacOS, and DOS/Windows.
&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6232819486261696035-3341529212990763250?l=gettinggeneticsdone.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/GettingGeneticsDone/~4/nG7N53-PSMQ" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://gettinggeneticsdone.blogspot.com/feeds/3341529212990763250/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/08/estimating-trait-heritability-from-gwas.html#comment-form" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/3341529212990763250?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/3341529212990763250?v=2" /><link rel="alternate" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/08/estimating-trait-heritability-from-gwas.html" title="Estimating Trait Heritability from GWAS Data" /><author><name>Will</name><uri>http://www.blogger.com/profile/09703349044940180835</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>2</thr:total></entry></feed>

