<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss version="2.0">
  <channel>
    <title>NHGRI Active Grants</title>
    <link>http://www.genome.gov/10001799</link>
    <description>NHGRI-supported active grants and their descriptions.</description>
    <language>en-us</language>
	<lastBuildDate>Mon, 2 Nov 2009 10:07:52 EST</lastBuildDate>
	<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" href="http://feeds.feedburner.com/NhgriActiveGrants" type="application/rss+xml" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com" /><item>
		<title>Tool for annotation and analyses of human whole-genome s</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC2HG05619&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC2HG05619&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): Abstract Current estimates place the number of personal variants at approximately 4 million per genome. Given the rapid advances in genome sequencing technologies and the future democratization of human genome sequencing, small groups and even individual scientists will soon be performing their own human genome projects. We believe that the ability to automatically annotate the millions of variants that these projects will produce, to combine data from multiple projects, and to recover subsets of annotated variants for diverse downstream analyses will become a critical analysis bottleneck. Despite the need, there are no publically available tools that automate these procedures. In response to the NHGRI's RFA Development and Application of Statistical and Computational Data Analysis Methods for DNA Sequence, Variation, GWAS, Genomic Function, Chemical Biology and Related Genomic Data Sets we propose in this GO grant to develop a standalone software tool called VAAST-Variant Annotation, Analysis and Selection Tool. This system will fulfill NHGRI's need for a technology to assess data quality and call variants and will allow for analysis of data from all sequencing centers and will be useable for data from all sequencing platforms. We believe VAAST will fill a huge void in the software landscape by helping individual scientists to extract meaningful results from whole genome variant files. PUBLIC HEALTH RELEVANCE: It is now known that on average any two individual human genomes differ by approximately 4 million positions. These differences, called sequence variants, underlie the inherited physical differences between individuals, including their predisposition to develop certain diseases. This project proposes to develop a tool called VAAST- Variation Annotation, Analysis and Selection Tool. VAAST will help researchers sort through these millions of variants in their quest to identify which of them underlie different phenotypic traits of individuals and susceptibility to diseases.</description>
		<pubDate>Wed, 30 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Nanodiagnostics and Nanotherapeutics: Building Research</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC1HG05338&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC1HG05338&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): This proposal on Nanodiagnostics and Nanotherapeutics:Building Research Ethics and Oversight responds to RFA-OD-009-0003, addressing Challenge Area 02: Bioethics and Challenge Topic Unique Ethical Issues Posed by Emerging Technologies, as set forth in 02-OD (OSP)-101. The ability to manipulate atoms and molecules at the nanoscale has catalyzed the emerging field of nanomedicine. While many biological phenomena occur at the nanoscale, nanomedicine denotes material fabricated at the scale of 1-100 nanometers (nm) to take advantage of novel properties (biological, optical, thermal, chemical, and mechanical) that manifest at the nanoscale. A focal area of development is nanodiagnostics and nanotherapeutics. These fields use nanotechnology to develop nanoscale tools for in vitro diagnostics, in vivo imaging agents, drugs and therapies, targeted drug delivery systems, nanoparticle and other nanoscale gene delivery vectors, biomaterials for enhanced tissue engineering, and multi-function medical devices. Nanomedicine poses enormous challenges for human subjects research and oversight particularly because the health, safety, and environmental impacts of nanomaterials are largely unknown and researchers are struggling to characterize these materials and develop adequate toxicology and assessment tools. Despite these unknowns, research on nanodiagnostics and nanotherapeutics is already being conducted with human participants. Institutional Review Boards (IRBs), Data Monitoring Committees (DMCs) and Data Safety Monitoring Boards (DSMBs), funders such as the National Institutes of Health (NIH), and oversight authorities at NIH, the Food and Drug Administration (FDA), and Office for Human Research Protections (OHRP) are already facing acute challenges posed by nanomedicine research, but without systematic guidance on how to address those challenges. Indeed, the FDA has already approved some nanomedical products for use in human beings. Work is urgently needed to address what substantive changes to the rules governing human subjects research and what procedural changes to research oversight are necessary. This 2-year project will examine current and emerging nanomedicine research on drugs, devices, and gene therapy in order to map the issues raised by nanodiagnostic and nanotherapeutic research and oversight and formulate much-needed guidance. We will collect and analyze existing guidance on human subjects research and oversight by the NIH, FDA, and OHRP in nanotherapeutics and nanodiagnostics as well as existing policy analysis. We will use that to inform normative work generating the first systematic recommendations to ensure adequate protections and oversight for human participants in nanomedicine research. This process will involve collaboration among Investigators and Working Group members who are directly involved in the science, medicine, policy, law, and ethics of nanotechnology. This project will have major impact by providing the first systematic and comprehensive guidance on the ethical conduct and oversight of human subjects research on nanotherapeutics and nanodiagnostics. Outcomes will be: (1) an assessment of publicly available documents indicating how researchers, IRBs, DMCs/DSMBs, NIH, FDA, OHRP, and relevant professional societies are currently approaching the ethics of human subjects nanomedicine research; (2) an assessment of how NIH (including the Office of Biotechnology Activities (OBA) and the Recombinant DNA Advisory Committee (RAC)), FDA, and OHRP are approaching the oversight of human subjects nanomedicine research; (3) the first comprehensive and systematic recommendations on the ethics and oversight of human subjects nanomedicine research, to be authored by the Investigators after critique by the Working Group; (4) additional individually authored papers by members of the Working Group; (5) a public conference with videotape archived for free public access; (6) a comprehensive bibliography; and (7) rich web-based resources. This proposed project will serve the goals of ARRA by creating 4 jobs and significantly augmenting a fifth to aid in retention of that job. This project brings together top experts on nanomedicine, biomedical engineering, law, policy, and bioethics to produce the first systematic and comprehensive recommendations on how to protect human participants in research on nanodiagnostics and nanotherapeutics, including drugs, devices, and gene therapy using nonviral nano- vectors. Research in these nano-fields is burgeoning, with research on human participants under way, but current research ethics and oversight have not yet adequately addressed key concerns including: difficulty predicting human response from animal data, uncertainty about how to assess risks, concerns about both participant and third-party safety both short-term and long-term, and challenges obtaining informed consent. The project group will use normative, empirical, and policy analysis to evaluate current approaches to nanomedicine research ethics and oversight, including at NIH, FDA, and OHRP, generating much-needed recommendations on ethics standards and oversight processes.</description>
		<pubDate>Wed, 30 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Microlysis Technology: Enabling Cell Type-Specific Prote</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC1HG05354&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC1HG05354&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): This application addresses broad Challenge Area (06) Enabling Technologies and Specific Challenge Topic 06-HG-102: Technologies for obtaining genomic, proteomic, and metabolomic data from individual viable cells in complex tissues. Perhaps the greatest challenge in the area of proteomics is to develop methods that can report on the abundances and post-translational modification states of many different proteins in a single cell or cell type obtained with high spatial and temporal resolution from complex, living tissue. There are two fundamental issues that need to be addressed in order to meet this challenge. First, new and innovative sample collection methods are needed to enable the fast and efficient recovery of material from single cells embedded in live tissue. Second, highly sensitive analytical techniques are needed that can accurately quantify proteins in a multiplexed fashion in extremely small sample sizes (1-100 cells). Here, a new technology - termed microlysis technology - is described that enables the collection of lysates from single cells embedded in complex, living tissue. This technology uses mobile laminar flow of lysis buffer to efficiently lyse a single cell in approximately three to five seconds and to recover the lysate in a volume of approximately two nanoliters. First, a strategy is presented to build an instrument that automates this technology. Second, a plan is presented to couple this technology with lysate microarray technology in order to quantify protein abundances and post-translational modification states in a highly multiplexed and high-throughput fashion. These development efforts will be focused on the most challenging of tissues, the mammalian brain, which comprises thousands of distinct cell types and hence has defied most biochemical and proteomics efforts to characterize it at the molecular level. The technology presented here can realistically be developed in a two-year timeframe. If this work is successful, a new company will be launched at the end of the funding period to commercialize microlysis technology. Funding of this proposal will stimulate the economy through the immediate acquisition of instrumentation, through the hiring of two postdoctoral fellows, and through the founding of a new company, thereby creating additional jobs. On a scientific level, the ability to collect lysates from single cells embedded in complex living tissue will have a profound effect on the fields of genomics, proteomics, and metabolomics. To date, efforts in these areas have relied either on cultured cells, which have questionable physiological relevance, or on whole tissue lysates, which comprise dozens to hundreds of distinct cell types. Microlysis technology will enable the physiologically relevant study of biomolecules in virtually any solid tissue. One of the greatest challenges in biochemistry is to study dozens to hundreds of molecules simultaneously using single cells or in single cell types obtained from complex, living tissue. Here, we propose a new technology, termed microlysis technology, which enables the automated collection and quantitative analysis of single cells embedded in acute brain slices. This technology will have a profound impact on the fields of genomics, proteomics, and metabolomics since it will enable researchers to study biomolecules in a physiologically relevant fashion in virtually any solid tissue.</description>
		<pubDate>Wed, 30 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>A Data Analysis Center for integration of fly and worm m</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC2HG05639&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC2HG05639&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): The aims of the ENCODE (Encyclopedia of DNA Elements) and modENCODE (model organism ENCODE) projects are to apply high-throughput, cost-efficient approaches to generate a catalog of functional elements in the human, worm, and fly genomes, which will serve as the basis for biomedical research advances. By their smaller genome size, powerful genetics, and ease of experimentation, D. melanogaster and C. elegans can help guide the study of functional elements in the human genome, reveal new insights into global gene regulation and embryo development, and enable experimental studies of gene function and regulation which are not accessible in mammalian systems. This proposal aims to enhance the value of these datasets by creating a Data Analysis Center (DAC) to support, facilitate, and enhance integrative analyses of the modENCODE consortium in fly and worm, to achieve a high-resolution annotation of all their functional elements, and to reveal new insights into the biology and gene regulation of animal genomes including the human. We foresee four central roles for the DAC, and have organized our aims around them. Aim 1: We will provide common computational guidelines for data processing in fly and worm, a common computational infrastructure and pipeline for common analysis and statistical tasks. Aim 2: We will facilitate and carry out element-specific integrative analyses to identify diverse classes of functional elements based on combinations of relevant datasets coming from multiple groups. This includes (a) enhancers, promoters, insulators, and other regions of regulatory importance, (b) protein-coding and non-coding genes, (c) regulatory networks of transcription factor and microRNA targeting, and (d) sequence features predictive of diverse classes of functional elements. Aim 3: We will carry out exploratory data analyses across different data types to discover potentially novel correlations and insights relating diverse classes of elements. In particular we will apply dimensionality reduction techniques to coordinate-based genome-wide genomic and epigenomic datasets, we will apply clustering and bi-clustering methods to identify functionally related sets of genes and modules, and we will analyze structural and dynamic properties of discovered networks. Aim 4: We will carry out comparative analyses across the two model organisms, and also with yeast and human. We will provide an ortholog resource between the species, compare regulatory relationships and dynamics for orthologous cell lines and developmental points, and carry over biological knowledge across model organisms and human. To achieve these four aims, we will work closely with members of the consortium, the modENCODE Analysis Working Group (AWG), consisting of all Principal Investigators and analysis groups, and the Data Coordination Center (DCC), responsible for all data sharing within the consortium and with the larger worm and fly communities. PUBLIC HEALTH RELEVANCE: The aims of the ENCODE (Encyclopedia of DNA Elements) and modENCODE (model organism ENCODE) projects are to apply high-throughput, cost-efficient approaches to generate a catalog of functional elements in the human, worm, and fly genomes, which will serve as the basis for biomedical research advances. By their smaller genome size, powerful genetics, and ease of experimentation, D. melanogaster and C. elegans can help guide the study of functional elements in the human genome, reveal new insights into global gene regulation and embryo development, and enable experimental studies of gene function and regulation which are not accessible in mammalian systems. This proposal aims to enhance the value of these datasets by creating a Data Analysis Center (DAC) to support, facilitate, and enhance integrative analyses of the modENCODE consortium in fly and worm, to achieve a high-resolution annotation of all their functional elements, and to reveal new insights into the biology and gene regulation of animal genomes including the human.</description>
		<pubDate>Wed, 30 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Defining the genetic basis of human respiratory chain di</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC2HG05556&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC2HG05556&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): The mitochondrion is the center stage for energy metabolism, apoptosis, signaling, and ion homeostasis. Much of what we know about this organelle comes from studying mitochondrial respiratory chain disease (RCD). This devastating disease is due to genetic defects in the mtDNA or the nuclear DNA that give rise to a malfunctioning mitochondrial respiratory chain. Virtually all organ systems can be affected. RCD affects an estimated 1:5000 live births and is devastating - it is extremely difficult to diagnose, requiring consultation by multiple physicians and invasive biopsies, and at present no effective therapies are available. A small fraction of these disorders are maternally, but the vast majority of these disorders are due to mutations in nuclear genes, many of which have yet to be identified. Our research team has recently used integrative proteomics to define the ~1100 nuclear genes that encode the mitochondrial proteome - these genes represent a near-comprehensive collection of candidate genes for RCD. In the current application we have assembled a team consisting of leaders in mitochondrial medicine, computational genomics, and large-scale sequencing, to comprehensively resequence all these ~1100 nuclear genes in a panel of ~120 patients with clinically characterized RCD. Through this project, we solve the molecular bases for RCD, establish a facile, comprehensive DNA diagnostic test for RCD; and identify scores of new mitochondrial disease genes that will unlock new pockets of mitochondrial biology. PUBLIC HEALTH RELEVANCE: Mitochondrial disorders comprise one of the largest classes of inherited human disease, affecting both children and adults. We will analyze the DNA of patients with such disorders to discover their molecular basis. The results of this study may help us better diagnose and treat these devastating diseases.</description>
		<pubDate>Wed, 30 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Development of Electron Microscopy-based Nucleic Acid Po</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC2HG05592&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC2HG05592&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): We aim to provide a comprehensive foundation for development of an ultra-low-cost, ultra-fast nucleic acid polymer sequencing technology based on single-atom resolution transmission electron microscopy (TEM) of heavy atom-labeled nucleic acid polymers. Our particular approach is based on TEM imaging of ultra-dense (3 nm strand-to-strand spacing) parallel arrays of high molecular weight ssDNA molecules labeled base- selectively with heavy atoms. This will allow read lengths of at least ~150 kb and potentially as much as 2-4 Mb or more, with no special difficulties posed by highly repetitive DNA. With appropriate optimization, automation, and scaling, and with further funding beyond the scope of this proposal, this technology (TEM sequencing) will potentially enable human genome sequencing at significantly lower cost and with much greater speed and consensus accuracy/completeness than other proposed third- generation sequencing approaches. Our project will involve the optimization of our novel ssDNA array deposition protocol, improvement of imaging conditions and substrate quality, and subsequent design and building of a prototype TEM sequencing system with which we hope to demonstrate the approach's potential by delivering a human reference genome assembly that we believe may possess unprecedented consensus accuracy and completeness due to the inherently extreme read lengths and high coverage enabled by the approach. PUBLIC HEALTH RELEVANCE: The pace and impact of biomedical research on improving human health may be greatly increased by the development of ultra-low-cost, ultra-high-quality genome sequencing. Our electron microscopy-based approach employs preparation and readout unbiased by sequence content with extremely long read lengths (at least 150,000 bases and potentially as much as 2-4 Mb), suggesting that nearly gapless assemblies will be achievable, shedding light on previously unassembled long repetitive regions and structural variations with potentially important roles in complex disease. Furthermore, our models indicate that TEM sequencing may enable sequencing of whole human genomes to &gt;99.9999% consensus accuracy and completeness in &lt;10 minutes/genome, at a cost of &lt;$100, and thus its realization may have a broad, near- term, lasting impact on biomedical research. )</description>
		<pubDate>Wed, 30 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Virtual Machines and Cloud Computing for automated and p</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC2HG05597&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC2HG05597&amp;cr_yr</guid>
		<description>Increasing amounts of sequence data are being generated in multiple biomedical research disciplines, particularly through the application of next-generation sequencing technologies to the genomic analysis of humans and their associated microbiome. However, the bioinformatic infrastructure necessary for sequence processing, requiring demanding software installations and access to powerful CPUs, currently presents a dramatic bottleneck for the further expansion of research in this field. Our proposal aims to address this problem through the generation of a portable, stand-alone Virtual Machine (VM) software package that combines all essential tools to perform basic Metagenomic, Viral and Eukaryotic sequence analysis. This VM will allow researchers to easily implement entire software pipelines locally or on server networks, independently of the operating system that is being used and without further installation steps. Furthermore, due to the mobility of the VM package, researchers will have the opportunity to outsource compute-intensive processing steps to external Cloud Computing networks that exist as free academic and commercial services. As part of the proposed project, we will assemble bioinformatics analysis pipelines for metagenomic, viral and eukaryotic genome sequencing projects with relevance to research on the human microbiome and the human host (Aim 1). Supported applications will include 16S rRNA-based phylogeny and community comparison, and identification, assembly, annotation and functional characterization of viral, bacterial and eukaryotic sequences from metagenomic HMP samples. In addition, pipelines for other increasingly relevant applications in this field will be provided for the removal of human DNA from HMP samples, eukaryotic and prokaryotic DNA and RNA sequence mapping, including SNP and splice site identification and de novo sequence assembly and annotation of eukaryotic microbes from the human microbiome, such as fungi and protists. Integrated analysis pipelines will be packaged into a VM, creating a stand-alone, push-button sequence analysis package (Aim 2), which will be optimized for performance on commercial and academic Clouds (Aim 3). Objective measures of the suitability of Clouds for next-generation sequence analysis will be determined and include runtime performance metrics, such as execution time and relative speedup, requirements on disk storage, memory, and data transfer throughput to and from Clouds. Outreach to the broad user community, including key collaborators utilizing Cloud platforms, will promote interoperability and development of standards for performing sequence analysis on Cloud platforms (Aim 4). Extensive documentation will be provided, including interactive online training seminars (webinars). User satisfaction, including software ease of use, quality of documentation, and value of instructional seminars will be surveyed. An advisory board will be established and used in reviewing all user statistics, performance metrics, technical developments and overall success of the project in conjunction with the Program Officer.</description>
		<pubDate>Wed, 30 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Mapping Transcription Factors Binding Sites in the Mouse</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC2HG05602&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC2HG05602&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): Large-scale efforts are underway to systematically map transcription factors binding sites throughout the human genome. The ENCODE project has focused its initial attention on two cell lines, 1) K562 cells, a myeloid precursor cell line and 2) GM12878, a lymphoblastoid cell line, and our laboratory has mapped the binding sites of a large number of transcription factor expressed in these cells. To study their conservation and help provide functional information into these binding sites and to determine if these sites are occupied in vivo, we propose two types of studies. First, we will map the binding sites of at least 30 transcription factor orthologs that have been analyzed in the human ENCODE project in mouse MEL and CH12 cells which are analogous to K562 and GM12878 cells, respectively. Second, we will map the binding sites of Pol II and nine other factors in cells differentiated from human CD34+ cells and primary erythroid mouse cells. These studies will determine which transcription factor binding sites and gene targets are conserved in vertebrates and which are species-specific as well as determine the extent to which targets mapped in cultured cell lines reflect in vivo binding sites. The information from these studies will be deposited into public databases and is expected to be extremely valuable to the large mouse and human genetic communities. PUBLIC HEALTH RELEVANCE: The ENCODE project has produced relatively large amounts of data on transcription factor binding and RNA expression in a limited number of human cell lines. We propose to extend these results by obtaining mouse cell lines at similar states of differentiation to human cell lines. We will then duplicate the experiments that have been done in human cells, and locate control elements based on sequence conservation and similarities in factor binding between the two species. We will also determine if elements identified in vitro are occupied in cells isolated from organisms.</description>
		<pubDate>Wed, 30 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Bioassay Ontology and Software Tools to Integrate and An</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC2HG05668&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC2HG05668&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): Increasingly large and diverse data sets are being generated by publically funded screening centers using various high- and low-throughput screening technologies. Much of this data is accessible. The largest public repository of small molecule screening results is PubChem, currently covering over 1,500 assays for 370,000 compounds. The number of publically available assays is expected to grow more than 10 fold during the next five years. The utility of this invaluable resource is currently limited, because the knowledge contained in complex and diverse bioassay data sets is not formalized and therefore cannot be accessed for comprehensive computational analysis or integration with other data sources. This proposal is to attack this limitation. For the past ten years ontologies have been developed by biologists to facilitate the analysis and discussion of the massive amounts of information emerging from the various genome projects. An ontology is a controlled vocabulary representation of the objects and concepts and their properties and relationships. The purpose is to model and share domain-specific knowledge so that software agents can automatically extract and associate information. The aim of this proposal is to develop a bioassay ontology, software tools, and to demonstrate their utility. The bioassay ontology will coherently describe diverse biological assays (such as those in PubChem) with a focus on complex cell-based assays and in particular high-content screening data. Software support and development includes modules to build ontology terms and to curate data sets, tools to map the ontology onto screening experiments and other ontologies, and tools to standardize, reformat, and aggregate data sets in the context of the ontology. We will demonstrate the utility of our approach by creating a PubChem-derived database and making it available to the community via a search interface. The ontology and software tools will facilitate the analysis of bioassay screening data in various contexts, for example signaling or metabolic pathways and indirectly human disease. The tools will enable one to extract data sets for modeling specific interactions between perturbing agents and biological targets (or pathways), or to model assay technology-dependent interferences. End user software needs to provide ease of use for biologists and chemical biologists to utilize the ontology in the context of their own and external data sets. It will be modular and open source. We will develop various collaborations to disseminate the bioassay ontology and software in the community and to facilitate their ongoing development. PUBLIC HEALTH RELEVANCE: This project will develop a bioassay ontology to coherently describe the hundreds of different assays used to study how perturbing agents, such as drugs, alter cell function. Along with new software to search existing assay databases, this will enable scientists to more effectively identify and prioritize chemicals for further development into chemical probes or starting points for therapeutics.</description>
		<pubDate>Wed, 30 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Carbon Nanotubes: A New Synthetic Nanopore for Sequencin</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC2HG05625&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC2HG05625&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): Nanopore sequencing offers the possibility of rapid single molecule sequencing with long reads, almost no sample preparation, and direct electronic readout from a small, computer-chip-like device. Nanopores are orifices that are so small that electrophoretic translocation of DNA through them necessarily occurs one base at a time. All nanopores require some means of localizing DNA with atomic precision as well as controlling its speed of translocation through the pore. Here, we introduce an entirely new type of nanopore for the DNA translocation, the single walled carbon nanotube (SWCNT). SWCNTs are relatively homogeneous on an atomic scale, easy to manufacture with no special nanofabrication, and can form excellent electrodes, simplifying tunneling readout and opening the possibility of electrochemical readout. Their high aspect ratio (channel length/channel diameter) might permit trapping of the DNA in the tube, opening a new avenue for control of translocation speed. Molecular dynamics simulations have predicted that single-stranded DNA can be driven through a 2 nm diameter SWCNT by an electric field. We have confirmed this prediction experimentally, building devices in which a single SWCNT connects two fluid reservoirs and using PCR to verify DNA translocation. Arizona State University will direct the project and focus on device fabrication and DNA translocation. Oak Ridge National Laboratory will focus on multiscale modeling of ion and DNA transport through SWCNTs. Columbia University will focus on the construction of nm-scale gaps in SWCNTs. At the end of the two-year project we will: (a) Have developed robust device fabrication procedures and made devices available to the research community; (b) We will have obtained an understanding of the factors controlling the transport of ions and DNA through the tubes; (c) We will have identified factors that affect the speed of DNA translocation and the length of polymers that can be passed in one read; and (d) We will have built prototype devices in which a tunnel gap is integrated into a single SWCNT nanopore device. These developments should accelerate development of nanopore technology towards the production of a cheap, fast, and reliable DNA sequencing chip. PUBLIC HEALTH RELEVANCE: If successful, the new technology could enable ultra-low cost, single molecule sequencing with long reads, making whole-genome studies available to the general population. Making a search of whole genomes for rare variants economically feasible has many implications for medicine.</description>
		<pubDate>Wed, 30 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Deep Sequencing Analysis of mRNA Isoform Expression Chan</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC2HG05624&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC2HG05624&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): Myotonic dystrophy (DM) is the most common form of adult onset muscular dystrophy, with an incidence of about 1 in 8,000 adults. The most common form of the disease, DM1, is caused by an expanded CTG repeat in the 3' UTR of the DMPK gene, and CUG repeat RNAs from this gene fold into hairpins that accumulate in nuclear foci, resulting in effective depletion of the alternative splicing factor Muscleblind (MBNL1) and hyperactivation of the splicing factor CUG Binding Protein 1 (CUGBP1). Misregulation of splicing by these factors is central in the disease. Thus, characterization of the spectrum of changes in the transcriptomes of DM patients is central to understanding disease pathogenesis. This project seeks to understand the molecular basis of DM and to identify genes and mRNA isoforms suitable for therapeutic intervention using an approach based on next-generation sequencing of mRNAs. The project has the following specific aims: 1) To generate a comprehensive catalog of genes, exons and mRNA isoforms whose expression is altered in DM, and to assess the variability of these changes between individuals. 2) To characterize gene and mRNA isoform expression changes in mouse models of DM. 3) To associate gene and isoform changes with clinical and pathological features in DM. Achieving these aims will lay the foundation for a deeper understanding of DM and will generate leads for future molecular genetics and screening studies and is likely to identify candidate therapeutic targets. PUBLIC HEALTH RELEVANCE: This research project will comprehensively determine the changes in RNA and protein molecules that occur in the muscles of patients affected by myotonic dystrophy, which is the most common adult onset form of muscular dystrophy, affecting 1 in 8,000 adults. Knowledge of these molecular changes will help to identify which molecules and genes underlie specific symptoms of the disease and will aid in identifying targets for therapy.</description>
		<pubDate>Wed, 30 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Pharmacogenomics of Breast Cancer Adjuvant Chemotherapy</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=U01HG05137&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=U01HG05137&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): This proposal represents a response to NIH RFA-HG-08-004 Genome-Wide Association (GWA) Studies of Treatment Response in Randomized Clinical Trials. We propose that we perform a GWA study using DNA samples from the Success A breast cancer trial of adjuvant chemotherapy. Breast cancer, the most common cancer, and the second-leading cause of cancer death in women, is of major public health importance. Most women with breast cancer are treated with, and benefit significantly from combination chemotherapy. However, patients with breast cancer display large individual variation in efficacy and the occurrence of drug-related adverse events in response to chemotherapy. The SUCCESS A trial enrolled 3754 breast cancer patients and had two arms. One arm involved treatment with standard chemotherapy consisting of 5-flurouracil-epirubicin-cyclophosphamide (FEC) plus docetaxel, and the other involved FECdocetaxel plus gemcitabine. Therefore, this clinical trial provides an ideal opportunity to perform a GWAS to identify genomic markers for variation in efficacy and toxicity of breast cancer chemotherapy, with and without the inclusion of gemcitabine. This proposal is based on an ongoing collaboration between the Success A multi-institutional breast cancer clinical trial group in Germany and the Mayo Clinic NIH Pharmacogenetics Research Network (PGRN) Center-a center with a commitment to pharmacogenomic studies of breast cancer. Therefore, the present proposal unites the strengths of a major breast cancer clinical trials group with those of a group with extensive experience in both clinical and laboratory-based breast cancer pharmacogenomic studies. Polymorphisms identified in the course of the proposed GWA Study will be replicated utilizing DNA samples from other breast cancer clinical trials and will also be pursued functionally and mechanistically with a pharmacogenomic model system that has been applied by the Mayo PGRN to obtain preliminary genome-wide data for drugs used clinically in the Success A trial. The goal of the proposed GWAS is to enhance the efficacy and to decrease the toxicity of breast cancer chemotherapy in order to move toward truly individualized therapy of the most common cancer of women. Public Health Relevance: Breast cancer is the most common cancer of women. Most of these women are treated with adjuvant chemotherapy, and display large variations in both efficacy and toxicity. We propose a genome-wide association study of the 3754 women who participated in the Success A breast cancer clinical trial to identify genomic markers that will make it possible to better individualize breast cancer chemotherapy.</description>
		<pubDate>Wed, 30 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Next Generation Mendelian Genetics</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC2HG05608&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC2HG05608&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): This application addresses NHGRI RFA-OD-09-004 for Medical Sequencing Discovery Projects. The ultimate goal of this proposal is to scale a new approach to identify the candidate genes and mutations that underlie rare Mendelian diseases in humans by exome resequencing. For decades, linkage analysis has been the mainstay of human genetics. However, for rare Mendelian diseases where family collection is difficult or pedigrees are small, this approach is less useful. Although the molecular bases of more than 2,600 Mendelian diseases have been determined by linkage mapping or a candidate gene approach, a nearly equal number remain to be solved (OMIM). We have assembled a collection of rare pediatric and adult Mendelian diseases that are representative of this unsolved set. In every instance, the identification of the causal gene remains intractable to either linkage mapping or exhaustive candidate gene analysis. Exome resequencing offers a new way forward for dissecting the underlying causes of rare Mendelian diseases. In our preliminary studies, we show that selective capture of protein coding sequences across the human genome coupled with massively parallel resequencing to define coding variation can accurately identify the gene underlying a monogenic disorder. In this example, comparative analysis of exome variation data from as few as two unrelated individuals affected with the disease reduced the list of candidate genes to less than ten. The candidate list was further reduced to a single gene with exome data from as few as four unrelated cases. Once identified, each candidate gene will be screened for disease-causing variants by conventional methods in a larger set of cases. Discovery of the genetic basis of a large collection of rare disorders that have, to date, been unyielding to traditional analysis will substantially expand our understanding of the biology of the human genome, facilitate accurate diagnosis and improved management of these diseases, and provide the information needed for the development of novel therapeutics. If successful, this approach is likely to replace linkage analysis as the dominant paradigm for studying diseases exhibiting Mendelian inheritance patterns and will provide a new path forward for medical genetics. PUBLIC HEALTH RELEVANCE: As we enter an era of personalized medicine, DNA sequencing will be increasingly important to public health, contributing to our understanding of the genetic basis of human disease. The targeted capture and massively parallel sequencing of all protein coding regions in the human genome (the exome) has the potential to markedly accelerate human genetics research as an efficient method for identifying highly penetrant variants at a genome-wide scale. This project will apply and evaluate exome resequencing as a new tool to rapidly identify the causes of dozens of rare genetic diseases in humans.</description>
		<pubDate>Wed, 30 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Statistical Methods for Next-Gen Sequencing in Disease A</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC2HG05605&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC2HG05605&amp;cr_yr</guid>
		<description>Statistical Methods for Next-Generation Sequencing in Disease Association Studies Through this project we propose to develop statistical approaches and software for genotype calling andassociation testing in next-generation sequence data. The field is driven by molecular advances that allow foraffordable, massively parallel sequencing. The rapid development of statistical methods for next-generationsequence data in disease studies is necessary to keep pace with the advancing molecular technology. Next-generation sequencing is based on random, short-read technology; thus the coverage of any nucleotide ishighly variable and subject to error. Distinguishing random error from truly variable sites is required for SNP-calling. One step beyond this is identifying the individual's actual genotype at the site. This is a highlystatistical problem and we have yet to see this problem addressed in a statistically rigorous manner. The solution that we propose, and what makes our approach novel, assumes that we have a sample ofindividuals, each with next-generation sequence data. We anticipate that sequencing may ultimately replaceGWAS SNP arrays for disease-association studies. While this may be several years away for whole-genomesequencing, sequencing enough people individually for a small association study is already becoming practicalwith target capture arrays. We can leverage the information from a sample of individuals with next-generationsequence data to more accurately estimate an individual's genotype and the position-specific error rate. Ourapproach is to express the genotype probabilities and error rate in a likelihood framework. We can then usestandard statistical theory to help us call genotypes. This approach should perform better than callinggenotypes for a single individual at a time based on an arbitrary filter as is currently done. A distinct advantage of this statistical framework is that the uncertainty in the genotype calls can beincorporated directly into our disease-association tests (e.g., case-control and rare variant analysis). In thisway we will increase power of our association tests and reduce bias due to error or systematic missingness.Incorporation of next-generation sequence data into the association tests provides a complete analysis pipelinefrom sequence to association.</description>
		<pubDate>Wed, 30 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>A Universal Front End to Improve Assembly Outcomes for N</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC2HG05596&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC2HG05596&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): DNA sequencing is currently in the midst of disruptive technological shifts, with 454, Illumina, and Solid providing us with enormous throughput increases and large reductions in cost per base. Massively parallel technologies deliver a few Gbp of sequence per week as short fragments, or reads. New applications of sequencing only recently considered impractical are enabled: personal genome sequencing, metagenomics analysis of 'soups' containing several, to hundreds of unique organisms, and finally, de novo sequencing of novel genomes of complex organisms. No matter how the sequencing is done, reads must be assembled computationally, if they are to be useful. Given the read length and read quality limitations of new instruments and the massive volume of data generated, the computational assembly problem is becoming critical, with the cost of computational infrastructure and personnel exceeding reagent and instrument-related costs. Moreover, the results of assembly are currently far from ideal; for example, much of the human genome remains invisible due to high percentage of repeats. We propose to develop a new front end to next-gen sequencers for DNA preparation, the Read-Cloud Method, which can reduce computational cost of genome assembly by 2-3 orders of magnitude, produce more complete and accurate genomes, and make metagenomics tractable. We propose a hierarchical sequencing approach, without any need for bacterial cloning. We will achieve this by handling single DNA molecules, tiled across the genome with high redundancy, on microfluidic devices. We will design, prototype, and thoroughly test technology to (i) shear genomic DNA into 200- kbp fragments with narrow size distributions; (ii) randomly amplify each individual, 200-kbp DNA in isolation, within a porous gel microcontainer that will be formed around the dsDNA molecule within a microdevice; (iii) digest micro-encapsulated DNA into small fragments, of tunable size; (iv) bar-code the progeny of each 200-kbp DNA with a 12mer oligonucleotide, to identify each read as associated with a particular 200-kbp DNA. A planar microfluidic device will be fabricated to allow one unique bar- code sequence to be blunt-end-ligated to both DNA termini. Bar-coded DNA is pooled, and next-gen sequencing is done. The results are a highly reducible data set. The method and algorithm are applicable universally, to next-generation platforms. The PIs (Batzoglou, Barron, Shaqfeh, Quake) will collaborate to make an efficient approach to hierarchical sequencing in microfluidic devices. PUBLIC HEALTH RELEVANCE: Project Narrative Gene sequencing is important to medicine. Our DNA sequencing method has the potential for reducing computational cost by orders of magnitude while making the assembled genomes significantly more complete and accurate. The key to this step is using microfluidic handling technologies to subdivide genomic DNA into 200kbp fragments, which are then amplified in isolation from each other and uniquely-labeled to form a highly reducible dataset for genomic assembly.</description>
		<pubDate>Wed, 30 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Genetics of Lipid Levels: Draft Sequencing of 1000 Genom</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC2HG05581&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC2HG05581&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): This application responds to NHGRI's call for Medical Sequencing Discovery Projects that will use next-generation sequencing technology to tackle high-impact challenges in medical genetics. We propose to build on our successes in the study of blood lipid levels and other complex traits in a Sardinian population cohort by generating draft whole genome sequences for 1,000 individuals using whole genome shotgun approaches, as pioneered by the 1,000 Genomes Project. The proposed experimental plan poses many logistical, computational and statistical challenges, which we are uniquely poised to address - as evidenced by our track record in the organization and phenotyping of the population cohort from a highly inter-related town-dwelling subgroup of the founder Sardinian population, and in the development and deployment of tools for the analysis of cutting edge human genetics data. The proposed plan will allow us to evaluate the contribution of common (frequency &gt;5.0%) and rare (frequency 0.5 - 5.0%) single nucleotide polymorphisms, short insertions and deletions, large copy number polymorphisms and other structural variants to blood levels of low density lipoprotein cholesterol (LDL-c), high density lipoprotein cholesterol (HDL-c) and triglycerides (TG), all of which are key risk factors for cardiovascular disease. The isolated Sardinian population is ideal for this type of study for several reasons, and in particular because: (i) the bottleneck that occurred after colonization of the island attenuated natural selection against deleterious alleles, increasing the odds that deleterious alleles will reach modest frequencies (0.5 - 5.0%) and will be detected in the present study; (ii) our ascertainment of many relatives of the individuals to be sequenced enables us and our collaborators to investigate the effects of any rare alleles we identify by genotyping the relatives of sequenced individuals; (iii) sharing of long haplotype stretches surrounding rare variants will facilitate imputation based analyses of shotgun sequence data, which improve the accuracy of individual genotype calls and thus increase power. The proposed research will help advance NIH's mission by furthering our understanding of the genetic factors contributing to blood lipid levels and coronary heart disease. In addition, these studies will result in experimental strategies and analysis tools that will be readily deployable by many laboratories to study the genomes of hundreds to thousands of individuals and further our understanding of the genetics and biology of many different traits and conditions. PUBLIC HEALTH RELEVANCE: In the past several years, genome-wide association studies have furthered our understanding of the molecular basis blood lipid levels, which are key risk factors for the development of cardiovascular disease. The success of these studies resulted, in large part, from their ability to explore the genome in a comprehensive manner, systematically assessing the impact of common variation on the trait of interest. Here we propose to deploy high-throughput sequencing technologies to extend these systematic whole genome assessments to include rarer variants as well. To maximize our chances of success, we focus our study on an isolated founder population in Sardinia, which is ideal for the study of rare genetic variants. Our results should expand the understanding of the genetics of blood lipid levels and also result in strategies that can be widely deployed to study many human traits. In this way, the proposed research plan addresses several objectives of the Grand Opportunity call; it describes groundbreaking, innovative, high impact research that has the potential to accelerate human genetics research for a wide range of complex phenotypes.</description>
		<pubDate>Wed, 30 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Development of a Software Pipeline for Sequence Data</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC2HG05546&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC2HG05546&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): Next-gen sequencing technologies are generating an incredible amount of data in a very short time span. While the raw sequence data is submitted to NCBI, at present there is no standard pipeline at NIH that can process this vast amount of data in a uniform, robust, fast and accurate manner to produce the variant calls needed for further biological research. For large collaborative projects, such as 1000 genomes or TCGA, it is critical to the quality of the results that all data for the project be processed consistently, through a single, validated analysis pipeline. The pipelines must be able to validate the data. recalibrate error rates, merge data for each sample across multiple sources and technologies, align to reference, and call SNP's and structural variants. Further, if increases in data production continue along current trajectories, these pipelines will need to process terabases of data per day. At present, every large project is coordinating its own pipeline infrastructure and analysis processes, or alternatively, reconciling results generated through inconsistent processes. Furthermore, next-generation technologies make it possible for small labs to generate huge datasets with only one or two instruments. But those labs are likely not equipped with the IT and informatics infrastructure needed to make full use of these data. They will therefore need to process their data at some external location to make the potential of these instruments a reality. We propose to build and deploy a massively-parallel, high-throughput analysis pipeline infrastructure to be managed by NCBI, and hosted at Amazon Web Services (the Amazon cloud). We will further develop several pre-configured analysis pipeline workflows to run common types of sequence analysis on that infrastructure. Users will be able to modify and extend the pre-configured pipeline workflows, or design and deploy new pipelines as new types of sequencing analyses develop, using tools we provide. Those new pipelines will be able to incorporate analysis algorithms implemented in a variety of programming languages, and will be able to use available compute resources to run as much as possible in parallel, thus reducing the time to delivery of results. Finally, we will provide a catalog of algorithm implementations, already configured to run within the pipeline infrastructure, from which new pipeline workflows can be constructed. These components will include quality recalibration steps, snp detectors, and indel detection algorithms. PUBLIC HEALTH RELEVANCE: Next-generation sequencing technologies are generating an incredible amount of data in a very short time span, and analysis pipelines are needed to process this raw data to produce usable biological information. We propose to build and deploy a massively-parallel, high-throughput analysis pipeline infrastructure to be managed by NCBI. We will further develop several pre-configured analysis pipeline workflows to run common types of sequence analysis on that infrastructure.</description>
		<pubDate>Wed, 30 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Development of a semiconductor-based platform for genomi</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R01HG05094&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R01HG05094&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): We propose to develop a novel disposable semiconductor sensor and system able to directly and rapidly read gigabases of de novo sequence. The system is comprehensive, and includes simplified and robust sample preparation technology, and produces data fully compatible with current standards. A new type of semiconductor sensor - an Ion Torrent Chip -has been designed and developed to directly detect polymerization of DNA without the need for ANY intermediate enzymatic reactions, chemiluminescence, fluorescence, optics, optical imaging, or other constraints of having to detect light or use unnatural reagents. The system consisting of disposable Ion Torrent Chips, an integrated chip reader and fluidics, and can be produced at extremely low costs and generate high quality assembled human genome sequence at less then $1,000. At the heart of the system is a semiconductor sensor, with 10's of millions of separate detectors, each capable of sequencing long stretches of DNA. Along with high-speed signal processing, and base-calling algorithms, the system will be able to establish a new gold standard for low cost, diploid assembled genome sequences. Because the heart of the system is a novel sensor built and assembled using standard semiconductor fabrication methodologies, able to sequence without the need for intermediate enzymes, or the constraints of having to image using light, the cost of genome sequencing will continue to fall with each successive generation of denser chips according to Moore's law. PUBLIC HEALTH RELEVANCE: This proposal aims to develop and allow the deployment of genome sequencing technology with a simplicity and cost point to truly democratize and make routine whole genome sequencing. This technology is expected to have a substantial and direct impact on lowering the cost of healthcare, fully enable consumer genomics and usher in the age of personalized medicine. It will also have a profound impact on our understanding of basic biological processes, with direct impact on the generation of biofuels, and other green technologies.</description>
		<pubDate>Wed, 30 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Miniaturized DNA Sequencer for Identification of Microbi</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R43HG05186&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R43HG05186&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): Miniaturized DNA Sequencer for Identification of Microbial Pathogens Abstract. The goal of this project is develop a compact and inexpensive DNA sequencer based on the implementation of sequencing-by-synthesis chemistry on a droplet-based digital microfluidic cartridge. The proposed device would be capable of sequencing 10's to 100's of base pairs using an inexpensive disposable cartridge and a compact and simple piece of equipment. The cartridge will also integrate sample preparation capability, including DNA amplification, to enable widespread use by non-specialists and provide rapid and reliable results at low cost. The initial application for this technology will be microbial pathogen identification. The rationale for this is that even a few tens of base pairs can provide discrimination of microbial pathogens while there is a genuine clinical and medical need for an instrument that can perform this type of analysis rapidly, automatically, and at low cost. As a proof of concept, a prototype system will be developed in Phase I and an evaluation will be performed to assess the ability of the prototype to identify a set of clinically relevant yeasts and moulds. PUBLIC HEALTH RELEVANCE: A compact and inexpensive instrument to sequence small amounts of DNA for the purpose of identifying microbial pathogens will be developed and tested. The proposed instrument would improve the ability of clinicians to diagnose and treat infectious diseases.</description>
		<pubDate>Wed, 30 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>New Resources for e-Patients</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R43HG05046&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R43HG05046&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): New Resources for e-Patients addresses the unmet medical needs of consumers who search for health and healthcare information online, currently a population of more than 160 million people in the U.S. It will fill gaps and address deficiencies in currently available online health information resources. It will maximize the value of public domain health information from U.S. Government sources. Textual consumer health information will be collected from NIH, FDA and other government sources. This information will be subjected to automated topic analysis and classification using methods of natural language processing and statistical text-mining to discover and extract topics on i) diseases and conditions; ii) treatments, benefits and risks; and iii) genomic risks and responses. These topics will be integrated and mapped to the most frequent health topics of interest to consumers. Personally-controlled electronic health records and personal genotypes will be studied for their potential contributions to personalized medicine for e-patients. Phase I of this project will achieve proof-of-principle and develop an advanced prototype as a foundation for construction of a new web-based resource in Phase II. PUBLIC HEALTH RELEVANCE: This project addresses the unmet medical needs of consumers who search for health and healthcare information online, currently a population of more than 160 million people in the U.S. It will fill gaps and address deficiencies in current online health information resources and also target new opportunities in genomic and personalized medicine. In the process we will create consumer-friendly, automated systems that make online information search and retrieval more efficient more efficient and maximize the value of public domain health information from U.S. Government sources. The work will lead to more reliable, personalized and actionable information for a new generation of web-savvy and socially-networked e-patients and will lead to more efficient and productive encounters between patients and healthcare systems.</description>
		<pubDate>Wed, 30 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>African Diversity and the Genetics of Human Health</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=F32HG05292&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=F32HG05292&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): Understanding the genetic and environmental factors influencing phenotypes is central to improving human health. However, most studies of genetic architecture involve populations living in the United States, Europe and Asia. Though large, these population samples capture only a tiny subset of the standing genetic and phenotypic variation in humans. Africa on the other hand contains tremendous phenotypic, cultural, linguistic, genetic and environmental diversity and is the source of the worldwide range expansion of all modern humans in the past 100,000 years. In fact, African populations show levels of genetic diversity and substructure equivalent to that seen at the global level. Some of these observed differences are thought to reflect local adaptation to the distinct diets, climates and exposure to pathogens experienced by each group. The broad objective of this study is to incorporate genetic and phenotypic information from Africa into the emerging picture of the genetics of health related traits. We propose to analyze a unique dataset assembled by the Tishkoff lab that includes both DNA and phenotypic measurements for as many as 3324 individuals (depending on the trait) across 61 highly diverse African ethnic groups, a subset of which will be genotyped on the lllumnia 1M SNP chip. We will first characterize the standing phenotypic variation in the data and identify groups of populations showing highly contrasting phenotypes that will be especially informative in further genetic analyses. Traits to be analyzed include: adult height, weight, body mass index (BMI), blood glucose level, resting blood pressure, resting heart rate, taste perception (PTC, Salicin and SOA) and lactase persistence. Second, using dense SNP marker data for a highly diverse subset of our sampled individuals, we will identify areas of the genome showing signatures of natural selection. Such genomic regions may harbor loci involved in adaptations to diet including altered carbohydrate, protein and lipid metabolism, to physical environmental factors including growth rate and body composition, and to infectious disease susceptibility including modified immune system function. Finally, using the same densely genotyped individuals we will directly explore genotype-phenotype relationships for our traits of interest focusing on known candidate genes/SNPs and regions showing evidence of natural selection.</description>
		<pubDate>Wed, 30 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Next-Generation Medical Resequencing of Gout Disease Gen</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC2HG05697&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC2HG05697&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): This proposal responds to the GO Program ARRA Medical Sequencing Discovery Projects to establish next-generation technologies for medical resequencing in smaller academic laboratories compared to larger facilities like the Genomic Sequencing Centers. In this application, we propose to use next-generation sequencing for medical resequencing of genes that have shown highly significant associations with gout and serum uric acid levels in genome-wide association studies (GWAS) in the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE). Resequencing will characterize the overall genetic architecture by identification of both common functional variants that underlie GWAS statistical associations, as well as rare variants with larger phenotypic effects. Our laboratories have established key elements of next-generation sequencing including a robust and cost-effective process for subgenomic capture to enrich gene targets for resequencing, as well as implementation of the SOLID System v.3 (Applied Biosystems) and associated pipelines for data management and quality control. We have selected 11 genes from CHARGE GWAS results for resequencing of functional regions (promoters, exons, conserved regions) in gout cases and controls (total n=1,199) from the Atherosclerosis Risk in Communities (ARIC) cohort. After resequencing, we will genotype variants in the entire ARIC cohort (n=16,000) to verify resequencing results, and to increase power for statistical analysis. Statistical analyses will include standard association studies for relatively common alleles, as well as analyses of rare variants by tests for differences in numbers of rare variant carriers in cases versus controls, and comparisons of mean uric acid concentrations in carriers versus the overall cohort. For both common and rare variants that show significant associations, we will use bioinformatics to identify possible functional consequences like non-conservative amino acid replacements and premature stop codons, disruption of normal mRNA splicing, or alterations in control elements that regulate gene expression. We propose to replicate our findings by genotyping and statistical analysis in two additional CHARGE cohorts including the Framingham Health Study (FHS) and the Cardiovascular Health Study (CHS). PUBLIC HEALTH RELEVANCE: We propose to use next-generation DNA sequencing technologies to identify genetic variants that influence gout, one of the most common forms of arthritis affecting nearly 3 million adults in the US. Our subjects are from the Atherosclerosis Risk in Community (ARIC) study, a large multi-ethnic epidemiological cohort (16,000 subjects) that has been measured for multiple disease-related risk factors and clinical endpoints. The identification of genetic variants will provide an improved understanding of molecular mechanisms that regulate serum levels of uric acid (the major risk factor for gout), and eventually lead to novel drug targets to improve treatment of gout.</description>
		<pubDate>Wed, 30 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Mechanistic Signatures of Drug Responses in Cancer</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC2HG05693&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC2HG05693&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): This application proposes a systematic effort to collect and analyze multi-factorial Pharmaco-Response Signatures (PRSs) for 15 therapeutic small molecules across a bank of 80 cancer cells lines for which genomic data is becoming available. The signatures will be used to elucidate response mechanisms, identify specific determinants of drug sensitivity or resistance at the cellular level, and create new response classifiers. PRSs will be based on high dimensionality measurements of phenotypes in single cells collected using high content microscopy supplemented by biochemical and plate-based assays, all performed at multiple times following exposure to drug at multiple doses (for a total of ca. 5 x105 unique measurements). The data will be analyzed using a variety of mathematical modeling methods that incorporate more or less prior knowledge and will, in all cases, be combined with gene sequence and transcriptional data on pre-treatment state. Regions of the drug/cell line/dose response landscape that are particularly rich will be subjected to in-depth biochemical analysis aimed at creation of detailed mechanistic models of response pathways. By providing a more effective means to prioritize lead compounds, PRSs should help to overcome substantial obstacles to the development of therapeutic small molecules. Looking forward, such signatures should also be useful in monitoring patients during the course of therapy. For example, applying PRSs to measurements made on circulating tumor cells would fundamentally advance the personalization of cancer therapy. The assembly and analysis of sophisticated new signatures of drug response will involve close collaboration between seven investigators with expertise in medicinal chemistry, systems biology, genomics and high content screening and could not be undertaken in the absence of RC2 funding. Pharmaco-response signatures are expected to find a wide audience in industry and academe and to meet a critical knowledge gap; their creation will require new informatics approaches, reduction to practice of diverse measurement technologies and application of innovative mathematical modeling- the essence of a grand opportunity. PUBLIC HEALTH RELEVANCE: Project Narrative The proposed development of pharmaco-response signatures is directly relevant to NIH goals of developing better anti-cancer drugs and identifying those patients most likely to benefit from specific therapies; it also meets the GO (RFA-OD-09-004) requirement that a unique information resource be developed through collaborative attack on a fundamental problem in translational drug development. The work can be initiated immediately and substantially completed within two years.</description>
		<pubDate>Wed, 30 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Genetics of MRSA Infection</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC2HG05680&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC2HG05680&amp;cr_yr</guid>
		<description>Enter the text here that is the new abstrRecent advances in large-scale DNA sequencing have the potential for rapid advances in correlating disease phenotypes with genotypes. Infectious disease research has long focused on the pathogen's role in infection but less so on the impact of the host genotype. The current epidemic of Methicillin Resistant Staphylococcus aureus (MRSA) infection is a Grand Opportunity to apply new genetic methods to understanding this disease. There are multiple outcomes from MRSA infection, ranging from curable localized infection to devastating invasive infection. The project will study 500 subjects with MRSA infections by applying high throughput DNA sequencing to candidate genes involved in immune and other host responses to infection to correlate the outcome of infection with the host genotype. In addition, MRSA isolates from these same patients will be host immune responses and the arsenal of bacterial virulence factors on outcomes of community acquisition of MRSA is largely unknown. Our long-range objective is to seek a biological explanation for the wide spectrum of disease manifestations caused by CA-MRSA with the immediate goal of understanding the presentation of severe invasive infections in otherwise healthy individuals. We hypothesize that the outcome of CA-MRSA acquisition is determined by interactions between the genetic composition of the infecting bacterium and polymorphisms in relevant host genes involved in immune, inflammatory, and other host defense responses.1. Development of a capture system for host defense genes. a. Selection of candidate genes and oligonucleotide probes; b. Development of solution capture strategy: WUCap or other state-of-the-art methods will be employed; c. Validation of the method with PCR-resequencing of a subset of genes; use of VarScan for data analysis.2. Recruitment, phenotyping, and sampling of 500 subjects (infected and controls). a. Recruitment from up to 6 different clinical sites: i. Washington University School of Medicine will provide DNA samples from 200 subjects and 450 bacterial isolates from these subjects; ii. Texas Childrens Hospital (Houston) will provide DNA samples from 150 subjects and, if needed, additional bacterial isolates from these subjects; iii. The University of Chicago, will provide DNA samples from 150 subjects and, if needed, additional bacterial isolates from these subjects. b. Phenotyping of the clinical presentation of each infection; c. Sampling: isolation of human and bacterial DNA from these subjects.3. Sequencing of host defense genes in 500 subjects using the capture methodology in Aim 1 and next generation sequencing such as Illumina or 454.4. Bioinformatic analysis of sequences for candidate mutations in host defense genes using the Genome Center at Washington University variant detection pipeline, currently in use for analysis of whole genome sequences. a. SNP detection; b. Small indel detection; c. Prediction of mutations with functional consequences. 5. Validation of mutations using alternative sequencing methods such as PCR-based Sanger resequencing.6. Sequencing of bacterial genomes using the Illumina and 454 platforms. 7. Analysis of bacterial sequences for variants using the Genome Center at Washington University variant detection pipeline as modified for bacterial genome analysis. 8. Validation of candidate bacterial variants using consensus data from at least two different next generation sequencing platforms or alternative sequencing methods such as PCR-based Sanger resequencing.9. Correlation of host and bacterial variants with clinical phenotypes and assessment of statistical confidence in the correlations.</description>
		<pubDate>Wed, 30 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Enhancing ENCODE Through a Transcription Factor Tagging</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC2HG05679&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC2HG05679&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): We propose to use bacterial artificial chromosome (BAC) recombineering to epitope tag 40 transcription factors per year for chromatin immunoprecipitation (ChIP) followed by sequencing to map binding sites genome wide. A major hurdle for the ENCODE project is the availability of ChIP-grade antibodies for each factor to be analyzed. Epitope tagging of chromatin-associated proteins presents an alternative approach for ChIP, using the same epitope-specific antibody for each factor. Expressing epitope tagged factors from BACs ensures that the factors are expressed at near-physiological levels due to the presence of endogenous regulatory sequences that drive each tagged factor from its native local genomic context. We propose to analyze a diversity of transcription factors using this method, which we have already demonstrated for more than 20 nuclear receptor class proteins, a forkhead domain protein, Jun and Fos, and several other types of factors (Poser et al. 2008; Hua, Kittler and White 2009). The goal of this project is to integrate our approach with the ENCODE project, testing it for a wider diversity of transcription and chromatin-associated factors and scaling the approach to production levels necessary for ENCODE. The proposed project involves a formal collaboration between the White and Snyder labs, as well as integration with other funded ENCODE and human epigenome projects. PUBLIC HEALTH RELEVANCE: Using a BAC recombineering approach, we propose to systematically epitope tag transcription and chromatin associated factors for ChIP-seq to speed current large-scale mapping projects such as the Encyclopedia of DNA Elements (ENCODE) project by eliminating the laborious step of antibody production and testing. The technology presented here has the potential to facilitate the large-scale identification of the binding sites of mammalian transcription factors and other chromatin-binding proteins. This technology will also enable the ChIP analysis of proteins that are recalcitrant to ChIP grade antibody production and thus impractical to map using the conventional factor-specific antibody ChIP approach for mammalian cells.</description>
		<pubDate>Wed, 30 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Quantification of gene expression in targeted rare cells</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC1HG05428&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC1HG05428&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): This application is intended for the (06) Enabling Technologies challenge area and (06- HG-102*) Technologies for obtaining genomic, proteomic, and metabolomic data from individual viable cells in complex tissue challenge topic. The specialized microenvironment surrounding stem cells or cancer cells is thought to be a major regulator of critical functions such as stem cell quiescence, self-renewal, proliferation and differentiation, as well as cancer progression. The amount of speculation implicating microenvironments in these processes currently exceeds the ability to generate supporting experimental data. This is, in part, due to the nature of micro-environments: they must be studied in vivo as in vitro recreations have not been convincing; the roles of individual cell types are often unclear; and there is a lack of techniques to systematically evaluate micro-environmental regulatory pathways in an unbiased manner. One means of evaluating cellular responses is to monitor changes in gene expression. We have established new methods that increase sensitivity and species-selectivity of TaqMan-based probes when combined with a PCR-based pre-amplification protocol. This allows homologous transcripts from two closely-related species to be distinguished, even when the mRNA from one species is up to 100,000,000-fold higher than in the other species. Thus, allows the quantification of gene expression in as few as 1-10 cells of a specified type (e.g., muscle stem cells) as they reside in cellular heterogeneous in vivo microenvironments in a distinct species. Similarly, this Q-PCR- based approach can be used to quantify the total contribution of a stem cell population to a specific tissue, allowing repopulating assays to be efficiently completed on solid organs. Finally, the utility of this approach to prospectively isolate satellite cells from skeletal muscle at the single cell level is explored. While these advances in Q-PCR will have a large impact on the study up to 200 genes per sample, the current techniques are not scalable to genome wide discovery. Several alternative approaches to scale up species-specific gene quantification are explored. While specific studies are proposed here, we believe that this powerful approach is broadly applicable to the study of a wide variety of biological processes such as tissue regeneration and cancer. While our primary rational for creating this technology is to identify the molecular mechanisms of regulating skeletal muscle stem cells, this technique has broad applicability to study the reciprocal interactions between stem cells and their niche as well as malignant cells within the supporting stroma. CHOP contributes substantially to the local economy. In 2008, CHOP's operations created and supported over 16,882 jobs in the region, and CHOP's total economic impact was over $2.01 billion. Moreover, through a combination of private donations, NIH funding, and allocations from its hospital operations, CHOP receives more total research support than any other children's hospital in the United States -- $180 million in fiscal year 2007-2008. The direct funding in this proposal will create or retain 4 full-time staff positions in academic research plus 2 trainee/work study positions for undergraduate students at the University of Pennsylvania. In addition, $390,000 in indirect costs will directly enable an additional 4 CHOP staff members to retain their employment. Furthermore, approximately, over 95% of the material and supplies will be purchased from American, biotechnology companies. Approximately 50% of the supply budget will go to Applied Biosystems in Foster City, CA where it will create an estimated 1.3 jobs. Thus, this award will create or maintain approximately 11.3 U.S. jobs. The specialized microenvironments surround stem cells or cancer cells and are thought to be major regulators of cell behavior. We present here a technique to quantify expression of up to 100 genes in 1-10 cells residing in intact tissue in an animal. This powerful approach is broadly applicable to the study of a wide variety of biological processes. Specifically, we propose that these techniques can distinguish gene expression patterns in a selected cell type in gross tissue samples and allow the recognition of regulatory pathways and metabolic processes that are fundamental to stem cell and cancer cell behaviors.</description>
		<pubDate>Wed, 30 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Integrating Patient Generated Family Health History from</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC1HG05331&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC1HG05331&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by the applicant): Abstract: This application addresses broad Challenge Area (10): Information Technology for Processing Health Care Data, and specific Challenge Topic, 10-HG-101: New information technology and resources for disease prevention and personalized medicine. Background: The long established wisdom of including family health history as a key part of an individual's medical record has been invigorated by the new emphasis on personalized medicine. While in the past, family health history was used to understand an individual's disease risk and to focus disease prevention efforts, in 21st century medicine, family health history's importance will increase as it will be essential to put detailed personal genetic information into a clinical context, namely the context of how the shared code has played out in a person's closest relatives. This new need for family health history will demand a more comprehensive family history dataset for all patients, and the time limitations faced by healthcare providers demand a technology-driven solution whereby the patient performs primary data entry and the provider then refines these data. Solutions do not currently exist by which most Americans can organize their family health history and then place it into their electronic health record (EHR). My Family Health Portrait (MFHP) is an open source, electronic family history collection tool developed by the Surgeon General that offers interoperability with EHRs, yet to our knowledge has not been widely integrated because of limitations in the capacity of many EHRs to accept these data, and barriers to the systematic collection of these data in clinical practice. Additionally, obstacles exist for those individuals who are not computer literate or do not have access to a home computer. In order to capture patient-generated family history data across diverse patient populations, EHR's may need to offer patients a variety of data entry options which allow for differences in preference, convenience, computer literacy, and computer availability. This proposal seeks to develop new resources for family history data entry into the EHR. These resources will be developed, tested and validated in a primary care setting within of a large complex healthcare system. Research Plan: The proposed project will examine the reach, effectiveness, adoption and implementation of three innovative portals to transfer and integrate patient generated family history data with an EHR. Specific Aim 1 (technical development) is to develop the three portals for entry of patient generated family history data integrated with an EHR. The pathways will include: : (1) computer tablets in waiting rooms to complete the MFHP, (2) a secure internet portal to transfer data collected by patients at home using MFHP, and (3) an interactive voice response (IVR) system to collect the necessary data elements by phone. Each of these modalities will interface with the EHR of a large health delivery system using current data standards. Each of these modalities will be designed to interface with the EHR of a large health delivery system using current data standards using current data standards. Specific Aim 2 (content development and validation) is to evaluate facilitators and barriers to the adoption, and implementation of these three electronic portals by assessing differences in patient preferences, privacy concerns, convenience, and understanding. The validity of the family history data collected by each of these three portals will also be assessed by a genetic counselor. Specific Aim 3 (pilot randomized controlled trial) is to conduct a 4-armed pilot randomized controlled trial (RCT) to measure the reach and effectiveness of integrating this family history data with a patient's EHR. The trial will examine and compare changes in family history documentation, patient-doctor discussion of family history, and patient and provider satisfaction with each data entry portal described in Aim 1, as well as a control arm. The trial will be conducted as a pilot cluster RCT in selected practices within the Brigham and Women's Primary Care Practice-Based Research Network. Potential Impact: The impact of obtaining accurate family history data and integrating this with an individual's health record are substantial, and will be of growing importance as our understanding of the genome advances. This project will ultimately contribute to a better understanding of how available technologies can be integrated with EHR's to obtain accurate family history in ways that allow for widespread acquisition and integration of accurate family history data in a variety of settings and diverse patient populations. The technology and lessons learned from this project will be exportable to healthcare settings throughout the United States. In the 21st century, the importance of family health history will increase as it will be essential to put detailed personal genetic information into the context of an individual's health, namely the context of how the shared code has played out in an individual and his/ her closest relatives. These scientific developments in our understanding of genetics will demand a more comprehensive family history dataset for all patients, and the time limitations on healthcare providers demand a technology-driven solution that integrates an individual's knowledge of their family history with the medical records maintained by their health care providers. A solution does not currently exist by which most Americans can organize their family health history and then place it into their electronic health record (EHR). We propose to develop and compare three different ways of proactively collecting family history information from patients using computer technology independent of a health care visit, including telephone (interactive voice response technology), tablet computers in a physician's waiting room, and a secure internet portal at home. These tools will be based on the US Surgeon General's My Family Health Portrait, an electronic family history collection tool. Family history data will be transferred and integrated with a patient's EHR in a large primary care network. This project will seek to demonstrate that family history data can be accurately reported by diverse patients using these technologies, and that these data can be integrated to tailor an individual's health care based on their familial risk.</description>
		<pubDate>Wed, 30 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>An Ontology of Qualities for the Annotation of Biomedica</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R01HG04838&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R01HG04838&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): Our proposal is to develop an ontology of qualities (i.e. distinguishing characteristics such as: long, short, increased, decreased, red, blue, and so on) and use it, in conjunction with ontologies of particular anatomies and biological processes, to describe phenotypic data from zebrafish, fruit fly, mouse, and human rigorously. This work will provide one of the integral components necessary for integrating phenotypic descriptions semantically with other aspects of biomedical knowledge. Currently these descriptions are recorded either as free text or using a terminology that is idiosyncratic to a single organism or research project. We have chosen these particular data sets because of the preliminary work that has already been carried out on these data, because they provide a wide spectrum of descriptive situations that will fully exercise the ontology, and particularly because, with the human data, we will be able to explore how unifying these data sets can enable translational research. We will also work with other database resources, such as BIRN [BIRN], WormBase [WormBase], dictyBase [dictyBase], and PhenoScape [PhenoScape], who are also adopting the approach we advocate (see letters of support). We will contribute PATO and the other ontologies that we develop to the OBO Foundry [OBO], and we will deposit our phenotypic annotations into the OBD [OBD] database of the National Center for Biomedical Ontology [NCBO] to facilitate comparisons between these and other data. The Quality Ontology [PATO] will provide one essential part of the unifying framework needed to surmount the difficult problem of integrating phenotypic data sets. PUBLIC HEALTH RELEVANCE: The underlying principle of this proposal is that the comparative approach can be used to further our understanding of the genetic and molecular bases of human diseases. Experiments that study the phenotypic consequences of mutations in non-human organisms are justified by the fact that they provide valuable models for a better understanding of human diseases.</description>
		<pubDate>Wed, 30 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Automated Integration of Biomedical Knowledge</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R01HG04836&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R01HG04836&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): Today, ontologies are critical instruments for biomedical investigators, especially in those areas, such as cancer research, that require the command of a vast amount of information and a systemic approach to the design and interpretation of experiments. In fact, ontologies are proliferating in all areas of biomedical research, offering both challenges and opportunities. One of the principal challenges to the full realization of their potential stems from the fact that ontologies are developed in isolation, rendering it impossible to move, for instance, from genes to organisms, to diseases, to drugs. The National Center for Biomedical Ontology (NCBO) represents a fundamental endeavor in the collection, coordination and distribution of biomedical ontologies and offers an unparalleled opportunity to combine biomedical ontologies into a single search space where genetic, anatomic, molecular and pharmacological information can be seamlessly explored as a holistic representation of biomedical knowledge. Unfortunately, ontology integration using standard means of manual curation is a labor intensive task, unable to scale up and keep up with the current growth rate of biomedical ontologies. We have developed a systematic framework for automated ontology engineering based on information theory, and we have successfully applied it to the analysis and engineering of Gene Ontology, the development gene and protein databases, and the identification of peripheral biomarkers of disease progression and drug response. This project brings together a unique group of competences, ranging from ontology engineering, statistics, artificial intelligence, bioinformatics, cancer research, and clinical pharmacogenomics, to develop a principled method, grounded on the mathematics of information theory, to automatically combine and integrate biomedical ontologies and implement it as part of the NCBO web services. Our framework and implementation will be evaluated by comparing its results to those obtained by human curation. The translational impact of this approach will be shown by combining disease, tissue, molecular and drug ontologies to reposition compounds for the treatment of colorectal cancer. This project will integrate biomedical knowledge along dimensions that are today isolated. In so doing, it will empower investigators with a new holistic understanding of disease, it will fast track the clinical translation of biological discoveries by revealing their implications , and it will change our approach to biomedical discovery, especially for those complex diseases that, like cancer, require a systemic view of their biological mechanisms. Ontologies are critical instruments for biomedical investigators especially in those areas, such as cancer research, that require a vast amount of information and a systemic approach to the design and interpretation of their experiments. In collaboration with the National Center for Biomedical Ontology (NCBO), this project will develop a principled method, grounded on the mathematics of information theory, to automatically combine biomedical ontologies. As a result, this project will integrate biomedical knowledge along dimensions that are today isolated and, in so doing, it will empower investigators with a new holistic understanding of disease, it will fast track the clinical translation of biological discoveries, and it will change the approach to discovery, especially for those diseases that, like cancer, require a systemic view of their biological mechanisms</description>
		<pubDate>Wed, 30 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Increasing confidence and changing behaviors in primary</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R01HG05117&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R01HG05117&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): Primary care physicians have almost no training in genetics, nor in the ethical, legal and social implications (ELSI) of genetic testing, diagnosis and therapy. Further, mere provision of curricular content fails to impact physician behavior. However, programs with elements that are based on established educational and adult learning principles have been shown to effective in affecting behavioral change. We propose to evaluate whether participation in a web-based curriculum focusing on the ethical legal and social issues in primary care genetics will improve primary care physicians' approach to genetic issues. Specifically, we will assess whether participation will improve physician knowledge, attitudes, and practice-behaviors. We will recruit 120 PCPs across five health systems in California (UC Davis, Rural Health Network, UC Los Angeles, Kaiser Permanente, Sutter Medical Group Sacramento). PCPs will be randomized by practice site to control (paper curriculum) or active intervention with an interactive web-based curriculum previously developed with funding from the NHRGI. This curriculum utilizes visual tools, video clip vignettes, and other interactive content to illustrate key points about risk assessment, genetic screening, and SDM. PCPs encounters with Announced Standardized Patients Learning objectives will be used to assess PCP's ELSI related behavioral change. Evaluation will also include the effectiveness of the curriculum for genetic knowledge (testing of knowledge), skills (self-report, follow-through with learning plan), and attitudes (survey and self-efficacy assessments). If proven effective, these web-based tools can be easily disseminated around the country, and internationally, to improve knowledge and attitudes about genetic screening among physicians and patients. PUBLIC HEALTH RELEVANCE: Most primary care physicians have almost no training in genetics, nor in the ethical, legal and social implications (ELSI) of genetic testing, diagnosis and therapy. This project evaluates a novel web-based curriculum focusing on the ethical legal and social issues in primary care genetics in order to improve primary care physicians' understanding of the implications of genetic disorders in patients' lives. If proven effective, these web-based tools can be easily disseminated to improve physician's knowledge of the social context of genetics and improve their ability to engage in shared decision-making with patients about genetic screening.</description>
		<pubDate>Wed, 30 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Promoting genetic literacy in students and teachers: The</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC1HG05445&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC1HG05445&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): This application addresses broad Challenge Area (12): Science, Technology, Engineering, and Mathematics Education (STEM) and specific Challenge Topic, 12-OD-104: Innovative approaches to STEM education. Title of project: Promoting genetic literacy in students and teachers: The effectiveness of non-classroom instructional strategies and settings Rapid advancements in genetic technology, the popularity and coverage of genetics by the press, and the increased understanding of the role genetics plays in our health necessitates a basic understanding of the science for everyone. In spite of this increased exposure to genetics, a study by Bowling (2008) indicated that the public's genetics literacy remains relatively low. Studies looking specifically at the genetics knowledge of students in grades K-12 also show low levels of understanding. The 2000 National Assessment of Educational Progress tested ~ 49,000 students and on average only ~30% of 12th graders could answer basic genetic questions correctly. This project proposes that the use of authentic patient scenarios and modern equipment in a non-traditional classroom setting makes science instruction relevant and more interesting to students. The intent is to improve genetic literacy and to ignite students' interest for continued education in the life sciences and ultimately, the workforce. The Greenwood Genetic Center (GGC) proposes multiple objectives to promoting genetic literacy: 1. Open a Genetics Learning Center on the main campus with a full offering of validated instructional modules in the areas of cytogenetics, biochemical genetics, molecular genetics, dysmorphology, and bioethics. The Learning Center will host high school biology classes for half or full day learning experiences. Existing modules are case studies of human disorders requiring students to analyze the scenario, define a list of differential diagnoses, perform laboratory testing, analyze test results, determine a diagnosis, identify any available treatment, and discuss ethical concerns. 2. Offer a course in human genetics for high school students. This course will introduce students to the basic principles of genetics, to traditional and non-traditional patterns of inheritance, to contemporary topics and ethical issues, and to technology in the laboratory. The course will include didactic instruction, case studies, and laboratory exercises. 3. Continue the Center's program of summer courses in human genetics for high school science teachers but in a new laboratory/classroom environment that allows for more activities. Evaluations from previous summer courses indicate a desire and need for more laboratory exercises. 4. Provide these same instructional modules to more distant areas of South Carolina through a Mobile Genetics Learning Center. The instructional modules are practical, different yet current, and allow students to experience the application of science content to health issues. . Our hypothesis is that the use of authentic patient scenarios in genetics and laboratory testing in nonclassroom environments will lead to increased student understanding of genetics, interest in healthcare careers and post-secondary studies. The availability of both an on-campus learning center and a mobile learning center will serve to strengthen the genetic literacy of students and teachers throughout the state of South Carolina. Public Health Relevance: The field of genetics is emerging as a key player in healthcare, public policy making, and social issues; there is an increasing need for public understanding of genetics. Genetic knowledge should no longer be restricted to science halls and laboratories. Through the use of both an on-site learning center and a mobile genetics learning center, this project will make genetics education a state-wide effort. If people hope to make informed healthcare decisions and to understand risks, if students hope to hold high-tech jobs and if teachers hope to prepare students for today's job market, then it is necessary to educate everyone about genetics and its integration into our social fabric.</description>
		<pubDate>Tue, 29 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Connectivity Map 100k</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC2HG05604&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC2HG05604&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): PROJECT SUMMARY: The Connectivity Map 100k project aims to forge a path toward a comprehensive 'functional look-up table' that that links disease biology, genome function and small-molecule action. Such a Connectivity Map would enable researchers worldwide to generate testable hypotheses that might otherwise remain undiscovered. By using genomic signatures as a common language with which to describe different cellular states, a broad range of research applications would be enabled. We propose here an ambitious plan to generate 100,000 Connectivity Map profiles of genetic and pharmacologic perturbation that will serve two important purposes. First, it will generate an expanded Connectivity Map database that will further support biological discovery. Second, it will establish the parameters for a future, larger scale Connectivity Map resource that might include, for example, profiles of perturbation of all human and mouse genes plus hundreds of thousands of small-molecules in a large number of cell types. In Aim 1, we will extend our pilot pharmacologic Connectivity Map data to genetic perturbations. Specifically, we will use lentiviral shRNAs to knock down the expression of 1,000 human genes in 10 diverse cell types. These experiments will establish the feasibility of developing large-scale signatures of genetic perturbation, and will establish the extent of cellular context-dependence of the observed connections. In Aim 2, we will generate profiles of 1,500 small-molecules (a blend of FDA-approved drugs and tool compounds being studied in NIH-sponsored chemical biology programs), again in 10 diverse cell types. By the end of this 2-year project, we expect a) to have generated 100,000 perturbational profiles made publicly accessible via a user-friendly web interface, and b) to have laid the foundation for the creation of a future, community-based, large-scale Connectivity Map public resource. RELEVANCE The proposed project is expected to have significant impact on a broad range of the biomedical research community. It has the potential to yield new approaches to genome functional annotation, to provide a path toward the elucidation of mechanism-of-action of small-molecule compounds, and to facilitate the discovery of drugs with unanticipated therapeutic effects on disease biology. PUBLIC HEALTH RELEVANCE: NARRATIVE: The Connectivity Map 100k project aims to forge a path toward a comprehensive 'functional look-up table' that links disease biology, genome function and small-molecule action. Through the generation of a database of 100,000 gene expression profiles of genetic and pharmacologic perturbation, the Connectivity Map will enable the biomedical community with a set of tools that facilitate systematic discoveries that link disease biology, genome biology and chemical biology. The proposed project will lay the groundwork for a future, large-scale, community-wide effort to create a public Connectivity Map resource that spans diverse perturbational profiles across a large number of cell types.</description>
		<pubDate>Tue, 29 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Computational &amp; Functional Annotation of the Zebrafish G</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R01HG05058&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R01HG05058&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): Zebrafish with its growing arsenal of tools that allow the generation of transgenics, gene knockdowns and knockouts, and mutant resources coupled with its high-throughput and cost efficiency is quickly becoming the major animal model for drug screens and gene related studies. However, as with other vertebrate genomes, the majority of the zebrafish genome (97%) is made up of non-genic sequences whose functional necessity remains largely unknown. One vital function that is clearly embedded in these regions is gene regulation, instructing genes when and where to turn on or off. However, unlike genes where we know their genomic location, their code, and the consequences of nucleotide changes within them, in gene regulatory sequences we don't have that knowledge. This knowledge is extremely vital, with a wide variety of clinical and molecular data supporting these sequences to be an important driver for development, evolution, diversity, and disease. In this proposal, we will combine advanced computational tools with high-throughput zebrafish functional studies to annotate this noncoding terrain. Using and refining multiple vertebrate genome alignments we have generated an unprecedented set of 166,693 zebrafish conserved noncoding elements (CNEs), with at least 8,805 regions having a direct ortholog in the human genome. Preliminary studies for a portion of these sequences using a zebrafish transgenic enhancer assay, find 41% of these sequences to function as enhancers at 24 to 48 hours post fertilization. Taking advantage of this transgenic assay we aim to screen 200 sequences a year for enhancer activity. These sequences will be selected from our large CNE set, sequences whose enhancer activity and tissue-timepoint specificity will be predicted using sophisticated computational tools, and community requested sequences. This characterization will not only allow the functional annotation of these sequences, but will also generate a novel and extremely important toolkit of gene regulatory elements that can drive expression of any gene of interest at precise locations and precise developmental time points. In addition, we will also use the annotated regulatory landscape to discover novel genes with potential important developmental function. This will be carried out by analyzing the expression patterns and functional consequences due to knockdown of less characterized genes that lie in rich regulatory regions, a common sign for the existence of important developmental gene regulators. Additional computational techniques will be used to discover genes under tight regulation in novel tissue contexts, as well as pathways which are currently not studied in the context we find them enriched in. All the data generated in this proposal, both computational and functional, will be made available to the community through a dedicated web browser (http://zebrafish.stanford.edu/) as well as integration into ZFIN, Ensemble, and the UCSC genome browser. Combined, our work will advance zebrafish as the major animal model for annotating and characterizing the noncoding portion of the vertebrate genome. PUBLIC HEALTH RELEVANCE: Computational &amp; Functional Annotation of the Zebrafish Genome Regulatory Toolbox While genes make up less than 3% of our DNA, within the remaining 97% lie other numerous extremely important sequences such as gene regulatory elements, that instruct the genes when and where to turn on or off. Mutations in these gene regulatory elements can have a great impact on human disease, yet their location and code still remains on the majority unknown. In this proposal we will take advantage of the unique properties of the zebrafish model organism to couple advanced computational tools with rapid functional zebrafish assays to annotate these sequences and obtain a better understanding of the vertebrate gene regulatory code, which will be of extreme importance to our comprehension of the genetic cause for numerous human diseases.</description>
		<pubDate>Tue, 29 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Digital DNaseI mapping and footprinting of the mouse gen</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC2HG05654&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC2HG05654&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): The goal of this project is to produce comprehensive, high-definition maps of mouse regulatory DNA marked by DNaseI hypersensitive sites to parallel the human catalogue currently under production by the ENCODE Project. Digital DNaseI technology enables efficient genome-wide mapping of accessible chromatin and DNaseI hypersensitive sites. The core regions of DNaseI hypersensitive sites are constitutively populated by regulatory factor binding sites, the nucleotide-resolution footprints of which may be systematically exposed on a genome-wide scale by ultra-deep sequencing. DNaseI hypersensitive sites exhibit marked cell-type variability; accordingly, production of a comprehensive catalog will require surveying a wide range of cell types. Cell types targeted under this proposal include murine analogues of the ENCODE Tier 1 and Tier 2 common reference cell lines; a broad spectrum of primary adult tissues; embryonic stem cells; and sentinel tissues amenable to sequential temporal profiling during development. The production of a parallel, high-quality, high-resolution compendium of mouse regulatory DNA will greatly enhance the value of the human ENCODE project and will provide a rich independent and unique resource for evolutionary, functional, and model organism genomics. PUBLIC HEALTH RELEVANCE: Relevance to Public Health Understanding the genetic basis of human disease requires detailed knowledge of the functional elements of the human genome which may be subject to polymorphism. The ENCODE Project seeks to identify all of the functional elements in the human genome, and present project seeks to greatly increase the value of the ENCODE data by providing a parallel catalogue of regulatory DNA in the mouse genome. This project is therefore expected to provide key insights into the importance of elements in the human genome, and to provide an unprecedented resource for rational functional modeling of human disease in the mouse.</description>
		<pubDate>Tue, 29 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Pharmacogenomic studies in VISP: results &amp; implications</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=U01HG05160&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=U01HG05160&amp;cr_yr</guid>
		<description>DESCRIPTION: (provided by applicant): More than 750,000 Americans suffer stroke each year. Of these, nearly 160,000 die and hundreds of thousands are disabled. The burden on the public health is even greater given that 11 million subclinical strokes per year contribute to cognitive decline and dementia. Ischemic stroke accounts for 85% of all strokes. Established stroke risk factors play major roles in defining stroke risk at a population level, but prediction of individual risk remains unrealized. Identification of factors that place individuals at risk for ischemic stroke is central to the development of therapeutic preventative strategies. The Vitamin Intervention for Stroke Prevention (VISP) trial, an NIH-funded, multicenter, double-blind, randomized, controlled clinical trial, was designed to determine whether the daily intake of high dose folic acid, vitamins B6 and B12 reduced recurrent cerebral infarction and a combined vascular endpoint. The question of benefit versus risk for B-vitamin supplementation remains a controversial topic. Concern that such interventions may incur measurable risk heightens the need to clarify who might benefit and who might be harmed by such therapies, particularly given population-level folate supplementation efforts. Our central hypothesis is that there are genetic variants that significantly correlate with risk of recurrent ischemic stroke in the setting of vitamin therapy. The specific aims of this proposal are to: 1) Identify genetic variants that influence the risk of recurrent stroke or combined vascular endpoints in response to vitamin therapy; 2) Determine whether the association between genetic variants and risk of recurrent stroke, MI or death is mediated via diet, inflammation, and/or coagulation; 3) Develop predictive models, incorporating genetic and clinical information, that can be applied to future clinical trial design; 4) Work collaboratively with other UOl awardees and the NHGRI to develop a paradigm for pharmacogenomic clinical trial design. The VISP trial provides a unique data set and a wealth of analytic opportunities. These analyses are expected generate testable models predictive of stroke risk, impart insights into pathophysiologies underlying susceptibility to ischemic stroke, and be instructive in providing a framework to develop guidelines for genome-wide studies on future clinical trials; PUBLIC HEALTH RELEVANCE: More than 750,000 Americans suffer stroke each year. Of these, nearly 160,000 die and hundreds of thousands are disabled. The burden on the public health is even greater given that 11 million subclinical strokes per year contribute to cognitive decline and dementia. Beyond the human cost, the direct and indirect costs of ischemic stroke in the U.S. are projected to exceed $2.2 trillion in 2050.</description>
		<pubDate>Tue, 29 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Randomized Clinical Trials - Whole Genome Studies Coordi</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=U01HG05157&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=U01HG05157&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): The Department of Biostatistics at the University of Washington proposes to establish a Coordinating Center for a series of genome-wide association studies of treatment response in randomized clinical trials. The Coordinating Center will be administered within the Center for Biomedical Statistics of the Department of Biostatistics and it will take advantage of the experience gained by departmental activities as Coordinating Center for the Geneva project. Geneva is a series of 14 genome-wide association studies within the Gene-Environment Initiative. The new RTC-WGA Coordinating Center will be co-directed by Bruce Weir, Chair of the Department of Biostatistics and PI of the Geneva Coordinating Center, and by Patrick Heagerty, Director of the Center for Biomedical Statistics. They will be joined by Biostatistics faculty Scott Emerson, Ken Rice, Lianne Sheppard, Jon Wakefield, Epidemiology faculty member Annette Fitzpatrick and by Medicine faculty member Bruce Psaty. These faculty have substantial experience in the conduct of both clinical trials and genetic studies, as well as in studying the effects of environmental exposures. They will be able to seek advice from a distinguished advisory panel of Lon Cardon, Tom Fleming, Dick Kronmal and Ross Prentice. The Coordinating Center will provide administration and coordination of all activities for the set of whole-genome analyses of data from participants in clinical trials at study sites. The Center will facilitate harmonization and sharing of phenotypic and genetic data across study sites, the genotyping centers and dbGaP. The Center will assure the integrity of data by implementing appropriate data management procedures and quality control activities. The Coordinating Center will provide statistical support for modeling and selecting options for replication and follow-up studies, and, as appropriate, for selecting targets for sequencing and functional studies as well as analytic support for issues inherent to randomized clinical trials. The Coordinating Center will serve as a resource to facilitate and support all NHGRI whole-genome association studies, including the training of researchers in appropriate statistical methodology and the provision of computer software for statistical analyses. Public Health Relevance: People may respond to treatments for disease in a way that depends on their genetic constitution. There would be many benefits if the possibility that they will have adverse reactions could be predicted on the basis of a simple blood test. The best way to determine the relationship between genetic constitution and response to treatment is with a randomized clinical trial. The coordinating center will manage the data from these trials.</description>
		<pubDate>Mon, 28 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Analysis and integration of expression patterns in embry</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=F32HG05192&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=F32HG05192&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): Instructions for building tissues and organs are encoded into the genome and at each Step of embryonic development a series of transcriptional regulatory events are required for their accurate execution. Understanding the regulatory programs that pattern genes during embryogenesis is a key step in understanding developmental diseases and the mechanisms behind the formation and repair of healthy tissues and organs. The mechanisms that pattern tissues and organs are strikingly similar across species. Therefore model organisms, like the fruit fly, provide convenient experimental systems to explore the basic principles that guide tissue morphogenesis. Despite the availability of numerous genomes and techniques to physically map transcription factor binding site locations and chromatin states, the regulatory programs implemented during tissue and organ development are challenging to assemble. A critical barrier is the incorporation of spatial and temporal gene expression data into the construction and analysis of regulatory networks. In this proposal, spatial and temporal expression pattern data will be integrated with a range of heterogeneous datasets in order to assemble and analyze developmental gene regulatory connections. The first aim of the proposal is to develop a language for the assembly of geometric models to represent gene expression patterns using a combination of simple shapes (primitives) and Boolean operations. In parallel, regulatory regions within the genome will be predicted using a combination of sequence, transcription factor binding locations, and chromatin states. The final aim of the proposal investigates connections between regulatory grammar and spatial expression domains. These approaches will be developed using spatial expression patterns for over 4000 genes patterned during Drosophila melanogaster embryogenesis and will be central to interpreting the datasets generated in the modENCODE project in the context of tissue and organ development. Relevance to public health: Studying the mechanisms by which healthy tissues and organs are patterned is central to understanding developmental disease and will provide important insight into how adult tissues can be repaired or organs can be engineered for transplantation. Examining tissue and organ development in model organisms, like the fruit fly, is directly relevant to human health, as many of the mechanisms that pattern tissues in a model organism are same in humans.</description>
		<pubDate>Mon, 28 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Informed Consent and Data Access Issues in State-based B</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC1HG05439&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC1HG05439&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): This application addresses broad Challenge Area (02) Bioethics and specific Challenge Topic, 02-HG-101: Informed consent and data access policies. The ethical, legal, and social issues (ELSI) underlying the development and implementation of state-sponsored birth cohort studies and their accompanying biobanks are complex and potentially volatile. Michigan and other states, such as Connecticut and California, are in the midst of investigating and deliberating on how to set up biobanks, and there is a pressing need for practical ELSI research and guidelines for these historic initiatives. Consequently, to facilitate the development of state-sponsored population birth cohort databases for a wide range of studies, including genetics, research is urgently needed to address how recruitment, informed consent, and data access issues are affected by community members' hopes, expectations, and anxieties about research use of newborn blood spots. Our application specifically addresses the Challenge Area 02-HG- 101* Informed consent and data access policies. We propose the following specific aims to investigate whether a method of ameliorating these concerns through a new health information technology adequately addresses community member's needs. Aim 1: To develop and test a multi-level participant-centric informed consent, privacy, and data access educational system and protocol that utilize an already existing on-line health information technology system called Private Access. Aim 2: To evaluate the impact of participant-driven levels of informed consent and data access on potential recruitment into studies (e.g., the Michigan Neonatal Biobank) using both in-person Town Hall meetings and on-line testing in 15 diverse Michigan communities in five geographical locations. Specifically, we will examine how demographics, types of research, and types of researcher (government, academic, private company), consent options, types of privacy control, and data access options affected community leaders and participants' knowledge, attitudes, and consent to participate in a large birth cohort and biobanking effort in the state of Michigan. Our proposed project will evaluate the impact of consumer-driven informed consent and data access on participation in a large birth cohort and biobanking effort being developed in the state of Michigan.</description>
		<pubDate>Sat, 26 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Transcriptomics from Single Circulating Tumor Cells</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC1HG05471&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC1HG05471&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): This application addresses the broad Research Area: 06, Enabling Technologies And Challenge Topic: 06-HG-102* Technologies for obtaining genomic, proteomic, and metabolomic data from individual viable cells in complex tissues. Methods will be developed to analyze gene expression from single cancer cells enabling studies at an unprecedented level of sensitivity. We propose techniques to measure RNA transcription from individual circulating tumor cells (CTCs) isolated from the blood of cancer patients. By accessing confirmed cancer cells, separated from the large background of normal cells, it will be possible to gain unexplored knowledge about the disease process and its spread through the body by metastasis. Recent advances in several key technologies have made this approach feasible: 1) Researchers at the Scripps Research Institute have developed the means to fluorescently label CTCs so that they can be identified from human blood samples, 2) Technologies for isolating single cells have been combined with nucleic acid amplification methods at the J. Craig Venter institute enabling the first demonstration of DNA sequencing from single cells, and 3) Life Technologies (Invitrogen/Applied Biosystems) have combined methods for cDNA synthesis from single cells and whole genome expression profiling with their next generation SOLiD sequencing platform. These three institutions will collaborate to isolate single cancer cells, synthesize cDNA and sequence it by the SOLiD method. The goal is to observe cancer at the most basic level of the single cell. PUBLIC HEALTH RELEVANCE: We propose development of a method to measure gene expression in individual cancer cells obtained from human blood. These circulating cancer cells (CTCs) are very rare but can be responsible for the spread of cancer throughout the body by metastasis. With new strategies to fluorescently label CTCs it is possible to isolate them. Other new technology enables determination of mRNA expression levels from a single cell giving an understanding of the alterations to gene expression. A new understanding of the physiology of CTCs and how they lead to metastasis will be possible.</description>
		<pubDate>Sat, 26 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Toward a Framework for Policy Analysis of Microbiome Res</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R01HG04900&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R01HG04900&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): It is recognized that research on the human microbiome is important for its potential scientific and medical impact. The complexity of microbiome research, however, could change the way that genetics is studied and understood because it calls for a more complex, nuanced framework for defining and demonstrating causality. The understanding of the human microbiome could also disrupt traditional assumptions about definitions of species, self, disease and normality. It is also recognized that microbiome research can raise ethical, legal, and social issues. The mandate to study the ELSI issues of human microbiome research at this stage implicitly embraces the concept of preventive or prophylactic bioethics. While useful, such an approach can be less effective than desired at identifying ethical and social issues and minimizing harm if it occurs separately from the scientific community, or is conducted in the abstract and general rather than linked to actual features of planned or ongoing research. Our overall goal is to devise an approach to examine the ELSI issues associated with microbiome research. We propose to use the frameworks of Constructive Technology Assessment and Value-Sensitive Design because they are designed to evaluate research specifically incorporating social context and values, and are well suited to evaluating rapidly-moving and boundary-challenging technologies such as those used in microbiome research. We propose to use a dual concept of risk as a tool to link discussions of abstract questions about values and social implications with specific features of research. This analysis will be used to identify potential research design alternatives that could minimize value conflicts and could potentially be generalized to other genomic and biomedical research more broadly. Our specific aims are: AIM 1. To analyze how risk and benefit are conceptualized in research contributing to the understanding of the human microbiome and its applications, through: A) content analysis of scientific articles about microbiome-related research B) content analysis of microbiome articles in the lay media AIM 2. Determine the relationship between microbiome research questions or design, concepts of risk and benefit, and societal values, in order to inform research conduct, through: A) extended, structured interdisciplinary dialog with experts in microbiome research, technology assessment, and ethical and social analysis B) writing and disseminating white papers and articles PUBLIC HEALTH RELEVANCE: Our overall goal is to devise an approach to examine the ELSI issues associated with microbiome research. We propose to use the frameworks of Constructive Technology Assessment and Value-Sensitive Design because to evaluate research specifically incorporating social context and values and the concepts of risk and benefit as a tool to link discussions of abstract questions about values and social implications with specific features of research.</description>
		<pubDate>Sat, 26 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Generating and Managing Large Scale Proteogenomic Data f</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC2HG05591&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC2HG05591&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): The first human genome sequence was published in 2001, yet as of now, eight years later, major questions remain, such as how many genes are encoded by the genome, and of those genes, how many functional products are encoded due to phenomena like alternative splicing. The Encyclopedia of DNA Elements (ENCODE) project has been coordinated by National Human Genome Research Institute (NHGRI) to answer these questions by comprehensively classifying functional elements on the human genome. The pilot phase of the project studied 1% of the genome in detail, revealing extensive transcription well beyond that predicted by classical gene models. The biological function of a significant portion of the discovered transcripts is unclear. The ENCODE project is now scaling up to examine the whole human genome. It is likely that results will echo the pilot project, revealing extensive transcription, a significant fraction of which has unexplained function. Proteomic technologies can be applied, in a process called proteogenomic mapping, to determine which of the myriad transcripts encode proteins. This approach has been used to reveal new genes, new alternative splice variants, new start sites, and upstream open reading frames (ORFs). While substantive progress has been made in developing proteogenomic mapping technologies, a significant hurdle in using proteogenomics to assist with the ENCODE project is the lack of proteomic data sets that are coordinated with the ENCODE transcription mapping efforts. Here we propose to generate large-scale proteomic data sets directly from the same tier I ENCODE cell lines studied by the transcription efforts, coordinating the results with the transcription mapping efforts to determine which of the pervasive transcripts are translated. Our specific aims are to: 1) produce large scale proteomic data sets on ENCODE cell lines using the most advanced mass spectrometry methods, 2) use our database technologies to store, manage, and make accessible to the community all results of the project, and 3) use our software pipeline to map the results to the latest human genome drafts, producing a UCSC (University of California Santa Cruz) genome browser track with the results. We believe the result will be a significant advancement in knowledge about our genomes and the functional products they encode. PUBLIC HEALTH RELEVANCE: The human genome is the blueprint for human life and human health, but we do not yet understand its language - the language of genes. The ENCODE project is deciphering that language systematically, and the goal of this proposal is to accelerate that effort by revealing which parts of the blueprint contain instructions to build proteins.</description>
		<pubDate>Sat, 26 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Enhance human ENCODE by function comparisons to mouse</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC2HG05573&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC2HG05573&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): Our goal is to discover and use relationships between mouse and human regulatory genomes to advance the ENCODE Project in its effort to map all functional elements in the human genome. Our comparative approach aims to uncover principles and solve problems that are proving difficult by studying the human genome alone. ENCODE is vigorously mapping hundreds of function-associated biochemical markers in selected cell lines, resulting already in tens of millions of reproducible biochemical features. Some observed protein:DNA interactions find and refine known transcriptional enhancers, promoters, silencers, together with associated chromatin structure, as was anticipated. But substantial questions arise as to how many of the myriad biochemical events are functional, what those functions are, which gene or genes are meaningful targets, etc. To highlight and sort functionally important biochemical marks from others, we will systematically identify the molecular events retained by both mouse and human since they diverged. We will then analyze how conservation of biochemical features relates to conservation of DNA sequence and conservation of regulated gene expression. By using the mouse, we can leverage decades of molecular genetics and manipulated mouse genomes that do not exist in any other mammal. In Aim 1 we execute genome-wide assays for biochemical signatures of functional DNA sequences in a few specific mouse cell types. By using well-studied mouse lines and cell states, we can interpret results in light of previously validated elements and in light of ENCODE human results. We will use ENCODE standards for high throughput, sequence-based assays to determine gene expression, DNase hypersensitive sites, histone modifications and selected transcription factor occupancy in seven mouse cell types. The eight selected features are the most informative ones for function, and thus most useful for comparison with human data. In Aim 2, we apply a genome-wide implementation of chromosome conformation capture to map the interactions between transcription factor binding sites and their responsive genes in two cell types. These results will be compared to those from an ENCODE developmental project. Comparative analysis in Aim 3 will insure that the impact of the data we produce will go beyond the individual mouse cell systems per se. To do this we have organized a collaboration of investigators at multiple institutions, in which each group is expert in one or more critical aspects. Our data, made public and accessible via ENCODE, will fuel and accelerate many future studies after the 2-yr stimulus both in and beyond ENCODE. This responds to NHGRI request for applications on Enhancement of the value of the human ENCODE Project by conducting a parallel effort on the mouse genome. The proposed work will improve the maps of biologically functional DNA sequences in humans, which in turn will help explain how variants in human genome sequences could be associated with human diseases, leading to candidates for novel avenues for effective therapy and prevention. PUBLIC HEALTH RELEVANCE: Every person differs in his or her response to pathogens and in the likelihood that they will suffer from complex diseases such as cancer, heart disease or diabetes. Individual susceptibility to disease is determined in part by genetics, and we can map with high precision the locations of DNA variants associated with disease susceptibility. In order to understand how these variants contribute to disease susceptibility, we need to identify the biological functions of all DNA sequences; the proposed work will help us map these functional DNA sequences.</description>
		<pubDate>Sat, 26 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Returning Individual Genetic Research Results to Parents</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC1HG05491&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC1HG05491&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): This application addresses broad Challenge Area (02) Bioethics, 02-HL-101: Informing the ethical and practical guidelines for providing genetic research results to study participants. The mapping of the human genome has allowed researchers to discover new relationships between genotype and phenotype, and has provided the basis for genome-informed medical decision-making that will lead to diagnoses and therapies that are targeted, have reduced variability, maximize efficacy, and minimize adverse effects. As information of greater health significance is generated by genomic research, there is an emerging consensus that the ethical return of genomic information will be needed. The goal of this proposal is to understand the attitudes of participants in genomic research towards the return of research results in the setting where the participant is a child and the receiver of the information is the parent. We will take advantage of a large genotype-phenotype project initiated by our group at Children's Hospital Boston (CHB) based on the Informed Cohort, a new paradigm for genomic research that we developed. The Informed Cohort is a model for the ethical recruitment of participants into a longitudinal genotype-phenotype registry and reconciles the paradox of maintaining participant privacy, yet providing results. We call the implementation of the Informed Cohort model at CHB the Gene Partnership Project (GPP). GPP is a longitudinal genotype-phenotype registry that uses a messaging system through the CHB personally- controlled health record (PCHR) to facilitate the disclosure of research results back to the participants. The Informed Cohort Oversight Board (ICOB) will provide the crucial oversight of the communication of results back to participants. While the enrollment of children might seem to present additional ethical obstacles, we see the natural participation of the family unit that occurs in pediatric hospital as an advantage for GPP. While we have implemented GPP, we do not know what factors will maximize its benefit and appeal for participants and families. We therefore propose a multi-step evaluation of the GPP and the messaging system. We do not know how parents view studies where they receive research results back on their children. To address this issue we will assess the interest of parents to participate in a genotype-phenotype study focused on their children, and determine if they would want to receive genetic information back from the study about their child. We also do not know how participants will perceive the messaging when it occurs. Thus, we will assess the use of the PCHR-based messaging system by parents of participants enrolled in GPP. Finally, a functioning ICOB will be paramount for the process of returning research results to be successful, yet we do not know how the ICOB will function. Since further work is required to ensure that the ICOB is a workable model for decision-making, we will develop the Informed Cohort Oversight Board. The knowledge gained from the proposed project on the ethical return of research results to participants will be critical as we move towards a paradigm where individuals directly benefit from the genomic research they participate in, and eventually from genomic medicine. This project addresses the ethical issues associated with the return of genetic information to the parents of children participating in genomic research. It is critical that participants in genomic research benefit from the studies that they are participating in, especially if there are research results that pertain to their current or future health. Yet the return of such research information needs to be done in a manner that assures validity of the results, and is thoughtful and sensitive. The goals of this project are to understand parental attitudes toward receiving genetic research results back on their children, and to determine the effectiveness of an oversight board that will address the questions of how to ethical return results to participants. This study will be important as genetic research moves towards participants truely realizing the benefits of the research they are a part of.</description>
		<pubDate>Sat, 26 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Dynamically scalable accessible analysis for next genera</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC2HG05542&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC2HG05542&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): Project Summary Wide availability of next-generation sequencing (NGS) instruments has enabled any investigator, for a modest cost, to produce enormous amounts of DNA sequence data. However, working with these raw sequences presents significant problems for individual investigators, small labs, or core facilities. For an experimental group with no computational expertise, simply running a data analysis program is a barrier, let alone building a compute and data storage infrastructure capable of dealing with NGS data. Fortunately, a computational model - Cloud computing - has recently emerged and is ideally suited to the analysis of large- scale sequence data. In this model, computation and storage exist as virtual resources, which can be dynamically allocated and released as needed. Importantly, cloud resources can provide storage and computation at far less cost than dedicated resources for certain use cases. However, formidable challenges need to be addressed to make these resources available to individual investigators. Specifically, although cloud computing provides a way to acquire computational resources on demand, the resources provided are either virtual machines on the Internet or specific programming libraries, which are unusable for experimentalists. Thus, a viable analysis solution needs to be accessible and deployable without informatics expertise; it must efficiently and automatically use dynamically scalable resources, while taking into account time and cost; it must include appropriate analysis tools and easily support addition of new tools as they emerge. We have previously developed a software system - Galaxy (http://galaxyproject.org) - that provides a robust framework for addressing these needs. Here we propose to significantly extend this framework to allow any experimentalist to perform large-scale NGS analyses utilizing the power of cloud computing infrastructure. In particular, we will modify the existing Galaxy framework to run entirely within the cloud. We will adapt the way Galaxy schedules and executes jobs to make effective use of cloud-style. We will provide a mechanism for individual users to create and deploy custom Galaxy instances on a cloud through an entirely web-based interface. Finally, we will test our approach by applying the developed facilities to the existing human re- sequencing data in order to uncover hidden patters of mutations causing human genetic disease on a very large scale. PUBLIC HEALTH RELEVANCE: Project Narrative Increasingly available and inexpensive high-throughput DNA sequencing holds great promise for biomedical research, but informatics challenge block the full realization of the potential of this transformative technology. In particular progress is limited by the informatics and engineering expertise of biomedical researchers, and the availability of sufficient computational infrastructure to analyze these enormous datasets. This project will address these problems by bringing together Galaxy, a system for making complex computational analysis accessible and reproducible, with cloud computing, an infrastructure model where computing resources are purchased on demand as needed, making it possible for investigators with no informatics expertise to perform data-intensive analysis using cloud resources.</description>
		<pubDate>Fri, 25 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>A new system for mammalian cell genetics using a haploid</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R21HG04938&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R21HG04938&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): While powerful, phenotype driven genetic screens in mammalian cells are limited by the diploid state of their genome and the inability to perform genetic crosses. Genetic screens reliant on RNA interference (RNAi) are commonly thought to surpass these limitations. This supposition is called into question however, when one considers problems inherently associated with gene silencing by RNAi, such as frequently observed off-target effects and the failure to completely silence gene expression. Genome-wide screening efforts, especially, may in fact founder from the accumulation of these insurmountable complications. Here we propose to bypass or alleviate such complications using a novel system allowing efficient genetic screening based on a haploid genomic context. The objective of this project is to develop a powerful new method for mammalian cell genetics for use in the identification of undiscovered genetic networks in cancer and other diseases. The specific aims are: 1. To generate a new platform for loss-of-function genetics. We will use a cell line haploid for all but one human chromosome and efficient gene-trap mutagenesis as an alternative methodology for phenotype-based, recessive genetic screens. 2. To use gene-trap screens in haploid cells to identify cancer relevant genes. We will employ this unique technology to discover novel genetic networks operative in cancer cell biology; our initial efforts aim to elucidate unknown genetic components of death-receptor induced apoptosis and therapeutic response to Gleevec. PUBLIC HEALTH RELEVANCE: Unbiased genetic screens are powerful means for elucidating poorly understood biological processes. RNA interference strategies comprise the most recent and widely used application for genetics in cultured mammalian cells. These approaches are undermined by the inherent rate of false positives and negatives. Here we propose to develop a new system for mammalian cell genetics using a cell system that is haploid for nearly all human chromosomes. We will further validate the utility of this approach by identifying novel, cancer-relevant genetic pathways. Following its development and validation, we foresee this novel strategy for mammalian genetic screens would be pertinently useful in more broad applications of biomedical research.</description>
		<pubDate>Fri, 25 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>On-Chip Protein Synthesis Based on Directional Droplet-E</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R21HG05118&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R21HG05118&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): The proposed research is to synthesize a 2-D array of proteins with various peptide sequences on a solid substrate on demand, using a nozzleless, directional droplet-ejector array and MEMS (microelectromechanical systems) technology. The proposed research is to lay down the underlying technologies for a portable and affordable system for synthesis of protein probe arrays on a solid substrate. The synthesis system for protein probe array is to make protein chip technology available to individual laboratories so that the laboratories may be able to immobilize thousands of peptides on a solid substrate for bioassay and screening, and have far greater flexibility in chip design and faster turnaround in fabricating new chips. The envisioned portable protein synthesis system is based on a nozzleless, directional ejector array and MEMS technology to enable the rapid and facile synthesis of a two-dimensional array of any protein sequence on a porous or nonporous planar substrate using small quantities of proteinogenic amino acids. The envisioned PECM (Protein Ejector Chip Machine) is designed for use in any biological research laboratory; it will be no more difficult to operate than a standard protein synthesizer. The proposed technique is entirely different from the technology that produces protein probes pre-made at factory, and completely different also from microarray techniques that spot pre-made protein sequences. The PECM uses novel microfluidic management techniques and an array of self-focusing acoustic transducer (SFAT) for ejecting the 20 proteinogenic amino acids in any desired sequence to any location in order to form a 2-D array of protein probes on a solid chip. The PECM will be a flexible, compact and low cost system that consists of a 2-D array of SFAT ejectors along with microfluidic components, a wash station, a printed circuit board (that contains the control and pulsed-sinusoidal-wave-generation circuits), servomotors and mechanical fixtures, all integrated on a single platform. To demonstrate proof-of-principle, an array of 10 x 10 sets of 20 directional SFAT ejectors will be integrated with MEMS-based microfluidic components for producing 100 protein probes of amino acids of arbitrarily specified sequence on a cellulose membrane functionalized with appropriate linker/spacer chemistry. A single location on the membrane will be inked by 20 directional SFAT ejectors that can eject droplets in any direction off from the straight vertical direction so that a spot can be inked by 20 ejectors without any mechanical motion of the ejectors. Consequently, the envisioned PECM will have few moving parts and non- stringent alignment requirements. PUBLIC HEALTH RELEVANCE: The proposed research will demonstrate 2-D array synthesis of proteins with various peptide sequences on a solid substrate, using a directional ejector array to eject nanoliter droplets of proteinogenic amino acids on demand. The research is to lay down a firm technical foundation for a portable, flexible and affordable protein synthesis system that will produce any protein probe array on a solid substrate (from small amounts of proteinogenic amino acids). With the envisioned protein synthesis system, proteomics scientists will be able to generate protein probes on a chip in any way they desire (at their sites within hours), as they carry out bioassays.</description>
		<pubDate>Fri, 25 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Chromatin and RNA Expression Maps of the Zebrafish Genom</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R01HG05111&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R01HG05111&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): Studies in zebrafish (Danio rerio) have provided important insights into vertebrate biology, but genomics tools and resources lag behind other model systems. This study aims to systematically annotate the RNA transcripts and chromatin domains in the zebrafish genome by an integrated experimental and computational strategy. Previous studies in mouse and human have demonstrated that the combination of tiling microarrays, RNA expression profiling, chromatin immunoprecipitation, gene set enrichment analysis and gene module analysis can identify and annotate large numbers of novel genomic elements. We will apply these approaches to experimentally annotate the zebrafish genome for functional genomic elements (Aim 1) and generate a publicly available atlas of coding and non-coding genes and their functional relationships (Aim 2). Taken together, this project will provide a comprehensive and dynamic map of genomic elements in zebrafish. PUBLIC HEALTH RELEVANCE: Zebrafish is a vertebrate model organism that shares many biological processes with humans. Research in zebrafish has provided important insights into birth defects, behavior and disease. By creating tools and resources to better study the zebrafish genome, the proposed work will further enhance the utility of this model system.</description>
		<pubDate>Fri, 25 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Nanopore-based Electrical Device for DNA Sequencing</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R01HG05110&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R01HG05110&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): The technologies that make sequencing DNA fast, cheap and widely available have the potential to revolutionize bio-medical research and herald the era of personalized medicine. Being able to sequence human genomes for $1000 will enable comparative studies of variations between individuals in both sickness and health. Ultimately it can improve the quality of medical care by identifying patients who will gain the greatest benefit from a particular medicine, and those who are most at risk of adverse reactions. Nanopore-based sequencing technologies attempt to thread a long DNA molecule through a few nanometer wide nanopore and use physical differences between the four base types to read the sequence of bases in DNA. The two major potential benefits of nanopore sequencing are the high speed and the low price. Nanopore sequencing does not need slow and expensive chemistry, therefore electrical-only sequence readout can proceed at highest rates achievable by modern electronics. At present, the nanopore sequencing is still a promise - no single nucleotide resolution has as yet been achieved experimentally. It is very likely that the ability to localize a DNA molecule inside a nanopore with a single base resolution would provide a sufficient time for read-out electronics to determine the base type. We propose a nano-electro-mechanical device (DNA Transistor) capable of controlling the translocation of a single DNA molecule inside a nanopore with single nucleotide accuracy. This function is based on interaction of discrete charges, localized on phosphate groups along the backbone of a DNA molecule, with the externally controlled electric field confined inside the nanopore. The design of the DNA Transistor relies on well researched thin film deposition techniques from semiconductors industry. The device is a stack of metal and dielectric layers, each a few atoms thin, with a nanopore penetrating through the stack. The electric potentials applied to the metal layers traps the DNA molecule inside the nanopore. Pulsing these potentials allows controlled translocation of the molecule with a single base resolution. IBM Research is uniquely positioned to implement the proposed idea. Our experimental effort will rely on in-house industry leading semiconductor device fabrication facilities. The experimental component of the effort will be complemented by a modeling and simulation component that will rely on in-house Blue Gene supercomputing capabilities. Our first aim is to fabricate the DNA transistor and demonstrate its capability to translocate the DNA molecule through the nanopore with a single base resolution. The second aim is to electrically differentiate bases of the localized DNA molecule. The final aim is to develop cost-effective DNA Transistor fabrication methods suitable for mass production. PUBLIC HEALTH RELEVANCE: IBM proposes the design, characterization and production of a nano-electro-mechanical device (the DNA Transistor) that forms the basis of a fast technology to sequence human genomes for $1000. The widespread availability of this technology will enable comparative studies of variations between individuals in both sickness and health. Ultimately it can improve the quality of medical care by identifying patients who will gain the greatest benefit from a particular medicine, and those who are most at risk of adverse reactions.</description>
		<pubDate>Fri, 25 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Genetics Educaiton: Resources for Health Professionals</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=P41HG04693&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=P41HG04693&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): The goal of the proposed project is to address a well-documented lack of genetics education among health professionals, which is a rate-limiting step in the integration of genetics and genomics into mainstream health care. The five-year project will create, disseminate, and evaluate a set of educational resources for health professionals who are not trained in genetics. Those resources will acquaint health professionals with the application of genetics and genomics to prevention, diagnosis, and treatment of disease and will raise awareness of the ethical, legal, and social implications of genetically based health care. All resources will be available free of charge. The specific aims are: 1. Create, disseminate, and evaluate genetics-education resources that are scientifically accurate and clinically relevant, increase awareness of ELSI issues, and are freely available for use by students, faculty, and practitioners in the health professions. 2. Provide an infrastructure that will leverage collective expertise to raise awareness of the role of genetics in mainstream medicine and reduce duplication of effort in genetics education for health-care professionals. The research design and methods will employ a well-tested process for development and evaluation of educational materials for adult learners. The process is intended to maximize input from content experts and the end users and to help ensure that the resulting resources are 1) scientifically accurate, 2) clinically relevant, 3) educationally effective, and 4) not significantly duplicative of other efforts. The project's relevance to public health is in the improved understand of health professionals about genetics and genomics and their applications to health, disease, and clinical practice.</description>
		<pubDate>Fri, 25 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>GWAS of Hormone Treatment and CVD and Metabolic Outcomes</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=U01HG05152&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=U01HG05152&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): The results of the early-terminated WHI hormone trials (WHI-HT) led to a dramatic 50% reduction in hormone use by post-menopausal women in the ensuing 5 years since the trial was terminated. Yet, controversy persists, and several important clinical and scientific questions remain unresolved. Individuals are known to vary considerably in response to drug therapy and other interventions, both in magnitude of treatment effect and in risk of adverse events. Much of this variation in drug response has been hypothesized to have a genetic basis. Based on previous data from animal models, observational human studies, and accruing candidate gene and plasma biomarker and proteomic data from WHI-HT, estrogen and progesterone have multi-factorial effects on atherosclerotic and metabolic traits. Nonetheless, the specific genetic factors that govern the overall risk of vascular and metabolic disorders in response to hormone therapy are largely unknown. Because it requires fewer a priori assumptions, a genome-wide approach may increase the likelihood of identifying risk variants and may reveal novel mechanisms. Moreover, the breadth and depth of genomic linkage disequilibrium coverage on current 1 million SNP whole-genome platforms allow for more focused analysis and follow-up of candidate genes based on prior biologic hypotheses. Therefore, a comprehensive evaluation of SNPs using current generation genome-wide association (GWA) technology is the next logical step to screening susceptible populations. As such, the value of using existing epidemiologic data and biologic specimens from the large, population-based randomized hormone trials of WHI will be substantially increased by the addition of GWA genotyping studies through this proposal. We hypothesize that it is possible to identify common variants that reproducibly alter risk of vascular events (CHD, stroke, and VTE) and diabetes after exposure to estrogen with or without progestin in postmenopausal women. Public Health Relevance: Information generated from this study will be critical to determine the health impact of genetic variants on the balance of benefits and risks associated with hormone therapy in post-menopausal women. Findings may also provide valuable insights into disease pathways and mechanisms, and identify novel targets for disease screening, prevention, and treatment of cardiovascular events and diabetes in women.</description>
		<pubDate>Fri, 25 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>A Strategy for High Quality Clinical Resequencing of the</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC2HG05570&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC2HG05570&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): The overall goal of this proposal is to develop a strategy towards clinical resequencing of the human genome which we define as the application of human genome resequencing to large clinical populations for disease gene discovery research as well as a clinical application that has diagnostic potential. Our strategy relies on third generation single molecule DNA sequencing as utilized by the Pacific Biosciences platform. We anticipate that single molecule DNA sequencing technology will ultimately be the sequencer of choice given these systems' rapid collection of massive amounts of DNA sequences from patient samples with clinical outcome annotation, and cost-effective operation at a fraction of the current costs of other sequencing technologies. This first goal focuses on developing what we term an intermittent segment resequencing approach. A set of interspersed reads will be generated from a single DNA molecule, much the same way as mate paired sequence works, but having internal interval sequences rather than a terminal sequence pair as generated by paired-end approaches. The second goal involves testing in-solution based approaches to increase substantially the fold coverage of genomics regions of high clinical interest in combination with genome shotgun sequencing. Given their random distribution of sequences, shotgun genome sequencing efforts treat all regions of the genome equally. Increasing the fold-coverage of these regions of interest selectively, in comparison to the rest of the genome, would improve the detection of important clinical variants. PUBLIC HEALTH RELEVANCE: Creating a personalized medicine strategy for each individual requires decoding the DNA sequence of a person's cancer that involves identifying genetic variations for any patient and discovering the clinical significance of these variants. The composition and spectrum of these variants can vary significantly from individual to individual, thus pointing out the importance of identifying all of the critical ones with clinical ramifications. New sequencing technologies have opened the way for creating personalized genetic signatures which can be used to tailor the clinical management of individuals but even more cost-effective approaches are needed given that thousands of individuals with specific disease have their genomes sequenced.</description>
		<pubDate>Fri, 25 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Providing the $1000 genome via improved single molecule</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC2HG05598&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC2HG05598&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): DNA sequencing technology has improved dramatically over recent years with orders of magnitude improvements in the yield of sequence data having resulted from applying massively parallel approaches. Helicos has taken next generation sequencing even further with its single molecule technology, allowing simpler sample prep, lower sample quantities, and lower costs than competing platforms. However, this cost is still not low enough to make whole genome sequencing routinely available for medical or other applications. For sequencing whole genomes to become widespread, it will be necessary to further lower the cost to $1000 or less. With the improvements that can be accomplished via this grant over the next two years, this goal could be realized. Sequencing of a whole human genome using single-molecule sequencing with a HeliScope has already been accomplished for less than $100,000 outside of a genome center and technical improvements to that system should make it possible to bring the price down to less than $1000. During the span of our current grant (now in the final year of NIH grant 5R01HG004144, High Accuracy Single Molecule DNA Sequencing by Synthesis), Helicos has been able to surmount all the critical technical barriers for single-molecule sequencing at the genome-wide scale. We have taken a very early stage technology and driven it to a commercial instrument which, even in its initial state, provides sequence data at an unprecedented low cost. The next two years are critical for affirming the commercial viability of this platform and technology. Attainment of additional improvements in yield, read length, and error rates based on improvements to the sequencing surface and better nucleotides would allow us to drive costs into the &lt;$1000 range for a whole human genome. By developing an ordered surface of DNA primers, up to 5x increase in sequencing yield is possible. Similarly, dyes with better fluorescence yield will allow reduced imaging time and higher throughput. Additionally, many other biological applications that are not possible to carry out with other technologies will be enabled with these improvements. Because many potential improvements are additive, it is not necessary to make substantial gains in all areas but, rather, measured improvements in the proposed areas would be sufficient. As such, we aim to attack the issues of cost and throughput on multiple fronts in order to improve all aspects of the process and allow attainment of the long sought $1000 genome. PUBLIC HEALTH RELEVANCE: Massively parallel DNA sequencing has revolutionized many areas of biology by providing orders of magnitude more sequence data than previously possible. The majority of this sequence data has been generated using systems that require complex sample preparation methods, which limits the applications that can be attempted and increases the cost. By improving the current true single-molecule sequencing system to lower costs and improve yield, additional applications will become possible. These applications will likely include basic research into disease pathogenesis, as well as clinical and applied research. Therefore, the proposed project will open new research avenues, lead to a better understanding of the biological mechanisms underlying disease states, and ultimately aid in identifying revolutionary new ways to diagnose, treat and prevent human disease.</description>
		<pubDate>Fri, 25 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Human Microbiome Research and the Social Fabric</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R01HG04856&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R01HG04856&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): Learning more about the human microbiome is likely to change the way medicine is practiced. It may also have implications for our society and our legal system and important implications for how we conceive and address the ethics of medicine and biomedical research. The goal of our project will, therefore, be identifying the ethical, social, and legal implication raised by the study of the human microbiome so as to provide insight and guidance for scientists who will be engaged in the work and members of our society who will be asked to cooperate in the studies and to live with the consequences. Our project will bring together an interdisciplinary team of 27 health professionals, scientists, and scholars from the humanities and social sciences to explore key issues through an intense process of mutual education, group discussion, consensus formation, writing, critiquing, and confirming our views. With that background we will go on to engage a broader community in a series of discussions of the topics. This series of Community Conversations on Developing Science will be designed to provide skill-based education to our audience, to engage participants in a dialogue about the issues, and to elicit their views in a process that could be called community consultation. Combining what we learn from our group's research and discussions with the input that we gather from community consultation, we will prepare a volume for publication on the ethical, legal and social implications of the human microbiome and a set of materials to be used by others to inform scientists and our society about these matters. The working hypothesis of our project is that the human microbiome may or may not raise entirely unique issues, but that considering theoretical issues from the vantage point of the human microbiome will allow us to reexamine policies and positions in a new light. We will be trying to locate our understanding of the microbiome within the existing rich and intricately textured social fabric by identifying relevant models and points of comparison for grounding our responses. Seeing issues from the new vantage point of research on the human microbiome will spur us to ask and answer questions about the conceptual foundation of accepted principles and distinctions, about the relative importance of previously accepted commitments, and about how they fit within the warp and weft of two broadly shared values: individual liberty and the social good. We envision several distinct ethical, legal, and social domains in which research on the human microbiome is likely to have significant implications: human subject research; sample banking and biobanking; public health; privacy; property and commercialization; personhood, personal identity, and normalcy. PUBLIC HEALTH RELEVANCE: The human microbiome is a factor in many diseases. Learning more about it through sample banking and translational research is likely advance healthcare through personalized medicine and to have an impact on public health by improving our capability in disease prevention, surveillance and tracking. Research on the human microbiome will also have repercussions for our society and our legal system and important implications for how we conceive and address the ethics of medicine and biomedical research.</description>
		<pubDate>Thu, 24 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Nanopore sequencing of DNA with MspA</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R01HG05115&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R01HG05115&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): The objective of this project is to engineer a new protein pore, MspA, for nanopore DNA sequencing. MspA's short and narrow constriction, its extreme stability against denaturation and its tolerance to mutations make this protein an ideal, inexpensive and novel nanopore sequencing development platform. We have obtained exciting results that demonstrate the feasibility of our proposal. We designed and made MspA mutants that pass DNA. Importantly, mutated MspA can already nearly resolve single nucleotides using co-passing current alone. Molecular dynamics simulation of MspA agrees excellently with experiment. A prototype fast, low-noise current amplifier was built specifically for nanopore sequencing experiments. Our specific aims are to (i) rationally design, produce and test MspA mutants to improve DNA base recognition and reduce translocation speed; (ii) use molecular dynamics simulation to understand how DNA interacts with MspA and to optimize MspA for nanopore sequencing; (iii) construct a single chain protein to further improve DNA base sensitivity and control of DNA motion in an asymmetric MspA pore; (iv) construct a highly sensitive electronic amplifier and a practical bilayer apparatus. We have formed a team of three outstanding labs with complementary expertise in protein science, protein simulation, single-channel experiments, molecular biology, and instrumentation to realize these aims. It is our goal to develop a system that can sequence a human genome for under $1000. PUBLIC HEALTH RELEVANCE: This three university team is engineering a novel pore from mycobacteria, MspA, for nanopore DNA sequencing. MspA has an ideal shape for nanopore sequencing. The protein pore is remarkably tolerant of mutations so that it can be exactly tailored to be sensitive to individual nucleotides when DNA passes through it.</description>
		<pubDate>Wed, 23 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Single Cell Single Molecule Digital mRNA Profiling with</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC2HG05613&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC2HG05613&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): The economical sequencing of a number of individual human genomes has been made possible by next-generation sequencing (NGS) methods. These same sequencing methods are now being applied to gene expression profiling. However, next-generation methods, when applied to mRNA sequencing, rely on PCR, which introduces bias, distorts the overall mRNA distribution, and cannot generally be applied to individual cells. Capitalizing on our group's recent work on single-molecule DNA sequencing by synthesis with fluorogenic dNTP substrates, we propose a novel method for multiplex sequencing of individual mRNA molecules using a reserve transcriptase that employs fluorogenic nucleotide substrates to sequence mRNA directly during the synthesis of cDNA. Upon incorporation of a non-fluorescent, terminal phosphate labeled nucleotide substrate by the reserve transcriptase, a fluorogenic polyphosphate molecule is released, and subjected to fast enzymatic digestion, yielding a single fluorophore, the color of which reports the identity of the incorporated dNTP. To allow single-molecule fluorescence detection, the sequencing reaction takes place continuously in a sealed sub-femtoliter nanoreactor, in which there is only one (or no) confined mRNA molecule. Using soft lithography, we fabricate an array of nanoreactors that allow simultaneous, real-time monitoring of many thousands of isolated sequencing reactions with a fluorescence microscope and CCD camera. We will integrate a microfluidic system that processes, isolates and delivers mRNAs from a single lysed cell to a single-molecule sequencer. The easy sample preparation, low cost, and rich information afforded by this new technique will have a broad impact on biological and medical research. PUBLIC HEALTH RELEVANCE: We propose a new approach for system-wide analyses of mRNAs of a single cell with single-molecule sensitivity. By eliminating PCR, this method circumvents the amplification error and bias associated with PCR for low copy number genes, and offers long read lengths and easy sample preparation. This capability will provide a powerful tool for diagnosis and discovery in biomedical research.</description>
		<pubDate>Tue, 22 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>NOVEL METHODS FOR VAST INCREASE IN THROUGHPUT AND ACCURA</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC1HG05332&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC1HG05332&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): This Application addresses broad Challenge Area (08), Genomics, and Specific Challenge Topic 08-HG-101 Technology and resources for high throughput functional analysis of functional elements in genomic sequences. A two year project is proposed to expand, to exploit, to extend, and to apply on a demonstration basis a completely novel approach to high throughput cis-regulatory analysis. We have very recently developed the initial component of this new technology as a tool for rapid validation of sea urchin embryo gene regulatory network models, and demonstrated its efficacy in facilitating the discovery and in measuring the quantitative activity of previously unknown cis-regulatory modules with a throughput of up to 100x that of traditional methods. The essential principle of this approach is use of sequence-tagged barcoded vectors which can be introduced together in large number in a single experiment and de-convolved later. But there remain many additional spinoffs and additional developments to be brought to practice, and the Challenge Grant program offers the opportunity to mount a crash program and bring these opportunities on line in the immediate future. In addition, the methods we have so far developed measure quantitative cis-regulatory output and not spatial activity. We propose additional technological developments to generate higher quality spatial expression data than obtainable by any other means and a high throughput method of recovering large sets of cis-regulatory modules operating in any given spatial regulatory state. The specific aims of this proposal include adapting the sequence tag method to NanoString technology to permit simultaneous assessment of activity of &gt; 100 different cis-regulatory modules; demonstrate the use of this method to obtain temporal output profiles of large numbers of cis-regulatory modules simultaneously; develop a very high accuracy method of determining spatial expression profiles of unknown cis-regulatory modules by use of NanoString measurements; and tune for general use a high throughput technology for isolating all cis-regulatory modules of a large unknown set which operate in a given time-space domain of the organism. Two additional comments are important: first, there is no a priori reason why these technologies should not be transferrable to any other system in which gene transfer by direct DNA injection is utilized; and second, in order to accomplish these objectives we shall have to build a new research subgroup. This will require hiring additional personnel, and in this respect both the scientific and organizational aspects of the proposal synergize with the objectives of the A.R.R.A. NIH initiative. This work is about finding the causal lines of control that determine how fundamental life processes are executed according to the instructions encoded in the genomic regulatory system. The most powerful approach to general solutions to complex disease states requires solid understanding of their control circuitry. Our practice must get beyond struggling to ameliorate effects rather than altering causes. This research shows the way to discovery of structure and function in causal genomic control systems.</description>
		<pubDate>Tue, 22 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Applying transposon mapping technology to describe human</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC1HG05359&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC1HG05359&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): This application addresses broad Challenge Area (06) Enabling Technologies and specific Challenge Topic 06-HG-103: Methods to sequence highly variable, repeat-rich regions of complex genomes. A remarkable 45% of our genome consists of repetitive elements - more than 40-fold the mass contribution of protein-coding sequences. Retrotransposons termed long interspersed elements (LINEs) are among the most predominant and dynamic of these. More than 500,000 LINEs, both intact 6kb elements and fragments, comprise 17% of the human genome. Their presence is intriguing because LINEs are major forces in the evolution of mammalian genomes, with the potential to significantly alter neighboring gene expression levels and mRNA structure. The youngest LINEs, known as T(a)LINEs (transcriptionally active LINEs), retain retrotransposition activity, creating significant genetic differences across human populations and inherited disease by germ-line integration, as well as somatic transforming mutations in cancer. Each of our genomes harbors about 500 T(a)LINEs, approximately 100 of which are intact, autonomous transposons capable of copy-and-paste retrotransposition. Finally, their insertion into both coding and noncoding regions has been associated with a wide variety of functional effects, implicating them as a potentially major source of human phenotypic diversity. There is a fundamental lack of understanding surrounding the role of retrotransposons in human disease largely because of the massive numbers of LINEs that exist in our genomes, as well as their large size. Of necessity, LINEs have been excluded from array based copy number variation studies and next generation whole genome sequencing efforts. My laboratory recently published a method to map repetitive elements in S. cerevisiae by a coupled vectorette PCR-microarray method we have termed transposon insertion profiling (TIP-Chip). We have demonstrated that this technology enables mapping of human T(a)LINEs, and propose the first major comprehensive survey of T(a)LINEs in reference DNA samples to begin to characterize this underexplored aspect of our genomes. Much of our DNA is derived from LINE retrotransposons. The youngest family of these elements, T(a)LINEs, remain mobile and are poorly characterized, major sources of genomic structural variation across human demographics. Moreover, their insertion into both coding and noncoding regions has been associated with a wide variety of functional effects, implicating them as a potentially major source of human phenotypic diversity. A newly developed method developed in this laboratory for identifying locations of T(a)LINEs will be exploited to comprehensively map these sequences in 120 reference DNA samples and generate a public database of insertional sites and frequencies. This effort will add a new dimension to our understanding of the human genome and provide the basis for future biomedical research investigating the impact of transposable elements on human health and disease.</description>
		<pubDate>Tue, 22 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Controlling Large DNA Fragments During Nanopore Sequenci</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC2HG05553&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC2HG05553&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): The potential use of nanopores for DNA sequencing has gained significant momentum. This is partly due to innovative solid state techniques, but more so due to breakthroughs using biological pores. One promise of nanopore sequencing has been very long read lengths, however we and others have performed most of our experiments using short synthetic DNA oligomers. In this proposal, we present experiments designed to test how efficiently nanopores can control and process long DNA templates (up to 2500 nt in length) as they are catalytically modified by DNA polymerases. Our work will focus on T7 DNA polymerase (T7 DNApol) and the Klenow fragment of DNA polymerase I (KF), coupled to the alpha hemolysin biopore (1-HL). There are three aims: Aim 1. Limit DNA replication to template strands captured one-by-one in the nanopore. To ensure efficient serial analysis of individual DNA templates during catalysis, we will optimize a new strategy developed in our laboratory that quantitatively blocks DNA replication in bulk phase buffer bathing the nanopore, and that activates replication of individual DNA templates exclusively at the nanopore. Aim 2. Quantify the effect of electrical force and DNA/pore interactions on DNA polymerase- dependent replication. Our objective is to determine the length of DNA template that can be reproducibly replicated on the nanopore. Three conditions (see figure below) will be examined to address three independent properties that could influence replication efficiency: a) Polymerase replication of long DNA templates captured in the nanopore under no load; b) Polymerase dependent replication of long DNA templates against resistive forces that arise from DNA/pore interactions; c) Polymerase dependent replication against a resistive electrical force. Aim 3. Determine the effect of voltage on registry of large DNA templates in the nanopore at single nucleotide precision. Nanopore sequencing of intact DNA templates presupposes maintenance of single nucleotide spatial register during the time a base is read. For DNA-polymerase-controlled translocation this would be in the range of 1 to 100 milliseconds per measurement. At low voltages that are likely to permit polymerase catalysis, it is unclear if registry can be maintained. PUBLIC HEALTH RELEVANCE: High speed DNA sequencing is fundamental to understanding human diseases including cancer and heart disease. This proposal addresses fundamental questions about one promising new DNA sequencing technique based on biological nanopores.</description>
		<pubDate>Tue, 22 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Modular Software for Sequence Data Quality Checking Alig</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC2HG05552&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC2HG05552&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): Project Summary / Abstract Rapid technological advances mean that sequencing technologies can now be deployed on a genomic scale to further our understanding of human traits and diseases. Massive-throughput genotyping technologies have provided a tool for large-scale assessment of the contribution of common genetic variants (particularly SNPs) to complex traits. It is expected that next-generation sequencing technologies will substantially broaden the scope of genetic variants to include rarer SNP variants, as well as short insertion and deletion polymorphisms, copy number polymorphisms and other large structural variants. Extracting the full benefits of these new sequencing technologies will require new analytical tools and data processing pipelines, both because (a) approaches and implementations designed to handle more modest amounts of data generated by earlier technologies cannot always handle the orders of magnitudes more voluminous high-throughput data or provide only cumbersome ways for doing so; and because (b) the nature of the data and quality control issues generated by new sequencing technologies are often substantially different from existing technologies. We propose to build on our extensive knowledge and understanding of next generation sequencing technologies and of the analysis of large genetic association studies to construct a data processing pipeline that can be deployed by the NCBI and by scientists wishing to process and analyze large of amounts of next generation sequence data. This data processing pipeline will facilitate (1) quality assessment and validation of short read sequence datasets; (2) mapping of sequencing reads to the genome; (3) variant calling with accurate quality scores; and (4) include tools for data export and visualization. All component software tools of our modular pipeline will support standard data formats. The pipeline will be extensively tested and documented to ensure they are ready for widespread production-level deployment. We believe that the proposed data analysis pipeline will enhance the value of a variety of planned and future sequencing experiments including cancer sequencing, and are committed to delivering these tools in a timely, and standards compliant, manner. PUBLIC HEALTH RELEVANCE: Project narrative We are developing a computer software pipeline to analyze and discover genetic differences between individual human genome sequences. This pipeline will be installed at the NIH and used to analyze data submitted by large genome sequencing projects. The methods we are developing will enhance the study of human genetic variability, contribute to gene mapping and, ultimately, the understanding of heritable human diseases. 1</description>
		<pubDate>Tue, 22 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Statistical methods for estimation of copy number from n</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC1HG05482&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC1HG05482&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): This application addresses board Challenge Area (08) Genomics and specific challenge topic, 08-DA-102 Improved Bioinformatics Analysis for Deep Sequencing. The number of human samples undergoing whole-genome sequencing is expected to increase dramatically in the next few years, as advances in next-generation sequencing technologies continue to lower the cost of sequencing. In addition to detection of sequence variation, these data can be used to estimate DNA copy number variation and subsequently to examine correlation between copy number and phenotype. In this proposal, we aim to develop a series of computational steps and integrated analysis pipeline for accurate estimation of copy number from next-generation sequencing data. This involves efficient processing of the sequencing data, including appropriate alignment procedures and correction for experiment artifacts. For estimation of the copy number along chromosomal location, we will develop novel segmentation procedures, both for a single sample and for multiple samples, to take advantage of the specific nature of sequencing data. Importantly, we also address issues in experimental design, especially the effect of depth of sequencing (genome coverage) and read length on the resolution and accuracy of copy number profiles. We use data from a number of platforms including Solexa, SOLiD, and CompleteGenomes for our studies. The pipeline developed in this proposal will be implemented on a powerful distributed computing system and will be made available freely to the research community. The results of this project will thus enable efficient extraction of copy number from whole-genome sequencing data and will facilitate rapid translation of next-generation sequencing technology to identify structural variations associated with normal or disease phenotypes.</description>
		<pubDate>Tue, 22 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Assessing Attitudes and Experiences of Early Adopters of</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC1HG05369&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC1HG05369&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): This application addresses broad Challenge Area (02) Bioethics and specific Challenge Topic, 02-HG-102 Direct to Consumer (DTC) Personal Genomics-Ethical, Legal and Social Implications Research. Completing the Human Genome and the Human HapMap Projects has enabled studies associating genetic variation with complex diseases such as various cancers, coronary artery disease, and diabetes. This has led to the emergence of direct-to-consumer testing companies offering genomic profiling to inform individuals about their risk for dozens of diseases and traits. Such testing is being offered with the assumption that identification of an increased risk could lead to preventative measures to reduce a person's risk for developing disease or to improve disease outcome. Although personalized medicine is gaining clinical and policy attention and appears to be technically feasible, little is known about the public's understanding and perceptions of such care, nor about their assessment of its risks and benefits. We are proposing a project that capitalizes on the expertise of researchers at the University of Pennsylvania to investigate public response to personalized medicine. The proposed study will take advantage of the Coriell Personalized Medicine Collaborative (CPMC) conducted at the Coriell Institute in Camden, NJ. The CPMC aims to determine the clinical utility of personalized medicine by offering participants a personalized genomic risk assessment for a variety of diseases and collecting data on health outcomes. While not a direct-to-consumer company the CPMC study offers a unique opportunity to assess the social, behavioral, and ethical implications of direct availability of personalized genomic risk assessment. The specific aims of our project are to: 1) Assess motivations and perceived utility of personalized genomic risk assessment among individuals who express interest in the CPMC; 2) Explore participant understanding of their results, the use of the information, and educational needs; and 3) Develop policy recommendations for the ethical offering of personalized genomic disease risk assessment. We will use a mixed methodology for addressing these study aims. For specific aim 1, we will survey approximately 1000 individuals who register for a CPMC informed consent session, regardless of whether they actually attend or provide a sample for testing. For specific aim 2, we will interview 60 CPMC participants 3-6 months after they receive their results. For specific aim 3, we will work with members of the research-to-policy core of the Penn Center for the Integration of Genetic Healthcare Technologies (Penn CIGHT) to develop and disseminate policy recommendations for the responsible and ethical offering of genomic tests that takes into account the misperceptions, concerns, and educational needs of consumers. As direct-to-consumer genomic testing becomes more common, policies and educational materials are needed to ensure that health care consumers derive maximal benefit from this type of testing. This project will identify misperceptions, concerns and educational needs of a diverse group of health care consumers who are offered the opportunity to have genomic risk assessment. Findings from this study will be the basis of policy recommendations for the ethical offering of genomic testing directly to the consumer.</description>
		<pubDate>Tue, 22 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Impact of Data Access Policies on Biobank Participation</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC1HG05468&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC1HG05468&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): This application addresses broad challenge area (01) Bioethics, and specific Challenge Topic, 02-HG-101: Informed Consent and Data Access Policies. Our project title is The Impact of Data Access Policies on Biobank Participation. The NIH data access policy for genetic research provides enormous opportunities for genetic investigators but also raises a number of challenges for educating and recruiting participants into large-scale genetic research studies. The NIH released a final data access and sharing policy for genome-wide association studies (GWAS) in August, 2007. The policy requires specific phenotypic and genetic data from GWAS be deposited into a government controlled, limited access database. The goal of this study is to examine how NIH data access policies will impact participation in large-scale genetic research. In parallel with the emergence of GWAS, medical centers and research institutions worldwide are developing biobanks that house large numbers of participant DNA samples and data. Large scale participant recruitment initiatives are required to sustain and grow these resources that will increasingly be tapped to support GWAS. Challenges for the investigators establishing these and similar collections include: the most appropriate manner to educate potential research participants about genetic research and the data sharing policy, how to inform them about known and potential risks, and how to do this in an efficient and scalable manner. These challenges make it increasingly difficult to individually consent research participants as well as consent for each future use. Broad consent, one in which the participants consent to uses of their biospecimen and data for unspecified future research of prospectively collected cohorts, is generally accepted in the US and more so in Europe. However, controversy exists over how much control to offer participants regarding future access to and uses of samples and data. An opt-out model, one in which participants may indicate that they do not wish for leftover de-identified clinical samples or medical record data to be used for research is being incorporated into biobanks. Although the practice is controversial, its use may become more common as US investigators seek more efficient means of amassing large scale collections necessary for GWAS and other genome wide analytic approaches. Potential participants' understanding of both broad consent and opt-out models must be assessed in the context of genetic research and existing data access policies. To date, participant knowledge about data access policies and practices and their potential impact on participation, or their participation preferences, has not been assessed. We will ascertain novel educational strategies needed to help a patient population consisting of patients from a metropolitan Chicago hospital clearly understand the data access policy for genetic research, the role of the government as the holder of the data, the privacy protections included in the policy and the known and potential risks to privacy. Specifically, we propose to: (1) Investigate whether data access policies affect the willingness of patients to participate in a prospective hospital-based biobank, (2) Assess whether wide-spread data sharing policies for genetic research impacts participants' preferences for two consent models: broad consent and opt-out approaches, and (3) Develop recommendations to help future patients to better understand GWAS and data access policies. To address these aims, we will conduct semi-structured interviews on a random sample of patients ascertained from Northwestern Memorial Hospital and affiliated outpatient clinics. The interviews will address in greater depth preliminary data obtained from focus groups on data sharing and genetic research and will be analyzed according to qualitative research methods. Interview results will inform development of a survey to test educational messages on data sharing and to address patients' interest in participating in genetic research based on a presentation of the two consent models. The results of the survey will provide a basis for recommendations of educational messages and consent models for participants involved in studies in which data will be shared. The goal of this study is to examine how the NIH data access policies for sharing genetic research information will impact participation in large-scale genetic research. The NIH data access policy encourages wide sharing of genetic research information among investigators to speed the translation of genetic research into improving human health. This policy has yet to be assessed for its impact on participation in genetic research nor the research participant's understanding and preferences around this policy.</description>
		<pubDate>Tue, 22 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Integrative analysis of genomic and epigenomic datasets</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC1HG05334&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=RC1HG05334&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): This application addresses broad Challenge Area (08) Genomics, and specific Challenge Topic, 08-OD-101, Computational approaches for epigenomic analysis. While the primary DNA sequence of the human genome is ultimately responsible for the encoding and functioning of each cell, a plethora of chromatin and DNA modifications have been described in recent years that can modulate the interpretation of this primary sequence. These epigenetic modifications lead to the diversity of function across different human cell types, and play key roles in the establishment and maintenance of cellular identity during development, and also in health and disease. The human ENCODE project, the NIH Epigenome Roadmap, and several other large-scale experimental efforts are currently underway to map dozens of histone and DNA modifications across multiple human cell types and disease states, generating a diversity of rich epigenomic datasets. This creates a pressing need for the development of rigorous computational methods for the systematic integrative analysis of epigenomic datasets, and for understanding their relationship to other genomic datasets, including gene expression, disease association, and phenotypic profiling. In this proposal, we will develop and apply graphical probabilistic models for describing chromatin modifications, based on multivariate hidden Markov models. We will use these models to discover the set of underlying chromatin state, based on recurrent combinations of epigenetic marks across the entire genome (Aim 1). We will validate and functionally characterize these states based on their enrichments and positional biases with respect to existing functional elements, as well as large-scale gene expression and disease association datasets (Aim 2). Lastly, we will extend these methods to study dynamics of chromatin state across both healthy and disease cell types, and study how these correlate with functional differences between the observed cell types (Aim 3). We will work closely with the scientists involved in data production and facilitate communication and data integration across them, and also with data analysis and coordination centers already established to facilitate sharing of methods and results across the ENCODE and Epigenome Roadmap consortia, and with the larger community. Overall, the proposed integrative analysis of large-scale genomic and epigenomic datasets will provide a unified view of current and planned epigenomic datasets, towards a systematic understanding of gene and genome regulation in health and disease. While the primary DNA sequence of the human genome is ultimately responsible for the encoding and functioning of each cell, a plethora of chromatin and DNA modifications have been described in recent years that can modulate the interpretation of this primary sequence, leading to the diversity of function across different human cell types. This project will create a computational framework and resource to integrate large-scale genomic and epigenomic datasets, to understand their functional role in health and disease, and to understand their dynamics across different cell lines and disease states. The knowledge gained can play key roles in understanding the establishment and maintenance of cellular identity during healthy development, and how dysregulation of these processes can lead to the onset of disease.</description>
		<pubDate>Tue, 22 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Filling the data processing gap for exon-region specific</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=U01HG05211&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=U01HG05211&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): We propose to develop, implement and streamline an informatics pipeline to fill the gap between production and analysis for gene-region specific high coverage data from the full-scale 1000 Genomes Project. The developed pipeline aims to process data generated from exomes using direct capture technologies and next-generation sequencing as a major part of the 1000 Genomes Project, to identify and catalog SNPs and indels that enable a detailed understanding of the genetic variants distribution within coding regions among the human population. We will develop and improve several software packages for read mapping, variant discovering, and data quality assurance in terms of statistical rigor and software engineer aspects so they will be suitable for general usage as a toolkit. We expect that both the genetic variation information from exome and the toolkit will play a critical role in the future genetic medical research. We propose three specific Aims: Aim 1. QC metrics for gene-region data across different samples, populations and technological platforms allowing for full data integration. Here we will explore the various possible approaches to deal with duplicate reads and their effects. An informatics pipeline for applying these metrics to QC gene-region specific data will be implemented. Aim 2. Develop and optimize gene-region specific pipeline for genetic variations detection, and derive common quality metrics for variations regardless of the technological platforms. The focus of this particular data processing pipeline is to reliably discover nearly all genetic polymorphisms (up to 0.1% MAP) within the coding sequences. We will optimize our Atlas software for SNP and INDEL discoveries, using Pilot 3 data as an exercise for validation. We will also carry out genotyping and sequencing experiments for quality assessment on SNP/INDEL discoveries, and then evaluate and compare its performance with other different available approaches. Aim 3. Coordinate with DCC to implement gene-region specific data processing pipeline. We will closely collaborate with DCC to implement and streamline this particular data processing pipeline so it is readily applicable for processing the gene-region data from the full-scale project. We will facilitate the effort of integrating the genetic variations and individual genotypes obtained from different components of the 1000 Genomes Project. Public Health Relevance: The developed pipeline will process gene-region specific data as a major part of the 1000 Genomes Project, to catalog SNPs and INDELs within coding regions of the human genome. Once such a high quality data set becomes available, we expect that the list of novel rare non-synonymous SNPs will be immediately included and characterized in any disease association study.</description>
		<pubDate>Fri, 18 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Joint SNP and CNV calling in 1000 Genomes sequence data</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=U01HG05208&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=U01HG05208&amp;cr_yr</guid>
		<description>DESCRIPTION: (provided by applicant): The 1000 Genomes Project is developing data resources and analytical methods required for the next stage of human genetics research: (a) discovering millions of novel polymorphisms with frequencies 0.5%-10%, which can then be tested for association to disease via imputation and direct genotyping in patients, and (b) bringing together genome centers, technology companies and population geneticists in a collaborative framework to develop data formats, analytical methods and standards for sensitive, accurate genome-wide resequencing for rare variants. The Project's goals are aggressive as detection of rare variants requires unprecedented accuracy, multiplied by the inclusion of multiple rapidly-evolving next generation sequencing platforms. Key tasks for Data Processing include (a) defining the biases and error processes characteristic of each sequencing platform, (b) determining how to use properly calibrated data to discover and genotype variants (SNP and structural), including making use of population genetics and prior array data for each sample, and (c) making it easy for users to browse the resulting data, and integrate it in statistical genetic analysis of disease samples. As members of the 1000 Genomes Project Analysis Group, we propose three Aims. First, to develop, implement and apply methodology to convert raw intensity data from each platform into accurate four-base probabilities, refining and calibrating the underlying base-call probabilities, and increasing accuracy. Second, to develop and implement an integrated approach to SNP and CNV detection that utilizes these probabilities, combines information across multiple samples, and exploits existing information from genotyping arrays, increasing sensitivity and accuracy for both SNPs and structural variants. Third, to develop user-friendly software for browsing and applying 1000 Genomes Project data in disease research, making Project data on sequence variation and linkage disequilibrium accessible and easily usable to the wider genetics community. We have assembled an experienced and skilled team of statistical and population genetic analysts and software engineers, with a track record of contributions to the SNP Consortium, HapMap project, and disease association studies. If funded, we will develop improved methods for interpreting raw next generation sequencing data, and software tools that speed the application of data from the Project to the genetics community. PUBLIC HEALTH RELEVANCE: The data for the 1000 Genomes project will provide the underpinnings for the execution and interpretation of all complex human disease genetic research that follows. As this constitutes the most prodigious investment in human variation resource generation to date, it is Imperative that the data from this project is processed and analyzed as accurately as possible as the raw data is of such a scale that it cannot be maintained permanently. In addition the methods developed in this proposal will be directly applied beyond the 1000 Genomes project to medical sequencing efforts to unlock the genetics of complex disease.</description>
		<pubDate>Wed, 16 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Structural Genomic Variation Analysis for the1000 Genome</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=U01HG05209&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=U01HG05209&amp;cr_yr</guid>
		<description>DESCRIPTION: (provided by applicant): The 1000 Genomes Project is an initiative to sequence the complete genomes of over 1000 individuals and create a reference set of common and uncommon genetic variation among various ethnic populations. This project aims to more comprehensively identify all types of genetic variation, including Single nucleotide polymorphisms (SNPs) and Structural genome variants (SVs) which include regions that have been duplicated, deleted, inverted, or translocated through the course of human evolution. Some of these structural variants have been correlated with many different disease phenotypes and thus play a major role in human health. In the course of the pilot phase of this project, numerous diverse, yet complementary, analytical methods have been developed to detect these types of variation on multiple sequencing platforms. However, there remains a need to coalesce these approaches in an optimal fashion to apply to the large amounts of genomic sequence data that will be produced during the production phase. Our consortium include members of the structural genomic variation analysis group for the 1000 genome project and have been conducting analysis from the 1000 genome pilot project 2 over the past year. We will conduct a concerted effort to coordinate our resources to develop a unified process to analyze these data. We will research new ways of integrating and optimizing our existing methods of detection, and will cooperate with similar international and industrial efforts in order to provide a set of high quality structural variants to the biomedical research community. Specific Aim 1: Facilitate and coordinate computational analysis to provide structural variation data on data being generated by the 1000 genomes project. Specific Aim 2: Research and develop new methods for structural genomic variation data integration and processing.</description>
		<pubDate>Wed, 16 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Genetic Testing Immigration and Family Reunification: Im</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=F31HG05201&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=F31HG05201&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): This project will address the ethical, legal and social implications of the use of genetic testing as part of US immigration procedures for family reunification. Last year, approximately two-thirds of immigrants who came to the US as legal permanent residents were family sponsored under the family reunification provision. Under this provision a sponsor, who must be a US citizen or permanent resident, petitions to the US Citizenship and Immigration Services (USCIS) to bring his or her family members (spouse, children, parents or siblings) to the U. S. As part of the application process, the sponsor is required to show proof of alleged family relationships. This is typically done through documentation (e.g. birth certificates). But when documents are lacking or insufficient, or fraud is suspected, US immigration officials may suggest DNA testing (parentage or sibling testing) as a way to verify family relationships. In the past several years, DNA testing has become more frequent in immigration procedures, but the impact such testing may have on immigrants, their families or their communities, is not known. The objective of this study is to explore the positive and negative effects DNA testing may have on Mexican families, the largest immigrant group (14 percent) entering the US in the past year, particularly how test results might impact family relationships, social adaptability, and psychological well-being. The specific aims of the study are to use interviews with Mexican families to (1) Identify how Mexican immigrant families define and understand their familial ties to one another; and (2) Examine Mexican immigrant families' perceptions about the potential effects (positive and negative) of using genetic testing to prove alleged family relationships as part of the family reunification process in immigration. Endings from this research will be used to (3) Develop educational materials including (a) an informational brochure for immigrants planning to sponsor a family member under family re- unification provisions and (b) an ethical points-to-consider document to inform policy-makers, advocates and immigrant communities about findings of the study and their implications for the use of DNA testing in family re-unification Relevance to public health: Family reunification policies are based on the understanding that the reunion of immigrant family members with their close relatives supports their psychological well-being and in so doing promotes the health and welfare of the United States. This study will assess the implications of new genetic technology for its potential positive and negative effects on this important immigration policy. PUBLIC HEALTH RELEVANCE: My ultimate career goal is to become an academic researcher focusing on the ethical, legal and social implications of genetic technologies in society. My particular research goal in this proposal, supported by my experiences as an immigrant, is to explore the intersection of law and policy with genetic science in the development of immigration policy. The proposed research will consider the use of genetic testing immigration procedures for family re-unification, and will explore the potential impact of genetic testing in the context of immigrant perspectives on how families are defined and family ties constructed. This project will enable me to develop research and analytical skills in social science and law. Research and writing done through this training grant will be part of my doctoral dissertation, which will more broadly consider how the use of genetic testing influences the health and social structure of immigrant families and communities. In addition to developing specific research skills and producing a body of knowledge about Mexican immigrant perspectives on the use of genetic testing in family re-unification, this project will help to prepare me for a career as an independent investigator addressing ethical and policy questions relating to genetic science,' After completion of my PhD, 1 anticipate pursuing a post-doctoral fellowship in a setting where I can continue policy-relevant research related to genomics and the needs and concerns of immigrant and other minority communities, and subsequently seeking a faculty position where 1 can continue and further develop this work.</description>
		<pubDate>Wed, 16 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Data Analysis and Coordinating Center (DACC) for Researc</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=U01HG05155&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=U01HG05155&amp;cr_yr</guid>
		<description>DESCRIPTION: (provided by applicant): The National Human Genome Research Institute (NHGRI) has launched a series of research training activities in Genomics and Ethical, Legal, and Social Issues (ELSI), comprising the Research Training Consortium (RTC), with a focus on increasing the successful participation of individuals from Under Represented Minorities (URM). While the various grantees are involved in evaluating and tracking progress towards their goals for their own programs, this has not been done systematically, making it difficult to compare and pool data across programs. The major goal of RFA-HG-08-006 is to establish a Data Analysis and Coordinating Center (DACC) to facilitate and streamline the evaluation and tracking component for the entire MAP initiative. An interdisciplinary team of investigators with considerable strengths and experience in areas of direct relevance to the RFA from Washington University in St. Louis is submitting this application in response to the RFA. With a strong sense of commitment to accomplish the goals of the RFA, we propose 4 specific aims: First, we will develop complementary evaluation and tracking tools for the entire RTC after reviewing the evaluation and tracking instruments currently implemented within each of the training centers. Second, we will create a web-based Data Entry System (DES). a prototype of which has already been created for one of our Summer Institutes, for establishing a centralized database for deposition of the RTC data from grantees that will become the source of all data analysis. Third, we will periodically perform rigorous analyses of the data using standard methods and generate reports that would assess progress and outcomes, report the findings at annual meetings of the grantees and training coordinators and bi-annual meetings of the National Advisory Council of the NHGRI, and provide written reports periodically. Fourth, we will provide leadership as necessary in other areas as maybe customarily expected of coordinating centers and work closely with the RTC and the NHGRI in achieving the goals of the RFA. We believe that we have the strengths and experience as well as the people skills and familiarity with the RFA goals and yet we bring a sense of freshness, objectivity, and flexibility to the table by virtue of not being directly affiliated with any of the research training centers. PUBLIC HEALTH RELEVANCE: The National Human Genome Research Institute (NHGRI) has launched a series of research training activities in Genomics and Ethical, Legal, and Social Issues, with a focus on increasing the successful participation of individuals from Under Represented Minorities (URM). This Data Analysis and Coordinating Center will facilitate and support a systematic evaluation and tracking program to assess whether and to what extent the NHGRI is progressing toward its goal of increasing the number of individuals of URMs.</description>
		<pubDate>Tue, 15 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Data Processing and Visualization for 1000 Genomes</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=U01HG05210&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=U01HG05210&amp;cr_yr</guid>
		<description>DESCRIPTION: (provided by applicant): The International 1000 Genomes Project (1000 Genomes) aims to leverage the emergence of next generation sequencing technologies to catalogue common human genetic variability. The ambitious goals and timeline of 1000 Genomes will require highly coordinated collaboration by multiple research groups. While aspects of our development efforts will involve all three platforms, a major proportion of our proposal will focus on optimal integration of SOLID data into the 100 Genomes data production pipeline. Our goal is to insure that the unique capabilities of the platform are maximized during data processing, while adhering to common data and analytical standards established across the 1000 Genomes. In Aim 1, we will develop tools for monitoring data quality, focusing partly tools for detecting experimental biases and partly on developing better quality metrics. In Aim 2, we will develop tools for detection of genetic variation through the use of recursive use of alignment and variant discovery. In Aim 3, we will further develop client/server software capable of simultaneous viewing of sequence data across multiple sites for the purpose of quality control and variant inspection. The deliverable of our proposal is series of stand-alone software utilities that can be integrated into the software pipelines developed by the 1000 Genome DCC's and that fit within the collective analytical framework of 1000 genomes participants. This collaborative proposal includes teams from academia (UCLA), industry (Applied Biosystems) and non-profit research institutes (TGen). PUBLIC HEALTH RELEVANCE: The purpose of this proposal is to develop tools for monitoring and interpreting data from the 1000 Genomes Project.</description>
		<pubDate>Fri, 11 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Quality Control Genotype Calling and Study Design for 10</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=U01HG05214&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=U01HG05214&amp;cr_yr</guid>
		<description>DESCRIPTION: (provided by applicant): The 1000 Genomes Project aims to achieve a nearly complete catalog of common human genetic variants by generating high-quality sequence data surveying the genomes of &gt;1000 individuals. This catalog will include SNPs, copy number variants, and short insertion and deletion polymorphisms. By cataloging and describing the relationships between these variants, the Project will provide important benefits to genetic association studies of complex disease. Specifically, availability of very complete lists of candidate functional variants will: (a) accelerate fine-mapping efforts in gene regions indentified through genome-wide association studies or candidate gene studies; (b) improve the power of future genetic association studies by enabling design of next generation genotyping microarrays that more fully represent human genetic variation, and (c) enhance the analysis of ongoing and already completed association studies by improving our ability to impute or predict untyped genetic variants. This application supports the execution of several tasks essential to the completion of the 1000 Genomes Project. Specifically, we propose working with production centers to finalize the design of the project (for example, by deciding the depth of sequencing required for each individual that is examined or the read length and insert size for the associated sequencing libraries) and to evaluate the trade-offs from different choices of individuals to sequence; we also propose to monitor the data generated to provide regular summaries of data quality and to identify problems with sample tracking before data is released; finally, we will help generate genotype and haplotype calls and prepare submissions of project results to public databases. We believe that timely completion of these tasks, in collaboration with other groups participating in the analysis of project data is critical to ensure the genetics community obtains maximum benefit from the project. PUBLIC HEALTH RELEVANCE: Reconstructing the genome sequence of many individuals will allow the 1000 Genome Project to deliver catalogs of common genetic variants and the relationships between these variants in the population. These catalogs are an essential component of genetic association studies focused on complex diseases such as diabetes, asthma, cancer and aging associated disorders. In this application, we propose to help design a data collection strategy for the project, to monitor the quality of the primary sequence data, and to analyze the primary sequence data to deliver a processed dataset that is useful to the genetics community at large.</description>
		<pubDate>Fri, 11 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>GenomeSpace: A community web environment for integrative</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=P01HG05062&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=P01HG05062&amp;cr_yr</guid>
		<description>DESCRIPTION: (provided by applicant): The genomics research community is facing a time of tremendous opportunity and challenge as it deals with the increasing amounts of data produced by ever improving technology platforms. The vast amounts of data of diverse types (sequence, expression, epigenetic, RNAi screens, etc.) offer the prospect of integrating different views of biological states, and the promise of a deeper mechanistic understanding. However, such integration is out of reach for most biologists. The increasing need to use multiple sophisticated computational methods and tools to gain insights from the wealth of available data poses a major barrier. The number of computational tools and methods has also grown quickly over the past few years increasing the difficulty of finding the appropriate ones to use and getting them to work easily together. As a result, many biologists rely on the few tools they may already be accustomed to and thus are unable to take advantage of more advanced approaches that could transform biomedical research. In this Program Project we propose to tackle the challenge of integrative genomic analysis by leveraging the power of new methods for sharing data and applications that are appearing on the World Wide Web in other domains. We will develop GenomeSpace, a modular and extensible computational environment where a wide range of analytical methods and tools can interoperate and be made accessible to biologists. GenomeSpace will be a community-driven site where tool developers can easily share their methods and users can adopt and use them. Thus, while tools will continue their independent development efforts and retain their own look and feel, they will also be able to interoperate with a wealth of other methods, tools, and genomics resources. We will drive the GenomeSpace development in two ways. Six successful and popular software packages (Genomica, GenePattern, (Cytoscape, the Integrative Genomics Viewer, Galaxy, and the UCSC Genome Browser) will seed GenomeSpace with the capability for integrative genomic analysis, while guiding the development of the infrastructure to ensure the widest range of architectures and applications can participate. Three Driving Biological Projects - in cancer genomics, epigenomics, and hematopoiesis - will test the ability of GenomeSpace to perform important analyses in the context of real biological problems. Together, they will ensure we provide unprecedented accessibility for computational analysis for genomics data, thus transforming integrative genomics analysis and biomedical research. PUBLIC HEALTH RELEVANCE: The GenomeSpace community Web environment will put the universe of genomic analysis tools within the reach of all biomedical researchers. Through the integrative analysis of genomic data from diverse sources and types, GenomeSpace users will be able to address a variety of problems at the forefront of biomedical research including patient diagnosis and prognosis, identification of new drug targets, and understanding biological mechanisms.</description>
		<pubDate>Fri, 11 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Federal Regulation of Probiotics: An Analysis of Existin</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R01HG05171&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R01HG05171&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): Federal Regulation of Probiotics: An Analysis of the Existing Regulatory Framework and Recommendations for Alternative Frameworks. This proposal requests funding to support an evaluation of existing regulatory frameworks and their appropriateness for the regulation of new probiotic products that are available in the market or will be available in the near future. The project will include a literature review, review of existing relevant statutes and regulations, and three day-long meetings that will each bring together 15-20 invited participants from the regulatory, scientific, biotechnology, government and academic communities to discuss whether the existing regulatory framework of foods, dietary supplements, and drugs is sufficient to ensure the safety of probiotics and accuracy of health-related claims made by sellers of products with probiotic components. The focus of the proposal is the regulation of probiotic products that are or will be available to consumers for human use without a prescription and probiotic products that are or will be available to human patients in the clinical setting. Additionally, the focus of the study will be only those commercial and clinical products that promote themselves as having, or make health-related claims based on, probiotic properties. The first meeting will focus on the science of probiotics - including the current state of probiotic research, current and future clinical applications of probiotics, current and future commercial uses of probiotics, and a discussion of the benefits and risks of consuming or using probiotics. The second meeting will focus on the adequacy of the current regulatory framework for consumer products that contain probiotics components. The third meeting will focus on developing an appropriate framework for the regulation of probiotics and making regulatory policy recommendations. Following the final meeting, the Investigators will prepare a paper or series of papers, with the help of meeting participants, that will include the following topics: a) current regulation of probiotics; b) documented benefits and risks of probiotics use; c) the adequacy of the current regulatory frameworks for the regulation of probiotics; and d) if appropriate, suggested alternative regulatory models for the regulation of probiotics. PUBLIC HEALTH RELEVANCE: The field of probiotics is a new field that may or may not fit into the current regulatory framework in the United States that is in place to regulate food and drugs for human use. As with all new technologies, it is critical that an interdisciplinary discussion of possible product risks and appropriate regulation take place before uptake of the new technology makes it too burdensome or unrealistic to impose a new regulatory structure on the new technology. This project will explore the current state of probiotics research and identify current and potential uses of probiotics in consumer products and clinical applications. The Investigators will bring together experts in the area of health policy, human microbiome science, and food and drug regulation to discuss the adequacy of the current regulatory frameworks and any potential alternatives in order to provide policy makers with regulatory options designed to protect the public from misleading claims or any potential risks that might derive from the use of probiotic products.</description>
		<pubDate>Fri, 4 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Sequencing by nanopore mass spectrometry</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R21HG05100&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R21HG05100&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): The goal of this project is to test the feasibility of a new single-molecule DNA sequencing strategy that combines solid-state nanopores with mass spectrometry. The idea is to sequentially cleave each nucleotide or base from a DNA molecule as it transits a nanopore, then identify each one by determining its mass-to-charge ratio in a mass spectrometer. Identifying the bases of a translocating DNA molecule by mass spectrometry is appealing because: 1) It is an extremely sensitive technique that can easily distinguish the four DNA bases from their significantly different masses; 2) Modern ion detectors can detect the impact of single ions with a quantum efficiency approaching unity; 3) Those same ion detectors register ions as ~ 20 ns electrical pulses, offering a high detection bandwidth that may obviate any need to control the DNA translocation speed; 4) The sequence of DNA is revealed by the order in which ions of different mass impact the detector, and is not affected by variations of translocation speed; 5) Mass measurements are expected to be insensitive to the orientation of a base in the nanopore, which is difficult to control. The success of our strategy hinges on whether ionized bases or nucleotides can be controllably cleaved from the leading end of a DNA molecule as it translocates the nanopore, and transferred into a mass spectrometer that is housed in a vacuum chamber. This project consequently focuses on assembling a nanopore mass spectrometry instrument, and using it to understand and control ionization and molecular fragmentation processes at the liquid-vacuum interface. The specific aims are to: 1) Detect DNA mononucleotides in a quadrupole mass spectrometer coupled to a �m-scale, chip-based pore; 2) Demonstrate mass spectrometry of single DNA bases ejected from a nanopore; 3) Identify efficient DNA fragmentation and ionization mechanisms; 4) Sequence short DNA homopolymers. Obtaining high quality sequence information (e.g. Q20 bases or better) from DNA homopolymers will demonstrate the viability of the nanopore mass spectrometry technique. This would justify developing a second-generation system, capable of sequencing long, heterogeneous DNA molecules. The $1000 per genome objective would be well within reach. Public Health Relevance: This project aims to develop a single-molecule DNA sequencing technology that would permit a full human genome to be sequenced for under $1000 and in under one day. This would have a profound impact on the life sciences by enabling genetic studies of exceptionally wide scope, and it would enable revolutionary health care options through personal genomics.</description>
		<pubDate>Tue, 1 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Single Molecule DNA Sequencing by Fluorescent Nucleotide</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R01HG05109&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R01HG05109&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): The ability to sequence a human genome with high accuracy and speed, and at low cost, is critical to the emerging field of personalized medicine. In response to this demand, our research team developed the novel method of DNA sequencing-by-synthesis (SBS) on a solid surface, which has been recognized as a successful new paradigm for deciphering DNA sequences. In this grant application, we will use molecular engineering approaches to take our successful SBS strategy to the next level by adapting it for single molecule sequencing using fluorescent reversible terminators. Template DNA molecules will be attached to a glass surface modified by covalent attachment of PEG-primers under conditions where as many as 1 billion clearly separated single molecules are attached to the slide and their location registered by the presence of a cleavable fluorescent moiety. SBS will then be conducted using reversible blocked nucleotides with an appropriate set of cleavable fluorophores. We have also developed a walking strategy that permits re-use of the template multiple times to increase SBS readlength. We will modify a TIRF microscope to create a device with an enhanced microfluidic flow cell platform to permit large-scale detection of single molecules during each cycle of SBS. Finally, we have designed a number of DNA library construction methods that avoid amplification and a paired-end sequencing strategies compatible with the single molecule SBS approach. This will permit us to test the system with real genomic DNA, cDNA and other templates from ongoing biomedical research collaborations. With a billion DNA templates immobilized on a chip at single molecule resolution, even 30 to 50 base reads will cover the entire human genome at good coverage on a single chip. Public Health Relevance: The realization of the need for personalized medicine has encouraged the development of technologies able to sequence the human genome with high accuracy and speed at low cost. To approach this goal, we have combined the concepts of our successful sequencing by synthesis and sequence walking method with the ability to utilize single molecules. The latter avoids the necessity of cloning or otherwise amplifying DNA before sequencing, which is in fact one of the most expensive and time consuming parts of the process, and can lead to undesirable biases in the DNA sequences. With a billion DNA molecules immobilized on a chip at single molecule resolution, even read lengths of 30 or 50 bases will provide the ability to sequence the entire human genome at high accuracy on a single sequencing chip.</description>
		<pubDate>Tue, 1 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Real-time single-molecule nucleic acid sequencing with f</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R01HG05097&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R01HG05097&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): Capitalizing on our group's experience on single molecule enzymology, we propose a novel method for multiplex sequencing of individual nucleic acid molecules using a sequencing-by-synthesis approach that employs fluorogenic nucleotide substrates. Upon incorporation of a non-fluorescent, terminal phosphate-labeled nucleotide substrate by a polymerase, a fluorogenic polyphosphate molecule is released, and subject to fast enzymatic digestion, yielding a single fluorophore, the color of which is dependent on the identity of the incorporated nucleotide. To facilitate single molecule fluorescence detection, an individual nucleic acid molecule is confined in a sealed sub-femtoliter nanoreactor, in which the sequencing reaction takes place continuously. Using conventional soft lithography, we fabricate an array of nanoreactors that allow simultaneous, real-time monitoring of thousands of isolated sequencing reactions with a fluorescence microscope and CCD camera. Our new approach offers low reagent cost, long read lengths, easy sample preparation, and high throughput at several megabases per minute. We also propose the integration of a massively parallel single molecule fluorogenic sequencer with microfluidic devices that process and deliver genetic material from a single cell. PUBLIC HEALTH RELEVANCE: This project will develop new methods of sequencing DNA at the single molecule level, providing a new path towards human genome sequencing for less than $1000. This ability to economically sequence full genomes will usher in a new era of personalized medicine.</description>
		<pubDate>Tue, 1 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>99.99% Accuracy Direct DNA Sequencing via the Protein Na</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R01HG05095&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R01HG05095&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): Our aim is to develop a DNA sequencing device based upon the blockade of ionic current in the transmembrane protein pore a-hemolysin (aHL). Compared to other proposed nanopore-based approaches, the protein pore current blockade (PPCB) method is arguably the simplest, both conceptually and technologically, but recently has been passed over in favor of more complex methods. Recent electronic readout and bilayer lipid membrane advances by the proposers have greatly alleviated prior signal-to-noise ratio and robustness issues. New experimental data and an accurate Stochastic Model for DNA Motion (SMDM) within a-HL now indicate threshold feasibility for sequencing by the PPCB method. Inspection of calculated SMDM responses to known input DNA sequences shows that the principal issue for sequencing via the PPCB method is random variance in the order the bases pass through the pore, leading to three quantifiable sources of error. This program will make specific structural and measurement parameter modifications to the present apparatus to reduce each type of error and to produce a minimal overall sequencing error. The final apparatus will be evaluated by sequencing up to 30 Mbases of natural DNA. PUBLIC HEALTH RELEVANCE: This project offers a path to low cost DNA sequencing. Determining the DNA sequence is useful in basic research studying fundamental biological processes, as well as in applied fields such as diagnostic or forensic research. The advent of DNA sequencing has significantly accelerated biological research and discovery, but current methods are complicated and expensive, thus limiting their applicability. Simple, inexpensive DNA sequencing offers the capability to apply the benefits of the process to everyday medical and forensic applications.</description>
		<pubDate>Tue, 1 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Advancing High Throughput Flow Cytometry</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R01HG05066&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R01HG05066&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): Flow cytometry (FCM) is a versatile multicolor technology for multi-parameter analysis which is unparalleled in its facility and capacity for bioassay multiplexing, the performance of multiple assays in a single sample volume. By increasing multi-sample throughput 20-fold, recently developed HyperCyt technology has overcome historic barriers to application of this and other powerful features of FCM in the high throughput screening (HTS) drug and probe discovery enterprise. Specific aims here address the long term goal of a robust FCM platform capable of extended, fully automated operation in the HTS environment. In Aim 1 the HyperCyt platform will be modified to process four 384-well quadrants of a 1536-well plate in parallel, using four sampling probes to send samples to four separate flow cytometers by positive pressure sample delivery. This approach will take advantage of powerful, low-cost flow cytometers to quadruple throughput to about 8,400 samples/hour. Aim 2 will evaluate an alternative approach in which samples will be pulled rather than pushed through the cytometer observation chamber. Exploiting a cytometer that uses negative pressure for sample intake, an automated XYZ stage will introduce air-bubble separated bioassay samples directly into the cytometer from 1536-well plates without need of the HyperCyt peristaltic pump. This will eliminate cell exposure to compressive pump forces, shorten sample transit distance to reduce fluid carryover between samples, and allow sample fluid to be moved at higher velocity without loss of optical resolution to achieve an expected doubling of throughput for a single cytometer. Aim 3 will beta-test each platform in the University of New Mexico Center for Molecular Discovery (UNMCMD). Established multiplexed assays will be used to optimize and validate platform performance vs. mature 384 well FCM HTS technology. Samples containing 20 or more single-plex elements (i.e., e 20-plex) will be used to define the limits of and balance between multiplex composition and sample throughput. By the end of the project period, these novel platforms will be positioned for integration into the automated process stream of the UNMCMD, allowing HTS FCM to take its place in the armamentarium of robust HTS capabilities for the Molecular Libraries Probe Centers Network and for the international scientific community. PUBLIC HEALTH RELEVANCE: Flow cytometry has proven to be a pivotal technology in biological investigations of health and disease (AIDS, Genome Project, stem cells, autoimmune disease, and cancer, to name a few). Two proposed innovative platforms promise complementary solutions by which to significantly increase the speed and efficiency of flow cytometry in small molecule high throughput screening, advancing discovery of novel drugs and biological probes.</description>
		<pubDate>Tue, 1 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>DNA Sequencing Using Intricsic Base Fluorescence</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R21HG05090&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R21HG05090&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): There are extensive ongoing efforts to develop high-throughput low-cost DNA sequencing with the eventual goal of $1000 for an individual genome. Some approaches use physical properties to identify the bases, but most methods use fluorescent probes as extrinsic labels. Such extrinsic probes are needed because of the low quantum yield of the DNA bases. We propose to develop metallic nanostructures which will increase the brightness of intrinsic nucleotide emission, decrease the background, and efficiently direct the emission toward a detector. Additionally, these structures will provide spectral separation for base calling. These effects are possible due to through-space near-field interactions of the bases with electron clouds in the metal, which are called plasmons. To accomplish single-molecule intrinsic emission base calling we propose: Specific Aim 1. Use theoretical modeling, primarily the finite-difference time-domain (FDTD) method, to design geometries which enhance base fluorescence, provide directional emission, and which are practical for high throughput sequencing. Specific Aim 2. Measure the photophysical properties of DNA nucleotides near metal particles which increase the quantum yield. Specific Aim 3. Determine the detectability and maximum count rates for nucleotides in or near nanoholes in metal films. Specific Aim 4. Fabricate and test metallic structures which provide directional emission and spectral separation for base calling. We will determine the detection efficiency and accuracy of the base calling. PUBLIC HEALTH RELEVANCE: The NIH has set a goal of developing DNA sequencing at low cost. The goal is to sequence an individual's genome for $1000. This ambitious goal requires revolutionary approaches to sequencing which combine high throughput with high accuracy. A wide variety of chemical, physical and spectroscopic approaches are under investigation. The majority of these approaches use spectroscopic detection and identification of the bases or terminated oligomers, which typically requires labeling with extrinsic fluorophores. The need for extrinsic labeling increases the cost and complexity of sequencing, and has prevented the use of some promising methods. One of these methods is the use of an exonuclease to sequentially remove DNA bases from a single strand of DNA. This approach requires complete labeling of all the DNA bases in the strand to be sequenced, which has prevented the widespread use of exonuclease sequencing. Nonetheless, the potential for massive parallel throughput has maintained a high interest in this approach. The goal of this proposal is to develop a method to detect and identify single nucleotides released by exonuclease using the intrinsic fluorescence from the DNA bases. Intrinsic base emission is extremely weak. However, we have developed the use of metallic nanostructures to increase the quantum yields and photostability of visible fluorophores and recently we showed that our approach can work at the UV wavelength characteristic of DNA emission. We have also shown that structured metallic surfaces can be used to focus emission towards a detector, and can be used to suppress background emission. Our preliminary results suggest that it will be possible to increase the intrinsic base emission from DNA and to identify the bases with high accuracy. During this project we will use experimentation and simulations to design metallic structures for label-free calling of single nucleotides.</description>
		<pubDate>Tue, 1 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Physical and Functional Interactions of Cis Regulatory M</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R21HG05149&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R21HG05149&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): Sequence-specific transcription factors (TFs) regulate gene expression through their interactions with DNA sequences in the genome. These interactions control critical steps in development and responses to environmental stimuli, and their dysfunction can contribute to the progression of various diseases. In this project, as our model system we will examine primary human skeletal muscle myoblasts differentiating into myotubes. In metazoans, regulatory motifs tend to co-occur within stretches of noncoding sequence referred to as cis regulatory modules (CRMs) that regulate expression of the nearby gene(s). In this project, we will assess the physical and functional interactions of CRMs according to their interactions with their immediately adjacent target genes and their effects on reporter gene expression. In particular, we will examine the distant and potentially combinatorial regulatory nature of CRMs, through their physical interactions with the promoters of more distantly located target genes and with each other ('CRM- CRM interactions') and through potentially synergistic effects on regulation of gene expression. The results of this project may reveal trends in how far away are the genes that are regulated by but not adjacent to the regulating CRM. This could have major implications for the prediction and experimental study of the regulatory roles of CRMs, as it is currently unknown how often this phenomenon may occur. The results of this project may also reveal trends in how frequently multiple CRMs work together to regulate their target genes, and what their combined effects are. This could have major implications for the prediction and experimental study of the regulatory roles of CRMs, as currently investigators focus on finding the [single] CRM that confers a given expression pattern and assign the result of a reporter assay to the CRM tested on its own, and rarely do not examine it in the context of another candidate CRM with which it may synergize. Finally, the results of this project may reveal whether 'split' CRMs occur (i.e., CRMs composed of physically separated cis regulatory regions that must come together in order to affect gene expression), and may reveal trends in how the different 'parts' of the 'split' CRMs are organized in the genome (i.e., how far apart the 'parts' are, are the parts on the same or different chromosomes, etc.). This could have major implications for the prediction and experimental study of the regulatory roles of CRMs, including our understanding of the evolution of genomic regulatory elements, and would require new computational CRM prediction algorithms to be developed in the future since current algorithms consider CRMs to be contiguous stretches of sequence that function as independent regulatory units. PUBLIC HEALTH RELEVANCE: This project is focused on better understanding the genomic organization of DNA regulatory elements that regulate gene expression. In this project, we will examine differentiating skeletal muscle myoblasts, a biomedically important cell type. The findings from this project will provide a better understanding of gene regulatory mechanisms in these cell types.</description>
		<pubDate>Tue, 1 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Defining the genomic architecture of expression quantita</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=F32HG05176&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=F32HG05176&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): What is the genomic architecture of a quantitative trait? Despite substantial effort to answer this question in humans and model organisms, we remain far from understanding how many genes, gene-gene interactions, and gene-environment interactions underlie most polygenic traits, such as human disease. Global gene expression studies, which treat each transcript in the genome as a quantitative trait, have provided crucial insights into the genetic basis of trait differences between individuals. In particular, a cross of the BY lab strain and the RM wine strain of the budding yeast Saccharomyces cerevisiae that was done by the lab of Leonid Kruglyak, where I am presently a postdoctoral fellow, has illuminated the genetic complexity underlying expression differences between two individuals. However, even within this cross, the genetic variance for the majority of the transcripts in the genome remains incompletely mapped, with the summed effects of detected linkages often explaining only a small fraction of a transcript's expression variance. I am developing a new method that, for many polygenic architectures, will facilitate the mapping of all linkages in the genome that underlie a transcript difference between two yeast strains in a single environment, potentially with a gene-level mapping resolution. This approach exploits aspects of the recently developed Synthetic Genetic Array (SGA) technology to create extremely large pools (~10'^5 to 10'^7) of recombinant MATa haploids from a single cross. Bulk segregant analysis (BSA) on these large populations, which can be done by using parents that harbor translational fusion fluorescent reporters and cell sorting/recapture on the segregant pool, will facilitate the mapping of the genomic architecture of target transcripts. Once working, this approach can be extended to multiple environments, other selectable traits (e.g. drug resistance), and new backgrounds. AIM 1: To develop a robust metholodogy for mapping the genomic architecture of expression quantitative traits in large pools of segregants. AIM2: To apply this method to 25 transcripts that have previously been shown to exhibit heritable variation across segregants of a BY X RM cross in a glucose-limited environment. AIM3: To validate the genomic architecture of one transcript by doing all necessary allele replacements in both the BY and RM backgrounds. Public Health Relevance: Many diseases are influenced by multiple genes, with the number of carried risk alleles varying from person- to-person. Understanding how many genes contribute to risk for a particular disease remains a major challenge for medical genetics. The experiments I propose on yeast gene expression can provide critical information about how many genes underlie trait variation, such as disease risk.</description>
		<pubDate>Tue, 1 Sep 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Phylogenetic Binning of Metagenomic Sequence Data</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R21HG05107&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R21HG05107&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): Culture-independent metagenomic studies are essential for understanding our relationship with the organisms comprising the human microbiome, defining optimal microbial composition to maintain health, and devising selective treatment strategies to eliminate pathogens without harming beneficial species. To use metagenomic data effectively, raw DNA sequence data (reads) must be processed computationally (assembled) to obtain longer sequences (contigs). Existing software packages for this purpose are quite inefficient when presented with large, taxonomically diverse samples, resulting in considerable wastage of reads that cannot be assembled. Efforts to maximize assembly efficiency by relaxing stringency can lead to inappropriate joining of sequences from unrelated organisms (chimeric artifacts), compromising data accuracy and usefulness. Taxonomic binning of raw reads as a pre-filtering step is expected to improve metagenomic sequence assembly efficiency, reducing statistical noise due to sample complexity and allowing incorporation of raw reads into longer, more informative contigs without incurring chimeric artifacts. Benefits should be especially significant for less abundant species in complex mixtures. We have developed methods to quantify taxonomic binning program performance and assembly improvements in real metagenomic data sets, including reproducible calibration standards, to enable efficient parameter optimization for existing software and provide reliable benchmarks for future software development. Our specific aims are to 1) develop new computational methods for large-scale taxonomic classification of metagenomic sequence data, applicable to raw reads as well as assembled contigs; 2) develop software and protocols to use taxonomic data binning as a pre-treatment to increase efficiency of existing sequence assembly software; 3) benchmark performance enhancement for different assembly software programs using quantitative, statistical tests with both artificially created models and real-life metagenomic data sets of varying size and complexity; 4) make new computational methods and performance evaluation tools available to the general scientific community.</description>
		<pubDate>Mon, 24 Aug 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Multi-Dimensional Separation of Bacteria</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R21HG04854&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R21HG04854&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): Efforts to understand the complex relationship between microbes and their hosts are complicated by the large number of nonculturable organisms, and the heterogeneity even within each species. While modern metagenomics approaches admirably sample the identity of microbes, bulk studies limit the other inferences that can be derived from each bacteria. In order to overcome this problem, yet retain clues to the diversity of the original population, we propose to: Model, design, build, and test a multidimensional, microfluidic sorter based on both structural (size-and shape and Electrophoresis) and functional (Adhesion, Chemotaxis) parameters to separate a complex bacterial mixture into bins containing bacteria that share common properties. The identities of the sorted bacteria will be obtained through metagenomic studies. The device will enable determination of the heterogeneity both between and within species in a complex mixture. This microfluidic separation device will utilize 1. Asymmetric pinched flow fractionation to separate bacteria based on size and shape. 2. Electrophoretic based flow fractionation to separate bacteria based on surface charge. 3. Functionalized magnetic beads to separate bacteria based on adhesion to extracellular matrix (ECM) components. 4. Chemotaxis to separate bacteria based on their motile response to chemical stimuli, and lastly, 5. Multi separation modalities to separate bacteria based on size, shape, adhesion, response to chemical stimuli, and surface charge. We will use microfluidic approaches since (i) the feature sizes of microfluidic systems are compatible with the size of the bacteria; (ii) complicated flow paths can be machined with ease and at low cost; (iii) many sorting modules utilizing diverse principles can be integrated into a single device. Within each specific aim we rely heavily on direct numerical simulation of particle movement using code that reflects 2-dimensional geometry. As part of the experimental plan, we will expand functionality of the our custom Particle Mover program to a full 3-D simulation. Once design parameters have been established, devices will be fabricated and tested rigorously using particles, mixtures of known bacteria, and for the 3 and 4-stage devices, complex mixtures from human subjects. We will make use of a modular architecture that facilitates interchangeability of modules. The long term goal is to add other separation modalities into the device and to integrate into the device modules for single cell isolation, DNA isolation and amplification on-chip, to permit high-throughput analysis of complex mixtures. These studies will lead to devices that not only capture the diversity of complex mixtures, but also permit direct assignment of the heterogeneity of structural and functional properties, genes and gene products within each single species in the mixture, and aid understanding of human disease. PUBLIC HEALTH RELEVANCE: This proposal represents a new collaborative effort between three established scientists with unique and complementary interests. By focusing our expertise in fluid dynamics (Hu), microfluidic design (Bau), and clinical and experimental bacterial infection (Worthen) we propose to: Model, design, build, and test a multidimensional, microfluidic sorter based on both structural (size-and shape and Electrophoresis) and functional (Adhesion, Chemotaxis) parameters to separate a complex bacterial mixture into bins containing bacteria that share common properties. The identities of the sorted bacteria will be obtained through metagenomic studies. The device will enable determination of the heterogeneity both between and within species in a complex mixture, such as in clinical infectious illnesses (we are particularly interested in Bronchiectasis and necrpotizing enterocolitis) whose pathogenesis is obscure. We also are interested in contributing to an understanding of how tools such as fluid dynamics (for which a new program of numerical simulation will be presented to the scientific community) and microfluidics intersect with medicine.</description>
		<pubDate>Mon, 17 Aug 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Mathematical modeling of the voltage driven translocatio</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R01HG04842&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R01HG04842&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): A number of applications of nanometer scale pores have emerged recently in biotechnology inspired partly by their function in the biological context. This proposal relates to an exploratory method for sequencing DNA and other linear polyelectrolytes of biological significance by driving them through a single nanopore using an electric field. Analogous field mediated transfer across nanopores is believed to be part of the normal molecular level machinery of eukaryotic cells. A fair amount of experimental data has accumulated since the landmark 1996 paper by Kasianowicz, Brandin, Branton and Deamer [Proc. Natl. Acad. Sci. 93, 13770 - 13773 (1996)]. demonstrating the possibility of detecting the passage of individual molecules through a nanopore by observing the current signal. Recent atomic level molecular dynamic simulations have enhanced our understanding of the detailed mechanism behind these translocations. Simple analytical models that result in explicit formulas for such things as translocation speeds and scaling exponents are however rare. This proposal seeks to address this gap in our knowledge. It is proposed that the translocation process may be understood as a problem of electrophoresis of charged objects through confined spaces in an ionic fluid. Using such a description, coupled with a drift diffusion model to describe the Brownian fluctuations, quantitative predictions are sought that can be directly compared to experimental data. Preliminary results indicating the success of such an approach are presented. The expected broad impact of the proposed activity are: (A) The theoretical models being developed would provide much needed guidance in the experimental efforts to develop a viable ultra rapid DNA sequencing technology based on the nanopore idea. The ability to sequence DNA at a fraction of the cost of current methods has vast and broad implications for human health that are well known. (B) The theoretical models would lead to valuable insight on a vital and essential part of the molecular level functioning of a living cell. Such understanding has a variety of known and as yet unknown practical implications: it could, for example, lead to improved methods of injecting DNA into the cell nucleus in gene therapy, it could result in the development of new drugs based on the principle of disrupting the protein translocation step in the cell cycle of pathogens and conversely cure diseases caused by the failure of the translocation process by designing chemicals to rectify the defect. PUBLIC HEALTH RELEVANCE: The proposed activity is aimed at furthering fundamental understanding of the biophysical process of voltage mediated transfer of linear polyelectrolytes across nanopores. This process is the basis for a proposed ultra-fast DNA sequencing method that has been the subject of intensive research for about a decade and is also a basic step in the normal functioning of a eukaryotic cell. Besides the ultra-fast DNA sequencing technology, the enormous potential impact of which has been widely discussed, improved understanding of the translocation process could lead to improvements in techniques of gene therapy as well as the designing of drugs meant to promote or disrupt (in case of pathogens) the translocation process in cells.</description>
		<pubDate>Mon, 17 Aug 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Species-by-Species Dissection of Microbiomes using Phage</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R01HG04852&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R01HG04852&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): Metagenomics is a new scientific discipline that has developed in the last several years. It is both a set of research techniques, comprising many related approaches and methods, and a research field. As a scientific field, metagenomics attempts to resolve four tiers of questions: 1) what micro-organisms are present in a particular complex microbiome, such as human gut? 2) in what proportions? 3) what they are doing? and 4) how will they react to environmental changes, such as a change in diet? Currently, the approach to answer these questions has been one of brute force shotgun sequencing and 16S sequence surveys. However these technologies can only hint as to which kinds of bacteria are present, and the information provided tends to be biased to the commonest species. The majority of bacteria in complex microbiomes cannot presently be cultured and sequenced in a conventional way. Whole genome amplification (WGA) from single cells has been used in several studies. However, in the best cases, only 60% of a genome can be covered with the DNA obtained with WGA from a single cell. Studies have showed that the bias from WGA is random and coverage can be improved by adding more copies of the same genome. We propose here to change the metagenomics paradigm: rather than extracting all DNA in bulk without any independent information on the species that comprise it, we propose to develop tools to be able to analyze species one by one. This will be carried out by using phage display to select antibodies that recognize species in the population, and then to use such selected antibodies to characterize the abundance of the species by flow cytometry, purify it, and if necessary deplete the population of the species in order to repeat the process. The purified bacteria will be used as starting material for whole genome amplification, species characterization by rRNA analysis, and sequencing, if necessary. The antibodies developed within this proposal will be used to carry out the analyses indicated. Those developed within the context of the analysis of the human gut microbiome will also be very useful within the context of clinical studies in which bacterial composition may play an etiological role. An artificial bacterial mixture of E. coli and several other bacterial species will be used at the first stage of method development, and the microbiota in human gut will be analyzed in the later portion of the project. PUBLIC HEALTH RELEVANCE: Humans exist in collaboration with the bacteria that live within them, most of which are benign. However, it has recently been shown that the composition of the microbiome can have profound effects on the health of individuals. By developing new tools to analyze human microbiomes, we will provide additional methods to study and characterize different bacteria and elucidate their role in human disease.</description>
		<pubDate>Mon, 17 Aug 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Targeted genomic characterization of uncultured bacteria</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R01HG04857&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R01HG04857&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): A major goal of the Human Microbiome Project is to identify all of the organisms that are associated with the human body (the human microbiota) and determine the genomic sequence of most if not all of them. The detected diversity of the human microbiota reaches thousands of species and strains, the vast majority of which have not been isolated in pure culture. Our goal is to develop a robust and rapid approach for the targeted genomic characterization of any uncultured constituent of the human microbiota at single cell level and also to allow population genetic studies of selected groups of organisms that may have some cultured isolates. Our strategy utilizes the high phylogenetic resolution that the small subunit ribosomal RNA (SSU rRNA) provides in distinguishing microbial phylotypes. We plan to label and isolate single cells representing uncultured microbial lineages as well as populations of cells of specific phylotypes from complex microbiota samples and amplify their DNA to levels that enable genomic sequencing. This approach will bridge the gap between sequencing the limited number of individual cultured organisms and whole community shotgun sequencing (metagenomics) which generally does not provide sufficient depth and resolution to comprehensively sequence the microbiome. Initial feasibility studies indicate that our approach can be applied to any microbial consortia and is not dependent on the abundance of the target organism. Based on this, the focus of this proposal is to determine optimum experimental design and improved technical procedures for targeted single cell and population genomics of microbes from the human microbiota. The specific aims are to: 1. Aim 1. Separate single cells and populations of target uncultured microbial phylotypes from gut microbiota samples. We will use fluorescence in situ hybridization (FISH) combined with flow cytometry to obtain single cells and populations of targeted phylotypes, uncultured or with few cultured representatives 2. Aim 2. Amplification and sequencing of genomes from single cells representing the uncultured gut microbiota. We will amplify the genomes of target cells using multiple displacement amplification and sequence the DNA to obtain draft genomic assemblies. The experimental and computational approaches will be optimized for the human microbiota characteristics. 3. Aim 3. Pangenomic characterization of targeted populations of uncultured and cultured microbial phylotypes. We will isolate populations of specific bacterial phylotypes representing uncultured organisms as well as cell populations representing species/genera that have representatives in culture and one or few genomes sequenced. We will amplify and sequence the cell population genomic DNA to obtain composite genomes/pangenomes. PUBLIC HEALTH RELEVANCE: Targeted genomics of uncultured microbiota will enable selective access to the genetic blueprint of any of the organisms that inhabit the human body. This approach, which relies upon specific isolation of single cells or specific cell populations from complex human microbial consortia and sequencing their genomes or a part of, complements whole genome sequencing from individual cultivated organisms and global community metagenomics.</description>
		<pubDate>Mon, 17 Aug 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Software for genomic and methylation-specific MLPA probe</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R03HG05121&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R03HG05121&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): MLPA (multiplex ligation-dependent probe amplification) and MS-MLPA (methylation- specific MLPA) have rapidly evolved as promising methods for copy number variation and DNA methylation studies. The goal of this proposal is to continue the development of software for genomic and methylation-specific MLPA probe design. We have developed H-MAPD, a web based probe design tool that automates the generation and selection of probes for human genomic MLPA. We propose to expand the software in two ways: 1) To extend H-MAPD to support mouse and rat genomic MLPA probe design. 2) To develop a separate tool to handle MS-MLPA probe design. One major obstacle for MLPA and MS-MLPA is the difficulty in probe design. By automating the tedious probe design process, our software will become a valuable tool in the implementation of MLPA and MS-MLPA. PUBLIC HEALTH RELEVANCE: An increasing number of human diseases are now known to be associated with copy number variation and/or aberrant DNA methylation. The proposed software provides a computational tool that will facilitate research in these two areas.</description>
		<pubDate>Mon, 17 Aug 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Functional Sorting of Microbial Cells From Complex Micro</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R21HG04895&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R21HG04895&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): Microbial communities play a significant role in maintaining human health. However, understanding the complex relationship between a human host and its resident microbial flora presents a considerable challenge. For example, the total number of microbial cells residing in an individual is estimated to far outnumber the individual's somatic cells. Unfortunately, the identity, distribution, and functional significance of the majority of these microorganisms are unknown. The situation is further complicated by the inability to culture many of these organisms in the laboratory. Here, we propose the development of a microfabricated device that enables the parallel culturing and characterization of individual members of a microbial community. Single cells will first be encapsulated into alginate gel microdroplets to allow for small-scale growth of thousands of isolated cells in parallel. Segregation of a cell into a gel bead will also facilitate subsequent sorting and selection based on the metabolic profile of the cell, which will be assessed using fluorogenic enzyme substrates. By employing a panel of different substrates, a large number of different species can be distinguished based on their metabolic properties. This approach will allow for quantification of the relative distribution and functional capabilities of the different members of the consortia and for subsequent genetic analyses. The scale, throughput capabilities, and sensitivity of the proposed technology address the key challenges facing the analysis of microbial consortia. Demonstration of this front-end sample preparation technique will greatly facilitate subsequent genome sequencing and interpretation of the complex relationship between a human host and its resident microbial flora. PUBLIC HEALTH RELEVANCE: Microbial communities play a significant role in maintaining human health by bestowing metabolic capabilities and disease resistance. However, the components, composition, and dynamics of these communities cannot be assessed effectively using current techniques. The proposed research will result in new technology that will facilitate the identification, sorting, and characterization of these microbial communities.</description>
		<pubDate>Mon, 17 Aug 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Wisconsin Center of Excellence in Genomics Science</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=P50HG04952&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=P50HG04952&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): The successful completion of the human genome and model organism sequences has ushered in a new era in biological research, with attention now focused on understanding the way in which genome sequence information is expressed and controlled. The focus of this proposed Wisconsin Center of Excellence in Genomics Science is to facilitate understanding of the complex and integrated regulatory mechanisms affecting gene transcription by developing novel technology for the comprehensive characterization and quantitative analysis of proteins interacting with DMA. This new technology will help provide for a genomewide functional interpretation of the underlying mechanisms by which gene transcriptional regulation is altered during biological processes, development, disease, and in response to physiological, pharmacological, or environmental stressors. The development of chromatin immunoprecipitation approaches has allowed identification of the specific DMA sequences bound by proteins of interest. We propose to reverse this strategy and develop an entirely novel technology that will use oligonucleotide capture to pull down DNA sequences ot interest, and mass spectrometry to identify and characterize the proteins and protein complexes bound and associated with particular DNA regions. This new approach will create an invaluable tool for deciphering the critical control processes regulating an essential biological function. The proposed interdisciplinary and multi-institutional Center of Excellence in Genomics Science combines specific expertise at the Medical College of Wisconsin, the University of Wisconsin Madison, and Marquette University. Technological developments in four specific areas will be pursued to develop this new approach: (1) cross-linking of proteins to DNA and fragmentation of chromatin; (2) capture of the protein-DNA complexes in a DNA sequence-specific manner; (3) mass spectrometry analysis to identify and quantify bound proteins, determine posttranslational modifications, and characterize protein complex stoichipmetry; and (4) informatics to develop tools enabling the global analysis of the relationship between changes in protein-DNA interactions and gene expression. The Center will use carefully selected biological systems of increasing complexity from three species (yeast, mouse, human) to develop and test the technology in an integrated genome-wide analysis platform that includes efficient data management and analysis tools. As part of the Center mission, we will combine our technology development efforts with an interdisciplinary training program for students and fellows designed to train qualified scientists experienced in cutting-edge genomics technology. Data, technology, and software will be widely disseminated by multiple mechanisms including licensing and commercialization activities. PUBLIC HEALTH RELEVANCE The development of this novel technology to comprehensively identify and characterize genpmic protein- DNA interactions will help in the interpretation of a core function of the genome, gene transcription. The tools and technologies developed as part of the CEGS will be broadly available to the scientific community, and will help decipher crucial molecular mechanisms regulating the transcriptional levels of all genes across the genome, and how these regulatory mechanisms are altered in disease. This knowledge will revolutionize our understanding of genome biology, and impact how genomic information will be used in clinical medicine.</description>
		<pubDate>Wed, 12 Aug 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Improving genotype accuracy and haplotypic analysis for</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R01HG04960&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R01HG04960&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): Genome-wide association studies (GWAS) are an effective tool for indentifying common genetic variants that contribute to disease and heritable traits. These studies use high-density oligoneculeotide arrays to assay hundreds of thousands of diallelic genetic markers in each individual. However, genome-wide association studies can also produce hundred of spurious disease-gene associations caused by genotyping error. This research will develop statistical and computational methods that use inter-marker correlation to substantially improve genotype accuracy. All existing methods for calling genotypes for large-scale data ignore the correlation between genetic markers. This correlation is highly informative, but exploiting inter-marker correlation is computationally difficult because it requires inference of the marker alleles inherited from a single parent (the haplotype phase). Recently, we have developed a novel method of haplotype phase inference for large-scale data sets of unrelated individuals that is orders of magnitude faster and more accurate than competing methods. The next step will be to improve haplotype phase inference and genotype calling by performing both tasks simultaneously. This will enable genotype uncertainty to be taken into account when inferring haplotype phase and inter-marker correlation to be taken into account when calling genotypes. Our methods will improve genotype accuracy, improve haplotype phase inference accuracy, decrease false positive associations due to genotyping error, and increase power to detect true genetic associations. We will extend these novel methods to call genotypes and phase haplotypes for parent-offspring trios where the additional relatedness information will lead to even larger gains in accuracy. The improved genotype accuracy and phased haplotypes from our methods will contribute to improved understanding of the genetic contribution to human disease. Our research will also address one of the main impediments to haplotypic analysis: the difficulty in interpreting analysis results. We will develop interactive methods for visualizing haplotype structure and haplotype-trait associations. These new data exploration methods will greatly simplify the task of identifying sequences of genetic variants that are associated with a trait. PUBLIC HEALTH RELEVANCE: Heritable genetic variants contribute to many common diseases, such as cardiovascular disease and diabetes. This research will develop new methods and tools that improve the accuracy of genetic data and that improve our ability to identify genetic variants that increase risk of disease. These methods and tools will contribute to the prevention, diagnosis, and treatment of heritable diseases in the United States and throughout the world.</description>
		<pubDate>Fri, 7 Aug 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Defining the Human Microbiome</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=U54HG04969&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=U54HG04969&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): The Human Microbiome consists of all the organisms that comprise the many distinct microbial communities found on and in the human body. Early individual studies and the concerted initial steps of the Human Microbiome Project (HMP) have begun to define the extent of the diversity within these communities and the importance of their structure and temporal variation on human health. To establish the necessary foundation to measure the variation of communities and the functional consequences, we must: i) establish baseline measurements of these communities, including cataloging both the presence and prevalence of specific organisms and their gene content; ii) determine the physiological capabilities of each organism in the community by elaborating their gene content, to be able to interpret the functional significance of changes in community structure; iii) continue to develop new methods to characterize microbial communities and the organisms therein; and, iv) promote research in human metagenomics by engaging the research community and creating publicly available data and resources. We will take advantage of the explosive advances in new sequencing technologies to rapidly create a catalog of over 400 reference microbial genome sequences, including those of bacteria, eukaryotes, and viruses, as well as generate extensive community profiles from a number of human body sites with a focus on the oral and vaginal microbiome. We will also aggressively develop and apply new methods and technologies to further reduce the cost and expand the reach of the HMP. These data will be rapidly released to the public and, along with those of our collaborating partners in the HMP network, will define the Human Microbiome communities and provide baseline measurements for a wealth of subsequent experimental research. Public Health Relevance: The ultimate goal of the HMP is to understand how microbial community structure and function determine normal human health, development, and disease, and can contribute to new diagnostic or therapeutic tools. To do this we must first generate large, public data sets that define the organisms commonly found in healthy human subjects, and measure the extent to which they vary and the reasons for this.</description>
		<pubDate>Thu, 30 Jul 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Methods for Efficient Detection of Mutations in Zebrafis</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R01HG04819&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R01HG04819&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): There is a need in the zebrafish research community for efficient identification of mutations in known genes. Recently several groups have successfully identified mutations in genes by target selection, generation of libraries of mutation carrying zebrafish and the re-sequencing of PCR amplified exons. While this approach is effective, it is prohibitively expensive to undertake at a large scale. This proposal is designed to develop and apply a new sequencing technology, which promises higher efficiency and reduced cost in the detection of useful mutations and to target recovery of mutations in all known zebrafish transcription factors. Over the next five years, new generation high-throughput sequencing such as Solexa/Illumina machines will be used to identify mutations in 25,000 distinct 70bp target sites across the a library of 8400 individuals. We expect to recover at least 1500 new nonsense or disruptive splice-site alleles. The results of this pilot project could be applied to identify mutations in every protein-coding gene in zebrafish. We expect the cost to be between one- fifth to one-tenth the cost of conventional capillary sequencing. Every allele recovered under this project will be made freely available to the research community. PUBLIC HEALTH RELEVANCE: There are many human diseases that are caused by loss of normal gene functions. Increasingly, zebrafish are used to model human genetic diseases because zebrafish are easy and inexpensive to manipulate and maintain. Moreover, the availability of the genome sequence means that the zebrafish counterpart of the human gene can be readily identified. This project is designed to identify disruptive mutations in a large number of zebrafish genes encoding proteins that normally regulate gene expression. These mutations will provide models of human genetic disease that may be used to identify new drugs and therapies.</description>
		<pubDate>Tue, 28 Jul 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Statistical methods for assessing copy number variation</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=K99HG05015&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=K99HG05015&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): Two forms of genetic variation are common and can be measured on a genomic scale using recent high throughput genotyping platforms: single nucleotide polymorphisms (SNPs) and copy number variants (CNVs). Unlike high throughput genotyping algorithms that are highly accurate, copy number estimates are very imprecise and tools for estimating copy number and inferring regions of CNV are still under development. My immediate scientific goals are to provide first generation algorithms for each of the following tiers of estimation problems: (i) By locus: estimate the raw copy number at each locus on the array and quantify the uncertainty, (ii) By sample: infer regions of CNV, and (iii) Between samples: assess the contribution of CNV to disease susceptibility. My long term goal is to establish an interdisciplinary research lab in biostatistics and human genetics that supports creative computational and statistical solutions to high throughput genomic data. This Award will facilitate the necessary training and skills to transition to independent research through formal coursework in statistical genetics and computational biology, leadership opportunities in structured career development activities, such as the GWAs@JohnsHopkins working group, new collaborations from multiple research institutes, and presentations at national conferences, including epidemiological (American Heart Association), methodological (Joint Statistical Meetings), and topical (e.g., a copy number variant workshop). A scientific advisory panel of internationally recognized experts will oversee my research. New technologies and applications for genomic research developed during the course of this Award will lead to exciting new opportunities for biostatistical research, as well as R01 funding opportunities that I will actively pursue. PUBLIC HEALTH REVELANCE - Genetic variation between individuals is common and has been linked to common diseases such as diabetes and cancer. I propose to develop statistical methods for new genome-scale technologies to identify genetic variants and to characterize their contribution to disease susceptibility.</description>
		<pubDate>Fri, 24 Jul 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Computational and Experimental Analysis of RNA structure</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R21HG05129&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R21HG05129&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): mRNA polyadenylation is a key processing event for almost all mRNAs in eukaryotic cells. It involves cleavage of maturing mRNAs at the 3' end and addition of a poly(A) tail. The poly(A) tail influences many aspects of mRNA metabolism, including mRNA stability, mRNA transport, and translation. Over half of all human genes have multiple polyadenylation sites, or poly(A) sites, leading to transcript variants containing distinct mRNA cis- regulatory elements and/or encoding protein isoforms. Alternative polyadenylation has been shown to be regulated in tissue- and condition-specific manners. A growing number of human diseases have been associated with altered polyadenylation activity. While the core elements for polyadenylation have been well characterized, little is known about the auxiliary elements that modulate polyadenylation activity. In particular, the role of RNA secondary structures in regulation of polyadenylation is completely unclear. The long term goal is to fully understand the mechanisms by which mRNA polyadenylation is regulated under different biological conditions. The specific aims of this study are 1) to systematically analyze different types of RNA structures associated with mammalian poly(A) sites by bioinformatics, and 2) to examine how RNA structures regulate mRNA polyadenylation by experimental assays. PUBLIC HEALTH RELEVANCE: mRNA polyadenylation is a key processing event for almost all mRNAs in eukaryotic cells. It involves cleavage of maturing mRNAs at the 3' end and addition of a poly(A) tail. The poly(A) tail influences many aspects of mRNA metabolism, including mRNA stability, mRNA transport, and translation. Over half of all human genes have multiple polyadenylation sites, or poly(A) sites, leading to transcript variants containing distinct mRNA cis- regulatory elements and/or encoding protein isoforms. Alternative polyadenylation has been shown to be regulated in tissue- and condition-specific manners. A growing number of human diseases have been associated with altered polyadenylation activity. While the core elements for polyadenylation have been well characterized, little is known about the auxiliary elements that modulate polyadenylation activity. In particular, the role of RNA secondary structures in regulation of polyadenylation is completely unclear. The long term goal is to fully understand the mechanisms by which mRNA polyadenylation is regulated under different biological conditions. The specific aims of this study are 1) to systematically analyze different types of RNA structures associated with mammalian poly(A) sites by bioinformatics, and 2) to examine how RNA structures regulate mRNA polyadenylation by experimental assays.</description>
		<pubDate>Wed, 22 Jul 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Model-Based Methods for Analyzing ChIP Sequencing Data</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R01HG05119&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R01HG05119&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): Protein-DNA interaction constitutes a basic mechanism for genetic regulation of target gene expression. Deciphering this mechanism is challenging due to the difficulty in characterizing protein-bound DNA on a genomic scale. The recent arrival of ultra-high throughput sequencing technologies has revolutionized this field by allowing quantitative sequencing analysis of target DNAs in a rapid and cost-effective way. ChIP-Seq, which couples chromatin immunoprecipitation (ChIP) with next-generation sequencing, provides millions of short-read sequences, representing tags of DNAs bound by specific transcription factors and other chromatin-associated proteins. The rapid accumulation of ChIP-Seq data has created a daunting analysis challenge. Here we propose a hidden Markov model (HMM)-based algorithm to detect genomic regions that are significantly enriched by ChIP-Seq. Our method will address complications such as sequencing bias and read alignment uncertainty. We also propose a multi-level hierarchical HMM that will allow integration of data from both ChIP-Seq and ChIP- chip. Next, we will build model-based de novo motif finding strategies that utilizing ChIP-Seq data. We believe efficient mining of all sequences identified by ChIP-Seq allows us to precisely characterize the protein-DNA interaction sites. Our long term biomedical research interest is in prostate cancer. We will apply ChIP-Seq and the data analysis tools developed in this project to investigate prostate cancer transcription (dys-) regulation. We believe effective data integration under a coherent probability framework will eventually lead to an in-depth understanding of mechanisms mediating transcription regulation in prostate cancer progression. PUBLIC HEALTH RELEVANCE: Transcription regulation plays an important role in cancer progression. The development of statistical and computational strategies proposed here will help us gain in-depth understanding of mechanisms mediating transcriptional regulation in prostate cancer progression.</description>
		<pubDate>Wed, 22 Jul 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>DNA Copy Number Variation in Dogs</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R21HG04663&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R21HG04663&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): Largely due to their population genetics, dogs are uncommonly powerful animal models of human disease. Recent advances have spawned a new era of dog genetics. Here we add an important tool to the canine genetics toolbox: DNA copy number genetics. In normal humans, thousands of gene-spanning Copy Number Variants (CNVs) were recently identified. Some of these have been shown to have strong roles in disease predisposition. We will use state of the art methods to create the first high resolution CNV map of the dog genome. We have validated the technology and find a level of &gt;50 kb copy number variation that is similar to that reported for primates and rodents. We will molecularly define a high-priority set of CNVs and determine their frequencies in a second panel of dogs. Our work will enable the identification of breed-specific microdeletions and other CNVs that are associated with disease susceptibility and diverse traits. In future studies Single Nucleotide Polymorphisms (SNPs) in linkage disequilibrium with these CNVs can be used to genotype them. Thus our resource and the canine 27k SNP platforms under development will be complementary. We will co-ordinate with other centers to assemble web based distribution of CNV data and maps. PUBLIC HEALTH RELEVANCE: Natural diseases of dogs offer outstanding animal models from which to learn about human disease. Here we propose to develop powerful new approaches and resources for finding dog disease genes. We expect this work will lead to improved health care in dogs, and to the improved understanding of human disease mechanisms.</description>
		<pubDate>Mon, 20 Jul 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Scaling up coalescent linkage disequilibrium mapping</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R01HG04839&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R01HG04839&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): Family-mapping studies are often able to locate a disease gene within an area of 5 cM, but such areas may contain 50+ genes. Methods based on linkage disequilibrium (LD) in population data have the potential to pinpoint disease-associated genes. This proposal begins with an existing LD mapping algorithm which is computationally limited to areas of 0.5 cM or less, and develops three approaches to making it usable for human disease-mapping studies. (1) Simplify the model of recombination, tracking fewer recombinations by disregarding fine recombinational structure between adjacent SNPs. (2) Construct the map in overlapping windows along the chromosome, rather than attempting to analyze the entire region simultaneously. (3) Pre-compute the ancient genealogical relationships for a region of the genome (one of the ENCODE regions will be used as a proof of concept). Essentially all modern samples will share portions of their deep genealogy; pre-computation will greatly reduce the redundant work done by different groups seeking disease loci in the same chromosomal region. This proposal will transform a powerful but computationally expensive mapping algorithm into one of practical use. Finding the specific genes which contribute to development of a disease is important in diagnosis, understanding, and treatment. Diagnostic tests built on a rough idea of a disease gene's location often work only in the ethnicity for which they were developed; tests informed by the actual causative gene or genes can work in all populations. Knowledge of the causative genes can also illuminate the mechanisms of disease and provide targets for treatment design. Public Health Relevance: Finding the precise gene or genes contributing to a human disease is important for diagnosis, understanding, and treatment. Family-based studies often identify a large chromosomal region containing 50+ genes; population-based studies are needed to narrow the location further. This proposal will extend a fine-scale gene-location algorithm based on population data so that it can be used in larger studies and across wider areas of uncertainty.</description>
		<pubDate>Mon, 20 Jul 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Indigenous Communities and Human Microbiome Research</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R01HG05172&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R01HG05172&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): This project is an investigation of the implications of research on ancient and contemporary human microbiomes for the social and ancestral identities of indigenous people. It will engage indigenous communities on the U.S. Southern Plains (Apache, Caddo, and Kiowa nations) and in the Andean region of Peru (Aymara, Quechua and Uros-descended communities). Community members will take part in focus groups, individual survey interviews, and public meetings to discuss the ways in which local variations in human microbiomes related to differences in environment, lifestyle and culture may have implications for health disparities, population histories, and social and ancestral identities. Local communities also will be engaged in discussions about how to conduct ethically and culturally appropriate microbiome research using contemporary samples from some members. PUBLIC HEALTH RELEVANCE: This project will advance our understanding of the relationship between social and ancestral identities and biology as well as develop a model for engaging indigenous communities in human microbiome studies. Both those goals will contribute to reducing health disparities in populations with histories of exploitation and economic and political disadvantages.</description>
		<pubDate>Fri, 10 Jul 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Ethical Legal and Social Dimensions of Human Microbiome</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R01HG04853&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R01HG04853&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): We propose an exploratory survey, parallel to the Human Microbiome Project (HMP), of the emergent ethical, legal, and social issues associated with human microbiome research. We will implement this study using in- depth interviews with key stakeholders in the HMP, including individuals who are recruited to the HMP but decline participation, study participants, and investigators and project leaders involved in planning for an conducting the first phases of the HMP. The overall goal of this project is to identify and analyze ethical, legal, and social issues related to human microbiome research and to develop ethically sound and empirically informed strategies for managing these issues in future research. This project has three Specific Aims: (1) Describe recruits and participants' ideas about the HMP (and participants' experiences of the HMP) as it relates to them physically, socially, and culturally and as it relates to their notions of health and disease, (2) Describe the ethical, legal, and social challenges of conducting the HMP from the perspective of study investigators and project leaders at the NIH, and (3) Provide a forum for interdisciplinary exchange with representative stakeholders (including study participants, members of the research team, and outside experts) to develop recommendations for the responsible management of ethical, legal, and social issues identified in Specific Aims 1 and 2. PUBLIC HEALTH RELEVANCE: The Human Microbiome Project (HMP) is a large study to learn more about the collections of bacteria, viruses and other tiny organisms that live in and on our bodies. We will interview three groups of people: (1) those who are asked to participate in the HMP but say no, in order find out why they said no, (2) those who decide to participate in the HMP, in order to explore their thoughts about the project and their experience with participating in the project, and (3) scientists who are conducting the HMP, in order to understand what ethical or social issues are related to this type of research. We will also host a workshop with members of our research team, study participants, and outside experts to discuss the issues that were identified in the interviews and develop recommendations for managing them.</description>
		<pubDate>Fri, 10 Jul 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Conference Support for Genetic Alliance</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R13HG05190&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R13HG05190&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): This grant application requests support for annual conferences sponsored and coordinated by Genetic Alliance over a five-year period. Genetic Alliance, was established in 1986, and is a network of 1,000 disease specific organizations. The mission of Genetic Alliance is to transform health through genetics to accelerate the translation of research to services. This year's conference will be held on July 17-19, 2009, at the Bethesda North Marriot in Rockville, MD, a handicapped accessible hotel. This conference theme is Discovering Openness in Health Systems. The overall goal of each of the Conferences is to provide a forum for discussion about the various challenges, solutions, and opportunities in the research translation pipeline. Participants will discuss best practices in basic and translational research, encourage and solidify systemic connections, and examine methods for collaboration among researchers, clinicians and disease advocates. The annual conferences held by Genetic Alliance have consistently drawn a large, diverse, and dynamic group of participants-last year's annual conference was attended by more than 200 people including advocates, health professionals, policymakers, researchers, industry professionals, and community leaders. The target group for participants is disease communities. We define these communities as composed of the advocate leaders, the clinician-scientists and the basic researchers for each disease. The types of diseases represented at the meeting are diverse: cardiovascular, mental health, skin, eye, liver, kidney, pancreas, bone, cancer, brain, metabolism and blood. These diseases affect individuals across the lifespan. Our conference attendance is typically 65% women, and 15% minority participants. The aim of this and upcoming conferences is to create a forum for the cross-fertilization of ideas between highly diverse groups within the genetics community in order to accelerate research and treatment development. Genetic Alliance is committed to increasing collaboration across various fields - including diseases and disease pathways, federal agencies, and institutions. We build each year upon the prior year's conference. All of the tools we develop are open source and publically available, including all materials from the conference. Previous tools have included the Genetic Alliance BioBank, Disease InfoSearch, the Resource Repository and various best practices.</description>
		<pubDate>Thu, 9 Jul 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>MIT/Whitehead/Broad Computational Genetics Training Prog</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=T32HG04947&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=T32HG04947&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): We propose to establish a new interdisciplinary research training program in Computational Genetics as a collaborative effort between MIT, the Whitehead Institute, and the Broad Institute of MIT and Harvard. The goal of this program is to train MIT students to be effective interdisciplinary scientists, working as team members with biologists to develop new algorithms, tools, and approaches for analyzing genomic and genetic data and expressing this analysis in the form of principled predictive models. The program faculty will consist of five MIT EECS and Mathematics faculty, four Whitehead faculty members, and four members of the Broad Institute of MIT and Harvard. The major research disciplines of this program include: 1) the development of new approaches and algorithms for the analysis of data from genomics and genetics based experiments and studies; 2) approaches for the principled design of studies based upon past data; 3) the construction of computational models that explain complex phenotypes and biological phenomenon; 4) and the development of approaches for interpreting genomic, genetic, and clinical data relevant to human health and disease. It is proposed that four pre-doctoral trainees be supported in this program, each for a period of two years (a total of 8 slots). We have been running a training program in this area for over seven years, and our students to date have made substantial contributions to the field. Among our recent graduates are faculty at Stanford, Berkeley, Univ. of Washington, Princeton, Duke, and CMU. Our pool of applicants is unusually strong, with 592 applicants in 2008 in relevant sub-areas of Computer Science. Trainees in our proposed research training program will have a very rigorous technical and quantitative foundation from the MIT graduate program in Computer Science, combined formal interdisciplinary course work and a co mentorship arrangement between a Computer Science and a Biology faculty member. The strong technical skills present in our pre doctoral students have provided an excellent foundation for the creation of ground breaking new approaches and algorithms in Computational Genetics. PUBLIC HEALTH REVELANCE - We will train scientists who can discover links between genetic information and risks for human disease. These studies can suggest appropriate therapies for certain diseases and give clues towards the development of new therapeutics. As more data form Genome Wide Association Studies becomes available, we expect that genetic information will become an important component of preventative medicine.</description>
		<pubDate>Wed, 8 Jul 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Mobile Elements in Mammalian Genomes</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R13HG05181&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R13HG05181&amp;cr_yr</guid>
		<description>DESCRIPTION: (provided by applicant): Partial support is requested for the third FASEB Summer Conference, Mobile Elements in Mammalian Genomes which will be held in Snowmass, CO, July 5-10, 2009. The objective of the meeting is to bring together a diverse group of investigators with a common interest in mobile DNA to discuss recent advances in how mobile elements have impacted and continue to influence the functional expression and evolution of mammalian genomes. The program comprises all aspects of mobile DNA including bioinformatics, biochemical, evolutionary, and population genetic studies on mammalian transposable elements, as well as work to exploit transposable elements for saturation mutagenesis and gene delivery. Comparative genomics provides strong evidence for mobile DNA amplification, repression and extinction in evolutionary time. Recently, detailed studies of the small inhibitory RNA pathway have provided strong evidence for derepression of mobile DNA in mutants of this pathway; the emerging interface of these two fields will be represented by a new session at this meeting. There will also be increased representation of studies that define a variety of roles for sequences derived from transposable elements in genome architecture and function. This timely conference will facilitate the sharing of exciting new findings, experimental challenges, and outstanding questions in this rapidly evolving field among students, post-doctoral trainees, and junior and senior principal investigators. PUBLIC HEALTH RELEVANCE: Mobile DNA has played, and continues to play a major role in shaping the structure and function of the mammalian genome. Retrotransposition deposits new copies of mobile DNA throughout the genome which can lead to gene disruption, modified expression of adjacent genes, and transduction of neighboring DNA; at the DNA level the resulting interspersed repeated sequences may give rise to novel regulatory networks, or provide a substrate for homologous recombination of mispaired sequences, leading to gene duplication, deletion, exon shuffling and chromosome translocation. Any one of the dynamic events caused by the presence and movement of mobile DNA in the human genome may lead to disease, including meiotic failure and infertility, inherited genetic diseases such as hemophilia and muscular dystrophy, and somatic diseases such as breast and colon cancer.</description>
		<pubDate>Thu, 2 Jul 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Genome Informatics Conference</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R13HG05183&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R13HG05183&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): The Cold Spring Harbor Laboratory conference on Genome Informatics will focus on the provision and utilization of large scale genomic data and annotations. Genomic resources provide the fundamental descriptions of an increasing array of organisms at the molecular level. This meeting forms part of a series that alternates annually between Cold Spring Harbor, USA and Hinxton, UK. The goal of the series is to explore both the latest provision of these resources, and perhaps most importantly, their use as engines of biological discovery. This ranges from the storage of data and their associated data models, to the design of effective algorithms to uncover non-obvious aspects of these datasets, to ontologies to concisely describe biological information, and software systems to support curation, visualization, and exploration. The conference has expanded its remit over the last few years, to ensure it remains current with the latest applications of informatics, all the while ensuring a strong focus on biological informatics. In particular we are increasing our focus on new sequencing technologies which will revolutionize the genomics field and rely heavily on bioinformatics tools. The conference brings together some of the leading scientists in this growing field, and we strongly encourage researchers from other large scale information handling disciplines to attend. Public Health Relevance: The Cold Spring Harbor Laboratory conference on Genome Informatics will focus on the provision and utilization of large scale genomic data and annotations. Genomic resources provide the fundamental descriptions of an increasing array of organisms at the molecular level. The goal of the series is to explore both the latest provision of these resources, and perhaps most importantly, their use as engines of biological discovery. This ranges from the storage of data and their associated data models, to the design of effective algorithms to uncover non-obvious aspects of these datasets, to ontologies to concisely describe biological information, and software systems to support curation, visualization, and exploration.</description>
		<pubDate>Wed, 1 Jul 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Statistical Models of Maternal and Offspring Genetic Eff</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=F31HG05056&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=F31HG05056&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): The development of statistical methods to detect complex biological or genetic mechanisms is very important to the furthering of our knowledge of the etymology of complex diseases. One example where genes operate through complex biological mechanisms to increase disease risk is through an interaction between maternal and offspring genes termed maternal-fetal genotype (MFC) incompatibility. MFC incompatibility arises from maternal-fetal genotype combinations that predispose for maternal immunological processes that adversely affect the developing fetus, and thereby increase susceptibility to disease. Although the adverse environment takes place early in life, the effects of it can be studied decades later because of its genetic origins. Using population designs that collect only offspring and their mothers, interactions between mothers and offspring gene effects are confounded with the effects of a susceptibility risk allele in the offspring or the mother. To solve this problem, my research group developed the original MFG test to estimate MFC interaction effects by adapting the log-linear method for estimating genotypic relative risks for trios (parents and affected offspring). However, there are still some fundamental limitations to the current version of the MFG test that must be overcome. The current test is limited to 2-generation families, it can not handle all forms of MFG incompatibility and it can not be used with quantitative outcomes. This proposed work will expand upon existing statistical methodology (the MFG Test) that estimates disease relative risk for MFG incompatible offspring. Specifically this research will (1) Develop an extended MFG test (EMFG Test) to accommodate arbitrary family structures, (2) Generalize the EMFG test so that one statistical method can be used to fit a variety of MFG incompatibility models, (3) Allow for interactions between specific covariates and MFG incompatibility in the arbitrary family setting, (4) Extend the EMFG test for use with quantitative traits. PUBLIC HEALTH RELEVANCE: This proposed work is relevant to public health because it will provide new statistical methods that can be used by others to test for the joint effects of maternal and offspring genes in a wide range of diseases. Ultimately it will help the scientific community better understand the mechanisms by which environmental and genetic risk factors interact to increase risk to disease.</description>
		<pubDate>Wed, 1 Jul 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>A Turnkey Solution for Next Generation Sequence Data Ana</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R21HG05133&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R21HG05133&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): Modern biology continues to be revolutionized by high throughput data production technologies. Nowhere is this more obvious than in the case of next-generation DNA sequencing technologies, which have dramatically higher throughput and lower cost then previous approaches. Not only do these technologies make genome sequencing and resequencing more widely available, they have driven the development of a variety of novel genome-wide (and data-intensive) functional assays. But are these methods really accessible for experimental- ists? Although the financial cost of sequencing has been substantially reduced, there is still a significant barrier preventing experimental biologists from making effective use of this data. Translating the data generated by these new technologies requires sophisticated computational infrastructure - both for data large-scale data management and analysis - that is accessible to experimentalists. Genomic data discovery is no longer the limiting factor for much genomic research, instead the problem lies in providing the data, analysis tools, and protocols in a form that is usable for bench biologists, so that they can take full advantage of their data. We have developed a framework - Galaxy - that makes it easy to provide accessible interfaces to computational tools, and provides experimental biologists with an intuitive and consistent interface for per- forming sophisticated analyses with minimal effort, regardless of the scale of data involved. Here we propose to build, using this existing framework, a complete turnkey solution for accessible management and analysis of next-generation sequence data. This solution will allow data produced by sequencing instruments to be automatically made available to bench biologists through Galaxy's user-friendly analysis environment. Into this environment we will integrate a large set of tools for sequence data analysis, along with pre-defined best- practice workflows for common analysis problems. The entire solution will be provided as a pre-configured ready-to-run package which any lab or provider of sequencing services can easily deploy, enabling their users to truly realize the promise of next-generation sequencing technologies. PUBLIC HEALTH RELEVANCE: A new generation of high-throughput DNA sequencing technologies has made a variety of novel data-intensive genome-scale experiments both possible and relatively inexpensive, putting these techniques within the reach of many more labs. However, these dramatic improvements in the availability and cost of sequencing have not yet been matched with easy-to-use, scalable, integrated and flexible data analysis capabilities. The proposed project will develop an integrated data management and analysis solution that allows biomedical researchers to easily and efficiently work with the data produced by these revolutionary new technologies.</description>
		<pubDate>Wed, 1 Jul 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Whole-genome shotgun sequencing strategy and assembly</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R01HG03474&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R01HG03474&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): Genome sequence, basic to biomedical research, is efficaciously produced by whole-genome shotgun (WGS) sequencing. Although WGS sequencing is a major NIH activity, we lack answers to fundamental questions about sequencing strategy and assembly of WGS data. Our work and the community's have focused on assembly of particular data sets and development of assembly algorithms. This grant focuses on mathematical underpinnings and rigorous analysis of genome sequencing and assembly, to improve our assembly tools and approaches. We will develop general methodology for optimally choosing specific sequencing strategies for new and varied organisms, fully exploiting data from emerging technologies. So that assembly is also optimal, we will develop algorithms that exploit the data's exact information content, retaining intrinsic ambiguity, and allowing assembly of genomes beyond current capabilities. We will develop strict internal consistency tests, guaranteeing accuracy and completeness of assembly units. A new assembly quality markup tool will label assembly regions from finished to inconsistent, by their inferred accuracy. This will guide finishing work (improving efficiency) and clearly describe reliability of particular assembly regions to end-users. In short, the work will produce better quality genome sequence at lower cost, marked to show reliability, thereby increasing utility for downstream analysis and laboratory experimentation.</description>
		<pubDate>Wed, 1 Jul 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Exploring Concepts of Population in Two Human Genetic Sa</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R03HG05030&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R03HG05030&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): The proposed project explores a new direction in our larger research on the use of human genetic variation studies in the search for biomedically related genetic markers. Broadly, the aim of the new add-on project and the original project is to understand how human genetic variation researchers operationalize the concept of a human population. Together, these studies will provide empirical information that will help geneticists and bioethicists to understand whether there may be potential downstream social and biomedical consequences of different conceptualizations. Through these studies we are investigating how the concept of population is operationalized during sample collection and analysis both within and across scientific research projects. Our research methodology involves primarily ethnographic observation, interviews, and documentary analyses, which will be employed in the proposed study. The proposed project will study how data in different human genetic variation studies is collected and analyzed and how different aspects of scientific work (e.g. sample collection design and practices, statistical analysis of data, etc.) influences the definition and operationalization of genetic populations. We are especially interested in scientists' efforts to avoid the reification of social categories such as race in human genetic variation studies, especially as they relate to disease variant research. The proposed project will provide data from two laboratory field sites that complement the laboratory field sites we are currently studying. These two new sites are crucial for understanding how the selection of groups from whom DNA samples are collected for genome wide association studies impacts the way researchers operationalize the concept of human populations. It will provide information on DNA sample collection processes, for comparison with DNA analysis processes, and will add research on several ethnic groups that are absent from our current research sites. These include groups of individuals in the U.S. who self identify as Latinos/Hispanics, Native Hawaiians, and Japanese. PUBLIC HEALTH RELEVANCE - The proposed project will investigate the conceptualization and operationalization of the concepts of population in the data collection design and practices at the front end of human genetic variation research on disease related genetic variants. That research aims to provide better tools for health related genetic research. The aim of the proposed project is to provide information and analyses that will be useful towards improving the reliability of these public health-related genetic variation studies.</description>
		<pubDate>Fri, 26 Jun 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>A Microbial Genome Reference Platform for Metagenomics</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=U54HG04973&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=U54HG04973&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): The composition of the complex microbial communities inhabiting the human body has a tremendous influence on human health and disease. New DMA sequencing technologies offer the opportunity to study these communities by directly identifying the genome of individual microbial strains. Ideally, new sequences from metagenomic sampling can be compared to the known, full sequences of individual strains. It is estimated that there are more than 1,000 such reference strain sequences that are required and -600 that are either already determined, or else are in the process of being sequenced. The primary aim of this proposal is to generate reference DMA sequences of the remaining -400 strains to complete this reference catalogue. First, all 400 strains will be sequenced and assembled by shotgun methods. At least 60 will be finished to high quality using established methods and new approaches to convert the majority of the remainder to similarly high quality will be developed. The sequencing will use next-generation methods, based upon platforms developed by 454 Life Sciences, Illumina/Solexa and Applied Biosystems. All of the sequences wiN be annotated by an automated pipeline, and the 60 that are 'finished' will also be manually curated. The sequencing methods will also be applied to a selection of viral and fungal targets. Metagenomic sampling approaches will be tested using existing samples and methods, and these data will be refined by development of new technologies for selective DMA isolation and 16S and WGS sequencing of metagenomic samples, including microarray DMA chip capturing, electrophoretic techniques and cDNA tests. When all technical advances are combined, the cost of sequencing an individual bacterial genome will be less than $1,000.</description>
		<pubDate>Fri, 19 Jun 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>International Conference on Biomedical Ontology</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R13HG05049&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=R13HG05049&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): Justification for the Proposed Conference. The use of ontologies to describe both clinical and experimental data is now a standard technique in integrative translational research. When data from different sources are described using shared, logically structured, controlled vocabularies, this makes the data more easily retrievable and navigable, and it also enhances the degree to which they can be analyzed and combined to serve new purposes. Following a strategy pioneered by the Gene Ontology, ontologies are now being developed for the description of biological and biomedical data of almost every type. To achieve their goal in a maximally effective way, these ontologies must work well together, and as ontologies become ever more commonly used, the problems involved in achieving coordination in ontology development become ever more urgent. To address these problems there is an urgent need for a general overarching conference involving representatives of all the major communities involved in the development and application of ontologies in biomedicine. Qualifications of Organizers and Participants. The organizers have considerable experience in directing major interdisciplinary events at the borderlines of biology, biomedical informatics and clinical and translational research. Invited speakers will be leading figures in the field of biomedical ontology. The majority of presentations will be selected on the basis of submissions refereed by the Program Committee. Steps are being taken to encourage submissions from under-represented groups. Papers accepted for presentation will be published in a volume of conference proceedings. Scientific Plans for the Proposed Conference. The workshop program is designed to ensure coverage of the entire domain of biomedical research, from biomolecules to populations. Principal anticipated long-term outcomes are: (1) heightened mutual awareness of the work being undertaken by the many groups involved in developing and using ontologies to serve the ends of clinical and translational research; (2) establishment of new cross-disciplinary collaborations among these groups. Rationale for Funding Support. The sorts of cross-disciplinary expertise in biology, biomedicine, biomedical informatics and computer science that are the presupposition of successful work in biomedical ontology are not yet adequately addressed in university training programs. There is accordingly a shortage of junior researchers with the needed awareness and expertise. We hope that this conference will serve as an initial step to addressing this shortfall. We are accordingly requesting funding to enable students and early career scientists, especially from under-represented groups, to participate in the ICBO conference. PUBLIC HEALTH REVELANCE The use of ontologies to describe both clinical and experimental data is now a standard technique in integrative translational research. We request funding in the form of early career participant stipends for the first general conference involving representatives of all the major communities involved in the development and application of ontologies in biomedicine.</description>
		<pubDate>Tue, 9 Jun 2009 12:00:00 EST</pubDate>
	</item> <item>
		<title>Sequencing the Human Microbiome</title> 
		<link>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=U54HG04968&amp;cr_yr</link>
		<guid>http://www.genome.gov/page.cfm?pageID=17015407&amp;display_abstract=on&amp;query_grantid=U54HG04968&amp;cr_yr</guid>
		<description>DESCRIPTION (provided by applicant): The human microbiome is the collection of microorganisms that colonize the human body. Over the last year the analysis of the microbiome of 250 healthy subjects was initiated, focusing on five body sites (nasal, oral, vaginal, dermal, and gut) and many subsites. This study began to build a catalog of genome sequences of culturable bacteria that inhabit the human body as well as described the community structure formed by microbial species, allowing diversity to be assessed between body sites and between individuals. The goal of this study is to extend these initial probes in multiple directions. The study will add 1,000 genomes to the catalog of bacterial reference sequences of the human microbiome by using automated methods to isolate organisms and high throughput DNA sequencing to construct genome sequences. The study will also further characterize the metagenomes of the 250 individuals by directly comparing DNA sequences from organisms in the microbiome, thus deterimining what is common or variant between subjects. Going beyond bacteria, the project will characterize the virome, the component of the human microbiome formed by viruses, and take a census of eukaryotic microbes in the human microbiome. The last goal is to analyze the transcriptional patterns of microbes found in metagenomic communities, and by this means determine what functions are expressed in the different body niches. These studies will advance our growing knowledge of the human microbiome in healthy subjects, providing a baseline for disease studies as well as new tools for probing the microbiome. Public Health Relevance: The human microbiome is the collection of microorganisms that colonize the human body. Over the last year the analysis of the microbiome of 250 healthy subjects was initiated, focusing on five body sites. This study began to build a catalog of genome sequences of organisms that inhabit the human body as well as described the diversity found between body sites and individuals. The goal of this study is to add 1,000 genomes to the catalog of reference sequences of the human microbiome, to further characterize differences in the metagenomes of 250 individuals by directly comparing shotgun sequences, to characterize the virome component of the human microbiome, to take a census of eukaryotic microbes in the human microbiome, and to analyze the transcriptional pattern found in metagenomic communities. These studies will advance our growing knowledge of the human microbiome in healthy subjects, providing a baseline for disease studies as well as new tools for probing the microbiome.</description>
		<pubDate>Fri, 22 May 2009 12:00:00 EST</pubDate>
	</item> 
  </channel>
</rss>
