<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:atom="http://www.w3.org/2005/Atom" xmlns:openSearch="http://a9.com/-/spec/opensearch/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:gd="http://schemas.google.com/g/2005" xmlns:thr="http://purl.org/syndication/thread/1.0" version="2.0"><channel><atom:id>tag:blogger.com,1999:blog-2155351992730855318</atom:id><lastBuildDate>Sat, 03 Mar 2012 13:39:42 +0000</lastBuildDate><category>patents</category><category>BACE1</category><category>BACE2</category><category>protein homology</category><category>Ensembl</category><category>IPI</category><category>citations</category><category>bio</category><category>fish</category><category>ORFs</category><category>latamase</category><category>PubChem</category><category>target</category><category>ChEMBL</category><category>albumin</category><category>TTD</category><category>chem</category><category>gene trees</category><category>DrugBank</category><category>glyptins</category><category>INNs</category><category>Mw</category><title>Bio &lt;-&gt; Chem</title><description>&lt;strong&gt;Technical notes from the inteface between bioinformatics and cheminformatics by Chris Southan&lt;/strong&gt;</description><link>http://cdsouthan.blogspot.com/</link><managingEditor>noreply@blogger.com (Chris Southan)</managingEditor><generator>Blogger</generator><openSearch:totalResults>19</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/Bio-Chem" /><feedburner:info xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" uri="bio-chem" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><item><guid isPermaLink="false">tag:blogger.com,1999:blog-2155351992730855318.post-4468269305471889240</guid><pubDate>Wed, 29 Feb 2012 15:24:00 +0000</pubDate><atom:updated>2012-03-03T05:39:42.366-08:00</atom:updated><category domain="http://www.blogger.com/atom/ns#">BACE1</category><category domain="http://www.blogger.com/atom/ns#">Ensembl</category><category domain="http://www.blogger.com/atom/ns#">albumin</category><category domain="http://www.blogger.com/atom/ns#">BACE2</category><category domain="http://www.blogger.com/atom/ns#">bio</category><title>Get your kicks on Ensembl 66</title><description>&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&amp;nbsp;&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;Cultural reference prelude:&amp;nbsp; Many of you may know the Rolling Stones rendition of&amp;nbsp; (get your kicks on)&amp;nbsp; &lt;a href="http://www.youtube.com/watch?v=UyhkBg8wOBo"&gt;Route 66&lt;/a&gt;&amp;nbsp; but I chose the &lt;a href="http://www.youtube.com/watch?v=tg2EbJy-9dc"&gt;Chuck Berry&lt;/a&gt; and &lt;a href="http://www.youtube.com/watch?v=mQwb4SpNuKo"&gt;Manhattan Transfer&lt;/a&gt; versions for my MP3 collection. &lt;br /&gt;
&amp;nbsp;&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;Ensembl is fantastic for many reasons.&amp;nbsp; It is also a clean retrieval tag so we can count over 7 million specific Google hits, which goes up to&amp;nbsp; 437 million if you just add the “e”.&amp;nbsp;&amp;nbsp;&amp;nbsp; Analogously, you can find&amp;nbsp; 434 mentions&amp;nbsp; in PubMed, including&amp;nbsp; one from me&amp;nbsp; (&lt;a href="http://www.ncbi.nlm.nih.gov/pubmed/15174140"&gt;PMID: 15174140 &lt;/a&gt;).&amp;nbsp; I can count myself as one of the first adopters (hey, EB even brought me drink once) but something from those early days comes to mind.&amp;nbsp;&amp;nbsp; While checking EST contigs from what was then the SmithKline Beecham gene patenting assembly line against these early human Ensembl ORFs,&amp;nbsp; based largely on GeneWise and GenScan,&amp;nbsp; I read off some similarity matches against set of cleanly predicted (Met-to-stop)&amp;nbsp; but rather strange proteins.&amp;nbsp;&amp;nbsp; The only 100% match for these anywhere was&amp;nbsp; against some Actinomyces sequences in the patent databases.&amp;nbsp;&amp;nbsp; Thus, either something had blown in the window at the sequencing center or some other kind of mix-up had dropped&amp;nbsp; these contigs into an early assembly,&amp;nbsp; but, after regaling TH on the phone (maybe that's why he's never bought me a drink....), they were duly expunged.&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;br /&gt;
Moving swiftly on to&amp;nbsp;&amp;nbsp;&lt;a href="http://www.ensembl.info/blog/2012/02/24/ensembl-66-released/"&gt;Ensembl 66&lt;/a&gt;&amp;nbsp; there has been massive progress and its good to see the &lt;a href="http://www.ensembl.org/Latimeria_chalumnae/Info/Index?db=core;g=ENSLACG00000015003;r=JH126698.1:1152666-1177335;t=ENSLACT00000017156"&gt;Coelacanth&lt;/a&gt; join the genome zoo. &amp;nbsp; However, while exogenous contamination is likely to be a thing of the past, there are still reasons to be circumspect about the capability of automated genome annotation pipelines to generate authentic canonical proteomes&amp;nbsp; (in the Swiss-Prot sense of&amp;nbsp; correct,&amp;nbsp; all-exon ORFs).&amp;nbsp;&amp;nbsp; Given this release came a week or so after my post on &lt;a href="http://cdsouthan.blogspot.com/2012/02/fishy-bace1-and-bace2-story.html"&gt;BACE1 and 2 in fish&lt;/a&gt;&amp;nbsp; I decided to look at the 66 data in more depth.&amp;nbsp;&amp;nbsp; For human (or any genome used as a reference)&amp;nbsp; there is a cool feature to generate a complete set of orthologous alignments and download the statistics from the CLUSTAL zip-ups. You can read off a rough proxy for the evolutionary distance directly off the Excel plot for the whole set.&amp;nbsp; I’ll skip the details (you can work them out simply by clicking around in the Ensembl links)&amp;nbsp; but below you can see the result for 48 Albumin orthologues &lt;/span&gt;&lt;span style="font-size: large;"&gt;&lt;a href="http://www.ensembl.org/Homo_sapiens/Gene/Summary?g=ENSG00000163631;r=4:74262831-74287129"&gt;(ENSG00000163631&lt;/a&gt;) .&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-_RPQ-CCu8G8/T05CueeAIzI/AAAAAAAAAUg/t7poZwfhTNA/s1600/Ensembl_66_01.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="312" src="http://1.bp.blogspot.com/-_RPQ-CCu8G8/T05CueeAIzI/AAAAAAAAAUg/t7poZwfhTNA/s640/Ensembl_66_01.JPG" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;span style="font-size: large;"&gt;Assuming these are generally the same lengh (i.e. ~ 610 , and the data set supports this) the % id is essentially the reciprocal&amp;nbsp; match between “Query” (human in this case)&amp;nbsp; and “Target”.&amp;nbsp;&amp;nbsp; So, as expected, we see the primates in the high 90s, mammals in the 70s and the expected drop-off down to the 50s for the birds.&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;br /&gt;
But we can spot anomalies.&amp;nbsp; The first are five obvious cases of id mismatch (blue way above red target id ) where the species appears to slide down the evolutionary scale because of truncation and/or X-runs in the ORF.&amp;nbsp;&amp;nbsp; Thus the Coelacanth&amp;nbsp; albumin looks correct whereas it's dodgy for the two Shrews, Zebrafish is C-terminally truncated to 347 and&amp;nbsp; Cat is N-terminally truncated.&amp;nbsp;&amp;nbsp; OK, so errors in 5 out of 48 protein/gene models might not be so bad but there are other examples, such as Armadillo,&amp;nbsp; where the ids are full length and reciprocal but include substantial&amp;nbsp; Xs as translated n-ru&lt;/span&gt;&lt;span style="font-size: large;"&gt;ns from automated exon gap-filling (the IntePro-like protein graphic judiciously marks these up as "low complexity" ). &lt;/span&gt;&lt;span style="font-size: large;"&gt;Some of what look like&amp;nbsp; false-negatives (i.e.&amp;nbsp; Ensembl species we might expect to have serum albumin) may be evolutionarily real,&amp;nbsp; for example the missing &lt;a href="http://www.ncbi.nlm.nih.gov/pubmed/20432704"&gt;fish&lt;/a&gt; .&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;br /&gt;
OK, so let's take a look at the analogous plot for BACE2&amp;nbsp; &lt;a href="http://www.ensembl.org/Homo_sapiens/Gene/Summary?g=ENSG00000182240;r=21:42539728-42648524"&gt;(ENSG00000182240&lt;/a&gt;).&amp;nbsp; I have restricted the data set to 48 vertebrates because the ur-BACEs from Ciona and Lamprey complicates the analysis and the set of &lt;i&gt;C. elgans&lt;/i&gt; cathepsins are not really orthologous and produce erroneously low id numbers because of over-gapping.&amp;nbsp; &amp;nbsp; &lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-uIsHGtWGwlI/T05C_SWeIHI/AAAAAAAAAUo/9r3-83upFwQ/s1600/Ensembl_66_02.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="278" src="http://2.bp.blogspot.com/-uIsHGtWGwlI/T05C_SWeIHI/AAAAAAAAAUo/9r3-83upFwQ/s640/Ensembl_66_02.JPG" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;span style="font-size: large;"&gt;Firstly, there are more species, mainly because Ensembl “ finds” a BACE2 in all the fish (but splits it in two in the Zebrafish).&amp;nbsp; Secondly the graphical mismatches are much higher.&amp;nbsp;&amp;nbsp; Notwithstanding&amp;nbsp; an element of evolutionary unpredictability regarding length differences between paralogues and possible loss-of-function by truncation or pesudogene conversion,&amp;nbsp; the minimum parsimonious assumption is that at a substantial proportion of these ORFs are wrong.&amp;nbsp; Given the contentiousness of this claim I dutifully scrolled though all the &lt;a href="http://www.ensembl.org/Homo_sapiens/Gene/Compara_Ortholog/Alignment?db=core;g=ENSG00000182240;r=21:42539728-42648524"&gt;alignments&lt;/a&gt; under the graph.&amp;nbsp; Even with giving the occasional&lt;/span&gt;&lt;span style="font-size: large;"&gt; small indel or frame-skip the benefit of the doubt,&amp;nbsp; &lt;/span&gt;&lt;span style="font-size: large;"&gt;I could only judge nine to be correct.&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;span style="font-size: large;"&gt;The plot for BACE1 is shown below.&amp;nbsp; &lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-OQG1nmvAZR4/T08s74qHr5I/AAAAAAAAAUw/YGD35UwUz8o/s1600/Ensembl_66_03.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="258" src="http://3.bp.blogspot.com/-OQG1nmvAZR4/T08s74qHr5I/AAAAAAAAAUw/YGD35UwUz8o/s640/Ensembl_66_03.JPG" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;span style="font-size: large;"&gt;The ORF quality seems about midway between ALB and BACE2 with&amp;nbsp; 11 obvious mismatches including &lt;a href="http://cdsouthan.blogspot.com/2011/08/alas-poor-kamilah-erroneous-ensembl.html"&gt;Kamilah the Gorilla&lt;/a&gt;. However,&amp;nbsp; the Medaka fish is a true false-negative because I found a clutch of ESTs covering about 430 residues and a match an ORF from the NCBI genomic pipeline as &lt;a href="http://www.ncbi.nlm.nih.gov/protein/348523664?report=genbank&amp;amp;log$=protalign&amp;amp;blast_rank=1&amp;amp;RID=MYJ9PXU0013"&gt;XP_003449343&lt;/a&gt; &amp;nbsp;&lt;/span&gt;&lt;span style="font-size: large;"&gt; (but this was also incorrect at the extended N-terminal, so no-ones perfect...) &lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;br /&gt;
This is not the place to expound on the causality and consequences of these observations&amp;nbsp; but we can touch on some basics.&amp;nbsp;&amp;nbsp; Three genes do-not-a-survey-make but I perceive no reason why these should be worst-cases.&amp;nbsp;&amp;nbsp; Intuitively, we might predict that&amp;nbsp; gene/protein pipeline output accuracy for a species is related to&amp;nbsp; a) the cDNA-derived protein database coverage&amp;nbsp; b) assembly quality&amp;nbsp; and c)&amp;nbsp; the evolutionary distance between that species&amp;nbsp; and the ones with more transcript coverage.&amp;nbsp; Thus,&amp;nbsp; BACE1 fairs better that BACE2 because it is more conserved and has higher transcript coverage.&amp;nbsp; Obviously, we would rather Ensembl provide us with the gene predictions from the &lt;a href="http://www.ensembl.org/info/docs/genebuild/2x_genomes.html"&gt;low coverage assemblies&lt;/a&gt; than not, but the caveats we must accept include those shown above.&amp;nbsp; However, proteome comparisons (InterPro scans and dN/dS metrics for a start)&amp;nbsp; will be confounded until there is considerable improvement. What does seem paradoxical is that we are continually hearing about NGS sequencing acceleration (e.g. &lt;a href="http://www.bio-itworld.com/news/02/17/12/Oxford-strikes-first-in-DNA-sequencing-nanopore-wars.html"&gt;nanopores&lt;/a&gt;), plummeting costs and new genome completions (e.g. &lt;a href="http://www.citeulike.org/user/cdsouthan/article/10394061"&gt;Ötzi the Iceman&lt;/a&gt; ) but some of the Ensembl assemblies languish for years without an update, despite the&amp;nbsp;&lt;a href="http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/index.shtml"&gt;Genome Reference Consortium&lt;/a&gt;.&amp;nbsp; In addition, while there are RNAseq efforts to profile new species (e.g. the Transcript Assembly Shotgun division in GenBank)&amp;nbsp; they don't seem to be synchronised with the Ensembl genomes or feeding new cDNAs through to UniProt. &amp;nbsp; For example,&amp;nbsp; the &lt;i&gt;Fugu &lt;/i&gt;genome with a first-draft assembly from 2002, still has a mere 173 Swiss-Prot and 1,349 TrEMBL entires.&amp;nbsp;&lt;/span&gt;&lt;span style="font-size: large;"&gt;There are a couple of additional things I might pick related to Ensembl protein counting. In the meantime I'm still a big fan but let’s hope 67 does better than 66 on the ORF front.&amp;nbsp; &lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2155351992730855318-4468269305471889240?l=cdsouthan.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://cdsouthan.blogspot.com/2012/02/geting-your-kicks-on-ensembl-66.html</link><author>noreply@blogger.com (Chris Southan)</author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://1.bp.blogspot.com/-_RPQ-CCu8G8/T05CueeAIzI/AAAAAAAAAUg/t7poZwfhTNA/s72-c/Ensembl_66_01.JPG" height="72" width="72" /><thr:total>0</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-2155351992730855318.post-3886960209850720864</guid><pubDate>Thu, 23 Feb 2012 10:16:00 +0000</pubDate><atom:updated>2012-03-02T02:27:28.009-08:00</atom:updated><category domain="http://www.blogger.com/atom/ns#">ChEMBL</category><category domain="http://www.blogger.com/atom/ns#">chem</category><category domain="http://www.blogger.com/atom/ns#">Mw</category><category domain="http://www.blogger.com/atom/ns#">PubChem</category><category domain="http://www.blogger.com/atom/ns#">TTD</category><title>Therapeutic Target Database  in PubChem</title><description>&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
In the last few days PubChem has a new source that merited a newsflash and has bumped the &lt;a href="http://cdsouthan.blogspot.com/2012/01/suddenly-its-raining-patent-chemistry.html"&gt;IBM patent structures&lt;/a&gt;&amp;nbsp; off the top slot.&amp;nbsp; While this is a new entrant into the PubChem fold &lt;a href="http://bidd.nus.edu.sg/group/ttd/ttd.asp"&gt;TTD&lt;/a&gt; &lt;i&gt;per se&lt;/i&gt; is well established and their recent update is published &lt;a href="http://www.ncbi.nlm.nih.gov/pubmed/21948793"&gt;(PMID 21948793&lt;/a&gt;).&amp;nbsp; &lt;/span&gt;&lt;span style="font-size: large;"&gt;I can take a micro-credit for encouraging this to happen  via my contacts with both sides, although the respective teams did all the real work of&amp;nbsp; implementation.&amp;nbsp;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-RE3uH9SLJG8/T0YR3t9mhjI/AAAAAAAAAUA/to_w41iO8ws/s1600/TTD_01.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="266" src="http://3.bp.blogspot.com/-RE3uH9SLJG8/T0YR3t9mhjI/AAAAAAAAAUA/to_w41iO8ws/s640/TTD_01.JPG" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;I will just present some overview statistics here because the beauty of now having the structures as a selectable source is that anyone can - slice, dice, compare, profile, intersect or diff -&amp;nbsp; to their hearts content.&amp;nbsp;&amp;nbsp; As it happens,&amp;nbsp; we included TTD in a &lt;a href="http://www.citeulike.org/user/cdsouthan/article/8637704"&gt;database comparison exercise for compounds and targets&lt;/a&gt;&amp;nbsp;&amp;nbsp; some time ago but, to our chagrin, the book chapter is taking inordinately long to appear. Because of intervening database updates, including a big expansion of TTD chemistry, it will consequently be somewhat stale beer from the statistics angle but the methodology should still be of interest. I shall duly announce here when the flat beer finally gets served......&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;span style="font-size: large;"&gt;The first thing we can do with a new source is a skim check.&amp;nbsp; TTD gives SID:CID ratio of 1.009. This should not be over interpreted but being close to 1 suggests a satisfactory chemistry rules match. &amp;nbsp; The covalent count query of 2 or greater gives 718 mixtures, mainly salts, but note 181 of these have a MeSH pharmacology term. One of the largest is Fomivirsen (&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=56603661&amp;amp;loc=ec_rcs"&gt;CID 56603661&lt;/a&gt;) with no less than 15 sodiums.&amp;nbsp; Another useful check (thanks to EB for the query)&amp;nbsp; is the "uniqueness" count of source =1. This indicates 165 TTD-only CIDs. Many of these have canonical connectivity to other CIDs so we might not consider them as novel,&amp;nbsp; but at least one is unique&amp;nbsp; and happens to come at the top of the list as&amp;nbsp; &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=56603797&amp;amp;loc=ec_rcs"&gt;CID 56603797&lt;/a&gt; . It is actually a native tetra-peptide&amp;nbsp; YVAD but occurs in chemically modified forms in other CIDs. &lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;span style="font-size: large;"&gt;The snapshot below includes the useful category splits you see from any PubChem Compound query results. In this case I have included TTD, &lt;a href="http://www.drugbank.ca/"&gt;DrugBank &lt;/a&gt;, and the union,&amp;nbsp; as these are the most obvious initial comparisons to make. &lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-dGdybnHExWc/T0YS2uWcXDI/AAAAAAAAAUI/IdtzI4ddvY8/s1600/TTD_03.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="288" src="http://2.bp.blogspot.com/-dGdybnHExWc/T0YS2uWcXDI/AAAAAAAAAUI/IdtzI4ddvY8/s640/TTD_03.JPG" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;The first feature to note is the additive coverage indicated by the union being nearly 20K. Thus,&amp;nbsp; these two sources have captured different slices of the overall drug chemistry landscape.&amp;nbsp;&amp;nbsp; In this context BioAssay counts are dominated by ChEMBL compounds-in-common (see below). Interestingly DB is lower in these but much higher in PDB intersects, as would be expected from their emphasis in extracting ligands from this source (&lt;a href="http://www.ncbi.nlm.nih.gov/pubmed/21059682"&gt;PMID 21059682&lt;/a&gt;).&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;span style="font-size: large;"&gt;The fly in the ointment here is that both the top-ranked PDB structures are artefactual in that they merely contain the highest heteroatom counts from each source (6 and 11)&amp;nbsp; but not authentic drug ligands.&amp;nbsp; There is lower Rule-of-five (ROF) proportion in TDD (66%) compared to DrugBank  (75%). We can investigate one of these rules by plotting the Mw frequency as shown below. &lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-HU26AGpImu0/T0a1VBmKVjI/AAAAAAAAAUY/HhDMY7ejQYE/s1600/TTD_04.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="458" src="http://3.bp.blogspot.com/-HU26AGpImu0/T0a1VBmKVjI/AAAAAAAAAUY/HhDMY7ejQYE/s640/TTD_04.JPG" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;DrugBank shows a broader spread &lt;/span&gt;&lt;span style="font-size: large;"&gt;(mean 354,&amp;nbsp; median 320) &lt;/span&gt;&lt;span style="font-size: large;"&gt;but TTD (mean 437, median 321) &lt;/span&gt;&lt;span style="font-size: large;"&gt;shows a spike around 2000 that pushes up the mean. Taking a Mw cut of 1000 to 3000 in TTD covers 841 CIDS but only 74 in DrugBank.&amp;nbsp; Most of these entries seem to be peptides with various modes of action.&amp;nbsp; However, there is a paradox here in that while TTD is lower for ROF, it is significantly higher in MeSH pharmacology of which those big peptides only contribute 64 entries. &lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;span style="font-size: large;"&gt;The snapshot below is a small selection of intersects with sources also expected to include extensive coverage of drugs. &lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-T9zRQDPE0Pg/T0YS9H-0d1I/AAAAAAAAAUQ/WaH7x3Mqg8w/s1600/TTD_02.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="382" src="http://3.bp.blogspot.com/-T9zRQDPE0Pg/T0YS9H-0d1I/AAAAAAAAAUQ/WaH7x3Mqg8w/s640/TTD_02.JPG" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;span style="font-size: large;"&gt;The 93% overlap with the ChEMBL compounds extracted from medicinal chemistry journals is encouragingly corroborative, not withstanding the likelihood&amp;nbsp; that the TTD curators may have used this source for cross-checking and you can see some records point to the same paper.&amp;nbsp; A similarly high overlap with BindingDB also is expected because this mirrors the ChEMBL target-mapped data.&amp;nbsp; Thomson (Reuters) Pharma is predominantly patent extraction but also structures from journals and meeting disclosures. The rest of the intersects in this chart are lower than I expected but would need detailed analysis to see what differential capture strategies and/or chemical representation rules were responsible.&amp;nbsp; According to KEGG they have&amp;nbsp;&lt;a href="http://www.kegg.jp/kegg/docs/statistics.html"&gt;9,773 drugs&lt;/a&gt; so the 2,326 overlap seems low. Similarly the &lt;a href="http://journals.prous.com/journals/servlet/xmlxsl/pk_journals.xml_about_pr?p_JournalID=2"&gt;Drugs of the Future&lt;/a&gt;&lt;/span&gt;&amp;nbsp; &lt;span style="font-size: large;"&gt;journal (formerly Prous now Thomson Reuters) is mainly development compounds so we might have expected more than only 21% of these to match TTD.&amp;nbsp; Also surprising is that the exact match with DB is only 1,548 but consists mainly of approved drugs (see end of post).&amp;nbsp; A less-well known drug source in PubChem is &lt;a href="http://pubchem.ncbi.nlm.nih.gov/assay/assay.cgi?aid=1195&amp;amp;loc=ea_ras"&gt;AID 1195&lt;/a&gt;. This is not actually an assay but a set from the DSSTox (FDAMDD) FDA Maximum (Recommended) Daily Dose Database.&amp;nbsp; I can't work out why the overlap between this and both the other drug databases is so low. Taking the compounds-in-common between TTD and DrugBank (1548) only 567 intersect with AID 1195. Any ideas ?&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;span style="font-size: large;"&gt;The arrival of another curated drug-centric resource&amp;nbsp; in PubChem is of high value for consolidating drug information space, particularly because the chemistry capture and curatorial focus is complementary. It will also be interesting to see what the take up of their QSAR models is.&amp;nbsp; If you want to get a detailed feel for the coverage differences I recommend the inspection of the same DPPIV target in TTD, DrugBank, ChEMBL and BindingDB.&amp;nbsp; I looked at the DrugBank mappings for this target in a &lt;a href="http://cdsouthan.blogspot.com/2011/08/compound-to-target-mappings-part-i.html"&gt;previous post&lt;/a&gt; . I may revisit this important topic of drug data comparison because, word has it, selectable classifications &lt;/span&gt;&lt;span style="font-size: large;"&gt;may soon be added to&lt;/span&gt;&lt;span style="font-size: large;"&gt; the TTD source comment fields&amp;nbsp; (e.g. approved, clinical trial, experimental).&amp;nbsp; You can already use this to select 1392 "approved" CIDs from DrugBank (from starting with slightly more SIDs)&lt;/span&gt;&lt;span style="font-size: large;"&gt; which have 1138 matches from TTD.&amp;nbsp; A cursory inspection of the missing 254 indicates most have have canonical connectivities between DrugBank and TTD with the interesting exeption of&amp;nbsp; Lutein (&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=5368396&amp;amp;loc=ec_rcs"&gt;CID 5368396)&lt;/a&gt;. This is marked up as an "approved" nutracetical in DrugBank, making it clear one needs to use additional Boolean selects from the comment fields. &lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2155351992730855318-3886960209850720864?l=cdsouthan.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://cdsouthan.blogspot.com/2012/02/therapeutic-target-database-comounds.html</link><author>noreply@blogger.com (Chris Southan)</author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://3.bp.blogspot.com/-RE3uH9SLJG8/T0YR3t9mhjI/AAAAAAAAAUA/to_w41iO8ws/s72-c/TTD_01.JPG" height="72" width="72" /><thr:total>0</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-2155351992730855318.post-7362297698033891412</guid><pubDate>Wed, 15 Feb 2012 10:14:00 +0000</pubDate><atom:updated>2012-02-24T02:22:21.794-08:00</atom:updated><category domain="http://www.blogger.com/atom/ns#">BACE1</category><category domain="http://www.blogger.com/atom/ns#">fish</category><category domain="http://www.blogger.com/atom/ns#">BACE2</category><category domain="http://www.blogger.com/atom/ns#">bio</category><title>A fishy BACE story</title><description>&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
(Update: 24 Feb. It was nice to get a mail response from &lt;a href="http://zfin.org/"&gt;Zfin&lt;/a&gt; the Zebrafish database so we'll see what ensues) &lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;span style="font-size: large;"&gt;As part of a rather long manuscript gestation I keep an eye out for new sequences relevant to the evolution of &lt;a href="http://en.wikipedia.org/wiki/Beta-secretase_1"&gt;BACE1&lt;/a&gt; and &lt;a href="http://en.wikipedia.org/wiki/Beta-secretase_2"&gt;BACE2&lt;/a&gt;.&amp;nbsp;&amp;nbsp; You can see the bare bones of the story up to 2009 in this &lt;a href="http://www.slideshare.net/cdsouthan/southanposterprotevojan09"&gt;poster&lt;/a&gt; from JH and&amp;nbsp; myself .&amp;nbsp; Fish occupy an important position because this is where the post whole-genome duplication event resulted in the emergence (and consequent persistence) of the BACE1 and BACE2 paralogues&amp;nbsp; duplicated from the single Ur-BACE&amp;nbsp; ~ 450 million years ago.&amp;nbsp;&amp;nbsp; Fish genomes are not yet so well sampled as our furry land-dwelling cousins, so I pick up the latest ones until the tree gets dense on those branches.&amp;nbsp;&amp;nbsp; An odd thing about the Noah's Ark of genomes is that some creatures don’t make it to Ensembl because they start off as independent projects.&amp;nbsp; I picked up three new fish for the tree in the last few months.&amp;nbsp;&amp;nbsp; Cod made the bigger splash in&amp;nbsp; the news, we already had the Nile tilapia but we now also have a draft &lt;a href="http://en.wikipedia.org/wiki/Coelacanth"&gt;Coelacanth&lt;/a&gt;, although not yet in Ensembl.&amp;nbsp; The &lt;a href="http://www.dinofish.com/discoa.htm"&gt;discovery&lt;/a&gt; of this fish makes a fascinating story because it is the best-known example of a "living fossil" or Lazarus taxon, a species thought to have disappeared from the fossil record 80 million years ago only to be found in South Africa in 1938. The wanted poster below, in Portuguese, French,&amp;nbsp; and English,&amp;nbsp; was used to encourage local fisherman to find the second one, but this took until 1952. &amp;nbsp; &lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-tdVhD2JGfCA/TzuHC_rZRyI/AAAAAAAAASE/jBXHO0UXef4/s1600/FishBACE_05.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="521" src="http://1.bp.blogspot.com/-tdVhD2JGfCA/TzuHC_rZRyI/AAAAAAAAASE/jBXHO0UXef4/s640/FishBACE_05.JPG" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;span style="font-size: large;"&gt;One of the interesting things about sequence phylogeny is you get to have at least superficial encounters with&amp;nbsp; new and strange creatures, indeed there are no less than 32,200 species at &lt;a href="http://www.fishbase.org/home.htm"&gt;Fishbase&lt;/a&gt;.&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;span style="font-size: large;"&gt;On the protein sequence side, collating a BACE aquarium is difficult because few of the gene models seem to be complete. Most are suspiciously short but the cDNA and EST coverage can sometimes be used to fix them.&amp;nbsp; As an example, t&lt;/span&gt;&lt;span style="font-size: large;"&gt;he Medaka BACE1 is absent from the Ensembl genome  build.&amp;nbsp; I reconstructed&amp;nbsp; most of it via BLAT (a fast genomic  TBLASTN-like search) using the &lt;a href="http://genome.ucsc.edu/"&gt;Santa Cruz&lt;/a&gt; browser, followed by EST corroboration,&amp;nbsp; but part of the problem here is that there have  been&amp;nbsp; no Medaka assembly updates since 2005.&amp;nbsp; The NCBI pipeline produced a 606-residue BACE1 version as &lt;a href="http://www.ncbi.nlm.nih.gov/protein/348523664"&gt;XP_003449343&lt;/a&gt; but this is a chimera at the N-terminal.&amp;nbsp; The Ensembl&amp;nbsp; &lt;a href="http://www.ensembl.org/Danio_rerio/Transcript/ProteinSummary?db=core;g=ENSDARG00000044781;r=15:6348899-6373322;t=ENSDART00000065824"&gt;Zebrafish BACE2&lt;/a&gt; is still truncated at&amp;nbsp; 239-residues despite this being the the 9th genome assembly update.&amp;nbsp; We can’t completely rule it out as an a expressed pseudogene until someone purifies an active protein but I have walked out the ORF to a plausible 503 residues vi TBLASTN against the ESTs.&amp;nbsp; You can see (as Query) the match to human BACE2 below, with the active site Asps in red. &lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-_xcpMnnHG4E/Tz0EzBBw_II/AAAAAAAAAT0/4Q9DK7JRhzM/s1600/Zebrafish_hum_BACE2_cons.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="460" src="http://3.bp.blogspot.com/-_xcpMnnHG4E/Tz0EzBBw_II/AAAAAAAAAT0/4Q9DK7JRhzM/s640/Zebrafish_hum_BACE2_cons.JPG" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;/div&gt;&lt;span style="font-size: large;"&gt;It is useful to corroborate simple alignments with feature annotation engines.&amp;nbsp; First of these is an &lt;a href="http://www.ebi.ac.uk/Tools/pfa/iprscan/"&gt;InterPro scan&lt;/a&gt; and, given the the crucial role of the C-terminal transmembrane domain in the biochemistry and evolution of these enzymes,&amp;nbsp; I also run &lt;a href="http://www.cbs.dtu.dk/services/TMHMM/"&gt;TMHMM. &lt;/a&gt;You can see the results for Zebrafish/&lt;i&gt;Danio&lt;/i&gt;&amp;nbsp; BACE2 below&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-0W3zOodzLxo/Tzur4h4S2hI/AAAAAAAAAS0/DcOdor3SyHU/s1600/Danio_BACE2_interpro.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="376" src="http://2.bp.blogspot.com/-0W3zOodzLxo/Tzur4h4S2hI/AAAAAAAAAS0/DcOdor3SyHU/s640/Danio_BACE2_interpro.JPG" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt; &lt;/span&gt;&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;span style="font-size: large;"&gt; &lt;/span&gt;&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-fH_qG5pi1yU/TzusC8o5O3I/AAAAAAAAAS8/Z_FUXMtHrms/s1600/Danio_BACE2_TM.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="300" src="http://2.bp.blogspot.com/-fH_qG5pi1yU/TzusC8o5O3I/AAAAAAAAAS8/Z_FUXMtHrms/s640/Danio_BACE2_TM.JPG" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;It seems to add up here in the sense that we get a signal peptide at one end, the TM domain at the other,&amp;nbsp; with the two catalytic Asps and the specifically BACE-diagnostic PRINTS signatures &lt;/span&gt;&lt;span style="font-size: large;"&gt; (green) &lt;/span&gt;&lt;span style="font-size: large;"&gt;in the middle.&amp;nbsp; The equivalent analyses for Cod BACE2 are shown below.&amp;nbsp;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-e7shTToji3U/TzuzBbS-01I/AAAAAAAAATc/dBoZEkY_vHk/s1600/Cod_BACE2_InterPro.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="386" src="http://1.bp.blogspot.com/-e7shTToji3U/TzuzBbS-01I/AAAAAAAAATc/dBoZEkY_vHk/s640/Cod_BACE2_InterPro.JPG" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-IdUO0HdsH40/TzuSzvOlNoI/AAAAAAAAASc/alYxS8OqUr8/s1600/Cod_BACE2_03.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="304" src="http://1.bp.blogspot.com/-IdUO0HdsH40/TzuSzvOlNoI/AAAAAAAAASc/alYxS8OqUr8/s640/Cod_BACE2_03.JPG" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;span style="font-size: large;"&gt;The interesting feature for Cod BACE2 is the appearance, as detected by the algorithm,&amp;nbsp; a&amp;nbsp; distinct TM-triplet.&amp;nbsp; Whether this turns out to be biochemically real (and note the first pair are too close together to span the membrane) or just hydrophobic runs remains to be established but I have seen these "vestigial"&amp;nbsp; TM-like signals, as in Zebrafish above, in other BACE2s.&amp;nbsp; &lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;span style="font-size: large;"&gt;After putting a FASTA file together of 16 fish BACE1 and 2 pairs I could upload to a choice of&amp;nbsp; multiple allignment tools.&amp;nbsp; The cladogram below was done with &lt;a href="http://www.ebi.ac.uk/Tools/msa/mafft/"&gt;MAFFT&lt;/a&gt; (&lt;b&gt;M&lt;/b&gt;ultiple  &lt;b&gt;A&lt;/b&gt;lignment using  &lt;b&gt;F&lt;/b&gt;ast  &lt;b&gt;F&lt;/b&gt;ourier  &lt;b&gt;T&lt;/b&gt;ransform)&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;span style="font-size: large;"&gt;&lt;a href="http://1.bp.blogspot.com/-Jg8lv8DuxEs/TzuWC08GUhI/AAAAAAAAASk/-JwY5JzeNuU/s1600/FishBACE_maffit_tree.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="202" src="http://1.bp.blogspot.com/-Jg8lv8DuxEs/TzuWC08GUhI/AAAAAAAAASk/-JwY5JzeNuU/s640/FishBACE_maffit_tree.JPG" width="640" /&gt;&lt;/a&gt;&lt;/span&gt;&lt;/div&gt;&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;There is no room to discuss details here but the branch lengths already suggest BACE2 has been less constrained than BACE1. Considering the ORF equivocality, its not a bad set to slot into the eventual bigger picture and further analyses.&amp;nbsp; There are many reasons why any individual protein family won't recapitulate species phylogeny by more global criteria but I found a simple version of this below. &lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-zjhqq2UJZfs/TzuviGY5ifI/AAAAAAAAATU/OXvL1pmmQhg/s1600/FishBACE_04.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="380" src="http://4.bp.blogspot.com/-zjhqq2UJZfs/TzuviGY5ifI/AAAAAAAAATU/OXvL1pmmQhg/s640/FishBACE_04.JPG" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;So,&amp;nbsp; given they have a potential beta-secretase do fish get Alzheimer's ? This would be difficult to test&amp;nbsp; &lt;i&gt;per se&lt;/i&gt; but you can see a fish-specific cluster of&amp;nbsp; APP-like genes in TreeFam &lt;a href="http://www.treefam.org/cgi-bin/TFinfo.pl?ac=TF317274"&gt;TF317274 . &lt;/a&gt;From a cursory inspection it looks like these went through another round of&amp;nbsp; duplication before the four paralogues we have in mammals and indeed there are literature reports of two APPs. The gamma secretase presenililin components are definitely there in&amp;nbsp; &lt;a href="http://www.treefam.org/cgi-bin/TFinfo.pl?ac=TF317274"&gt;TF317274&lt;/a&gt; as a consequence of which inhibitors have been tested in Zebrafish&amp;nbsp; (&lt;a href="http://www.ncbi.nlm.nih.gov/pubmed/21056616"&gt;PMID 21056616&lt;/a&gt;).&amp;nbsp; As you can see below the Zebrafish-human BACE1 similarity is certainly high enough to consider testing BACE1 inhibitors as well.&amp;nbsp; Indeed this recent paper &lt;a href="http://www.ncbi.nlm.nih.gov/pubmed/21271309"&gt;(Modeling neurodegeneration in zebrafish)&amp;nbsp;&lt;/a&gt; suggests they can used for studying this disease even if they don't actually get it.&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-LFbfCybIVqs/Tz0CNuqe8XI/AAAAAAAAATk/Ue0Cwk4_nRs/s1600/Zebrafish_hum_BACE1_cons.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="410" src="http://3.bp.blogspot.com/-LFbfCybIVqs/Tz0CNuqe8XI/AAAAAAAAATk/Ue0Cwk4_nRs/s640/Zebrafish_hum_BACE1_cons.JPG" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2155351992730855318-7362297698033891412?l=cdsouthan.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://cdsouthan.blogspot.com/2012/02/fishy-bace1-and-bace2-story.html</link><author>noreply@blogger.com (Chris Southan)</author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://1.bp.blogspot.com/-tdVhD2JGfCA/TzuHC_rZRyI/AAAAAAAAASE/jBXHO0UXef4/s72-c/FishBACE_05.JPG" height="72" width="72" /><thr:total>2</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-2155351992730855318.post-534305401157669847</guid><pubDate>Mon, 16 Jan 2012 20:30:00 +0000</pubDate><atom:updated>2012-01-20T09:46:15.873-08:00</atom:updated><category domain="http://www.blogger.com/atom/ns#">patents</category><category domain="http://www.blogger.com/atom/ns#">chem</category><title>Suddenly,  it’s raining patent chemistry</title><description>&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
New sources and tools, both public and commercial,  for  the automated extraction of chemical structures from patents, now termed Chemical Named Entity Recognition (CNER), are being declared with increasing frequency.  I’ve pointed to some of the public ones in a &lt;a href="http://cdsouthan.blogspot.com/2011/11/chemicalize.html"&gt;previous post&lt;/a&gt;&amp;nbsp;  and our recent paper (&lt;a href="http://www.citeulike.org/user/cdsouthan/article/9935552"&gt;Sorel et al 2011&lt;/a&gt;) includes a comparison between commercial large-scale automated and manual curation sources for patent chemistry (see fig. 2c).   You can also find some background in this article on target names in patent titles &lt;/span&gt;&lt;span style="font-size: large;"&gt; by our very able Chalmers M.Sc. student &lt;/span&gt;&lt;span style="font-size: large;"&gt;(&lt;a href="http://www.citeulike.org/group/664/article/8637426"&gt;Suriyawongkul  et al 2010&lt;/a&gt;).&amp;nbsp; The flurry of more recent papers includes &lt;a href="http://www.citeulike.org/user/cdsouthan/article/10162989"&gt;Sayle et al 2011&lt;/a&gt; on identifying chemical names in free text,&amp;nbsp; &lt;a href="http://www.citeulike.org/user/cdsouthan/article/9912305"&gt;Jessop et al. 2011&lt;/a&gt; on extraction of chemical reactions and a new public database of structures from &lt;a href="http://www.uspto.gov/patents/init_events/cwupilot.jsp"&gt;USPTO complex work units&lt;/a&gt; (CWUs)  by   &lt;a href="http://www.citeulike.org/user/cdsouthan/article/10081073"&gt;Heifets &amp;amp; Jurisica 2011&lt;/a&gt; . Considering the primary generation of CWUs is manual, the classification of this undertaking as CNER might be a moot point but it does need advanced workflows to populate databases with the extracted individual molecular structures.&amp;nbsp;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;span style="font-size: large;"&gt;These disclosures were added to right at the end of 2011 &lt;/span&gt;&lt;span style="font-size: large;"&gt;by deposition&amp;nbsp; into PubChem from&amp;nbsp; &lt;a href="http://www.prnewswire.com/news-releases/ibm-contributes-data-to-the-national-institutes-of-health-to-speed-drug-discovery-and-cancer-research-innovation-135275888.html"&gt;IBM&lt;/a&gt;&amp;nbsp;of &lt;a href="http://www.ncbi.nlm.nih.gov/sites/entrez?term=%22IBM%22[sourcename]&amp;amp;cmd=search&amp;amp;db=pcsubstance%20%20http://www.ncbi.nlm.nih.gov/pccompound?term=%22IBM%22[sourcename]"&gt;2,480,917&lt;/a&gt;&amp;nbsp; chemical structures extracted from patents up to the year 2000. The press-release has now aquired 66K Google exact-matches.&amp;nbsp; There is some useful technical background in this slide set by &lt;a href="http://cactus.nci.nih.gov/presentations/meeting-08-2011/Fri_Aft_Steve_Boyer.ppt"&gt;SB&lt;/a&gt;. &amp;nbsp; Not-only-but-also  this was actually preceded&amp;nbsp; (by one day according to the submission dates)&amp;nbsp; &lt;/span&gt;&lt;span style="font-size: large;"&gt;by&amp;nbsp; &lt;a href="http://www.ncbi.nlm.nih.gov/pccompound?term=%22sling%20consortium%22[SourceName]"&gt;9609&lt;/a&gt; CIDs from the European Patent Office, collapsed from 100656&lt;/span&gt;&lt;span style="font-size: large;"&gt; SIDs,&amp;nbsp; complete with &lt;a href="http://gb.espacenet.com/search97cgi/s97_cgi.exe?Action=FormGen&amp;amp;Template=gb/EN/home.hts"&gt;Esp@cenet&lt;/a&gt; outlinks.  These&amp;nbsp; disappeared as source name "EPO" but have now reappeared&amp;nbsp; as&amp;nbsp; "sling consortium" (see &lt;/span&gt;&lt;span style="font-size: large;"&gt;&lt;a href="http://www.sling-fp7.org/workpackage/enhancing-patent-data-acquisition"&gt;SLING&lt;/a&gt;)&lt;/span&gt;&lt;span style="font-size: large;"&gt;.&amp;nbsp; I look forward to some kind of announcement and hope it includes details of their extraction process.&amp;nbsp; &lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;span style="font-size: large;"&gt;The links above will give you an overview of where things are at but on the public side there are now a lot of interesting new comparisons to be made and utilities to discern. For now I’ll  just present some initial assessments from inside PubChem.  There are sources (in the SID sense) already in PubChem that have open structure-to-patent document links (S&amp;gt;P) but these are generally restricted to approved drugs in DrugBank and ChEBI (the manual assignments rather that the automated ones). &lt;/span&gt;&lt;span style="font-size: large;"&gt;The two largest sources of manual document extractions in PC&lt;/span&gt;&lt;span style="font-size: large;"&gt; are Discovery Gate and Thomson Pharma.&amp;nbsp; They both include the Derwent World Patent Index Pharmaceutical “B” derived structures and the formal intersect (from an AND between the two compound sources)   is just a tad over 2.7 million.&amp;nbsp; These should be predominantly exemplars from patents but with some journals-in-common structures. However, you do need a subscription to &lt;/span&gt;&lt;span style="font-size: large;"&gt;out to the S&amp;gt;P links in &lt;/span&gt;&lt;span style="font-size: large;"&gt;the respective web applications (or both if you've got enough dosh). I've intersected these with the IBM and SLING sets at the bottom of this post but first you can see a few feature comparisons.&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;One of the utilities of the new IBM set in PubChem is the provision of S&amp;gt;P links, shown below for &lt;/span&gt;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=54738977&amp;amp;loc=ec_rcs"&gt;&lt;span style="font-size: large;"&gt;CID 54738977 &lt;/span&gt;&lt;/a&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-Aq7ryJ5iUb4/TxSSJsMcvbI/AAAAAAAAAP8/rkNUDB0voRg/s1600/Pats+1.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="610" src="http://3.bp.blogspot.com/-Aq7ryJ5iUb4/TxSSJsMcvbI/AAAAAAAAAP8/rkNUDB0voRg/s640/Pats+1.JPG" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;span style="font-size: large;"&gt; &lt;/span&gt;&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-kP8UyVOyI70/TxSUq4DukoI/AAAAAAAAAQE/y3BbML0dpfE/s1600/Pats+4.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="606" src="http://4.bp.blogspot.com/-kP8UyVOyI70/TxSUq4DukoI/AAAAAAAAAQE/y3BbML0dpfE/s640/Pats+4.JPG" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;The&amp;nbsp;&lt;/span&gt;&lt;span style="font-size: large;"&gt;link sequence&amp;nbsp; (&lt;/span&gt;&lt;span style="font-size: large;"&gt;somewhat &lt;/span&gt;&lt;span style="font-size: large;"&gt;clunky it has to be said)&lt;/span&gt;&lt;span style="font-size: large;"&gt; is CID &amp;gt; SID &amp;gt; IBMID &amp;gt; download with the patent and/or PMID number(s) suffixed to the IBMID &amp;gt; search the patent &lt;/span&gt;&lt;span style="font-size: large;"&gt;(on entries with multiple hits such as atorvastatin these are long concatenated strings).&amp;nbsp; This CID&amp;nbsp; turns out to have some interesting connectivity &lt;/span&gt;&lt;span style="font-size: large;"&gt;(it must be the way I pick 'em)&lt;/span&gt;&lt;span style="font-size: large;"&gt;.&amp;nbsp; The first was a handy-size analogue series (the "similar compounds" link)&amp;nbsp;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-RDibtB2MY_0/TxSZwbWCkQI/AAAAAAAAAQU/UTS2Z0RVaAc/s1600/Pats+2.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="244" src="http://4.bp.blogspot.com/-RDibtB2MY_0/TxSZwbWCkQI/AAAAAAAAAQU/UTS2Z0RVaAc/s640/Pats+2.JPG" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;span style="font-size: large;"&gt;It turns out that these all come from the same patent but one of them &amp;nbsp;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=54696526&amp;amp;loc=ec_rcs"&gt;CID 54696526&lt;/a&gt; has assay data and three "older" sources (i.e. SIDs pre-dating the IBM submission) &lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-Zik7uuFwT7U/TxSaPxwPTfI/AAAAAAAAAQc/ohYsMiz2MJw/s1600/Pats+3.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="466" src="http://3.bp.blogspot.com/-Zik7uuFwT7U/TxSaPxwPTfI/AAAAAAAAAQc/ohYsMiz2MJw/s640/Pats+3.JPG" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;Both the ChEMBL and Scripps BioAssay entries are anti-malarial hits (possibly from different assays but you would need to carefully check the AIDs to find out). As&amp;nbsp; an older CibaGeigy filing I guess&amp;nbsp; this Novartis compound made it into the GNF collection.&amp;nbsp; &lt;/span&gt;&lt;span style="font-size: large;"&gt;There is a nice completion of the circle as this turns out to have a ChEMBL-to-ChemSpider-to-SureChem link for the same patent number. &lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;span style="font-size: large;"&gt;Notwithstanding the fact it's rather small set so far the SLING entries also have some neat features as shown below, and they cover at least some published in 2011. &lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;span style="font-size: large;"&gt; &lt;/span&gt;&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-04jz0bEW_YE/Txa0GWI3oiI/AAAAAAAAAQs/pIKRkCzf7KQ/s1600/Pats+6.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="524" src="http://1.bp.blogspot.com/-04jz0bEW_YE/Txa0GWI3oiI/AAAAAAAAAQs/pIKRkCzf7KQ/s640/Pats+6.JPG" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;span style="font-size: large;"&gt; &lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt; You have to use the CID &amp;gt; SID route but the patent number is in the header and linked to &lt;/span&gt;&lt;span style="font-size: large;"&gt;&lt;a href="http://gb.espacenet.com/search97cgi/s97_cgi.exe?Action=FormGen&amp;amp;Template=gb/EN/home.hts"&gt;Esp@cenet&lt;/a&gt;&lt;/span&gt;&lt;span style="font-size: large;"&gt;&amp;nbsp;&lt;/span&gt;&lt;span style="font-size: large;"&gt;via the patent title at the bottom of the SID record.&amp;nbsp; In this case, (an insecticide filing to make a change from pharmaceuticals) while you have to use a "substance"&amp;nbsp; exact match query including the kind-code, you can answer the question "what compounds (have SLING extracted) from this patent number"&amp;nbsp; which is 267 in this case.&amp;nbsp; While I'm not sure how general this is I also got an answer to "which patents is inulin in"&amp;nbsp; via a MeSH lookup (I'd never heard of inulin either).&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;span style="font-size: large;"&gt;However, there's an interesting wrinkle here.&amp;nbsp; While IBM effectively spawns one unique SID-to-many documents (i.e. SIDs = CIDs)&amp;nbsp; the SLING relationship is one document-to-many SIDs (i.e. ~ 10:1 SIDs:CIDs).&amp;nbsp; The effect of this for common extractions (e.g. known drugs) is a multiplexing of SIDs-times-documents.&amp;nbsp;&amp;nbsp; For example the atorvastatin &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=60823&amp;amp;loc=ec_rcs"&gt;CID 60823&lt;/a&gt; now has one new IBM&amp;nbsp; &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?sid=129742546&amp;amp;loc=es_rss"&gt;SID 129742546&lt;/a&gt; that links to 18 patent numbers&amp;nbsp; but also gains 29 new SIDs from the SLING links (i.e. one for each EPO patent number).&amp;nbsp; &lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;span style="font-size: large;"&gt;I'll round off this post by showing some stats generated &lt;/span&gt;&lt;span style="font-size: large;"&gt;(at the CID level) &lt;/span&gt;&lt;span style="font-size: large;"&gt;from just a few minutes in PubChem. &lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-g5bxF9F-vq0/TxarOS_ITsI/AAAAAAAAAQk/vFsvtFONT0g/s1600/Pats+5.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="174" src="http://2.bp.blogspot.com/-g5bxF9F-vq0/TxarOS_ITsI/AAAAAAAAAQk/vFsvtFONT0g/s640/Pats+5.JPG" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;You can extend these by subtraction (or adding the NOT queries) to complete a three-way Venn.&amp;nbsp; I'll leave you to cogitate on the interpretation of these intersects and diffs &lt;/span&gt;&lt;span style="font-size: large;"&gt;(or of course just bash on with your own PubChem slicing-and-dicing) &lt;/span&gt;&lt;span style="font-size: large;"&gt;but just to point out these should be considered apples-and-pears sets from many aspects.&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt; &lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2155351992730855318-534305401157669847?l=cdsouthan.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://cdsouthan.blogspot.com/2012/01/suddenly-its-raining-patent-chemistry.html</link><author>noreply@blogger.com (Chris Southan)</author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://3.bp.blogspot.com/-Aq7ryJ5iUb4/TxSSJsMcvbI/AAAAAAAAAP8/rkNUDB0voRg/s72-c/Pats+1.JPG" height="72" width="72" /><thr:total>0</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-2155351992730855318.post-7526491322217629842</guid><pubDate>Mon, 09 Jan 2012 21:13:00 +0000</pubDate><atom:updated>2012-01-17T14:54:35.024-08:00</atom:updated><category domain="http://www.blogger.com/atom/ns#">DrugBank</category><category domain="http://www.blogger.com/atom/ns#">ChEMBL</category><category domain="http://www.blogger.com/atom/ns#">latamase</category><category domain="http://www.blogger.com/atom/ns#">bio</category><title>On the persitance of  compound to target miss-mappings</title><description>&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The first part of this story is - sort of - our fault.&amp;nbsp; Once upon a time, we wuz digging novel human proteases out of EST data and filing patents as fast as we could clone ‘em&amp;nbsp; (which wasn’t that fast in fact).&amp;nbsp; One of these happened to be homologue of a bacterial serine beta lactamase so we duly filed it as &lt;a href="http://www.ebi.ac.uk/citexplore/citationDetails.do?externalId=WO9957286&amp;amp;dataSource=PAT"&gt;WO9957286&lt;/a&gt; but, because it did not look like the next big drug project target we were permitted to publish. By that time TS had actually crunched out a mouse genomic clone along with the mouse and human cDNAs so we were able to write up&amp;nbsp;&amp;nbsp;&lt;a href="http://www.ncbi.nlm.nih.gov/pubmed/11707067"&gt; “Identification, genomic organization, and mRNA expression of LACTB, encoding a serine beta-lactamase-like protein with an amino-terminal transmembrane domain”&lt;/a&gt;&amp;nbsp; in 2001 and sumitted the sequences to the databases.&amp;nbsp;&amp;nbsp; It ended up with the Approved HGNC name “lactamase beta”&amp;nbsp; and the symbol&amp;nbsp;&amp;nbsp; &lt;a href="http://www.genenames.org/data/hgnc_data.php?hgnc_id=16468"&gt;LACTB&lt;/a&gt;.&amp;nbsp;&amp;nbsp; There’s more stories to tell about this protein, including a interesting case of symbol hijack, filament formation and a reported association with metabolic syndrome but another time.&amp;nbsp; However,&amp;nbsp; just for the record&amp;nbsp; it is not a mitochondrial ribosomal subunit,&amp;nbsp; UniProt gave it a different name “Serine beta-lactamase-like protein” to HGNC and&amp;nbsp; it’s a “dead” peptidase because it lacks catalytically crucial residues.&amp;nbsp; Also, back then, the idea “if we name it after the prokaryotic homologue this will cause future annotation ambiguities” did not occur to us.&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;br /&gt;
If you seach&amp;nbsp; ChEMBL targets with “lactamase” the 95 matches include, as we might expect, 94&amp;nbsp; plausible bacterial targets of many beta lactamase&amp;nbsp; inhbibitory compounds including the famous &lt;a href="http://en.wikipedia.org/wiki/Clavulanic_acid"&gt;clavulanic acid.&amp;nbsp;&lt;/a&gt;&amp;nbsp; However, our human lactamase is also there as the 95th entry, as an (auto) curation "target"&amp;nbsp; &lt;a href="https://www.ebi.ac.uk/chembldb/target/inspect/CHEMBL3791"&gt;CHEMBL3791&lt;/a&gt; with 5 IC50’s&lt;br /&gt;
&lt;/span&gt;&lt;br /&gt;
&lt;/span&gt;&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;span style="font-size: large;"&gt;&lt;span style="font-size: large;"&gt;&lt;a href="http://4.bp.blogspot.com/-NE4A9Vco9gY/TwtaeSWlT1I/AAAAAAAAAOs/OQzkzesEEEg/s1600/Lact1.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="102" src="http://4.bp.blogspot.com/-NE4A9Vco9gY/TwtaeSWlT1I/AAAAAAAAAOs/OQzkzesEEEg/s640/Lact1.jpg" width="640" /&gt;&lt;/a&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;span style="font-size: large;"&gt;&lt;span style="font-size: large;"&gt;&lt;span style="font-size: large;"&gt;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;/span&gt;&lt;br /&gt;
&lt;/span&gt;&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;span style="font-size: large;"&gt;&lt;span style="font-size: large;"&gt;&lt;span style="font-size: large;"&gt;Given the majority of probably correct compound-to-target mappings one error in 95&amp;nbsp; is not bad going To be fair there were more of these in ChEMBL some months back but, since alerting JPO by phone,&amp;nbsp; they were all removed exept this one.&amp;nbsp; However, I decided to do some cross-checking.&amp;nbsp;&amp;nbsp; Shure enough, via their collaboration for data mirroring,&amp;nbsp; BindingDB had picked up “our” lactamase entries from ChEMBL and was still maintaining&amp;nbsp; the links to 508 entries &lt;a href="http://www.bindingdb.org/jsp/dbsearch/PrimarySearch_ki.jsp?tag=pol&amp;amp;submit=Search&amp;amp;target=serine%20beta-lactamase-like%20protein&amp;amp;polymerid=50000604"&gt;(Enz. Inhib. hit(s) with Target = 'Serine beta-lactamase-like protein'&lt;/a&gt;)&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;br /&gt;
&lt;br /&gt;
&lt;/span&gt;&lt;br /&gt;
&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;span style="font-size: large;"&gt;&lt;span style="font-size: large;"&gt;&lt;a href="http://1.bp.blogspot.com/-wdbpiiDcmBw/Twta-ur1MXI/AAAAAAAAAO0/YqPlBQ-6uv0/s1600/Lact2.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="134" src="http://1.bp.blogspot.com/-wdbpiiDcmBw/Twta-ur1MXI/AAAAAAAAAO0/YqPlBQ-6uv0/s640/Lact2.jpg" width="640" /&gt;&lt;/a&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;span style="font-size: large;"&gt;&lt;span style="font-size: large;"&gt;&lt;span style="font-size: large;"&gt;&lt;br /&gt;
At this point I&amp;nbsp; decided to google “serine beta-lactamase-like protein target”&amp;nbsp; to see where else these relationships might have propagated to.&amp;nbsp;&amp;nbsp; I found&amp;nbsp; three hits. One of these was in a new target database &lt;/span&gt;&lt;span style="font-size: large;"&gt;&lt;a href="http://druggable.com/"&gt;Druggable.com&lt;/a&gt;&amp;nbsp; &lt;/span&gt;&lt;span style="font-size: large;"&gt;I’d never heard of&amp;nbsp; ( it does boast a &lt;a href="http://soundcloud.com/jswetnam/swetnam-drugable-bioit-podcast"&gt;podcast&lt;/a&gt; but at only 126 hits so I may not be the only one this resource is new to)&amp;nbsp;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;span style="font-size: large;"&gt;&lt;span style="font-size: large;"&gt;&lt;a href="http://4.bp.blogspot.com/-Tz8WtUqR4vI/TwtcfFEqR2I/AAAAAAAAAPE/339LMc06gWk/s1600/Lact3.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="182" src="http://4.bp.blogspot.com/-Tz8WtUqR4vI/TwtcfFEqR2I/AAAAAAAAAPE/339LMc06gWk/s640/Lact3.jpg" width="640" /&gt;&lt;/a&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;span style="font-size: large;"&gt;&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;They use ChEMBL as one of their sources so this is the explanation&lt;/span&gt;.&amp;nbsp; &lt;span style="font-size: large;"&gt;This also goes for the second match to the&amp;nbsp;&lt;/span&gt; &lt;span style="font-size: large;"&gt;&lt;a href="http://zinc.docking.org/results/annotation?annotation.name=Q46991_ENTCL&amp;amp;annotation.type=B1&amp;amp;page.format=targets"&gt;ZINC&lt;/a&gt; chemistry database where they report the target annotations were picked up in Dec 11.&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;/span&gt;&lt;br /&gt;
&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;span style="font-size: large;"&gt;&lt;span style="font-size: large;"&gt;&lt;a href="http://1.bp.blogspot.com/-Z4fTlu3PVWQ/Twte5_969pI/AAAAAAAAAPM/4OL5MzEP0HM/s1600/Lact4.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="134" src="http://1.bp.blogspot.com/-Z4fTlu3PVWQ/Twte5_969pI/AAAAAAAAAPM/4OL5MzEP0HM/s640/Lact4.jpg" width="640" /&gt;&lt;/a&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;span style="font-size: large;"&gt;&lt;span style="font-size: large;"&gt;&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&amp;nbsp; The protein target has also somehow got in to PubChem BioAssay &lt;a href="http://pubchem.ncbi.nlm.nih.gov/assay/assay.cgi?aid=43596"&gt;AID 43596&amp;nbsp; &lt;/a&gt;although the correct target is clear from the assay title&lt;br /&gt;
&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;br /&gt;
&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;span style="font-size: large;"&gt;&lt;span style="font-size: large;"&gt;&lt;a href="http://1.bp.blogspot.com/-3HtGELx2_c4/Twt1ACO5RaI/AAAAAAAAAPU/RiqloYqAifE/s1600/Lact5.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="612" src="http://1.bp.blogspot.com/-3HtGELx2_c4/Twt1ACO5RaI/AAAAAAAAAPU/RiqloYqAifE/s640/Lact5.jpg" width="640" /&gt;&lt;/a&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;span style="font-size: large;"&gt;&lt;span style="font-size: large;"&gt;&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;Now I was just about to finish this post when I remembered those nice folk at DrugBank had fixed an entry I commented on when I was reviewing drug-to-target mappings for a &lt;a href="http://www.slideshare.net/cdsouthan/compoundstotargets-bioit2011"&gt;BioIT presentation&lt;/a&gt;.&amp;nbsp; You can see the mail below from March 2011. &lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;br /&gt;
&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;span style="font-size: large;"&gt;&lt;span style="font-size: large;"&gt;&lt;a href="http://4.bp.blogspot.com/-w94zRK0sdI0/Twt5da_O-qI/AAAAAAAAAPc/3rJHDKmNYGc/s1600/Lact6.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="402" src="http://4.bp.blogspot.com/-w94zRK0sdI0/Twt5da_O-qI/AAAAAAAAAPc/3rJHDKmNYGc/s640/Lact6.JPG" width="640" /&gt;&lt;/a&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;span style="font-size: large;"&gt;&lt;span style="font-size: large;"&gt;&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;This was fixed and&amp;nbsp; http://www.drugbank.ca/drugs/DB01867&amp;nbsp; became duly deadlinked. However, googling revealed the persistance of this deprecated entry after 8 months. For example the the Wikepedia entry is shown below&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-Dx0QRhbhIms/Twt8jzfy2fI/AAAAAAAAAPk/AWNoqCdp_a0/s1600/Lact7.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="640" src="http://2.bp.blogspot.com/-Dx0QRhbhIms/Twt8jzfy2fI/AAAAAAAAAPk/AWNoqCdp_a0/s640/Lact7.JPG" width="234" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;It had also made it into Druggable&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-garLi0kJuDA/TwuA253wKfI/AAAAAAAAAPs/ZxMMGm4n6S8/s1600/Lact8.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="312" src="http://4.bp.blogspot.com/-garLi0kJuDA/TwuA253wKfI/AAAAAAAAAPs/ZxMMGm4n6S8/s640/Lact8.JPG" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;Paradoxically, the entry even persists as a &lt;a href="http://www4.wiwiss.fu-berlin.de/drugbank/page/drugs/DB01867"&gt;URI&lt;/a&gt;&amp;nbsp; on the Berlin D2R Server publishing the DrugBank data as linked data on the Semantic Web, along with pointers to the 154 non-targets as PDB entries with glycol hetero atoms in the crystal structures. &lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt; &lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;To give them the benefit of doubt some of these sources could have assumed&amp;nbsp; that DrugBank has major releases every couple of years and therefore would not schedule a refresh cycle , not expecting deprecations,&amp;nbsp; but I wonder if some of them are refreshing at all...&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;span style="font-size: large;"&gt;To conclude&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;br /&gt;
1)&amp;nbsp; The first-naming of (inevitably) partially characterized genes and avoiding the noriously confounding “-like” in the new name,&amp;nbsp; is a challenge.&amp;nbsp; After a decade we should probably now call it “mitochondrial respiratory complex I subunit X”.&lt;br /&gt;
2) Dead enzymes that have evolutionarily drifted into doing something completely different, after endosymbiont horizontal transfer in this case,&amp;nbsp; are problematic for annotation , hence the perpetual homology-based classification of this as a protease. &lt;br /&gt;
3) The increasing circularity of database connectivity has utility&amp;nbsp; but, as we all know, errors can consequently propagate rapidly, inexorably, globally, and persist.&lt;br /&gt;
4)&amp;nbsp; Given I made the effort to contact the ChEMBL primary source for the LACTB and send the mail to DrugBank about their toxic drug life’s too short for me to contact all the secondary sources that picked these entries up, even though I happen to know some of them (unless they happen to read this or, preferably, buy me drink at a conference). However,&amp;nbsp; I’ll see how long these errors persist.&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2155351992730855318-7526491322217629842?l=cdsouthan.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://cdsouthan.blogspot.com/2012/01/our-human-beta-lactamase-is-not_09.html</link><author>noreply@blogger.com (Chris Southan)</author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://4.bp.blogspot.com/-NE4A9Vco9gY/TwtaeSWlT1I/AAAAAAAAAOs/OQzkzesEEEg/s72-c/Lact1.jpg" height="72" width="72" /><thr:total>0</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-2155351992730855318.post-4370515789714648935</guid><pubDate>Tue, 15 Nov 2011 22:55:00 +0000</pubDate><atom:updated>2012-01-05T13:33:49.843-08:00</atom:updated><category domain="http://www.blogger.com/atom/ns#">patents</category><category domain="http://www.blogger.com/atom/ns#">chem</category><title>Chemicalize and tropical disease patents</title><description>&lt;span style="font-size: large;"&gt;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;span style="font-size: large;"&gt;This post is the result of a&amp;nbsp; conjunction between two events The first was that chemicalize.org came up in the &lt;a href="http://www.linkedin.com/groups/Cheminformatics-131234"&gt;LinkedIN Cheminformatics group&lt;/a&gt; just recently so I&amp;nbsp; decided to give it a re-spin having been impressed with earlier versions.&amp;nbsp; The second was the announcement last week of &lt;a href="http://www.wipo.int/research/en/"&gt;WIPO Re:Search&lt;/a&gt;. So I've combined trying them both out, synergistically you might say. &lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;span style="font-size: large;"&gt; &lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;WipoRe:Search is a&amp;nbsp; public database of intellectual property assets and resources for neglected diseases (NTDs).&amp;nbsp;&amp;nbsp; This is a big thing from a few angles, the main one being that the parties signing up, such as AstraZeneca,&amp;nbsp; Novartis,&amp;nbsp; GSK, Sanofi and NIH among others,&amp;nbsp; will allow, under certain terms, licence-free use of their IP protfolios (and other stuff surfaced in the database) for NTD research.&amp;nbsp;&amp;nbsp; It has to be said the database has only 140 entries so far but there were a few interesting patents such as&amp;nbsp; &lt;a href="http://www.wipo.int/research/en/details.jsp?id=688"&gt;US20110207784&lt;/a&gt; “ &lt;b&gt;Compositions and methods for the treatment of Schistosomiasis in mammals. The various compositions are based on a number of compounds derived from 1,2,5-oxadiazole that are potent inhibitors of thioredoxin glutathione reductase (TGR), a critical parasite redox protein”&lt;/b&gt;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;span style="font-size: large;"&gt; &lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;This caught my eye not only because it was an NIH patent but I had sequenced a glutathione transferase from the same organism many moons ago. The utility of this database to flag up relevant patents notwithstanding it does nothing (yet) to make the data any easier to get from the documents (and NIH use Markush nests just&amp;nbsp; like pharma companies – via the same drafting atourneys ?).&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;span style="font-size: large;"&gt;Thus,&amp;nbsp; this seemed a real-world case to try out a simple workflow to get some chemistry out of the patent (bypassing the grunt work of sketching).&amp;nbsp;&amp;nbsp; The first&amp;nbsp; step was&amp;nbsp; locating the patent in a full-text source.&amp;nbsp; While these have sprung up like mushrooms lately, including Google, &lt;/span&gt;&lt;span style="font-size: large;"&gt;&lt;a href="http://www.freepatentsonline.com/"&gt;Freepatentsonline&lt;/a&gt;&amp;nbsp;&lt;/span&gt;&lt;span style="font-size: large;"&gt; is a good choice as this was the “substrate” used by SureChem before they moved to&amp;nbsp; DigitalScience and it also has a useful interface where you can combine metadata queries with free text matches. &amp;nbsp; The second step was&amp;nbsp; simply popping the freepatents url into &lt;a href="http://www.chemicalize.org/?url=http%3A%2F%2Fwww.freepatentsonline.com%2Fy2011%2F0207784.html&amp;amp;source=fp"&gt;chemicalize&lt;/a&gt;.&amp;nbsp; In this case it worked well and you can spot&amp;nbsp; the line of drug-like exemplars on the scroll bar and locate these as the IUPAC exemplifications in the document. &lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-53HmFsyym8E/TsLxRvjSWPI/AAAAAAAAAM0/IgvMJD1KKDo/s1600/Chemicalize01.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="312" src="http://3.bp.blogspot.com/-53HmFsyym8E/TsLxRvjSWPI/AAAAAAAAAM0/IgvMJD1KKDo/s640/Chemicalize01.jpg" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;span style="font-size: large;"&gt; &lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;The third step of my workflow is a PubChem check to see  where the extracted structure&amp;nbsp; might fit in the space of 30 million CIDs that includes about 2.7&amp;nbsp; million extracted from patents by Derwent and dropped  in via Thomson Pharma and Discovery Gate.&amp;nbsp;&amp;nbsp; I do this by &lt;/span&gt;&lt;span style="font-size: large;"&gt;popping the SMILES&amp;nbsp; that chemicalize generated (O=N1=C(C#N)C(=NO1)C1=CC=C(C=C1)C1=CC=CC=C1), into the PubChem structure search box.&amp;nbsp;&amp;nbsp; However, there was a wrinkle here because I got the following warning:&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-iNtwUZ-dlcQ/TsLzM5tKq9I/AAAAAAAAAM8/iWirtWFa1Tg/s1600/Chemicalize02.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="372" src="http://2.bp.blogspot.com/-iNtwUZ-dlcQ/TsLzM5tKq9I/AAAAAAAAAM8/iWirtWFa1Tg/s640/Chemicalize02.jpg" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;span style="font-size: large;"&gt; &lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;After a bit of messing around with the Sketcher and SMILES string without getting very far I decided to try &lt;a href="http://opsin.ch.cam.ac.uk/"&gt;OPSIN&lt;/a&gt;.&amp;nbsp; It converted and rendered the structure in a different way with the balanced N dipole charges&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-tRIihIYOa50/TsLzyEGthYI/AAAAAAAAANE/aWlCU1QpR-w/s1600/Chemicalize03.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="388" src="http://2.bp.blogspot.com/-tRIihIYOa50/TsLzyEGthYI/AAAAAAAAANE/aWlCU1QpR-w/s640/Chemicalize03.jpg" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;This SMILES&amp;nbsp; C(#N)C1=[N+](ON=C1C1=CC=C(C=C1)C1=CC=CC=C1)[O-]&amp;nbsp; renders OK in the PubChem search and nicely whacks&amp;nbsp; &lt;a href="http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=search&amp;amp;db=pccompound&amp;amp;term=25269735"&gt;CID 25269735&lt;/a&gt; as an identity. This squares the circles nicely when you see the four SIDs under the CID&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-11SdaP-b21k/TsMEXyIWp1I/AAAAAAAAANM/ubP61iHIk_Y/s1600/Chemicalize04.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="640" src="http://3.bp.blogspot.com/-11SdaP-b21k/TsMEXyIWp1I/AAAAAAAAANM/ubP61iHIk_Y/s640/Chemicalize04.jpg" width="590" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;Thomson Pharma were first to drop it in 2009 followed by&amp;nbsp; Discovery Gate,&amp;nbsp; ChEMBL picked it up from a publication and BindingDB picked it up from ChEMBL.&amp;nbsp; Following &lt;a href="https://www.ebi.ac.uk/chembldb/index.php/compound/inspect/CHEMBL576082"&gt;CHEMBL576082&lt;/a&gt; connects to a publication from the NCGC folks about some of the same structures we have just dug out of the patent.&amp;nbsp; It could have saved time if &lt;/span&gt;&lt;span style="font-size: large;"&gt;WipoRe:Search had linked to this in the first place but note that even though CiteExplore makes a good job of including the 9 pointers to ChEMBL structures (see below) chemicalize pulls more exemplars out of the patent than are in the paper (but the SAR is more explicit in the latter).&amp;nbsp; I'm still puzzled by the dates though, because it looks like the paper was before the patent ? &amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;span style="font-size: large;"&gt; &lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-IntDMMrEm0k/TsQl1hJzjBI/AAAAAAAAANc/bTPJx3unWPQ/s1600/Chemicalize05.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="398" src="http://3.bp.blogspot.com/-IntDMMrEm0k/TsQl1hJzjBI/AAAAAAAAANc/bTPJx3unWPQ/s640/Chemicalize05.jpg" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-yAZCRP9sVYg/TsMFalsyqwI/AAAAAAAAANU/xC6dG8zjXGs/s1600/Chemicalize05.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;br /&gt;
&lt;/a&gt;&lt;/div&gt;&lt;span style="font-size: large;"&gt;Notwithstanding,&amp;nbsp; there was an unexpected bonus.&amp;nbsp; This paper had a recent citation specifying additional inhibitors in &lt;a href="http://www.ebi.ac.uk/citexplore/citationDetails.do?externalId=20440624&amp;amp;dataSource=MED"&gt;PIMID 20440624&lt;/a&gt;&amp;nbsp; &lt;/span&gt;          &lt;b&gt;&lt;span id="atitle"&gt; &lt;span style="font-size: large;"&gt;"Biological and immunological activity of new imidazolidines against adult worms of Schistosoma mansoni" &lt;/span&gt;&lt;/span&gt;&lt;/b&gt;&lt;span id="atitle"&gt;&lt;span style="font-size: large;"&gt;.&amp;nbsp; The wrinkle here was that the string for one of the two leads in the abstract,&amp;nbsp; &lt;/span&gt;&lt;/span&gt;&lt;span style="font-size: large;"&gt;5-(4-chloro-benzylidene-3-(4-nitrebenzyl)-4-thioxo-imidazolidin-2-one,&amp;nbsp;&lt;/span&gt;&lt;span id="atitle"&gt;&lt;span style="font-size: large;"&gt; would not convert in OPSIN or chemicalize&lt;/span&gt;&lt;/span&gt;&lt;span style="font-size: large;"&gt;&lt;span style="font-size: large;"&gt;, but the latter executes a web search with the IUPAC string and hits the same abstract.&amp;nbsp;&amp;nbsp; I managed to "fix" this to&lt;/span&gt; 5-4-Chlorobenzylidene-3-(4-nitrobenzyl)-4-thioxo-imidazolidin-2-one, &lt;/span&gt;&lt;span id="atitle"&gt;&lt;span style="font-size: large;"&gt;that then converted in both tools. &amp;nbsp; The difference in this case is that it (left hand side below) is PubChem negative for exact match but does hit a close analogue (right hand side below) &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=9999435&amp;amp;loc=ec_rcs"&gt;CID 9999435.&lt;/a&gt;&lt;/span&gt;&lt;/span&gt; &amp;nbsp;&lt;span id="atitle"&gt;&lt;span style="font-size: large;"&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-EkGmp5_YctA/TsQ4jCctc7I/AAAAAAAAANk/CSa9Sycm0xo/s1600/Chemicalize06.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="310" src="http://1.bp.blogspot.com/-EkGmp5_YctA/TsQ4jCctc7I/AAAAAAAAANk/CSa9Sycm0xo/s640/Chemicalize06.JPG" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;span id="atitle"&gt;&lt;span style="font-size: large;"&gt;This has no information in the links but was dropped in by Thomson Pharma in 2006.&amp;nbsp; So far so good and we have done citation "walk" to another anti-schistosomal structure but not with particularly rich pickings.&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;span id="atitle"&gt;&lt;span style="font-size: large;"&gt;OK so lets bypass &lt;/span&gt;&lt;/span&gt;&lt;span style="font-size: large;"&gt;&lt;a href="http://www.wipo.int/research/en/"&gt;WIPO Re:Search&lt;/a&gt; and go straight in to freepatentsonline with the following example.&amp;nbsp; Toggle on the full set of patents and set "malaria" in any field.&amp;nbsp; The result is 40,404 so we can restrict to IPC code C07D for a reasonable specificity for medicinal chemistry patents, to get 9,102.&amp;nbsp; Last step is to look for IC50 OR Ki on the description as a screening filter.&amp;nbsp; High up in the ranking you see these.&amp;nbsp; &lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-Pezqsyx400E/TsrZ7N1NloI/AAAAAAAAANw/Y33XUiyk64E/s1600/Chemicalize10.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="208" src="http://3.bp.blogspot.com/-Pezqsyx400E/TsrZ7N1NloI/AAAAAAAAANw/Y33XUiyk64E/s640/Chemicalize10.JPG" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;Given the similar titles and relevance score this turns out to be a patent family from GSK&amp;nbsp; (that's OK then 'cause they are in the WIPO Re:seach club).&amp;nbsp; Pop any urls of these into Chemicalize and you get a bumper crop of no less than 333 structures. What you do next is drop the whole lot out as an SDF file and then load that file up to PubChem structure search where it duly renders the whole lot (bar 3 in this case)&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-FaSgeU7eQW8/Tsra327YRAI/AAAAAAAAAN4/gSDWfotNFe0/s1600/Chemicalize07.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="426" src="http://1.bp.blogspot.com/-FaSgeU7eQW8/Tsra327YRAI/AAAAAAAAAN4/gSDWfotNFe0/s640/Chemicalize07.JPG" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;But before you fire the whole lot against PubChem set a filter for Mw 300 to remove most of the reagents.&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&amp;nbsp;&lt;/span&gt;&lt;span id="atitle"&gt;&lt;span style="font-size: large;"&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-WVcfH_ZpE3Y/TsrcWOwXueI/AAAAAAAAAOI/Z8U4SUHnV-4/s1600/Chemicalize08.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="454" src="http://3.bp.blogspot.com/-WVcfH_ZpE3Y/TsrcWOwXueI/AAAAAAAAAOI/Z8U4SUHnV-4/s640/Chemicalize08.JPG" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;
&lt;span id="atitle"&gt;&lt;span style="font-size: large;"&gt;Voila,&amp;nbsp; we loose 28 because they are not in PubChem but pick up 74 which not only look like an SAR analogue series but they also cluster nicely.&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;span style="font-size: large;"&gt;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-WLBCzNg5cgU/TsrcwvRMZ6I/AAAAAAAAAOQ/vv2Hsoiy3U4/s1600/Chemicalize09.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="508" src="http://1.bp.blogspot.com/-WLBCzNg5cgU/TsrcwvRMZ6I/AAAAAAAAAOQ/vv2Hsoiy3U4/s640/Chemicalize09.JPG" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;
&lt;span id="atitle"&gt;&lt;span style="font-size: large;"&gt;Now if your are used to this you recognize that the sequencial CID set must have come from the same patent as they are all from Thomson Pharma. Technically&amp;nbsp; you don't&amp;nbsp; know which patent (without a subscription) but its a good bet its from the family we started with. &amp;nbsp; I could not spot any ChEMBL links suggesting inclusion in a journal article. However, in this case you still can do a neat circle squaring by choosing one compound, &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=11848054&amp;amp;loc=ec_rcs"&gt;CID 11848054&lt;/a&gt;&lt;/span&gt;&lt;/span&gt;&lt;span id="atitle"&gt;&lt;span style="font-size: large;"&gt;, opening up to the four SIDs and selecting the&amp;nbsp; ChemSpider link at &lt;a href="http://www.chemspider.com/Chemical-Structure.10022528.html"&gt;10022528&lt;/a&gt;. You can then make the connection below.&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-fE1Agv9qY30/Tsrjcwz0mxI/AAAAAAAAAOY/uuxSiy66GrU/s1600/Chemicalize11.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="302" src="http://3.bp.blogspot.com/-fE1Agv9qY30/Tsrjcwz0mxI/AAAAAAAAAOY/uuxSiy66GrU/s640/Chemicalize11.JPG" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;
&lt;span id="atitle"&gt;&lt;span style="font-size: large;"&gt;The ChemSpider record makes an InChI look-up across to SureChem so you can open up a patent from the same family.&amp;nbsp; Last but not least you will find some SAR data in the body of these patents but the parasite IC50 assays have ranged&amp;nbsp; not discrete values.&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span id="atitle"&gt;&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span id="atitle"&gt;&lt;span style="font-size: large;"&gt;N.B.&amp;nbsp; Chemicalize archives all structures converted from urls put in by users so the rest of that series from &lt;/span&gt;&lt;/span&gt;&lt;span style="font-size: large;"&gt;&lt;a href="http://www.wipo.int/research/en/details.jsp?id=688"&gt;US20110207784&lt;/a&gt; for &lt;/span&gt;&lt;span style="font-size: large;"&gt;Schistosomiasis&lt;b&gt; &lt;/b&gt;&lt;/span&gt;&lt;span style="font-size: large;"&gt;and the &lt;a href="http://www.freepatentsonline.com/WO2006094799A3.html"&gt;WO2006094799&lt;/a&gt; anti-malaria set described above are both there &lt;/span&gt;&lt;span id="atitle"&gt;&lt;span style="font-size: large;"&gt;to be picked up &lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span id="atitle"&gt;&lt;span style="font-size: large;"&gt; &lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2155351992730855318-4370515789714648935?l=cdsouthan.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://cdsouthan.blogspot.com/2011/11/chemicalize.html</link><author>noreply@blogger.com (Chris Southan)</author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://3.bp.blogspot.com/-53HmFsyym8E/TsLxRvjSWPI/AAAAAAAAAM0/IgvMJD1KKDo/s72-c/Chemicalize01.jpg" height="72" width="72" /><thr:total>0</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-2155351992730855318.post-8703707314716210230</guid><pubDate>Tue, 27 Sep 2011 20:54:00 +0000</pubDate><atom:updated>2011-09-28T13:15:20.915-07:00</atom:updated><category domain="http://www.blogger.com/atom/ns#">citations</category><category domain="http://www.blogger.com/atom/ns#">bio</category><title>So whats it like to get to 50?</title><description>&lt;span style="font-size: large;"&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;span style="font-size: large;"&gt;Well,&amp;nbsp; I&amp;nbsp; had my 50th Party at the &lt;a href="http://whiteswantwickenham.com/"&gt;White Swan&lt;/a&gt; Pub in Twickenham (pictured below), which was&amp;nbsp; nice but actually some time ago, ‘nuff said.&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-Lz5E0C0erIc/ToI5H0sqaJI/AAAAAAAAAMY/qc6jnuASsys/s1600/White_Swan_Twick.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="277px" src="http://2.bp.blogspot.com/-Lz5E0C0erIc/ToI5H0sqaJI/AAAAAAAAAMY/qc6jnuASsys/s400/White_Swan_Twick.jpg" width="400px" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;However, what happened more recently, at the beginning of this September,&amp;nbsp; was the appearance of my &lt;a href="http://www.ncbi.nlm.nih.gov/pubmed?term=Southan%20C"&gt;50th entry in PubMed&lt;/a&gt;&amp;nbsp; which was our “&lt;a href="http://www.ncbi.nlm.nih.gov/pubmed/21878981"&gt;Minimum information about a bioactive entity (MIABE)&lt;/a&gt;”&amp;nbsp; collaborative effort that made a bit of a splash because of the&amp;nbsp; &lt;a href="http://www.ebi.ac.uk/Information/News/press-releases/press-release-30082011-MIABE.html"&gt;EBI press release&lt;/a&gt;.&amp;nbsp;&lt;/span&gt;&lt;span style="font-size: large;"&gt; There is a duplicatate in the list as &lt;a href="http://www.ncbi.nlm.nih.gov/pubmed/21205783"&gt;PMID 21205783&lt;/a&gt;  but this was planned a such by the BioDB working group and, in any  case,&amp;nbsp; I'm pleased to report publication 51 is only a few weeks away.&amp;nbsp;&lt;/span&gt;&lt;span style="font-size: large;"&gt; In fact&amp;nbsp;&lt;a href="http://www.ncbi.nlm.nih.gov/pubmed?term=%22Southan%20C%22[Author]"&gt;Southan C [Author]&lt;/a&gt;&amp;nbsp; currently only brings back 49 publications because it misses&amp;nbsp;&lt;a href="http://www.ncbi.nlm.nih.gov/pubmed?term=%22Southan%20CD%22[Author]"&gt;Southan CD [Author]&lt;/a&gt; which happens to be my highest cited paper at 785 by Google Scholar and 613 from ISI.&amp;nbsp; I&amp;nbsp; slightly regret not using my second initial from the outset of my publishing efforts because my patent entries are Christopher Donald&amp;nbsp; Southan (in full) which was where my colleagues got my name from for the draft.&amp;nbsp; As it happens this Asp2 paper (a.k.a. beta secretase a.k.a. &lt;a href="http://www.ncbi.nlm.nih.gov/gene/23621"&gt;BACE1&lt;/a&gt;) was written in somewhat of rush (over a weekend because we had been pipped to the post by earlier publications in that year) so the D (from my maternal Grandfather) stayed in.&amp;nbsp; While there are five other "Southans" in PubMed there is no other "Southan C" (yet) and in any case I have my Universal Author ID (below). &lt;/span&gt;&lt;span style="font-size: large;"&gt;In that link you can see the ISI statistics including the citation graph below and my &lt;a href="http://en.wikipedia.org/wiki/H-index"&gt;h-index&lt;/a&gt; of 21.&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&amp;nbsp; &lt;/span&gt;&lt;br /&gt;
&lt;span id="badgeCont816597" style="font-size: large; width: 126px;"&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;script src="http://labs.researcherid.com/mashlets?el=badgeCont816597&amp;amp;mashlet=badge&amp;amp;showTitle=false&amp;amp;className=a&amp;amp;rid=B-1550-2010"&gt;
&lt;/script&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&amp;nbsp; &lt;/span&gt;&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-rZYqMHdT_vU/ToJBx-aCPHI/AAAAAAAAAMo/SQHlJGwL25g/s1600/Citation_graph_Sep_2011.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="262px" src="http://1.bp.blogspot.com/-rZYqMHdT_vU/ToJBx-aCPHI/AAAAAAAAAMo/SQHlJGwL25g/s640/Citation_graph_Sep_2011.JPG" width="640px" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;I’m pretty satisfied with these, particularly as I have been in the commercial sector since the 15th entry.&amp;nbsp; Regrets ? Well, I wished I’d knocked out a few more HPLC papers in that flat spot of 92-95, and, it’s a cliché of course but we had to be runners-up behind Nature papers (due to patents in the pipe) for&amp;nbsp;&lt;a href="http://www.ncbi.nlm.nih.gov/pubmed/8624782"&gt;PMID 8624782&lt;/a&gt;, &lt;a href="http://www.ncbi.nlm.nih.gov/pubmed/9494101"&gt;PMID 9494101&lt;/a&gt; and the BACE1 paper &lt;a href="http://www.ncbi.nlm.nih.gov/pubmed/10656250"&gt;PMID 10656250&lt;/a&gt;.&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;span style="font-size: large;"&gt;So what does this stuff cover ? What you can see below are two "&lt;a href="http://www.wordle.net/"&gt;Tag Clouds"&lt;/a&gt;&amp;nbsp; generated from pasted in abstracts and titles. The top one is the most recent 25 PubMeds, the lower one is the first 25. &lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-kDQkIeH6L4k/ToI8NY4gOqI/AAAAAAAAAMc/5NNa9Y3kVzs/s1600/PubMeds_last_25.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="354px" src="http://4.bp.blogspot.com/-kDQkIeH6L4k/ToI8NY4gOqI/AAAAAAAAAMc/5NNa9Y3kVzs/s640/PubMeds_last_25.JPG" width="640px" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-l90lYMCqFV4/ToI8bFombwI/AAAAAAAAAMk/KwAxnr_oREU/s1600/PubMeds_first_25.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="350px" src="http://3.bp.blogspot.com/-l90lYMCqFV4/ToI8bFombwI/AAAAAAAAAMk/KwAxnr_oREU/s640/PubMeds_first_25.JPG" width="640px" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;These were filtered for common english words but they indicate I should&amp;nbsp;cut&amp;nbsp;back&amp;nbsp;or synonymously substitute for&amp;nbsp;my usage of "showed" and "however".&amp;nbsp;&amp;nbsp; You can see clear theme differences but&amp;nbsp;there is&amp;nbsp;some overlap, for example in proteases and sequences.&amp;nbsp; This&amp;nbsp;division into two halves makes an approximate split between my &lt;i&gt;in vitro&lt;/i&gt; and &lt;i&gt;in silico&lt;/i&gt; careers, although these ran paralell for some years&amp;nbsp; between by first bioinformatics publications&amp;nbsp; &lt;a href="http://www.ncbi.nlm.nih.gov/pubmed/2792366"&gt;(PMID 2792366&lt;/a&gt;) and my last HPLC paper ( &lt;a href="http://www.ncbi.nlm.nih.gov/pubmed/10419630"&gt;PMID 10419630)&lt;/a&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2155351992730855318-8703707314716210230?l=cdsouthan.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://cdsouthan.blogspot.com/2011/09/so-whats-it-like-to-get-to-50-well-i.html</link><author>noreply@blogger.com (Chris Southan)</author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://2.bp.blogspot.com/-Lz5E0C0erIc/ToI5H0sqaJI/AAAAAAAAAMY/qc6jnuASsys/s72-c/White_Swan_Twick.jpg" height="72" width="72" /><thr:total>0</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-2155351992730855318.post-4063492211617649122</guid><pubDate>Thu, 15 Sep 2011 20:14:00 +0000</pubDate><atom:updated>2011-10-08T08:15:40.195-07:00</atom:updated><category domain="http://www.blogger.com/atom/ns#">chem</category><category domain="http://www.blogger.com/atom/ns#">INNs</category><title>Checking  some summer USANs</title><description>&lt;span style="font-size: large;"&gt;This list of&amp;nbsp; &lt;a href="http://chembl.blogspot.com/2011/07/july-2011-usans.html"&gt;July USANs&lt;/a&gt; has been copied over (with thanks) from the ChEMBL blog to follow up a few things.&amp;nbsp; I chose it because as small list I could quickly do three things 1) an (exact match) Google count of the name,&amp;nbsp; 2) the same with the research code and 3) a PubChem mapping either by a name hit or scraping the IUPAC out of the PDF, pasting in to OPSIN and then pasting the SMILES into the PubChem search box. You can see the slightly surprising results below.&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;table border="1" style="height: 587px; width: 854px;"&gt;&lt;tbody&gt;
&lt;tr&gt; &lt;td style="text-align: center; width: 96px;"&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;USAN &lt;/b&gt;&lt;/span&gt;&lt;/td&gt; &lt;td style="text-align: center; vertical-align: top; width: 69px;"&gt;&lt;span style="font-size: large; font-weight: bold;"&gt;G-hits &lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;/td&gt; &lt;td style="text-align: center; width: 118px;"&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;Research Code &lt;/b&gt;&lt;/span&gt;&lt;/td&gt; &lt;td style="text-align: center; vertical-align: top; width: 101px;"&gt;&lt;span style="font-size: large; font-weight: bold;"&gt;G-hits&lt;/span&gt;&lt;/td&gt; &lt;td style="text-align: center; width: 303px;"&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;PubChem &lt;br /&gt;
&lt;br /&gt;
&lt;/b&gt;&lt;/span&gt;&lt;/td&gt; &lt;td style="text-align: center; width: 127px;"&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;Notes &lt;br /&gt;
&lt;br /&gt;
&lt;/b&gt;&lt;/span&gt;&lt;/td&gt; &lt;/tr&gt;
&lt;tr&gt; &lt;td style="text-align: center; width: 96px;"&gt;&lt;span style="font-size: large;"&gt;&lt;a href="http://www.ama-assn.org/resources/doc/usan/alisporivir.pdf"&gt;alisporivir&lt;/a&gt;&lt;/span&gt;&lt;/td&gt; &lt;td style="text-align: center; vertical-align: top; width: 69px;"&gt;&lt;span style="font-size: large;"&gt;16.7K&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;/td&gt; &lt;td style="text-align: center; width: 118px;"&gt;&lt;span style="font-size: large;"&gt;DEB-025, Debio-025&lt;/span&gt;&lt;/td&gt; &lt;td style="text-align: center; vertical-align: top; width: 101px;"&gt;&lt;span style="font-size: large;"&gt;6.7K&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;/td&gt; &lt;td style="text-align: center; width: 303px;"&gt;&lt;span style="font-size: large;"&gt;CID &lt;a href="http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=search&amp;amp;db=pccompound&amp;amp;term=49767348%5Buid%5D"&gt;49767348&lt;/a&gt;,&amp;nbsp; CID &lt;a href="http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=search&amp;amp;db=pccompound&amp;amp;term=11513676%5Buid%5D"&gt;11513676&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;/td&gt; &lt;td style="width: 127px;"&gt;&lt;span style="font-size: large;"&gt;&lt;a href="http://en.wikipedia.org/wiki/Alisporivir"&gt;Wikepedia&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;/td&gt; &lt;/tr&gt;
&lt;tr&gt; &lt;td style="text-align: center; width: 96px;"&gt;&lt;span style="font-size: large;"&gt;&lt;a href="http://www.ama-assn.org/resources/doc/usan/efinaconazole.pdf"&gt;efinaconazole&lt;br /&gt;
&lt;/a&gt;&lt;/span&gt;&lt;/td&gt; &lt;td style="text-align: center; vertical-align: top; width: 69px;"&gt;&lt;span style="font-size: large;"&gt;103&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;/td&gt; &lt;td style="text-align: center; width: 118px;"&gt;&lt;span style="font-size: large;"&gt;KP-103&lt;/span&gt;&lt;/td&gt; &lt;td style="text-align: center; vertical-align: top; width: 101px;"&gt;&lt;span style="font-size: large;"&gt;false&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;pos&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;/td&gt; &lt;td style="text-align: center; width: 303px;"&gt;&lt;span style="font-size: large;"&gt;CID &lt;a href="http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=search&amp;amp;db=pccompound&amp;amp;term=489181%5Buid%5D"&gt;489181,&lt;/a&gt; CID &lt;a href="http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=search&amp;amp;db=pccompound&amp;amp;term=489129%5Buid%5D"&gt;489129&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;CID &lt;a href="http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=search&amp;amp;db=pccompound&amp;amp;term=9819704%5Buid%5D"&gt;9819704&lt;/a&gt;&lt;/span&gt;&lt;/td&gt; &lt;td style="width: 127px;"&gt;&lt;span style="font-size: large;"&gt;&amp;nbsp;&amp;nbsp; As&amp;nbsp; (2R,3R), (2S,3R)&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;and flat&lt;/span&gt;&lt;/td&gt; &lt;/tr&gt;
&lt;tr&gt; &lt;td style="text-align: center; width: 96px;"&gt;&lt;span style="font-size: large;"&gt;&lt;a href="http://www.ama-assn.org/resources/doc/usan/esomeprazole-strontium.pdf"&gt;esomeprazole&lt;br /&gt;
strontium&lt;/a&gt;&lt;/span&gt;&lt;/td&gt; &lt;td style="text-align: center; vertical-align: top; width: 69px;"&gt;&lt;span style="font-size: large;"&gt;3.1K&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;/td&gt; &lt;td style="text-align: center; width: 118px;"&gt;&lt;span style="font-size: large;"&gt;FM-0F67&lt;/span&gt;&lt;/td&gt; &lt;td style="text-align: center; vertical-align: top; width: 101px;"&gt;&lt;span style="font-size: large;"&gt;1&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;/td&gt; &lt;td style="text-align: center; width: 303px;"&gt;&lt;span style="font-size: large;"&gt;CID &lt;a href="http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=search&amp;amp;db=pccompound&amp;amp;term=16048656%5Buid%5D"&gt;16048656.&lt;/a&gt;CID &lt;a href="http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=search&amp;amp;db=pccompound&amp;amp;term=44464000%5Buid%5D"&gt;44464000&lt;/a&gt;&lt;/span&gt;&lt;/td&gt; &lt;td style="width: 127px;"&gt;&lt;span style="font-size: large;"&gt;R &amp;amp;S&amp;nbsp; tetrahydrate&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;/td&gt; &lt;/tr&gt;
&lt;tr&gt; &lt;td style="text-align: center; width: 96px;"&gt;&lt;span style="font-size: large;"&gt;&lt;a href="http://www.ama-assn.org/resources/doc/usan/mapracorat.pdf"&gt;mapracorat&lt;br /&gt;
&lt;/a&gt;&lt;/span&gt;&lt;/td&gt; &lt;td style="text-align: center; vertical-align: top; width: 69px;"&gt;&lt;span style="font-size: large;"&gt;3.6K&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;/td&gt; &lt;td style="text-align: center; width: 118px;"&gt;&lt;span style="font-size: large;"&gt;ZK-245186&lt;/span&gt;&lt;/td&gt; &lt;td style="text-align: center; vertical-align: top; width: 101px;"&gt;&lt;span style="font-size: large;"&gt;787&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;/td&gt; &lt;td style="text-align: center; width: 303px;"&gt;&lt;span style="font-size: large;"&gt;CID &lt;a href="http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=search&amp;amp;db=pccompound&amp;amp;term=11711522%5Buid%5D"&gt;11711522&lt;/a&gt;&lt;/span&gt;&lt;/td&gt; &lt;td style="width: 127px;"&gt;&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;/td&gt; &lt;/tr&gt;
&lt;tr&gt; &lt;td style="text-align: center; vertical-align: top; width: 96px;"&gt;&lt;span style="font-size: large;"&gt;&lt;a href="http://www.ama-assn.org/resources/doc/usan/olodaterol.pdf" target="_blank"&gt;olodaterol&lt;/a&gt;,&lt;/span&gt;&lt;/td&gt; &lt;td style="text-align: center; vertical-align: top; width: 69px;"&gt;&lt;span style="font-size: large;"&gt;5K&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;/td&gt; &lt;td style="text-align: center; vertical-align: top; width: 118px;"&gt;&lt;span style="font-size: large;"&gt;BI-1744&lt;/span&gt;&lt;/td&gt; &lt;td style="text-align: center; vertical-align: top; width: 101px;"&gt;&lt;span style="font-size: large;"&gt;4.1K&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;/td&gt; &lt;td style="text-align: center; vertical-align: top; width: 303px;"&gt;&lt;span style="font-size: large;"&gt;CID &lt;a href="http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=search&amp;amp;db=pccompound&amp;amp;term=11504295%5Buid%5D"&gt;11504295&lt;/a&gt;&lt;/span&gt;&lt;/td&gt; &lt;td style="vertical-align: top; width: 127px;"&gt;&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;/td&gt; &lt;/tr&gt;
&lt;tr&gt; &lt;td style="text-align: center; width: 96px;"&gt;&lt;span style="font-size: large;"&gt;&amp;nbsp;&lt;a href="http://www.ama-assn.org/resources/doc/usan/olodaterol-hydrochloride.pdf"&gt;olodaterol&lt;br /&gt;
hydrochloride &lt;/a&gt;&lt;/span&gt;&lt;/td&gt; &lt;td style="text-align: center; vertical-align: top; width: 69px;"&gt;&lt;span style="font-size: large;"&gt;6&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;/td&gt; &lt;td style="text-align: center; width: 118px;"&gt;&lt;span style="font-size: large;"&gt;BI 1744 CL&lt;/span&gt;&lt;/td&gt; &lt;td style="text-align: center; vertical-align: top; width: 101px;"&gt;&lt;span style="font-size: large;"&gt;9.7K&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;/td&gt; &lt;td style="text-align: center; width: 303px;"&gt;&lt;span style="font-size: large;"&gt;CID &lt;a href="http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=search&amp;amp;db=pccompound&amp;amp;term=11711522%5Buid%5D"&gt;11711522&lt;/a&gt;,&amp;nbsp; CID &lt;a href="http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=search&amp;amp;db=pccompound&amp;amp;term=11654738%5Buid%5D"&gt;11654738&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;(flat)&lt;/span&gt;&lt;/td&gt; &lt;td style="width: 127px;"&gt;&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;/td&gt; &lt;/tr&gt;
&lt;tr&gt; &lt;td style="text-align: center; width: 96px;"&gt;&lt;span style="font-size: large;"&gt;&lt;a href="http://www.ama-assn.org/resources/doc/usan/teriflunomide.pdf"&gt;teriflunomide&lt;br /&gt;
&lt;/a&gt;&lt;/span&gt;&lt;/td&gt; &lt;td style="text-align: center; vertical-align: top; width: 69px;"&gt;&lt;span style="font-size: large;"&gt;48K&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;/td&gt; &lt;td style="text-align: center; width: 118px;"&gt;&lt;span style="font-size: large;"&gt;HMR-1726&lt;/span&gt;&lt;/td&gt; &lt;td style="text-align: center; vertical-align: top; width: 101px;"&gt;&lt;span style="font-size: large;"&gt;5K&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;/td&gt; &lt;td style="text-align: center; width: 303px;"&gt;&lt;span style="font-size: large;"&gt;CID &lt;a href="http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=search&amp;amp;db=pccompound&amp;amp;term=5479847%5Buid%5D"&gt;5479847&lt;/a&gt;&lt;/span&gt;&lt;/td&gt; &lt;td style="width: 127px;"&gt;&lt;span style="font-size: large;"&gt;&lt;a href="http://en.wikipedia.org/wiki/Teriflunomide"&gt;Wikepedia&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;/td&gt; &lt;/tr&gt;
&lt;/tbody&gt; &lt;/table&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;span style="font-size: large;"&gt;The first thing was, considering you might expect all these to have approximately equivalent public exposure up the point of having the INN/USAN approved, the big difference in the Google hit counts.&amp;nbsp;&amp;nbsp; (despite the caveats of such counts). For example, eficonazol only had 103 matches, nearly all domain name grabs,&amp;nbsp; even though KP 103 had been in MeSH for over 10 years via a 2001 publication "&lt;a href="http://www.ncbi.nlm.nih.gov/pubmed/12222928"&gt;KP-103, a novel triazole derivative, is effective in preventing relapse and successfully treating experimental interdigital&amp;nbsp; and tinea corporis in guinea pigs&lt;/a&gt;"&amp;nbsp; (but athletic guinea pigs ?).&amp;nbsp; This was a direct PubMed substance link back-mapped (i.e. the 2001 MeSH name was retrospectively linked to the 2005 PubChem entry) to the 2R, 3R, CID 489181, oddly enough with the "flat" version, CID 9819704, coming in over a year later. We can contrast this with teriflunomide having 48K hits.&amp;nbsp; This INN had been declared over a year ago which might have given it time to garner the many Google hits and a Wikepedia entry but,&amp;nbsp; here the MeSH entry for HMR-1726 has no tree icon in the entry.&amp;nbsp; The prodrug Leflunomide also has its own INN.&amp;nbsp; &lt;br /&gt;
&lt;br /&gt;
I was intrigued to see esomeprazole strontium getting a new USAN. Last time I checked PubChem (just now) Nexium = esomeprazole magnesium was the INN/USAN.&amp;nbsp; It looks like this on the packet (top, below)&amp;nbsp; and CID 2112130 (left )&amp;nbsp; but a bit differnt from esomeprazol strontium as represented in CID 44464000&amp;nbsp; (right) with one more water and those stereo wedges are rendred differently.&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-7K9tTCNSoYg/TnJ6_PHZifI/AAAAAAAAAL8/cqS1VxdNfX4/s1600/Nex_pack_pic.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="432" src="http://2.bp.blogspot.com/-7K9tTCNSoYg/TnJ6_PHZifI/AAAAAAAAAL8/cqS1VxdNfX4/s640/Nex_pack_pic.JPG" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;span style="font-size: large;"&gt;So how come both the hemi magnesium trihydrate and the hemi-strontium tetrahydrate both get INNs ? From the informative &lt;a href="https://extranet.who.int/tools/inn_online_application/"&gt;WHO site&lt;/a&gt; it transpires the first condition is that you have to fork over $9000, then you need to get a CAS no (!) and thirdly have a clinical trial in the offing. Top of the "esomeprazol strontium" Google hits reveals some interesting details. It seems &lt;a href="http://www.hanmipharm.com/eng/index.asp"&gt;Hanmipharm&lt;/a&gt; from Korea have made an &lt;a href="http://www.who.int/intellectualproperty/topics/ip/incremental_drugs/en/index.html"&gt;Incrementaly Modified Drug application&lt;/a&gt; and they have also some patent fillings such as&amp;nbsp; &lt;a href="http://www.freepatentsonline.com/EP1780207.html"&gt;"Crystalline esomeprazole strontium hydrate, method for preparing the same and pharmaceutical composition containing the same&lt;/a&gt;" &amp;nbsp; Surprisingly,&amp;nbsp; you also actually Google whack straight into a SureChem record as see below (but you can't open it)&lt;br /&gt;
&lt;br /&gt;
&lt;/span&gt;&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-v1MZWdS1PYM/TnKJnW5GUvI/AAAAAAAAAMA/Q0uWFjpEGiE/s1600/Es_St_pat.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="274" src="http://3.bp.blogspot.com/-v1MZWdS1PYM/TnKJnW5GUvI/AAAAAAAAAMA/Q0uWFjpEGiE/s640/Es_St_pat.JPG" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;span style="font-size: large;"&gt;&lt;br /&gt;
But note that the name-to-struc conversion renders in this case (I think so because its rather small graphic) just one parent, one strontium and two waters.&amp;nbsp;&amp;nbsp; At this point,&amp;nbsp; when one wanders into IP territory, prudence dictates reporting of nothing beyond the facts of what Google lays bare but this does includes the reference to a &lt;a href="http://patentdocs.typepad.com/files/az-v-hanmi.pdf"&gt;patent dispute&lt;/a&gt; 'nuff said.&amp;nbsp; &lt;br /&gt;
&lt;br /&gt;
OK, so what other salt forms can we find and do they have INNs/USANs ?&amp;nbsp; You can start back with the esomeprazol parent CID 9568614 and link out to 126 substance mixtures. These collapse to 60 compound mixtures.&amp;nbsp; You can open these out 42 componants that include a lot of metal cations.&amp;nbsp; You can then inspect the 60 to see what combinations there are by recognising the metal colour rendering.&amp;nbsp; The upshot of this was the table below.&lt;br /&gt;
&lt;br /&gt;
&lt;/span&gt;&lt;br /&gt;
&lt;table border="1" style="height: 587px; width: 656px;"&gt;&lt;tbody&gt;
&lt;tr&gt; &lt;td style="font-weight: bold; text-align: center; vertical-align: top; width: 117px;"&gt;&lt;span style="font-size: large;"&gt;Name&lt;br /&gt;
&lt;br /&gt;
&lt;/span&gt;&lt;/td&gt; &lt;td style="font-weight: bold; text-align: center; vertical-align: top; width: 104px;"&gt;&lt;span style="font-size: large;"&gt;Google&lt;br /&gt;
hits &lt;br /&gt;
&lt;br /&gt;
&lt;/span&gt;&lt;/td&gt; &lt;td style="font-weight: bold; text-align: center; vertical-align: top; width: 69px;"&gt;&lt;span style="font-size: large;"&gt;USAN&lt;br /&gt;
&lt;br /&gt;
&lt;/span&gt;&lt;/td&gt; &lt;td style="text-align: center; vertical-align: top; width: 187px;"&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;PubChem&lt;br /&gt;
&lt;/b&gt;&lt;/span&gt;&lt;/td&gt; &lt;td style="text-align: center; vertical-align: top; width: 147px;"&gt;&lt;span style="font-size: large;"&gt;&lt;span style="font-weight: bold;"&gt;Notes&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;/span&gt;&lt;/td&gt; &lt;/tr&gt;
&lt;tr&gt; &lt;td style="text-align: center; vertical-align: top; width: 117px;"&gt;&lt;span style="font-size: large;"&gt;esomeprazol&lt;br /&gt;
&lt;/span&gt;&lt;/td&gt; &lt;td style="text-align: center; width: 104px;"&gt;&lt;span style="font-size: large;"&gt;2,390K (wihout&lt;br /&gt;
Mg++)&lt;br /&gt;
&lt;br /&gt;
&lt;/span&gt;&lt;/td&gt; &lt;td style="text-align: center; vertical-align: top; width: 69px;"&gt;&lt;span style="font-size: large;"&gt;yes &lt;br /&gt;
&lt;br /&gt;
&lt;/span&gt;&lt;/td&gt; &lt;td style="text-align: left; width: 187px;"&gt;&lt;span style="font-size: large;"&gt;CID &lt;a href="http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=search&amp;amp;db=pccompound&amp;amp;term=9568614%5Buid%5D"&gt;9568614&lt;/a&gt;&lt;/span&gt;&lt;/td&gt; &lt;td style="text-align: center; width: 147px;"&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;/b&gt;&lt;/span&gt;&lt;/td&gt; &lt;/tr&gt;
&lt;tr&gt; &lt;td style="text-align: center; vertical-align: top; width: 117px;"&gt;&lt;span style="font-size: large;"&gt;esomeprazol&lt;br /&gt;
magnesium&lt;br /&gt;
&lt;/span&gt;&lt;/td&gt; &lt;td style="text-align: center; width: 104px;"&gt;&lt;span style="font-size: large;"&gt;589K&lt;br /&gt;
&lt;br /&gt;
&lt;/span&gt;&lt;/td&gt; &lt;td style="text-align: center; vertical-align: top; width: 69px;"&gt;&lt;span style="font-size: large;"&gt;yes&lt;br /&gt;
&lt;br /&gt;
&lt;/span&gt;&lt;/td&gt; &lt;td style="width: 187px;"&gt;&lt;span style="font-size: large;"&gt;CID &lt;a href="http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=search&amp;amp;db=pccompound&amp;amp;term=49767348%5Buid%5D"&gt;49767348&lt;/a&gt;&lt;br /&gt;
CID &lt;a href="http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=search&amp;amp;db=pccompound&amp;amp;term=11513676%5Buid%5D"&gt;11513676&lt;/a&gt;&lt;br /&gt;
CID &lt;a href="http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=search&amp;amp;db=pccompound&amp;amp;term=9842290%5Buid%5D"&gt;9842290&lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;/span&gt;&lt;/td&gt; &lt;td style="text-align: center; width: 147px;"&gt;&lt;span style="font-size: large;"&gt;&lt;a href="http://en.wikipedia.org/wiki/Alisporivir"&gt;Wikepedia&lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;/span&gt;&lt;/td&gt; &lt;/tr&gt;
&lt;tr&gt; &lt;td style="text-align: center; vertical-align: top; width: 117px;"&gt;&lt;span style="font-size: large;"&gt;esomeprazol&lt;br /&gt;
barium &lt;/span&gt;&lt;/td&gt; &lt;td style="text-align: center; vertical-align: top; width: 104px;"&gt;&lt;span style="font-size: large;"&gt;15&lt;br /&gt;
&lt;/span&gt;&lt;/td&gt; &lt;td style="text-align: center; vertical-align: top; width: 69px;"&gt;&lt;span style="font-size: large;"&gt;no&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;/span&gt;&lt;/td&gt; &lt;td style="vertical-align: top; width: 187px;"&gt;&lt;span style="font-size: large;"&gt;CID &lt;a href="http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=search&amp;amp;db=pccompound&amp;amp;term=9956914%5Buid%5D"&gt;9956914&lt;/a&gt;&lt;/span&gt;&lt;/td&gt; &lt;td style="text-align: center; vertical-align: top; width: 147px;"&gt;&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;/span&gt;&lt;/td&gt; &lt;/tr&gt;
&lt;tr&gt; &lt;td style="text-align: center; vertical-align: top; width: 117px;"&gt;&lt;span style="font-size: large;"&gt;esomeprazol&lt;br /&gt;
sodium &lt;/span&gt;&lt;/td&gt; &lt;td style="text-align: center; width: 104px;"&gt;&lt;span style="font-size: large;"&gt;115K&lt;br /&gt;
&lt;br /&gt;
&lt;/span&gt;&lt;/td&gt; &lt;td style="text-align: center; vertical-align: top; width: 69px;"&gt;&lt;span style="font-size: large;"&gt;yes&lt;br /&gt;
&lt;br /&gt;
&lt;/span&gt;&lt;/td&gt; &lt;td style="width: 187px;"&gt;&lt;span style="font-size: large;"&gt;CID &lt;a href="http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=search&amp;amp;db=pccompound&amp;amp;term=23674541%5Buid%5D"&gt;23674541&lt;/a&gt;&lt;br /&gt;
&lt;/span&gt;&lt;/td&gt; &lt;td style="text-align: center; width: 147px;"&gt;&lt;span style="font-size: large;"&gt;&lt;br /&gt;
Intravenous Nexium  &lt;/span&gt;&lt;/td&gt; &lt;/tr&gt;
&lt;tr&gt; &lt;td style="text-align: center; vertical-align: top; width: 117px;"&gt;&lt;span style="font-size: large;"&gt;esomeprazol&lt;br /&gt;
strontium&lt;br /&gt;
&lt;/span&gt;&lt;/td&gt; &lt;td style="text-align: center; width: 104px;"&gt;&lt;span style="font-size: large;"&gt;3.1K&lt;br /&gt;
&lt;br /&gt;
&lt;/span&gt;&lt;/td&gt; &lt;td style="text-align: center; vertical-align: top; width: 69px;"&gt;&lt;span style="font-size: large;"&gt;yes&lt;br /&gt;
&lt;/span&gt;&lt;/td&gt; &lt;td style="width: 187px;"&gt;&lt;span style="font-size: large;"&gt;CID &lt;a href="http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=search&amp;amp;db=pccompound&amp;amp;term=16048656%5Buid%5D"&gt;16048656&lt;/a&gt;&lt;br /&gt;
CID &lt;a href="http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=search&amp;amp;db=pccompound&amp;amp;term=44464000%5Buid%5D"&gt;44464000&lt;/a&gt;&lt;br /&gt;
&lt;/span&gt;&lt;/td&gt; &lt;td style="text-align: center; width: 147px;"&gt;&lt;span style="font-size: large;"&gt;R &amp;amp;S&lt;br /&gt;
tetrahydrate&lt;br /&gt;
&lt;br /&gt;
&lt;/span&gt;&lt;/td&gt; &lt;/tr&gt;
&lt;tr&gt; &lt;td style="text-align: center; vertical-align: top; width: 117px;"&gt;&lt;span style="font-size: large;"&gt;esomeprazole&lt;br /&gt;
potassium&lt;br /&gt;
&lt;br /&gt;
&lt;/span&gt;&lt;/td&gt; &lt;td style="text-align: center; vertical-align: top; width: 104px;"&gt;&lt;span style="font-size: large;"&gt;2,890K&lt;br /&gt;
&lt;br /&gt;
&lt;/span&gt;&lt;/td&gt; &lt;td style="text-align: center; vertical-align: top; width: 69px;"&gt;&lt;span style="font-size: large;"&gt;yes&lt;br /&gt;
&lt;br /&gt;
&lt;/span&gt;&lt;/td&gt; &lt;td style="vertical-align: top; width: 187px;"&gt;&lt;span style="font-size: large;"&gt;CID &lt;a href="http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=search&amp;amp;db=pccompound&amp;amp;term=23672364%5Buid%5D"&gt;23672364&lt;/a&gt;&lt;br /&gt;
&lt;/span&gt;&lt;/td&gt; &lt;td style="text-align: center; vertical-align: top; width: 147px;"&gt;&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;/span&gt;&lt;/td&gt; &lt;/tr&gt;
&lt;tr&gt; &lt;td style="text-align: center; vertical-align: top; width: 117px;"&gt;&lt;span style="font-size: large;"&gt;esomeprazole&lt;br /&gt;
titanium&lt;br /&gt;
&lt;br /&gt;
&lt;/span&gt;&lt;/td&gt; &lt;td style="text-align: center; vertical-align: top; width: 104px;"&gt;&lt;span style="font-size: large;"&gt;36&lt;br /&gt;
&lt;br /&gt;
&lt;/span&gt;&lt;/td&gt; &lt;td style="text-align: center; vertical-align: top; width: 69px;"&gt;&lt;span style="font-size: large;"&gt;no&lt;br /&gt;
&lt;br /&gt;
&lt;/span&gt;&lt;/td&gt; &lt;td style="vertical-align: top; width: 187px;"&gt;&lt;span style="font-size: large;"&gt;CID &lt;a href="http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=search&amp;amp;db=pccompound&amp;amp;term=25177609%5Buid%5D"&gt;25177609&lt;/a&gt;&lt;/span&gt;&lt;/td&gt; &lt;td style="text-align: center; vertical-align: top; width: 147px;"&gt;&lt;span style="font-size: large;"&gt;tetrahydrate&lt;br /&gt;
&lt;br /&gt;
&lt;/span&gt;&lt;/td&gt; &lt;/tr&gt;
&lt;/tbody&gt; &lt;/table&gt;&lt;br /&gt;
&lt;span style="font-size: large;"&gt; Following up these in Google indicates that out of the six salts no less that four have INNs (so someone has forked over the cash).&amp;nbsp; This illustrates the name mapping problem of INNS being not only multiplexed into salt and drug/prodrug forms but also synomously used for parent monomers.&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2155351992730855318-4063492211617649122?l=cdsouthan.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://cdsouthan.blogspot.com/2011/09/checking-some-summer-usans.html</link><author>noreply@blogger.com (Chris Southan)</author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://2.bp.blogspot.com/-7K9tTCNSoYg/TnJ6_PHZifI/AAAAAAAAAL8/cqS1VxdNfX4/s72-c/Nex_pack_pic.JPG" height="72" width="72" /><thr:total>0</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-2155351992730855318.post-4141103712784021460</guid><pubDate>Mon, 29 Aug 2011 21:49:00 +0000</pubDate><atom:updated>2011-09-03T22:28:24.898-07:00</atom:updated><category domain="http://www.blogger.com/atom/ns#">gene trees</category><category domain="http://www.blogger.com/atom/ns#">bio</category><title>Alas poor Kamilah;  an erroneous  Ensembl BACE1</title><description>&lt;span style="font-family: verdana; font-size: 130%;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;span style="font-family: verdana; font-size: 130%;"&gt;&lt;a href="http://3.bp.blogspot.com/-Pl0m4lfC7y0/TlwKJchHecI/AAAAAAAAALU/S9DPCoF4ShA/s1600/Kamilah.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="200px" src="http://3.bp.blogspot.com/-Pl0m4lfC7y0/TlwKJchHecI/AAAAAAAAALU/S9DPCoF4ShA/s320/Kamilah.png" width="150px" /&gt;&lt;/a&gt;&lt;/span&gt;&lt;/div&gt;&lt;span style="font-family: verdana; font-size: 130%;"&gt;This lady, a woodland Gorilla called Kamilah, has had her genome sequenced by the Sanger Centre and just undergone an updated &lt;a href="http://www.ensembl.org/Gorilla_gorilla/Info/Index?db"&gt;Ensembl release.&lt;/a&gt; You can inspect her BACE1 gene via the &lt;a href="http://www.ensembl.org/Gorilla_gorilla/Gene/Compara_Tree?db=core;g=ENSGGOG00000014376;r=11:115443447-115468786;t=ENSGGOT00000027809%20http://www.ensembl.org/Gorilla_gorilla/Gene/Summary?db=core;g=ENSGGOG00000014376;r=11:115443447-115468786;t=ENSGGOT00000027809"&gt;Ensembl Compara GeneTree &lt;/a&gt;&amp;nbsp;display&amp;nbsp; (not to be confused with &lt;a href="http://www.treefam.org/cgi-bin/TFinfo.pl?ac=TF329595"&gt;TreeFam&lt;/a&gt; or &lt;a href="http://taxonomy.zoology.gla.ac.uk/rod/treeview.html"&gt;TreeView&lt;/a&gt; )&amp;nbsp; If you are used to looking at this stuff it takes a few seconds to recognize that “her” BACE1 Ensembl sequence is suspiciously outgrouped as opposed to the expected outgrouping for the one amphibian frog/toad.&amp;nbsp;  &lt;/span&gt;&lt;br /&gt;
&lt;span style="font-family: verdana; font-size: 130%;"&gt;&lt;/span&gt;&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-uptXHJRiKI8/TlwNVrw6mpI/AAAAAAAAALg/DXRTOJvLejs/s1600/Tree_Gorilla_BACE1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="243px" src="http://2.bp.blogspot.com/-uptXHJRiKI8/TlwNVrw6mpI/AAAAAAAAALg/DXRTOJvLejs/s640/Tree_Gorilla_BACE1.png" width="640px" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;span style="font-family: verdana; font-size: 130%;"&gt;&lt;br /&gt;
Given that her BACE2 is positioned exactly as expected in the parologous tree branch (see below)&amp;nbsp; my parsimonious assumption, based on past experience, is to suspect the Ensembl&amp;nbsp; gene build rather than a dramatic primate evolutionary adaption or the lady has a genetic defect.&amp;nbsp;&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-family: verdana; font-size: 130%;"&gt;&lt;/span&gt;&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-X1drW8duYtA/TlwkAsiAsYI/AAAAAAAAALk/bZpiLZXnTXI/s1600/Tree_Gorilla_BACE2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="350px" src="http://4.bp.blogspot.com/-X1drW8duYtA/TlwkAsiAsYI/AAAAAAAAALk/bZpiLZXnTXI/s400/Tree_Gorilla_BACE2.png" width="400px" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;span style="font-family: verdana; font-size: 130%;"&gt;In the gene entry we have two transcripts&amp;nbsp; with the full length being four residues short of the 501aa of all mamallian orthologues. &lt;/span&gt;&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-Ka5N7cEJ-dA/Tlwkjw85X3I/AAAAAAAAALo/B_umpjZ3cmI/s1600/Ensembl_Gor_BACE1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="388px" src="http://4.bp.blogspot.com/-Ka5N7cEJ-dA/Tlwkjw85X3I/AAAAAAAAALo/B_umpjZ3cmI/s640/Ensembl_Gor_BACE1.png" width="640px" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;span style="font-family: verdana; font-size: 130%;"&gt;&lt;br /&gt;
&lt;br /&gt;
Even the IntePro mark-up tells you there is something wrong in the middle of the protein so the next step is a quick &lt;a href="http://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastp&amp;amp;BLAST_PROGRAMS=blastp&amp;amp;PAGE_TYPE=BlastSearch&amp;amp;SHOW_DEFAULTS=on&amp;amp;LINK_LOC=blasthome"&gt;BLAST&lt;/a&gt;. &lt;/span&gt;&lt;br /&gt;
&lt;span style="font-family: verdana; font-size: 130%;"&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-family: verdana; font-size: 130%;"&gt;&lt;/span&gt;&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-qBpdYNdd0hc/TlwmAIUcKcI/AAAAAAAAALs/-KQMkjPO_JU/s1600/BLASTP_result.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="163px" src="http://3.bp.blogspot.com/-qBpdYNdd0hc/TlwmAIUcKcI/AAAAAAAAALs/-KQMkjPO_JU/s640/BLASTP_result.png" width="640px" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;br /&gt;
&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;br /&gt;
&lt;/div&gt;&lt;span style="font-family: verdana; font-size: 130%;"&gt;The result confirms what looks like a big frameshift starting at exon 4 that would be the cause of the tree outgrouping.&lt;br /&gt;
&lt;br /&gt;
But a&amp;nbsp; &lt;a href="http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-page+LibInfo+-id+1i9Ho1eOgYG+-lib+DOTTUPN"&gt;DOTTUPN&lt;/a&gt; run&amp;nbsp;&amp;nbsp; (I’m on the &lt;a href="http://emboss.open-bio.org/wiki/Advisory_Board"&gt;EMBOSS observer board&lt;/a&gt; after all)&amp;nbsp;&amp;nbsp; makes it clear we are not actually dealing with a frameshift but an (in frame) insert, obviously a genomic assembly one rather than a chimeric experimental clone.&amp;nbsp; &lt;/span&gt;&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-vFEC0feQvnk/TlwnNn1QVJI/AAAAAAAAAL0/r1IGJXXuwKM/s1600/Dotplot.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="441px" src="http://2.bp.blogspot.com/-vFEC0feQvnk/TlwnNn1QVJI/AAAAAAAAAL0/r1IGJXXuwKM/s640/Dotplot.png" width="640px" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: left;"&gt;&lt;span style="font-family: verdana; font-size: 130%;"&gt;So what is the bit in the middle ?&amp;nbsp; A BLAST of the insert back against transcript and genomic data, cross-checked against the Gorilla WGS, shows a peculiar result of a rogue insert about 140 nts (no indentical matches to anything) plus an extended piece of BACE1 chromosome 11.&lt;/span&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;span style="font-family: verdana; font-size: 130%;"&gt;&amp;nbsp;&lt;/span&gt;&lt;a href="http://3.bp.blogspot.com/-2t2HV7Ybvkc/TlwoInLmIBI/AAAAAAAAAL4/FkXyjO3HsSM/s1600/BLASTN.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="158px" src="http://3.bp.blogspot.com/-2t2HV7Ybvkc/TlwoInLmIBI/AAAAAAAAAL4/FkXyjO3HsSM/s640/BLASTN.png" width="640px" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;
&lt;span style="font-family: verdana; font-size: 130%;"&gt;While it remains a bit of puzzle as to how the gene build pipeline managed this, Kamilah looks well on it. We can thus await the next assembly update and see&amp;nbsp;if her BACE1&amp;nbsp;gets fixed &lt;i&gt;in silico.&lt;/i&gt; I'll post here when it happens.&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2155351992730855318-4141103712784021460?l=cdsouthan.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://cdsouthan.blogspot.com/2011/08/alas-poor-kamilah-erroneous-ensembl.html</link><author>noreply@blogger.com (Chris Southan)</author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://3.bp.blogspot.com/-Pl0m4lfC7y0/TlwKJchHecI/AAAAAAAAALU/S9DPCoF4ShA/s72-c/Kamilah.png" height="72" width="72" /><thr:total>0</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-2155351992730855318.post-5818111007502075292</guid><pubDate>Wed, 17 Aug 2011 21:25:00 +0000</pubDate><atom:updated>2011-08-20T00:58:54.251-07:00</atom:updated><category domain="http://www.blogger.com/atom/ns#">chem</category><category domain="http://www.blogger.com/atom/ns#">glyptins</category><title>Glyptins are go</title><description>&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;span style="font-family: verdana; font-size: 130%;"&gt;&lt;br /&gt;
Looking at the&amp;nbsp; DrugBank collection of DPPIV inhibitors for a &lt;a href="http://cdsouthan.blogspot.com/2011/08/compound-to-target-mappings-part-i.html"&gt;previous post&lt;/a&gt; involved checking&amp;nbsp; the three approved&amp;nbsp; “&lt;a href="http://qjmed.oxfordjournals.org/content/100/11/671.full"&gt;gliptins&lt;/a&gt;”. This relatively new class of protease inhibitor drugs for diabetes seem to be crossing or coming up to the FDA finish line thick and fast (altought it is ranked 25th of all &lt;a href="http://www.jcheminf.com/content/3/1/14/table/T3"&gt;research targets by compound numbers&lt;/a&gt; )&amp;nbsp; I was therefore&amp;nbsp; intrigued to notice yet a 4th crop up in the ChEMBL blog of the &lt;a href="http://chembl.blogspot.com/2011/08/usan-watch-august-2011.html"&gt;latest INNs&lt;/a&gt;.&amp;nbsp;&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-family: verdana; font-size: 130%;"&gt;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-eRT5BS9YWHo/Tkw19HW--GI/AAAAAAAAAKo/yY-9k-iSBJo/s1600/Glyp1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="124px" src="http://2.bp.blogspot.com/-eRT5BS9YWHo/Tkw19HW--GI/AAAAAAAAAKo/yY-9k-iSBJo/s640/Glyp1.png" width="640px" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;span style="font-family: verdana; font-size: 130%;"&gt;&lt;br /&gt;
So what was this new one ? The first pops against PubChem Compound were negative on both name and code name.&amp;nbsp; Next pop was Google that surprisingly also suggested gliptin number 5.&amp;nbsp;&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-family: verdana; font-size: 130%;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-TxsAR0tuWVs/Tkw2q3uKMsI/AAAAAAAAAKw/Vgm4SO1cAe4/s1600/Glyp2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="222px" src="http://3.bp.blogspot.com/-TxsAR0tuWVs/Tkw2q3uKMsI/AAAAAAAAAKw/Vgm4SO1cAe4/s400/Glyp2.png" width="400px" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;span style="font-family: verdana; font-size: 130%;"&gt;&lt;br /&gt;
This got me wondering how many “declared”&amp;nbsp; gliptins were there ?&amp;nbsp; Going back the ChEMBL link opened up the &lt;a href="http://www.ama-assn.org/resources/doc/usan/trelagliptin.pdf"&gt;USAN&lt;/a&gt; rather than the WHO INN. While there were no live links in the PDF there was a structure rendering and an IUPAC. I just copied and pasted that string into &lt;a href="http://opsin.ch.cam.ac.uk/"&gt;OPSIN&lt;/a&gt; and it did the buissness.&amp;nbsp;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;span style="font-family: verdana; font-size: 130%;"&gt;&lt;/span&gt;&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-CgwaStfyJ90/Tkw3UKK8n3I/AAAAAAAAAK0/VhcNtqV-qmU/s1600/Glyp3.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="395px" src="http://1.bp.blogspot.com/-CgwaStfyJ90/Tkw3UKK8n3I/AAAAAAAAAK0/VhcNtqV-qmU/s400/Glyp3.png" width="400px" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;span style="font-family: verdana; font-size: 130%;"&gt;&lt;br /&gt;
&lt;span id="goog_1535971250"&gt;&lt;/span&gt;&lt;span id="goog_1535971251"&gt;&lt;/span&gt;&lt;br /&gt;
Reassuringly, I confirmed&amp;nbsp; the same rendering as the USAN PDF and popping the SMILES into the PubChem structure search gives the same structure,&amp;nbsp; &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=15983988"&gt;CID 15983988&lt;/a&gt; but there is also a “flat”&amp;nbsp; version (no stereochemistry) as &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=11360300"&gt;CID 11360300&lt;/a&gt;.&amp;nbsp; Both of these had Thomson Pharma as primary SID sources from 2006, suggestive of patent extraction.&amp;nbsp; This was confirmed via SureChem patent links to &lt;a href="http://www.chemspider.com/Chemical-Structure.13115365.htm"&gt;ChemSpider 13115365&lt;/a&gt;.&amp;nbsp; There were seven US granted patents counted but you can only link out to the first three for free. One of these seems to be in the patent family of the Takeda first-filing and it was rather neat to put the Mw 357 from the PubChem entry in “find” and whack (flat) trelagliptin as&amp;nbsp; “example 34” (Mw in green) below.&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;span style="font-family: verdana; font-size: 130%;"&gt;&lt;/span&gt;&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-kFQ5tcwSdWk/Tkw4Zl2iMiI/AAAAAAAAAK4/Yveg2DqHM3A/s1600/Glyp4.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="138px" src="http://2.bp.blogspot.com/-kFQ5tcwSdWk/Tkw4Zl2iMiI/AAAAAAAAAK4/Yveg2DqHM3A/s640/Glyp4.png" width="640px" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;span style="font-family: verdana; font-size: 130%;"&gt;&lt;br /&gt;
&amp;nbsp;&amp;nbsp; &lt;br /&gt;
Having closed that circle I moved on to Teneligliptin.&amp;nbsp; This hits straight through to &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=11949652"&gt;CID 11949652&lt;/a&gt;, but the INN in this case was picked up by KEGG not MeSH.&amp;nbsp; Here again&amp;nbsp; &lt;a href="http://www.chemspider.com/Chemical-Structure.10123963.html"&gt;ChemSpider 10123963&lt;/a&gt;&amp;nbsp;&lt;/span&gt;&lt;span style="font-family: verdana; font-size: 130%;"&gt; links to a SureChem &lt;/span&gt;&lt;span style="font-family: verdana; font-size: 130%;"&gt;patent from Mitshubishi.&amp;nbsp; There were no Mw in examples this time but you can spot the lead in the claims.&amp;nbsp;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-d6BEDRF4Qb8/TkxMS5kkUuI/AAAAAAAAAK8/Rv69jf5aW64/s1600/Glyp5.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="116px" src="http://2.bp.blogspot.com/-d6BEDRF4Qb8/TkxMS5kkUuI/AAAAAAAAAK8/Rv69jf5aW64/s640/Glyp5.png" width="640px" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;span style="font-family: verdana; font-size: 130%;"&gt;&lt;br /&gt;
&lt;br /&gt;
From a cheminformatics and SAR point of view finding&amp;nbsp;the filings for a single structure from the free links shown above might not have particular ulitity but you&amp;nbsp;may see the complete set of exemplars and often with quanitative SAR data across the series.&amp;nbsp; The ChemSpider open &lt;a href="http://www.surechem.org/"&gt;SureChem&lt;/a&gt; links are effectively teasers but the individual&amp;nbsp;sign up&amp;nbsp;does not seem too bad for 11 million extracted&amp;nbsp;structures&amp;nbsp;including Medline.&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-family: verdana; font-size: 130%;"&gt;&lt;br /&gt;
Continuing with Googling Teneligliptin&amp;nbsp; came up&amp;nbsp; with the same struture at&amp;nbsp; &lt;a href="http://www.chemblink.com/products/760937-92-6.htm"&gt;Chemblink&lt;/a&gt; that I had never heard of but it also confirmed the compound as having&amp;nbsp; code name MP513 from the&amp;nbsp; &lt;a href="http://www.mt-pharma.co.jp/e/"&gt;Mitshubishi portfolio. &lt;/a&gt;Having got five gliptin structures&amp;nbsp; and wondering if that was the lot&amp;nbsp; I suddenly realized I’d missed the glaringly obvious, in this case MeSH “Pharmacological Action” ; "Dipeptidyl-Peptidase IV Inhibitors".&amp;nbsp;&amp;nbsp; Bingo!&amp;nbsp; this gave no less than 17 CIDs&lt;a href="http://www.ncbi.nlm.nih.gov/sites/entrez?Db=pccompound&amp;amp;DbFrom=mesh&amp;amp;Cmd=Link&amp;amp;LinkName=mesh_pccompound&amp;amp;IdsFromResult=68054873"&gt;&lt;/a&gt;.&amp;nbsp; Along with some salt forms and research compounds this included&amp;nbsp; vildaglyptin&amp;nbsp;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=24848920"&gt;CID 24848920&lt;/a&gt; , aloglyptin&amp;nbsp;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=11450633"&gt;CID 11450633&lt;/a&gt;&amp;nbsp; , sitaglyptin &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=4369359"&gt;CID 4369359&lt;/a&gt;, linaglyptin&amp;nbsp;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=10096344"&gt;CID 10096344&lt;/a&gt; and saxaglyptin &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=11243969"&gt;CID 11243969&lt;/a&gt;.&amp;nbsp;&amp;nbsp;&amp;nbsp; With the two above that makes seven INNs-to-structures,&amp;nbsp; not a bad haul for one blog post.&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2155351992730855318-5818111007502075292?l=cdsouthan.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://cdsouthan.blogspot.com/2011/08/glyptins-to-go.html</link><author>noreply@blogger.com (Chris Southan)</author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://2.bp.blogspot.com/-eRT5BS9YWHo/Tkw19HW--GI/AAAAAAAAAKo/yY-9k-iSBJo/s72-c/Glyp1.png" height="72" width="72" /><thr:total>0</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-2155351992730855318.post-6109391749220124339</guid><pubDate>Mon, 18 Jul 2011 23:12:00 +0000</pubDate><atom:updated>2011-08-21T07:24:00.350-07:00</atom:updated><category domain="http://www.blogger.com/atom/ns#">protein homology</category><category domain="http://www.blogger.com/atom/ns#">bio</category><title>Ortholgues: Less Divergent in Sequence but more Divergent in Function?</title><description>&lt;span style="font-family: verdana; font-size: 130%;"&gt;&lt;br /&gt;
I’d like to make a few  comments on what might be one of the most significant bioinformatics papers for drug R&amp;amp;D  in a long time &lt;a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002073"&gt;“Testing the Orthologue Conjecture with Comparative Functional Genomic Data from Mammals”&lt;/a&gt;   .  It has implications for the interpretation of drug effects in animal models or cell systems that echo all the way up from pharmacological proof of concept to safety assessment and the productivity crisis associated with Phase II failure rates.  It’s a good paper that I can leave you to read but the main conclusion is that paralogs are better predictors of function (in the annotation transfer sense) than are orthologs. This implies that most important factor in the evolution of function is not amino acid sequence, but rather the cellular context in which proteins act.  To me the latter makes a lot of sense even if the former seems counterintuitive. I have myself  engaged with various aspects of protein phylogeny both from the display angle (see: &lt;a href="http://www.ncbi.nlm.nih.gov/pubmed/17511660"&gt;Exploiting new genome data and web resources for the phylogenetic analysis of proteases, substrates and inhibitors&lt;/a&gt;)   and some deeper aspects of &lt;a href="http://www.cdsouthan.info/Pubs/ProtEvol_NEHG_2003.pdf"&gt;protease evolution&lt;/a&gt;&amp;nbsp; which makes this paper particularly interesting for me. It so happens this theme came up back at the ranch (the one with the Elks) as a &lt;a href="https://www.yammer.com/about/product"&gt;Yammer&lt;/a&gt;&amp;nbsp; posting a month or so before this paper.   Without giving away any details it was concerning what potential workflows could be used to display some of the more complex target family phylogenies necessary to support model organism choices.  I pitched in by suggesting the TreeFam resource because I find these pre-cooked protein alignments and displays not only particularly easy to read but also of high quality (but unfortunately the updating since 2009 has lapsed).  I have included &lt;a href="http://www.treefam.org/cgi-bin/TFinfo.pl?ac=TF329595"&gt;TF329595&lt;/a&gt;,&amp;nbsp; for &lt;a href="http://www.uniprot.org/uniprot/P56817"&gt;BACE1&lt;/a&gt;&amp;nbsp; and &lt;a href="http://www.uniprot.org/uniprot/Q9Y5Z0"&gt;BACE2&lt;/a&gt;&amp;nbsp; below which illustrates some of the points in the paper. &lt;/span&gt;&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;span style="font-family: verdana; font-size: 130%;"&gt;&lt;/span&gt;&lt;a href="http://2.bp.blogspot.com/-6LAqBLR30HE/Tk2tpGu-ZWI/AAAAAAAAALI/MPcAQBJt76U/s1600/treeview_BACE.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="446" src="http://2.bp.blogspot.com/-6LAqBLR30HE/Tk2tpGu-ZWI/AAAAAAAAALI/MPcAQBJt76U/s640/treeview_BACE.png" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;span style="font-family: verdana; font-size: 130%;"&gt;As you can see this is a “classic” parologous pair that split in the fish lineage (except the Danio BACE1 has been dropped for some reason, it is there I have checked).  The provocative conclusion from the paper is that BACE1 and 2 (as paralogues) with ~ 50% identity may be more similar in function than human and mouse BACE1 and 2 as orthologues with ~ 95% identity.  I do have a problem to include cases like BACE because the parologous gene duplication split (~ 350 million yr.) is so much older that the orthologous human/mouse speciation split (~ 40 million yr.).  I checked out the expression patterns (via the GNF Affy data) at&amp;nbsp;&lt;a href="http://biogps.org/#goto=welcome"&gt;Biogps&lt;/a&gt; and have posted these below.&amp;nbsp;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;span style="font-family: verdana; font-size: 130%;"&gt;&lt;/span&gt;&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-ORxKew6m1YM/Tk2w4T8YVmI/AAAAAAAAALM/9amODleLHVw/s1600/BACE_expression.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="516" src="http://4.bp.blogspot.com/-ORxKew6m1YM/Tk2w4T8YVmI/AAAAAAAAALM/9amODleLHVw/s640/BACE_expression.png" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;span style="font-family: verdana; font-size: 130%;"&gt;&lt;br /&gt;
&lt;br /&gt;
Does the pattern above refute or support their findings?  It’s actually not so clear because of all the usual caveats of  Affy probe profiling, in particular neither the tissue axis is nor the signal levels are normalized between human and mouse.  However you could suggest the inage supports a more “different” pattern between the human paralogues as opposed to the mouse.  The interesting thing here is we have no clear picture of BACE2 function (nor a clear “normal” role for BACE1 in fact) but, as the literature supports, there appears to be little difference in cleavage specificity between the two.  Thus, such function/role differences of the paralogues may have could be predominantly determined by the special and/or temporal expression context rather than catalytic specificity. &lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;Consequences of this paper?&lt;/b&gt;  Well, it would be nice if drug development teams who need to carefully compare and select their range of animal models cosied up a bit more with such phylogenetically aware bioinformaticians as they may still have around.  The good news is that rapidly extending species coverage will improve the parology/orthology resolution in most cases (e.g. by adding rabbit, pig, guinea pig, codfish and kangeroo) but actually make it more complex to interpret in others. The other good news is that NGS-based transcript profiling will eventually produce better normalized (i.e. more cross-species comparable) and deeper expression signatures than Affy chips, possibly even extending to rodent strains and sub-tissue cell types (a range of mammalian expression atlases?).    None of this will make animals more like humans but at least we will have a clearer and more systematic picture of the differences.   &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2155351992730855318-6109391749220124339?l=cdsouthan.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://cdsouthan.blogspot.com/2011/08/ortholgues-less-divergent-in-sequence.html</link><author>noreply@blogger.com (Chris Southan)</author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://2.bp.blogspot.com/-6LAqBLR30HE/Tk2tpGu-ZWI/AAAAAAAAALI/MPcAQBJt76U/s72-c/treeview_BACE.png" height="72" width="72" /><thr:total>0</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-2155351992730855318.post-6570598562120443037</guid><pubDate>Thu, 14 Jul 2011 10:10:00 +0000</pubDate><atom:updated>2011-08-15T14:12:03.696-07:00</atom:updated><category domain="http://www.blogger.com/atom/ns#">target</category><category domain="http://www.blogger.com/atom/ns#">bio</category><title>Compound-to-Target Mappings: (Part I)</title><description>&lt;span style="font-family: verdana; font-size: 130%;"&gt;&lt;br /&gt;
I continue to have a lot to do with this big theme.&amp;nbsp; It is not only the subject of &lt;a href="http://www.ncbi.nlm.nih.gov/pubmed/21569515"&gt;our 2011 paper&lt;/a&gt; but I also have a slide set used this April/May for a &lt;a href="http://www.slideshare.net/cdsouthan/compoundstotargets-bioit2011"&gt;presentation&lt;/a&gt; to the PubChem team,&amp;nbsp; a BioIT World chemogenomics workshop and the ChEMBL users group meeting.&amp;nbsp; I’ll point to these slides so as not to reiterate but will go through some new examples in more detail.&amp;nbsp; There are many to choose from so I will divide this as a series of posts, this one picking up on some DrugBank-to-PDB mappings.&amp;nbsp;&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-family: verdana; font-size: 130%;"&gt;&lt;br /&gt;
As described in their earlier papers the core of &lt;a href="http://www.drugbank.ca/"&gt;DrugBank&lt;/a&gt;&amp;nbsp;&amp;nbsp; was derived from PDB ligands and remains so three releases on. You can confirm this by determining the intersects in PubChem (MMDB, 17,664, AND DrugBank, 6,370&amp;nbsp; = 3002).&amp;nbsp; The utility is obvious but the consequences can be quirky as you can see on slides 21-25 of the &lt;a href="http://www.slideshare.net/cdsouthan/compoundstotargets-bioit2011"&gt;presentation&lt;/a&gt;.&amp;nbsp; This post will look at some additional ones via the DrugBank target entry for &lt;a href="http://www.drugbank.ca/molecules/952?as=target"&gt;DPPIV&lt;/a&gt;&amp;nbsp; and picking just the lower 12 entries&amp;nbsp; 40 – 51.&amp;nbsp; &lt;br /&gt;
&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-family: verdana; font-size: 130%;"&gt;&lt;/span&gt;&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-jdU9CAe7xx8/TkeiKYG1-jI/AAAAAAAAAKU/3C9OHk-IrGE/s1600/DPPIV.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="369" src="http://4.bp.blogspot.com/-jdU9CAe7xx8/TkeiKYG1-jI/AAAAAAAAAKU/3C9OHk-IrGE/s640/DPPIV.png" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;span style="font-family: verdana; font-size: 130%;"&gt;&lt;br /&gt;
It’s immediately clear that the real “drugs” are 49,50,51, but not yet approved by all major authorities, hence approved and investigational.&amp;nbsp; It’s equally clear you would neither want to take the first four to treat your diabetes, nor would you expect these to be &lt;i&gt;in vitro&lt;/i&gt; inhibitors, except maybe 42. So let's look closer.&amp;nbsp; The Alpha D Mannose, no. 40&amp;nbsp;&lt;a href="http://www.drugbank.ca/drugs/DB02944"&gt;DB02944&lt;/a&gt; , has 76 “targets” so it’s a common hetero atom not a specific inhibitor.&amp;nbsp; I expected an MMDB entry as an SID to the&amp;nbsp;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=6992084"&gt;CID 6992084&lt;/a&gt; but it was not there.&amp;nbsp; The answer was the DrugBank entry is a beta-L-glucopyranose because the link went to &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=185698"&gt;CID 185698&lt;/a&gt; .&amp;nbsp; This was in fact the alpha D mannose mapping to 758 protein strcutures, although to be fair I can't guess which is which below - can you?&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-0BChG76fXkE/TkekUfxvFjI/AAAAAAAAAKY/mydTTyxQiTY/s1600/Dmannose.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="141" src="http://2.bp.blogspot.com/-0BChG76fXkE/TkekUfxvFjI/AAAAAAAAAKY/mydTTyxQiTY/s320/Dmannose.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;span style="font-family: verdana; font-size: 130%;"&gt;&lt;br /&gt;
&lt;br /&gt;
For the next one down, 41 &lt;a href="http://www.drugbank.ca/drugs/DB01076"&gt;DB01076&lt;/a&gt;, as atorvastatin (not a common heteroatom) we see a direct reference&amp;nbsp; “&lt;a href="http://www.ncbi.nlm.nih.gov/pubmed/18068977"&gt;Inhibition of dipeptidyl peptidase-IV (DPP-IV) by atorvastatin&lt;/a&gt;”&amp;nbsp; but there is no PDB entry for this combination. The problem here is that the paper refers to &lt;a href="http://www.uniprot.org/uniprot/P22411"&gt;porcine DPPIV&lt;/a&gt;&amp;nbsp;&amp;nbsp; not the &lt;a href="http://www.uniprot.org/uniprot/P27487"&gt;human version&lt;/a&gt;&amp;nbsp; linked in this record. The next one,&amp;nbsp;&lt;a href="http://www.drugbank.ca/drugs/DB04491"&gt;DB04491&lt;/a&gt;, has a hetero entry for a classical serine protease inhibitor&amp;nbsp;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=6327350"&gt;CID 6327350&lt;/a&gt;, a proper ligand in this case. But it links to just one PDB protein not DPPIV.&amp;nbsp; So what gives ?&amp;nbsp; Clicking on the DrugBank &lt;a href="http://www.ebi.ac.uk/msd-srv/chempdb/cgi-bin/cgi.pl?FUNCTION=getByCode&amp;amp;CODE=DFP"&gt;DFP&lt;/a&gt; PDBe link&amp;nbsp; provided the answer.&amp;nbsp; The correct DFP structure was in fact&amp;nbsp;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=74162"&gt;CID 74162&lt;/a&gt; in 12 PDB entires.&amp;nbsp; The figures below show the PDB mappings for the two CIDs and the DrugBank rendering for the&amp;nbsp;&lt;a href="http://www.drugbank.ca/drugs/DB04491"&gt;DB04491&lt;/a&gt;. Tricky one eh ?&amp;nbsp;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;span style="font-family: verdana; font-size: 130%;"&gt;&lt;/span&gt;&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-wLFHn-aIRN0/Tke6NN_U96I/AAAAAAAAAKc/oKOgZrBbsyA/s1600/DNP_dppiv.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="209" src="http://4.bp.blogspot.com/-wLFHn-aIRN0/Tke6NN_U96I/AAAAAAAAAKc/oKOgZrBbsyA/s640/DNP_dppiv.png" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;span style="font-family: verdana; font-size: 130%;"&gt;&lt;br /&gt;
&lt;span id="goog_247707312"&gt;&lt;/span&gt;&lt;span id="goog_247707313"&gt;&lt;/span&gt;&lt;br /&gt;
Next up &lt;a href="http://www.drugbank.ca/drugs/DB03965"&gt;DB03965&lt;/a&gt; = fucose ,&amp;nbsp; but which one is cystalised with DPPIV&amp;nbsp; ?&amp;nbsp; the DrugBank link to&amp;nbsp;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=6713579"&gt;CID 6713579&lt;/a&gt;&amp;nbsp; has no MMDB structure links but it does have no less that 41 (PubChem compound &amp;gt; same connectivity)&amp;nbsp; isomers/tautomers, but thats sugars for you.&amp;nbsp; Of these &lt;a href="http://www.ncbi.nlm.nih.gov/sites/entrez?db=pccompound&amp;amp;cmd=search&amp;amp;term=%28%2313%29%20AND%20%28pccompound_structure[filt]%29&amp;amp;loc=s_frm"&gt;14&lt;/a&gt;&amp;nbsp; split out to 566 protein structures but I can’t&amp;nbsp; think of an easy way to find the DPPIV intersect…..&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-family: verdana; font-size: 130%;"&gt;&lt;br /&gt;
On to Iodo-Phenylalanine in &lt;a href="http://www.drugbank.ca/drugs/DB03660"&gt;DB03660&lt;/a&gt;.&amp;nbsp; Here again the specified link&amp;nbsp;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=134497"&gt;CID 134497&lt;/a&gt; maps to a protein structure that is not DPPIV.&amp;nbsp;&amp;nbsp; This seems analogous to the DFP entry in that the rendered structure and CID apear to be wrong but the heteroatom right.&amp;nbsp; We can follow the &lt;a href="http://www.ebi.ac.uk/msd-srv/chempdb/cgi-bin/cgi.pl?FUNCTION=getByCode&amp;amp;CODE=PHI"&gt;PHI&lt;/a&gt; and find five ligands according to PDBe.&amp;nbsp; However, to get to a CID I have to pop the code into MMDB, where we finaly arrive at&amp;nbsp;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=447706"&gt;CID 447706&lt;/a&gt;&amp;nbsp; plus &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=447705"&gt;CID 447705&lt;/a&gt;. But, as you can see below the ligand structures extracted by PDBe and MMDB are not the same.&amp;nbsp; Whose right ? I dunno so I’ll give DrugBank the benefit of doubt on the chemistry …… but not for the target mapping where here again 10RW is piggy not human DPPIV. &lt;br /&gt;
&lt;/span&gt;&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-GEWoEEtnK_E/TkgayRK3CFI/AAAAAAAAAKg/9N8xPEUPyD0/s1600/Iodo_Phe.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="396" src="http://2.bp.blogspot.com/-GEWoEEtnK_E/TkgayRK3CFI/AAAAAAAAAKg/9N8xPEUPyD0/s640/Iodo_Phe.png" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;span style="font-family: verdana; font-size: 130%;"&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The next four entries are IUPACs.&amp;nbsp; This means they are likely to be published inhibitors, maybe even lead compounds.&amp;nbsp; Sure enough the first,&amp;nbsp;&lt;a href="http://www.drugbank.ca/drugs/DB07328"&gt;DB07328&lt;/a&gt;, one turns out to be an &lt;a href="http://www.ncbi.nlm.nih.gov/pubmed/16759095"&gt;Abbot compound against DPPIV&lt;/a&gt; but here again we have differences between PDBe and MMDB&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-family: verdana; font-size: 130%;"&gt;&lt;/span&gt;&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-naGrzuIDgo4/TkghlgRWOaI/AAAAAAAAAKk/0c2ccYRD9ig/s1600/2g63.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="400" src="http://4.bp.blogspot.com/-naGrzuIDgo4/TkghlgRWOaI/AAAAAAAAAKk/0c2ccYRD9ig/s400/2g63.png" width="360" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;span style="font-family: verdana; font-size: 130%;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-family: verdana; font-size: 130%;"&gt;On the left is the DrugBank PubChem link &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=6914640"&gt;CID 6914640&lt;/a&gt; which looks identical to the PDBe rendering at the bottom of the figure above. On the right is the different MMDB ligand for the same PDB code as &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=11840913"&gt;CID 11840913&lt;/a&gt;. &amp;nbsp; On to the next IUPAC&amp;nbsp; &lt;a href="http://www.drugbank.ca/drugs/DB07779"&gt;DB07779&lt;/a&gt;.&amp;nbsp; Here we have concordance (DrugBank &amp;gt; PubChem &amp;gt; MMDB = PDBe)&amp;nbsp; but this time the compound&amp;nbsp; is from &lt;a href="http://www.ncbi.nlm.nih.gov/pubmed/16376544"&gt;Santhera Pharmaceuticals&lt;/a&gt;.&amp;nbsp; The third, &lt;a href="http://www.drugbank.ca/drugs/DB08429"&gt;DB08429, &lt;/a&gt;also &lt;/span&gt;&lt;span style="font-family: verdana; font-size: 130%;"&gt;from Santher&lt;/span&gt;&lt;span style="font-family: verdana; font-size: 130%;"&gt;a, seems to map OK but has no Publication on the PDB. There is a link to the &lt;a href="http://www.ebi.ac.uk/citexplore/citationDetails.do?dataSource=MED&amp;amp;externalId=19515557"&gt;publications&lt;/a&gt;&amp;nbsp;&amp;nbsp; that have four strucutures linked between CiteExplor and ChEMBL. The 4th one in this set,&amp;nbsp;&lt;a href="http://www.drugbank.ca/drugs/DB06939"&gt;DB06939&lt;/a&gt; comes from Merck&amp;nbsp; as&amp;nbsp;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=16122596"&gt;CID 16122596&lt;/a&gt;&amp;nbsp;&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-family: verdana; font-size: 130%;"&gt;&amp;nbsp;&amp;nbsp; &lt;br /&gt;
The last thee drug links work fine and don’t show anything to add here except&amp;nbsp; that&amp;nbsp;&amp;nbsp;&lt;a href="http://www.drugbank.ca/drugs/DB01261"&gt;DB01261&lt;/a&gt; = &lt;/span&gt;&lt;b&gt;&lt;/b&gt;&lt;span style="font-family: verdana; font-size: 130%;"&gt;Sitagliptin, actualy could have had a hetero atom link as &lt;a href="http://www.ebi.ac.uk/pdbe-srv/pdbechem/chemicalCompound/show/715"&gt;715&lt;/a&gt;.&amp;nbsp; &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;Conclusions&lt;/b&gt;:&amp;nbsp; Despite some quirks and curatorial gaps (i.e. the sugars would not qualify as “experimental” drug entries but the classical&amp;nbsp; DFP inhibitor could) this is a useful DrugBank page for those interested in DPPIV.&amp;nbsp; In particular, navigating the links was particlualy instrcuctive and the provision of these is one the strenghts of this database.&amp;nbsp; They also demonstrated for the first time to me (has anyone else come across this ?) that the ligand structures in PDBe and MMDB are not identical.&amp;nbsp; No,&amp;nbsp; I have not cross-checked with RSC-PBD ligands but I might follow this up. &lt;br /&gt;
&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2155351992730855318-6570598562120443037?l=cdsouthan.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://cdsouthan.blogspot.com/2011/08/compound-to-target-mappings-part-i.html</link><author>noreply@blogger.com (Chris Southan)</author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://4.bp.blogspot.com/-jdU9CAe7xx8/TkeiKYG1-jI/AAAAAAAAAKU/3C9OHk-IrGE/s72-c/DPPIV.png" height="72" width="72" /><thr:total>0</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-2155351992730855318.post-138522429847859798</guid><pubDate>Sun, 01 May 2011 11:48:00 +0000</pubDate><atom:updated>2011-08-14T13:05:08.108-07:00</atom:updated><category domain="http://www.blogger.com/atom/ns#">ORFs</category><category domain="http://www.blogger.com/atom/ns#">IPI</category><category domain="http://www.blogger.com/atom/ns#">bio</category><title>Alas,  Poor International Protein Index  -  I Knew It Well…....</title><description>&lt;span style="font-family: verdana; font-size: 130%;"&gt;&lt;br /&gt;
If you don’t know that&amp;nbsp; much about the&lt;a href="http://www.ebi.ac.uk/IPI/IPIhelp.html"&gt; IPI&amp;nbsp;&lt;/a&gt; you will need to get the gist of it or this post, offering &amp;nbsp;arguments for &amp;nbsp;&lt;u&gt;not&lt;/u&gt; closing it down, will be a bit challenging.&amp;nbsp; I first looked this database when I was writing a &lt;a href="http://www.ncbi.nlm.nih.gov/pubmed/15174140"&gt;review on protein number&lt;/a&gt; &amp;nbsp;in 2003 and had numerous chats with PK about it over the phone. This was not only in order to do it justice &amp;nbsp;in the manuscript but it also had an important bearing on what OGS was doing at the time in their efforts to &lt;a href="http://www.bio-itworld.com/archive/040702/protein-altas"&gt;stamp as many ORFs as possible by mass-spec proteomics&lt;/a&gt;.&amp;nbsp; I have followed IPI with interest ever since.What you can see below are graphical snapshots (with bitmaps squished to match the number axis)&amp;nbsp; of 2003-2005 on the left and 2009 – 2011 on the right&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;a href="http://1.bp.blogspot.com/-RKWlX_LPsTE/TgR5PIfa7xI/AAAAAAAAAJw/htVmfAsaiFg/s1600/IPI+history.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="329" src="http://1.bp.blogspot.com/-RKWlX_LPsTE/TgR5PIfa7xI/AAAAAAAAAJw/htVmfAsaiFg/s640/IPI+history.png" width="640" /&gt;&lt;/a&gt;&lt;br /&gt;
&amp;nbsp;&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-family: verdana; font-size: 130%;"&gt;There are many features on these plots, some obvious and others not so obvious, &amp;nbsp;but one that PK did explain to me was that the UniProt &amp;nbsp;downward blips were due to to Swiss-Prot update cycles subsuming TrEMBL entries.&amp;nbsp; The intial kick-in of Vega is also explicable but the 2009 Ensembl jump (splice form paramaterisation changing and/or Vega merging ?)&amp;nbsp; as well as the continued RefSeq yo-yoing are mysterious from the outside at least.&amp;nbsp;&amp;nbsp; Also, according to what it says on the box&amp;nbsp; for a “minimally redundant yet maximally complete sets of proteins” &amp;nbsp;&amp;nbsp;IPI numbers seem large. &amp;nbsp;&amp;nbsp;After its first release of 33,013 it hit a peak of 67,105 in mid-2002, mainly because of the large deposition of the wonkey &lt;i&gt;ab initio&lt;/i&gt; RefSeq XP predictions, but collapsed back to 39,440 in 2003 as you can see on the left.&amp;nbsp; Paradoxically, while Swiss-Prot had “closed” the canonical human proteome at 20,232 by 2008&amp;nbsp;&amp;nbsp; (232 more than I predicted in 2006) IPI had bloated to over 90K.&amp;nbsp; Now, I take PK’s point that the algorithmic details of reciprocal cluster membeship, sequence overlaps and matching thresholds mean that IPI totals might well be expected to exeed any hypothetical &amp;nbsp;cannonical set. One of the reasons is that longer splice forms “cluster out” because of the non-overlaping exons.&amp;nbsp; Notwithstanding, I posit that if the major global protein pipelines feeding IPI were increasing their curation/annotation quality, purging deprecated ORFs, working from an improved reference sequence and otherwise generally geting it together&amp;nbsp; the numbers should be going south.&amp;nbsp; Oddly the 5-way consensus count ( entries pointing to UniProtKB, RefSeq, Ensembl, H-InvDB and Vega) seems to have been hovering around 13.3 to 13.5 K for a long time.&amp;nbsp; Granted that’s&amp;nbsp; well over half the proteome but the subset overlaps (2-way, 3-way and 4-way)&amp;nbsp; are - all over the place.So whats going on ?&amp;nbsp; Despite accepted the clustering bias towards ovecounting the fact that IPI continues relentlessy northwards&amp;nbsp; (~three-fold in a decade) suggests, to me at least, that the main&lt;a href="http://www.ebi.ac.uk/IPI/Databases.html"&gt; global pipelines for proteome annotation&lt;/a&gt;&amp;nbsp;&amp;nbsp; are not converging but diverging. Evidence for this includes &amp;nbsp;the “orphan” numbers (i.e. ORFs unique to each of the 5)&lt;br /&gt;
&lt;br /&gt;
Number of IPI entries pointing only to RefSeq         3313&lt;br /&gt;
Number of IPI entries pointing only to Ensembl  3941&lt;br /&gt;
Number of IPI entries pointing only to H-InvDB      2925&lt;br /&gt;
Number of IPI entries pointing only to Vega            6790&lt;br /&gt;
&lt;br /&gt;
For this release the 5-way concensus (UniProtKB/RefSeq/Ensembl/H-InvDB/Vega )  was 13738. As we know, some of these orphans are not unexpected, for example &lt;/span&gt;&lt;span style="font-family: verdana; font-size: 130%;"&gt;UniProtKB/TrEMBL includes &lt;/span&gt;&lt;span style="font-family: verdana; font-size: 130%;"&gt;all the partial CDS bits and pieces plus a few spurious 3’ UTR ORFs.  Others seem to beg examination and explaination, for example, where/how is Vega picking up 6.7K ORFs that cluster-out  ? &lt;br /&gt;
&lt;br /&gt;
On the occasions I’ve mentioned some of this over a beer or wine glass in various venus I hear back something along the lines of “don’t worry, the CCDS initiative will sort it all out” . Now, that basic good idea, of resolving reference coding transcripts against reference genomic coordinates,  has top-notch international credibility,  ‘nuff said.  However, I can’t see the entire set of pipeline discordancies that IPI exposes magically dissapearing as a consequence.  But, (and here’s the key point)  when IPI is turned off, this important job of publically tracking annotated ORF intersects and differencials (perhaps even eventual convergence on the back of CCDS)  will not only disappear but also constitute an inadvertant de facto sweeping of the problem under the carpet.  Now, to keep the record straight,  IMCO IPI was, and remains, a very good thing. Given that it is bedded in and with all this cloud space around surely it can’t be that resource-hungry to maintain for a few reference genomes ?&lt;br /&gt;
&lt;br /&gt;
I might do an update before they throw the switch in towards the end of 2011 but there are two final points to touch on. The first, that puzzles me,  is that, unless I have missed it, the paper(s) that could be written on a detailed analsis of the causality behind the discordancies have not yet appeared (the data sets are sitting there).   The second is that one of the (main?) objectives of the IPI exersise was that the regularly updated “minimally redundant yet maximally complete sets of proteins” was exactly what the proteomics community needed to use as a definitive (granted without the PTMs and common SNP variants) search space for their MS data scans to hook out peptides and map them back to protein IDs.  So what will they do when IPI disappears, revert back to NCBI non-redundant human? Or just Swiss-Prot?  (both problematic for MS hooking for different reasons).  This raises an interesting paradox (something I might also catch up on later) in that, on a good day, proteomics is a (the best?)  way to illuminate the IPI results.  For example, let’s say the MS practitioners set the orphan-specific tryptics form the lists above into their search space.  Some might get unequivocally and consistently stamped and therefore not only support that individual pipeline “bias” but also become Swiss-Prot (and HPA) candidates.  Those that persistently remain unseen in any experimental setting probably just ain't there and the pipelines concerned could tune them out.  &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2155351992730855318-138522429847859798?l=cdsouthan.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://cdsouthan.blogspot.com/2011/06/alas-poor-international-protein-index-i.html</link><author>noreply@blogger.com (Chris Southan)</author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://1.bp.blogspot.com/-RKWlX_LPsTE/TgR5PIfa7xI/AAAAAAAAAJw/htVmfAsaiFg/s72-c/IPI+history.png" height="72" width="72" /><thr:total>0</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-2155351992730855318.post-447611571194159100</guid><pubDate>Sat, 26 Mar 2011 20:56:00 +0000</pubDate><atom:updated>2011-08-15T14:07:42.538-07:00</atom:updated><category domain="http://www.blogger.com/atom/ns#">chem</category><title>The 2010 FDA drug approvals</title><description>&lt;span style="font-family: verdana; font-size: 130%;"&gt;&lt;br /&gt;
As well as the usual annual &lt;a href="http://www.nature.com/nrd/journal/v10/n2/full/nrd3370.html"&gt;Nature Drug Discovery&lt;/a&gt; article you can find background information in the &lt;a href="http://chembl.blogspot.com/search/label/2010%20New%20Drugs"&gt;ChEMBL blog&lt;/a&gt; including target mappings so I shall just make a few observations related to the PubChem links per se. These are presented as an alphabetic INN listing (and one trade name) of the 2010 approvals that mapped, as best as I could ascertain, to CIDs&lt;br /&gt;
&lt;br /&gt;
1. Alcaftadine (Lastacaft) = &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=19371515"&gt;CID: 19371515&lt;/a&gt;.  Compared to the rest of the list it was unusual to find just a single-CID mapping with concordance from all seven sources.  JFTR &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=11948478"&gt;CID: 11948478&lt;/a&gt; with identical Mw, has the nitrogen moved one step round the ring, presumably feeding in from a patent, via Thomson Pharma (as PubChem source name, without the “Reuters”).  It is registered in ChEMBL as an approved drug without assay data (more to come?) but linked to 12 clinical trials against only two via PubChem/ChemID Plus.  ChemSpider links it to an early Jansen patent via SureChem but oddly has no reciprocal PubChem link.   &lt;br /&gt;
&lt;br /&gt;
2. Cabazitaxel (Jevtana) = &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=9854073"&gt;CID: 9854073&lt;/a&gt;. The MeSH name mapping chose only this CID but there are five canonical SMILES matches (“CID &amp;gt; same connectivity”). With 11 stereo centers according to ChemSpider &lt;a href="http://www.chemspider.com/Chemical-Structure.8029779.html"&gt;8029779&lt;/a&gt; that’s hardly surprising. This may have something to do with the peculiarly large similarity cluster of &lt;a href="http://www.ncbi.nlm.nih.gov/sites/entrez?LinkName=pccompound_pccompound&amp;amp;db=pccompound&amp;amp;cmd=Link&amp;amp;from_uid=9854073"&gt;3426 Links&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
3. Carglumic acid (Carbaglu) = &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=121396"&gt;CID: 12139&lt;/a&gt;6.  While MeSH and DrugBank select the (2S) the PubChem Bioassay (ChEMBL) data maps to the unresolved &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=3679006"&gt;CID: 3679006&lt;/a&gt;. Four submitters chose the (2R) as &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=1749866"&gt;CID: 1749866&lt;/a&gt; while Discovery Gate and NextBio linked both&lt;br /&gt;
&lt;br /&gt;
5. Dabigitran (Pradaxa) has six INN-to-CID matches but &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=216210"&gt;CID: 216210&lt;/a&gt; is the majority bet for the active component and the ChEMBL in vitro assays are mapped to this.  The administered prodrug (exetilate mesylate) is &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=11434065"&gt;CID 11434065&lt;/a&gt;  but some ChEMBL in vivo data is mapped to the exetilate &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=6445226"&gt;CID: 6445226&lt;/a&gt;.  Should you wish to experiment with a  [13C6] version you can get it from Alsachim via &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=49849310"&gt;CID: 49849310&lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;
6. Dalfampridine (Ampyra) = &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=1727"&gt;CID 1727&lt;/a&gt;. An early CID from 2005-03-25 with 112 submitter links.  Has &lt;a href="http://www.ncbi.nlm.nih.gov/sites/entrez?db=pccompound&amp;amp;cmd=Link&amp;amp;LinkName=pccompound_pcassay&amp;amp;from_uid=1727"&gt;659&lt;/a&gt; assay results but only active in 3.  The tertradeutero was deposited as &lt;a href="http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=search&amp;amp;db=pcsubstance&amp;amp;term=120339102%5buid%5d"&gt;SID 120339102&lt;/a&gt; from Discovery Gate.&amp;nbsp;&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-family: verdana; font-size: 130%;"&gt;  &lt;br /&gt;
7. Eribulin Mesylate  (Halaven) &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=17755248"&gt;CID 17755248&lt;/a&gt;  includes Eribulin (parent)  = &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=11354606"&gt;CID 11354606&lt;/a&gt;  plus Methanesulphonic acid &lt;a href="cid:%206395"&gt;CID: 6395&lt;/a&gt;   but, oddly, the former has a wonky rendering, for which Thomson Pharma have a correct version  (within the same system)  as &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?sid=16446284"&gt;SID 16446284&lt;/a&gt;.  CID 11354606 has direct Clinical trials links as opposed to via the ChemID Plus link.&lt;br /&gt;
&lt;br /&gt;
8. Fingolimod (Gilenya) = &lt;a href="cid:%20107970"&gt;CID: 107970&lt;/a&gt; is a prodrug. I eventually found the phospho form as &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=9908268"&gt;CID: 9908268&lt;/a&gt; but had to do a substructure search because, guess what, the derivative name was FTY720-phosphate (i.e. a code name not the INN). JFTR the 15-deutero &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=49849425"&gt;CID: 49849425&lt;/a&gt;      makes a striking rendering but in this case the Alsachim &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?sid=104178328"&gt; SID 104178328&lt;/a&gt;  was deadlinked&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-family: verdana; font-size: 130%;"&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-family: verdana; font-size: 130%;"&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-family: verdana; font-size: 130%;"&gt;&lt;/span&gt;&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-7FZ3oSnIVKw/TgfKk6bbW3I/AAAAAAAAAJ4/EG7TgURf5tw/s1600/1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="200" src="http://2.bp.blogspot.com/-7FZ3oSnIVKw/TgfKk6bbW3I/AAAAAAAAAJ4/EG7TgURf5tw/s200/1.png" width="200" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;span style="font-family: verdana; font-size: 130%;"&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
9. Liraglutide = &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=44147092"&gt;CID: 44147092&lt;/a&gt;, is the 2nd largest CID here, weighing in at 3751.2 (with a nice peptide rendering) but it has been in MeSH since 2001.&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-sAxKCx5GCqM/ThOHywcoT3I/AAAAAAAAAJ8/ihdW4xMCmEw/s1600/Lira_g.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://3.bp.blogspot.com/-sAxKCx5GCqM/ThOHywcoT3I/AAAAAAAAAJ8/ihdW4xMCmEw/s1600/Lira_g.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;span style="font-family: verdana; font-size: 130%;"&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
10. Lurasidone = &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=213046"&gt;CID: 213046&lt;/a&gt;.  The seven canonical SMILES matches have no direct assay results but include links to Glida GPCR ligand collection via &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=44210114"&gt;CID: 44210114&lt;/a&gt; and ChemBank via &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=24011707"&gt;CID: 24011707&lt;/a&gt; and a di-deutero as &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=25207699"&gt;CID: 25207699&lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;
11. Natazia = &lt;a href="http://cid%209874560/"&gt;CID 9874560&lt;/a&gt;  = Dienogest = &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=68861"&gt;CID 68861&lt;/a&gt;  + estradiol valerate = CID 13791.  The mixture, actually not mapped to the trade name, is a Thomson Pharma entry &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?sid=14839526"&gt;SID 14839526&lt;/a&gt;  from 2006-10-25, possibly a portfolio pick up &lt;br /&gt;
&lt;br /&gt;
12. Polidocanol (Asclera) = &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=24750"&gt;CID: 24750&lt;/a&gt;  with a mono-deuero : &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=10489712"&gt;CID: 10489712&lt;/a&gt; and dideutero  &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=10823501"&gt;CID: 10823501&lt;/a&gt; and an odd PDB ligand appearance&lt;br /&gt;
&lt;br /&gt;
13. Tesamorelin = &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=44201342"&gt;CID: 44201342&lt;/a&gt; but the name match was from  &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?sid=85197628"&gt;SID 85197628&lt;/a&gt;, fed in only by Prous Drugs of the Future (now Thomson) with   a nice rendering of the peptide, largest CID in this set, at 5135.7&amp;nbsp;&lt;/span&gt;&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-I1CzJ8dU8jg/ThOIboGTxkI/AAAAAAAAAKA/Dp6oCvXa4zE/s1600/tesmo.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="136" src="http://1.bp.blogspot.com/-I1CzJ8dU8jg/ThOIboGTxkI/AAAAAAAAAKA/Dp6oCvXa4zE/s400/tesmo.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;span style="font-family: verdana; font-size: 130%;"&gt;&lt;br /&gt;
&lt;br /&gt;
14. Ulipristal Acetate (Ella) = &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=130904"&gt;CID: 130904&lt;/a&gt; with five stereoisomer CIDs.   An interesting inclusion of this in a selective progesterone receptor modulator assay from a 2008 paper (via ChEMBL)&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;Conclusions:&amp;nbsp; &lt;/b&gt; As observed for the &lt;a href="http://cdsouthan.blogspot.com/2010/05/2009-fda-approvals-in-pubchem.html"&gt;2009 FDA approvals&lt;/a&gt;, considering their importance, the linking seems oddly patchy. DrugBank and ChEMBL seem to be improving their new drug capture but MeSH seems to miss quite a few or have no pharmacological classification.    Assay data coverage seems very patchy and (obviously) depends either on them getting into the NIH screening center collections, which they generally do not (yet?) or have published data captured via ChEMBL. While the strutures are fairly straightforward to navigate via the PubChem rules some of these opened up unexpected linkage complications related to synonyms, isomers, mixtures, prodrugs and metabolites. This can supply enough detail for their own later postings. &lt;br /&gt;
&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2155351992730855318-447611571194159100?l=cdsouthan.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://cdsouthan.blogspot.com/2011/08/approvals.html</link><author>noreply@blogger.com (Chris Southan)</author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://2.bp.blogspot.com/-7FZ3oSnIVKw/TgfKk6bbW3I/AAAAAAAAAJ4/EG7TgURf5tw/s72-c/1.png" height="72" width="72" /><thr:total>0</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-2155351992730855318.post-4039422458042012433</guid><pubDate>Tue, 23 Nov 2010 00:46:00 +0000</pubDate><atom:updated>2011-06-23T12:51:24.514-07:00</atom:updated><category domain="http://www.blogger.com/atom/ns#">chem</category><title>Shrinking PubChem  – Yep, it wuz us</title><description>&lt;div class="MsoNormal"&gt;&lt;span lang="EN-GB" style="font-family: &amp;quot;Verdana&amp;quot;,&amp;quot;sans-serif&amp;quot;;"&gt;We expect databases to show growth but there are occasions when, just for&amp;nbsp; a change, they shrink.&amp;nbsp; This is the story behind a big dip in PubChem compounds that you can see in this slide from SB's presentation at TACBAC in March 2010. &lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;
&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;
&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-UzO_UUVQa0c/TgJ9WjFnerI/AAAAAAAAAJs/yQfT_qnu5gw/s1600/shrunk+PubChem.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="223" src="http://1.bp.blogspot.com/-UzO_UUVQa0c/TgJ9WjFnerI/AAAAAAAAAJs/yQfT_qnu5gw/s320/shrunk+PubChem.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="font-family: &amp;quot;Verdana&amp;quot;,&amp;quot;sans-serif&amp;quot;;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;span lang="EN-GB" style="font-family: &amp;quot;Verdana&amp;quot;,&amp;quot;sans-serif&amp;quot;;"&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;
&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span lang="EN-GB" style="font-family: &amp;quot;Verdana&amp;quot;,&amp;quot;sans-serif&amp;quot;;"&gt;The story starts about a week before the end of July 2009 (but my blog doesn’t go back that far) &amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;
&lt;/div&gt;&lt;div class="MsoListParagraphCxSpFirst" style="mso-list: l0 level1 lfo1; text-indent: -18.0pt;"&gt;&lt;span lang="EN-GB" style="font-family: &amp;quot;Verdana&amp;quot;,&amp;quot;sans-serif&amp;quot;;"&gt;1)&lt;span style="font: small &amp;quot;Times New Roman&amp;quot;;"&gt;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;/span&gt;&lt;span lang="EN-GB" style="font-family: &amp;quot;Verdana&amp;quot;,&amp;quot;sans-serif&amp;quot;; font-size: small;"&gt;I notice the PubChem compound count has jumped by nearly 10 million&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoListParagraphCxSpMiddle" style="text-indent: -18pt;"&gt;&lt;span lang="EN-GB" style="font-family: &amp;quot;Verdana&amp;quot;,&amp;quot;sans-serif&amp;quot;; font-size: small;"&gt;2)&lt;span style="-moz-font-feature-settings: normal; -moz-font-language-override: normal; font-family: &amp;quot;Times New Roman&amp;quot;; font-size-adjust: none; font-stretch: normal; font-style: normal; font-variant: normal; font-weight: normal; line-height: normal;"&gt;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;/span&gt;&lt;span lang="EN-GB" style="font-family: &amp;quot;Verdana&amp;quot;,&amp;quot;sans-serif&amp;quot;; font-size: small;"&gt;SM says immediately “they must have let virtuals in” &lt;/span&gt;&lt;/div&gt;&lt;div class="MsoListParagraphCxSpMiddle" style="text-indent: -18pt;"&gt;&lt;span lang="EN-GB" style="font-family: &amp;quot;Verdana&amp;quot;,&amp;quot;sans-serif&amp;quot;; font-size: small;"&gt;3)&lt;span style="-moz-font-feature-settings: normal; -moz-font-language-override: normal; font-family: &amp;quot;Times New Roman&amp;quot;; font-size-adjust: none; font-stretch: normal; font-style: normal; font-variant: normal; font-weight: normal; line-height: normal;"&gt;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;/span&gt;&lt;span lang="EN-GB" style="font-family: &amp;quot;Verdana&amp;quot;,&amp;quot;sans-serif&amp;quot;; font-size: small;"&gt;I check and establish that &lt;a href="http://zinc.docking.org/"&gt;ZINC&lt;/a&gt; is the source of the jump &lt;/span&gt;&lt;/div&gt;&lt;div class="MsoListParagraphCxSpMiddle" style="text-indent: -18pt;"&gt;&lt;span lang="EN-GB" style="font-family: &amp;quot;Verdana&amp;quot;,&amp;quot;sans-serif&amp;quot;; font-size: small;"&gt;4)&lt;span style="-moz-font-feature-settings: normal; -moz-font-language-override: normal; font-family: &amp;quot;Times New Roman&amp;quot;; font-size-adjust: none; font-stretch: normal; font-style: normal; font-variant: normal; font-weight: normal; line-height: normal;"&gt;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;/span&gt;&lt;span lang="EN-GB" style="font-family: &amp;quot;Verdana&amp;quot;,&amp;quot;sans-serif&amp;quot;; font-size: small;"&gt;I phone SB who confirms, as we suspected, they don’t want virtuals &lt;/span&gt;&lt;/div&gt;&lt;div class="MsoListParagraphCxSpMiddle" style="text-indent: -18pt;"&gt;&lt;span lang="EN-GB" style="font-family: &amp;quot;Verdana&amp;quot;,&amp;quot;sans-serif&amp;quot;; font-size: small;"&gt;5)&lt;span style="-moz-font-feature-settings: normal; -moz-font-language-override: normal; font-family: &amp;quot;Times New Roman&amp;quot;; font-size-adjust: none; font-stretch: normal; font-style: normal; font-variant: normal; font-weight: normal; line-height: normal;"&gt;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;/span&gt;&lt;span lang="EN-GB" style="font-family: &amp;quot;Verdana&amp;quot;,&amp;quot;sans-serif&amp;quot;; font-size: small;"&gt;I send an e-mail to BS, who acknowledges they are the source and he refers me to his colleague JI&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoListParagraphCxSpMiddle" style="text-indent: -18pt;"&gt;&lt;span lang="EN-GB" style="font-family: &amp;quot;Verdana&amp;quot;,&amp;quot;sans-serif&amp;quot;; font-size: small;"&gt;6)&lt;span style="-moz-font-feature-settings: normal; -moz-font-language-override: normal; font-family: &amp;quot;Times New Roman&amp;quot;; font-size-adjust: none; font-stretch: normal; font-style: normal; font-variant: normal; font-weight: normal; line-height: normal;"&gt;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;/span&gt;&lt;span lang="EN-GB" style="font-family: &amp;quot;Verdana&amp;quot;,&amp;quot;sans-serif&amp;quot;; font-size: small;"&gt;JI confirms that over 11 M ZINC compounds are virtuals from the Enamine make-on-demand catalog and adds that they have a high success rate in making them.&amp;nbsp; &lt;/span&gt;&lt;/div&gt;&lt;div class="MsoListParagraphCxSpMiddle" style="text-indent: -18pt;"&gt;&lt;span lang="EN-GB" style="font-family: &amp;quot;Verdana&amp;quot;,&amp;quot;sans-serif&amp;quot;; font-size: small;"&gt;7)&lt;span style="-moz-font-feature-settings: normal; -moz-font-language-override: normal; font-family: &amp;quot;Times New Roman&amp;quot;; font-size-adjust: none; font-stretch: normal; font-style: normal; font-variant: normal; font-weight: normal; line-height: normal;"&gt;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;/span&gt;&lt;span lang="EN-GB" style="font-family: &amp;quot;Verdana&amp;quot;,&amp;quot;sans-serif&amp;quot;; font-size: small;"&gt;We all exchange a few mails with SB and EB at PubChem.&amp;nbsp; &lt;/span&gt;&lt;/div&gt;&lt;div class="MsoListParagraphCxSpMiddle" style="text-indent: -18pt;"&gt;&lt;span lang="EN-GB" style="font-family: &amp;quot;Verdana&amp;quot;,&amp;quot;sans-serif&amp;quot;; font-size: small;"&gt;8)&lt;span style="-moz-font-feature-settings: normal; -moz-font-language-override: normal; font-family: &amp;quot;Times New Roman&amp;quot;; font-size-adjust: none; font-stretch: normal; font-style: normal; font-variant: normal; font-weight: normal; line-height: normal;"&gt;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;/span&gt;&lt;span lang="EN-GB" style="font-family: &amp;quot;Verdana&amp;quot;,&amp;quot;sans-serif&amp;quot;; font-size: small;"&gt;After consideration of some kind of new Entrez tag to specify virtual compounds EB suggests it might be simpler just to pull them&amp;nbsp; &lt;/span&gt;&lt;/div&gt;&lt;div class="MsoListParagraphCxSpMiddle" style="text-indent: -18pt;"&gt;&lt;span lang="EN-GB" style="font-family: &amp;quot;Verdana&amp;quot;,&amp;quot;sans-serif&amp;quot;; font-size: small;"&gt;9)&lt;span style="-moz-font-feature-settings: normal; -moz-font-language-override: normal; font-family: &amp;quot;Times New Roman&amp;quot;; font-size-adjust: none; font-stretch: normal; font-style: normal; font-variant: normal; font-weight: normal; line-height: normal;"&gt;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;/span&gt;&lt;span lang="EN-GB" style="font-family: &amp;quot;Verdana&amp;quot;,&amp;quot;sans-serif&amp;quot;; font-size: small;"&gt;This is supported by everyone, the virtuals are duly revoked and - voila - PubChem shrinks &lt;/span&gt;&lt;/div&gt;&lt;div class="MsoListParagraphCxSpMiddle" style="text-indent: -18pt;"&gt;&lt;span lang="EN-GB" style="font-family: &amp;quot;Verdana&amp;quot;,&amp;quot;sans-serif&amp;quot;; font-size: small;"&gt;10)&lt;/span&gt;&lt;span lang="EN-GB" style="font-family: &amp;quot;Verdana&amp;quot;,&amp;quot;sans-serif&amp;quot;; font-size: small;"&gt;Everyone is happy and the &lt;a href="http://www.enamine.net/index.php?option=com_content&amp;amp;task=view&amp;amp;id=8"&gt;Enamine REAL&lt;/a&gt; enumerated mod’s continue to sell from the catalogue &lt;/span&gt;&lt;/div&gt;&lt;div class="MsoListParagraphCxSpLast"&gt;&lt;br /&gt;
&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;
&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span lang="EN-GB" style="font-family: &amp;quot;Verdana&amp;quot;,&amp;quot;sans-serif&amp;quot;;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;
&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;
&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2155351992730855318-4039422458042012433?l=cdsouthan.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://cdsouthan.blogspot.com/2011/06/shrinking-pubchem-aug-2009-yep-it-wuz.html</link><author>noreply@blogger.com (Chris Southan)</author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://1.bp.blogspot.com/-UzO_UUVQa0c/TgJ9WjFnerI/AAAAAAAAAJs/yQfT_qnu5gw/s72-c/shrunk+PubChem.png" height="72" width="72" /><thr:total>0</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-2155351992730855318.post-7420493131066909308</guid><pubDate>Wed, 22 Sep 2010 21:25:00 +0000</pubDate><atom:updated>2011-06-23T13:48:38.332-07:00</atom:updated><category domain="http://www.blogger.com/atom/ns#">bio</category><title>So is it or isn’t it - a protein?</title><description>&lt;div style="font-family: Verdana,sans-serif;"&gt;&lt;span style="font-size: small;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="font-family: Verdana,sans-serif;"&gt;&lt;span style="font-size: small; line-height: 115%;"&gt;Subsequent to my&amp;nbsp; 2004&amp;nbsp; &lt;a href="http://www.ncbi.nlm.nih.gov/pubmed/15174140?dopt=Abstract"&gt;publication&lt;/a&gt; and a &lt;a href="http://www.slideshare.net/cdsouthan/the-yoyo-has-stoppedreviewing-the-evidence-for-a-low-basal-human-protein-number"&gt;presentation&lt;/a&gt; at the Swiss-Prot 2006 conference I maintain&amp;nbsp; an abiding interest in what, at that time, I&amp;nbsp; termed the basal (i.e. unspliced) protein number.&amp;nbsp; In deference to the Swiss-Prot philosophy and curation rules I would now term this the &lt;a href="http://www.uniprot.org/faq/30"&gt;“canonical”&lt;/a&gt; number of primary ORFs to which sequence variants from gene loci would be mapped.&amp;nbsp; My attention was therefore duly piqued by the publication &lt;b&gt;&lt;span style="color: red;"&gt;"A human-specific de novo protein-coding gene associated with human brain functions."&lt;/span&gt;&lt;/b&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; You can find this and the other links I explore here from the SwissProt entry &lt;a href="http://www.uniprot.org/uniprot/Q8NBC4"&gt;Q8NBC4&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;/div&gt;&lt;div class="MsoNormal" style="font-family: Verdana,sans-serif;"&gt;&lt;span style="font-size: small;"&gt;&lt;a href="http://www.uniprot.org/uniprot/Q8NBC4"&gt;&lt;span lang="SV" style="line-height: 115%;"&gt;&lt;/span&gt;&lt;/a&gt;&lt;a href="http://www.blogger.com/post-edit.g?blogID=2155351992730855318&amp;amp;postID=7420493131066909308" name="_Hlt295769467"&gt;&lt;/a&gt;&lt;a href="http://www.blogger.com/post-edit.g?blogID=2155351992730855318&amp;amp;postID=7420493131066909308" name="_Hlt295769468"&gt;&lt;/a&gt;&lt;a href="http://www.blogger.com/post-edit.g?blogID=2155351992730855318&amp;amp;postID=7420493131066909308" name="_Hlt295048747"&gt;&lt;/a&gt;&lt;a href="http://www.blogger.com/post-edit.g?blogID=2155351992730855318&amp;amp;postID=7420493131066909308" name="_Hlt296256980"&gt;&lt;/a&gt;&lt;a href="http://www.blogger.com/post-edit.g?blogID=2155351992730855318&amp;amp;postID=7420493131066909308" name="_Hlt296256981"&gt;&lt;/a&gt;&lt;a href="http://www.blogger.com/post-edit.g?blogID=2155351992730855318&amp;amp;postID=7420493131066909308" name="_Hlt296276343"&gt;&lt;/a&gt;&lt;a href="http://www.blogger.com/post-edit.g?blogID=2155351992730855318&amp;amp;postID=7420493131066909308" name="_Hlt295048179"&gt;&lt;/a&gt;&lt;a href="http://www.blogger.com/post-edit.g?blogID=2155351992730855318&amp;amp;postID=7420493131066909308" name="_Hlt295048180"&gt;&lt;/a&gt;&lt;a href="http://www.blogger.com/post-edit.g?blogID=2155351992730855318&amp;amp;postID=7420493131066909308" name="_Hlt296207304"&gt;&lt;/a&gt;&lt;a href="http://www.blogger.com/post-edit.g?blogID=2155351992730855318&amp;amp;postID=7420493131066909308" name="_Hlt296207305"&gt;&lt;/a&gt;&lt;a href="http://www.blogger.com/post-edit.g?blogID=2155351992730855318&amp;amp;postID=7420493131066909308" name="_Hlt295048805"&gt;&lt;/a&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="font-family: Verdana,sans-serif;"&gt;&lt;span style="font-size: small; line-height: 115%;"&gt;Hmmm…..so is-it-or-isn’t it?&amp;nbsp;&amp;nbsp; The authors provide a broad set of data and arguments but let’s go through these to point out some gaps and weak points.&amp;nbsp; The first thing note is that, as they propose, the evidence is that this gene clearly has its evolutionary origins as non-coding transcript.&amp;nbsp; In fact the Havana/Vega teams had manually annotated it as a “Large intergenic non-coding RNA gene” (lincRNA)&amp;nbsp;&lt;a href="http://www.ensembl.org/Homo_sapiens/Gene/Summary?g=ENSG00000198547;r=20:31220661-31239680;t=ENST00000360785"&gt;ENSG00000198547&lt;/a&gt; in 2009,&amp;nbsp; even though TrEMBL had stamped as an (CDS) &lt;a href="http://www.uniprot.org/uniprot/Q8NBC4.txt?version=1"&gt;ORF&lt;/a&gt;&amp;nbsp; since 2002.&amp;nbsp; &lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="font-family: Verdana,sans-serif;"&gt;&lt;span style="font-size: small;"&gt;&lt;a href="http://www.blogger.com/post-edit.g?blogID=2155351992730855318&amp;amp;postID=7420493131066909308" name="_Hlt296461311"&gt;&lt;/a&gt;&lt;a href="http://www.blogger.com/post-edit.g?blogID=2155351992730855318&amp;amp;postID=7420493131066909308" name="_Hlt296461312"&gt;&lt;/a&gt;&lt;a href="http://www.blogger.com/post-edit.g?blogID=2155351992730855318&amp;amp;postID=7420493131066909308" name="_Hlt296276328"&gt;&lt;/a&gt;&lt;a href="http://www.blogger.com/post-edit.g?blogID=2155351992730855318&amp;amp;postID=7420493131066909308" name="_Hlt296276014"&gt;&lt;/a&gt;&lt;a href="http://www.blogger.com/post-edit.g?blogID=2155351992730855318&amp;amp;postID=7420493131066909308" name="_Hlt296276015"&gt;&lt;/a&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="font-family: Verdana,sans-serif;"&gt;&lt;span style="font-size: small;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="font-family: Verdana,sans-serif;"&gt;&lt;span style="font-size: small; line-height: 115%;"&gt;There are three corollaries to being a lincRNA (see &lt;a href="http://www.ncbi.nlm.nih.gov/pubmed/19182780"&gt;“Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals”&lt;/a&gt;)&amp;nbsp;&amp;nbsp; that they can have in common with proteins&amp;nbsp; a) an intron/exon structure, b) footprints of selective pressure&amp;nbsp;&amp;nbsp; and c) a tissue expression profile&amp;nbsp;&amp;nbsp; So minimum parsimony suggests this became a human-only ORF by mutational accident and drift.&amp;nbsp;&amp;nbsp; However, while the authors infer that has the necessary features for ribosomal translation the experiments in this paper just don’t quite nail this down for me for the following reasons.&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;/div&gt;&lt;div class="MsoListParagraph" style="font-family: Verdana,sans-serif; text-indent: -18pt;"&gt;&lt;span style="font-size: small; line-height: 115%;"&gt;1.&lt;span style="-moz-font-feature-settings: normal; -moz-font-language-override: normal; font-size-adjust: none; font-stretch: normal; font-style: normal; font-variant: normal; font-weight: normal; line-height: normal;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;/span&gt;&lt;span style="font-size: small; line-height: 115%;"&gt;The specificity rests entirely on some pretty streaky westerns with no quantitation and some obvious cut-and-paste in Fig 5D.&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;/div&gt;&lt;div class="MsoListParagraph" style="font-family: Verdana,sans-serif; text-indent: -18pt;"&gt;&lt;span style="font-size: small; line-height: 115%;"&gt;2.&lt;span style="-moz-font-feature-settings: normal; -moz-font-language-override: normal; font-size-adjust: none; font-stretch: normal; font-style: normal; font-variant: normal; font-weight: normal; line-height: normal;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;/span&gt;&lt;span style="font-size: small; line-height: 115%;"&gt;Give the extremely basic PI this protein is unlikely to “run true”&amp;nbsp; at 22K&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoListParagraph" style="font-family: Verdana,sans-serif; text-indent: -18pt;"&gt;&lt;span style="font-size: small; line-height: 115%;"&gt;3.&lt;span style="-moz-font-feature-settings: normal; -moz-font-language-override: normal; font-size-adjust: none; font-stretch: normal; font-style: normal; font-variant: normal; font-weight: normal; line-height: normal;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;/span&gt;&lt;span style="font-size: small; line-height: 115%;"&gt;E.Coli expression or an ORF for an in vitro construct does not prove the in vivo case, and the western confirmation becomes circular&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoListParagraph" style="font-family: Verdana,sans-serif; text-indent: -18pt;"&gt;&lt;span style="font-size: small; line-height: 115%;"&gt;4.&lt;span style="-moz-font-feature-settings: normal; -moz-font-language-override: normal; font-size-adjust: none; font-stretch: normal; font-style: normal; font-variant: normal; font-weight: normal; line-height: normal;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;/span&gt;&lt;span style="font-size: small; line-height: 115%;"&gt;The controls should have been expanded to include human thymus (the maximum EST count on UniGene) and least some of the human-negative tissues from Fig 4.&amp;nbsp; to support concordance between transcript abundance and the Western signal.&amp;nbsp; &lt;/span&gt;&lt;/div&gt;&lt;div class="MsoListParagraph" style="font-family: Verdana,sans-serif; text-indent: -18pt;"&gt;&lt;span style="font-size: small; line-height: 115%;"&gt;5.&lt;span style="-moz-font-feature-settings: normal; -moz-font-language-override: normal; font-size-adjust: none; font-stretch: normal; font-style: normal; font-variant: normal; font-weight: normal; line-height: normal;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;/span&gt;&lt;span style="font-size: small; line-height: 115%;"&gt;They should have performed reciprocal spike-ins signal titrations (e.g. adding the E.Coli lysate into the human extracts and vice-versa).&amp;nbsp; &lt;/span&gt;&lt;/div&gt;&lt;div class="MsoListParagraph" style="font-family: Verdana,sans-serif; text-indent: -18pt;"&gt;&lt;span style="font-size: small; line-height: 115%;"&gt;6.&lt;span style="-moz-font-feature-settings: normal; -moz-font-language-override: normal; font-size-adjust: none; font-stretch: normal; font-style: normal; font-variant: normal; font-weight: normal; line-height: normal;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;/span&gt;&lt;span style="font-size: small; line-height: 115%;"&gt;If, as worst but by no means impossible case, the antibody is picking up a spurious cross-reactivity signal around 22K then the immunohistochemistry also becomes circular and does not prove the cytoplasmic presence of Q8NBC4. &lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="font-family: Verdana,sans-serif;"&gt;&lt;span style="font-size: small;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="font-family: Verdana,sans-serif;"&gt;&lt;span style="font-size: small; line-height: 115%;"&gt;So what would convince yours truly of &lt;i&gt;in vivo&lt;/i&gt; de-novo expressed proteins?&amp;nbsp; &lt;/span&gt;&lt;/div&gt;&lt;ul style="font-family: Verdana,sans-serif;"&gt;&lt;li&gt;&lt;span style="font-size: small; line-height: 115%;"&gt;&lt;span style="-moz-font-feature-settings: normal; -moz-font-language-override: normal; font-size-adjust: none; font-stretch: normal; font-style: normal; font-variant: normal; font-weight: normal; line-height: normal;"&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;span style="font-size: small; line-height: 115%;"&gt;Gold standard:&amp;nbsp; From the tissue source, purify 100ug in a tube, get a big fat band on a gel,&amp;nbsp; a tryptic peptide map and verify at least 50% peptide coverage by MS/MS, and, as&amp;nbsp; this is small and probably PTM naked, try for an intact mass to boot.&amp;nbsp; Yep, all this is tricky when there is no activity to track.&amp;nbsp; &lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style="font-size: small; line-height: 115%;"&gt;&lt;span style="-moz-font-feature-settings: normal; -moz-font-language-override: normal; font-size-adjust: none; font-stretch: normal; font-style: normal; font-variant: normal; font-weight: normal; line-height: normal;"&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;span style="font-size: small; line-height: 115%;"&gt;Silver:&amp;nbsp; a) set two predicted tryptic peptides into the scanning window of an MS, run a LC/MS/MS on the trypsinised&amp;nbsp; tissue to pick them both up c) confirm with spike-ins of synthetic versions of both peptides and a negative control from another tissue&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style="font-size: small; line-height: 115%;"&gt;Bronze:&amp;nbsp; generate two anti-peptide antibodies, purify the IgG&amp;nbsp; from peptide columns,&amp;nbsp; perform a full set of corroborative western&amp;nbsp; controls and try some pull-downs&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;div class="MsoNormal" style="font-family: Verdana,sans-serif;"&gt;&lt;span style="font-size: small;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="font-family: Verdana,sans-serif;"&gt;&lt;span style="font-size: small; line-height: 115%;"&gt;OK so let’s give the authors the benefit of doubt and assume they have detected some translation.&amp;nbsp; Now, because most&amp;nbsp; proteins have had their structures sculpted by anything up to a couple of billion years&amp;nbsp; they consequently “know”, in the primary sequence and secondary structure sense, how to; &lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;span style="font-size: small; line-height: 115%;"&gt; a) start to fold and/or get&amp;nbsp; chaperoned&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: small; line-height: 115%;"&gt; b) be&amp;nbsp; trafficked to somewhere&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: small; line-height: 115%;"&gt; c) to do one or more functionally useful&amp;nbsp; things under biological relevant and evolutionarily selectable conditions.&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;br /&gt;
&lt;span style="font-size: small; line-height: 115%;"&gt;d) The structre/function selection outside the ORF per se is likely to cause differential expression regulated&amp;nbsp; to some level of tissue specificity (because &lt;i&gt;de novo&lt;/i&gt; proteins are not likely to have a housekeeping role).&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;/div&gt;&lt;div class="MsoNormal" style="font-family: Verdana,sans-serif;"&gt;&lt;span style="font-size: small; line-height: 115%;"&gt;The problem faced by this supposed protein (with no detectable homology or IntePro signature at all) is that it may not be able to do any of these things and could thus be constitutively consigned to the cellular rubbish dump via degradation pathways.&amp;nbsp; Being an extremely basic outlier on the theoretical pI scale at 11.74 might not help stability either (see the yellow spike on the chart below)&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;span style="font-size: small;"&gt;&lt;a href="http://2.bp.blogspot.com/-bmP7vxjfr0I/TgJe3yunjrI/AAAAAAAAAJo/4JIEqv6k_TQ/s1600/Fig+5.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="226" src="http://2.bp.blogspot.com/-bmP7vxjfr0I/TgJe3yunjrI/AAAAAAAAAJo/4JIEqv6k_TQ/s320/Fig+5.png" width="320" /&gt;&lt;/a&gt;&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;&lt;div align="center" class="MsoNormal" style="font-family: Verdana,sans-serif; text-align: center;"&gt;&lt;span style="font-size: small;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;/div&gt;&lt;div align="center" class="MsoNormal" style="font-family: Verdana,sans-serif; text-align: center;"&gt;&lt;span style="font-size: small;"&gt;Fig 5 from&amp;nbsp;&lt;a href="http://www.ncbi.nlm.nih.gov/pubmed/11337469"&gt;PMID 11337469&lt;/a&gt; with thanks&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;/div&gt;&lt;div class="MsoNormal" style="font-family: Verdana,sans-serif;"&gt;&lt;span style="font-size: small; line-height: 115%;"&gt;But surely it must be a &lt;i&gt;bone-fide&lt;/i&gt; protein because it passed the Swiss-Prot &lt;a href="http://www.uniprot.org/program/"&gt;annotation criteria&lt;/a&gt; – right?&amp;nbsp; In the vast majority of cases this is of course a good bet, they do a great job after all.&amp;nbsp; But, the problem here is likely to be “publication primacy”.&amp;nbsp; Having passed the first hurdle of ORFdom in having &lt;a href="http://www.uniprot.org/faq/25"&gt;multiple CDSs in TrEMBL&lt;/a&gt; the curator is likely to have taken the Li &lt;i&gt;et al&lt;/i&gt;. paper at face value (it’s not their role to critique)&amp;nbsp; and duly promoted&amp;nbsp; this to Swiss-Prot (in fact their &lt;a href="http://www.uniprot.org/faq/45"&gt;evidence triage&lt;/a&gt; is an interesting read) and with&amp;nbsp; RefSeq and HGNC presumably falling&amp;nbsp; in line as a consequence. This exposes an interesting quirk of annotation in that this supposed protein is awarded a a defaulted GO term “cytoplasm” because of it's negative feature of lacking a signal peptide sequence.&amp;nbsp;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;/div&gt;&lt;div class="MsoNormal" style="font-family: Verdana,sans-serif;"&gt;&lt;span style="font-size: small; line-height: 115%;"&gt;But, hang on (you say) can’t you see the PRIDE x-ref in the Swiss-Prot entry – surely that means it &lt;u&gt;has&lt;/u&gt; been existentialy nailed down via MS/MS from biological samples?&amp;nbsp; Having checked out the links I don’t think so. The identifications are all one-hit-wonders from &lt;/span&gt;&lt;span style="font-size: small; line-height: 115%;"&gt;plasma and cell lines&lt;/span&gt;&lt;span style="font-size: small; line-height: 115%;"&gt; where the data has hooked &lt;/span&gt;&lt;span style="font-size: small; line-height: 115%;"&gt;out the IPI &lt;/span&gt;&lt;span style="font-size: small; line-height: 115%;"&gt; entry via correlative scores. This is neither a criticism of PRIDE nor the data submitters and its great to be able to drill throught to this level of detail - but to me these still look like false-positives (no transcript in liver and no signal to be removed for secretion?).&amp;nbsp; &lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;/div&gt;&lt;div class="MsoNormal" style="font-family: Verdana,sans-serif;"&gt;&lt;span style="font-size: small; line-height: 115%;"&gt;So does this leave us with not only a just-maybe protein but also a lincRNA?&amp;nbsp; Well, the other somewhat stretched conclusions from the paper (a nicotine addiction GWAS SNP, AD up-regulation and brain pathogenesis) could be just be related to&amp;nbsp; the transcript.&amp;nbsp;&amp;nbsp; What next ?&amp;nbsp; Well&amp;nbsp; I’ll be keeping an eye on this Swiss-Prot entry just to see what happens ……&lt;/span&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2155351992730855318-7420493131066909308?l=cdsouthan.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://cdsouthan.blogspot.com/2011/06/so-is-it-or-isnt-it-protein.html</link><author>noreply@blogger.com (Chris Southan)</author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://2.bp.blogspot.com/-bmP7vxjfr0I/TgJe3yunjrI/AAAAAAAAAJo/4JIEqv6k_TQ/s72-c/Fig+5.png" height="72" width="72" /><thr:total>0</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-2155351992730855318.post-8489457944246827513</guid><pubDate>Sun, 15 Aug 2010 23:48:00 +0000</pubDate><atom:updated>2011-06-22T16:55:23.665-07:00</atom:updated><category domain="http://www.blogger.com/atom/ns#">chem</category><title>Would you like thIs drug prescription regular, Sir - or “Super-Deutero ?”</title><description>It can be difficult to retro-fit a readable narrative&amp;nbsp;to a&amp;nbsp;story but this started when, as part of the day job,&amp;nbsp; I was following a few well established drugs through some internal applications I was documenting last year. This included&amp;nbsp;following the public outlinks for synonym and cross-mapping checks.&amp;nbsp; Unsurprisingly, considering this was AZ Mölndal,&amp;nbsp;&amp;nbsp;one of these was Crestor.&amp;nbsp; Popping the name into PubChem hits the hemi-calcium salt as &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=6436658&amp;amp;loc=ec_rcs"&gt;CID 6436658&lt;/a&gt;&amp;nbsp; (i.e. with an Mr of 101).&amp;nbsp; You can split the mix and get the monomer as&amp;nbsp;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=6439133&amp;amp;loc=ec_rcs"&gt;CID 6439133&lt;/a&gt;&amp;nbsp; with an Mr of 481.5.&amp;nbsp; Next step is the usual check for CID tautomeric multiplexing by&amp;nbsp;clicking on &amp;nbsp;&lt;a href="http://www.ncbi.nlm.nih.gov/sites/entrez?db=pccompound&amp;amp;cmd=Link&amp;amp;LinkName=pccompound_pccompound_sameconnectivity_pulldown&amp;amp;from_uid=6439133"&gt;CID 6439133 same connectivity&lt;/a&gt;&amp;nbsp;&amp;nbsp;and, voila - we see &amp;nbsp;15 CIDs&amp;nbsp; - uh?&amp;nbsp; Now I’ll show a line of these below and point out that the green squares producing the pretty Christmas tree effect are in fact deuterons.&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/_ShT2TREjNbI/TIlz9B4HEcI/AAAAAAAAAJA/4dpcIL1S6CA/s1600/Capture.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="113" src="http://3.bp.blogspot.com/_ShT2TREjNbI/TIlz9B4HEcI/AAAAAAAAAJA/4dpcIL1S6CA/s640/Capture.JPG" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;
From 15 CIDs 6 are tautomers and 7 deuterated derivatives from 484 up to 509 Mr.&amp;nbsp; There was another characteristic of these 7, namely that they all had the doublet of&amp;nbsp; DiscoveryGate and Thomson Pharma as the only submitters.&amp;nbsp; Now this strongly suggests patent extraction (even if you don’t pay any subscriptions!).&amp;nbsp; At the time I just guessed these were for metabolism studies and thought nothing more about it.&amp;nbsp; However, over the following months I kept coming across this same pattern for other marketed drugs including for the previous posting&amp;nbsp;&lt;a href="http://cdsouthan.blogspot.com/2010/05/2009-fda-approvals-in-pubchem.html"&gt;2009 FDA approvals&lt;/a&gt; on 2009 launches.&amp;nbsp; Eventually I tried a google pop “deuterated drugs” and Bingo!&amp;nbsp; all was revealed in this posting from &lt;a href="http://pipeline.corante.com/archives/2009/05/28/deuterated_drugs_the_pto_says_ok_so_far.php"&gt;In the Pipeline&lt;/a&gt;.&amp;nbsp; I encourage you to read this but I will include this quote from one of the comments,&amp;nbsp; “most of these deuterated compound applications stem from 3 companies, Auspex, Concert and Protia. Protia's patents appear to be a blunderbuss approach, with a mass of US filings 237 published to date but only 11 PCT applications. None of these provide exemplification or any biological description. Concert has 39 PCTs and 26 US applications published, Auspex 57 PCTS and 39 US applications”.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
So there you have it.&amp;nbsp; I realise this is not news of itself, simply that I was not aware of it.&amp;nbsp; However, what might be less well known is the causal chain that I sleuthed down backwards from the PubChem entries.&amp;nbsp; It&amp;nbsp; looks like this:&lt;br /&gt;
&lt;br /&gt;
(I)&amp;nbsp; The cluster of companies above have filed 100,s of patents claiming deueterated derivatives of established drugs, this seems to be largely since 2008. &lt;br /&gt;
&lt;br /&gt;
(II)&amp;nbsp; The patent offices do what the assignees pay for&amp;nbsp;by publishing the&amp;nbsp;applications and eventually examining them.&lt;br /&gt;
&lt;br /&gt;
(III) In the meantime Derwent World Patent Index do what they are supposed to after publication by extracting the exemplified structures, including specifying the deuterons.&lt;br /&gt;
&lt;br /&gt;
(IV)&amp;nbsp; Next in the pipe Thomson do what they are supposed to do and feed the Derwent structures through to their Thomson Pharma database product and subsequently&amp;nbsp;deposit them into PubChem as SIDs.&lt;br /&gt;
&lt;br /&gt;
(V)&amp;nbsp; As the penutimate&amp;nbsp; link in the chain&amp;nbsp;the PubChem chemistry rules also do exactly what they are supposed to and classify each of&amp;nbsp;the SID structures as a distinct CID by Mr and&amp;nbsp; the different deuteron position that produce different isomeric SMILEs and InChIs. &lt;br /&gt;
&lt;br /&gt;
(VI) As the last step DiscoveryGate do a cross-check that establishes where they have the identical structure &lt;span lang="EN-US"&gt;(presumably via the same patent extraction feed) and piggy-back on the intial Thomson entry by subsequently dropping in their own SIDs for these CIDs &lt;br /&gt;
&lt;br /&gt;
You can verify the scale of operation, not only at the first stage by inspecting the patent applications from &lt;a href="http://www.faqs.org/patents/asn/15058"&gt;Protia&lt;/a&gt;&amp;nbsp;&amp;nbsp; for example but also the end of the pipe by going to the&amp;nbsp;&lt;a href="http://en.wikipedia.org/wiki/List_of_bestselling_drugs"&gt;Wikipedia best selling drugs&lt;/a&gt;&amp;nbsp; and checking the PubChem links.&amp;nbsp; Top of the list, lipitor &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=60823"&gt;CID 60823&lt;/a&gt;, shows the same pattern, but even larger than Crestor (Related Compounds: Same, Connectivity: 38 Links, Same, Stereochemistry: 13 Links Same, Isotopes: 6 Links)&amp;nbsp; with no less than 32 deuterated CIDs &lt;a href="http://www.ncbi.nlm.nih.gov/sites/entrez?db=pccompound&amp;amp;cmd=Link&amp;amp;LinkName=pccompound_pccompound_sameconnectivity_pulldown&amp;amp;from_uid=60823"&gt;(CID 60823 same connectivity)&amp;nbsp;&lt;/a&gt; with the few I checked also showing the diagnostic of&amp;nbsp; a DiscoveryGate /Thomson Pharma submission pair.&amp;nbsp; You can walk down the list and see the same thing for all the best-sellers.&lt;br /&gt;
&lt;br /&gt;
I have nothing to add on the subject of&amp;nbsp; the opportunistic claiming of deuterated drugs because that is well covered by the In the Pipeline blog posts and comments from other sources including a 2009 &lt;a href="http://www.nature.com/news/2009/090316/full/458269a.html"&gt;Nature News&lt;/a&gt; mention&amp;nbsp;&amp;nbsp;and a more recent journal article &lt;a href="http://www.ncbi.nlm.nih.gov/pubmed/20697607"&gt;PMID 20697607. &lt;/a&gt;&amp;nbsp;&amp;nbsp; However, the chain of events I have outlined had an unintended consequence in that a substantial proportion &amp;nbsp;of&amp;nbsp; the “crown jewels” of drug discovery (and possibly more to come depending on the eventual&amp;nbsp;statistics&amp;nbsp;of granting), namely the most popular approved medicines, now have a slew of extra duterated PubChem SIDs and CIDs.&amp;nbsp; So who cares ?&amp;nbsp;&amp;nbsp; Well I do – a bit – in that I concur with the widely held opinion that databases should maintain quality, not only via cogent chemical rules but also curation where possible.&amp;nbsp; It is certainly not PubChem's fault that what we have here is contamination (in the cheminformatic sense) by prophetic structures because few of these deutorated drugs are likely to have&amp;nbsp;actually&amp;nbsp;been synthesised with the defined stochiometric occupancy implied by their formal representaion in PubChem, let alone have any test data.&amp;nbsp; While I haven’t got the time to prove it&amp;nbsp;I also suspect this form of  proliferation can make the interpretation of structure searching and clustering results within the immediate analogue space of these drugs more difficult.&lt;br /&gt;
&lt;br /&gt;
Feel free to offer any via comments here or on your own blogs but I ran through the obvious thought experiments. Purging isotopes is certainly not an option because of the many authentic submissions with data.&amp;nbsp; Flagging all&amp;nbsp; patent-derived SIDs and/or patent-only CIDs in PubChem ?&amp;nbsp; This would be very useful in its own right but does not actually solve the problem (not to mention being quite something to expect from the two major submitters concerned). You would also certainly not want to use a an Entrez filter to remove all patent-derived when it’s just the deuterated ones that mess things up (although the opportunistic drug mixture claims coming in via the same route don’t help).&amp;nbsp; Put a tag on all isotopically labeled  so they could be toggled out of searches ?&amp;nbsp; While ChemSpider already has this option (Disregard Isotopically Labeled Structures) it seems not to be picking up these deuterated drugs for the small number I checked but I suspect this is an updating issue for the Thomson entries imported from PubChem.&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;span lang="EN-US"&gt;&lt;b&gt;Update&lt;/b&gt;:&amp;nbsp; Having the good fortune to be invited to&amp;nbsp; BioIT World in Hannover (Oct 2010)&amp;nbsp; by ML I was able to use the story above as the basis of a presentation&amp;nbsp; "The Unforseen Consequences of Opportunistic Deuterated Drug Claims,&amp;nbsp; Patent Extraction Feeds&amp;nbsp; and the PubChem Chemistry Rules".&amp;nbsp; You can pick these up at &lt;a href="http://www.slideshare.net/cdsouthan/southan-bio-itoct2010"&gt;Slideshare&lt;/a&gt;&amp;nbsp;&amp;nbsp; I found a few tricks to filter and sort these compounds (slides 11 and 12)&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2155351992730855318-8489457944246827513?l=cdsouthan.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://cdsouthan.blogspot.com/2010/09/would-you-like-regular-drug-for-that.html</link><author>noreply@blogger.com (Chris Southan)</author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://3.bp.blogspot.com/_ShT2TREjNbI/TIlz9B4HEcI/AAAAAAAAAJA/4dpcIL1S6CA/s72-c/Capture.JPG" height="72" width="72" /><thr:total>0</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-2155351992730855318.post-2543866823883037768</guid><pubDate>Tue, 20 Jul 2010 21:57:00 +0000</pubDate><atom:updated>2010-08-20T16:04:54.637-07:00</atom:updated><category domain="http://www.blogger.com/atom/ns#">bio</category><title>Ferreting out those therapeutic antibody sequences</title><description>Since therapeutic antibodies are the new pharma rock and roll (not to mention pipeline stuffing) I decided to see what I could find from the 2009 approvals in the same listings that supplied the small-molecule drugs in the previous post. In particular I wanted to track down the exact protein sequences. Why ? in the first instance simply because this was a challenge.&amp;nbsp; There are also practical utilities (that I am unlikely to have any involment with just now) of binding site modeling&amp;nbsp; and selecting proteotypic peptides for mass-spec quantitation.&amp;nbsp;&amp;nbsp; I’m no mab expert but I do know its one area where novel sequence composition-of-matter patent claims are still important, and thus are likely to be in extracted listings somwhere.&amp;nbsp; I also discern that some mabs (but why not all?) have full sequences in the excellent WHO INN PDF lists (wouldn’t it be great if the WHO made a database of those including the small-molecule PubChem xrefs?).&amp;nbsp; It’s been some time since I did occasional patent sequence cross-checking&amp;nbsp; as part of what was then target-360 bioinformatic checklisting (and even longer since I was &lt;a href="http://www.cdsouthan.info/CVpatspub.htm"&gt;filing sequences&lt;/a&gt; as co-inventor). I had access to GENESEQ then anyway, so this was an opportunity to see how far I could get in a short space of time with the now much-expanded public resources. &lt;br /&gt;
&lt;br /&gt;
GSK’s&amp;nbsp; &lt;a href="http://en.wikipedia.org/wiki/Ofatumumab"&gt;Ofatumumab&lt;/a&gt; provides a good exemplar and data mining path you could follow if you are so inclined.&amp;nbsp; It's an IgG l,κ high-affinity antibody targeted at the CD20 (a.k.a&amp;nbsp; &lt;a href="http://www.genenames.org/data/hgnc_data.php?hgnc_id=7315"&gt;MS4A1&lt;/a&gt; ) This also has a code name HuMax-CD20™.&amp;nbsp; The logical first port&amp;nbsp; of call after Wikipedia was&amp;nbsp; the &lt;a href="http://imgt.cines.fr/mAb-DB/query"&gt;International Immunogenetics Information System&lt;/a&gt;&amp;nbsp; with mab entries from INNs as primary source (I had the pleasure of meeting Prof. M-PL at an ELIXIR commitee meeting so its nice to be able to put a face to the db).&amp;nbsp;&amp;nbsp; Sure enough Ofatumumab was there with its formula C6480H10022N1742O2020S44 and a CAS number 679818-59-8&amp;nbsp; but no sequence. Now, presenting the formular implies someone knows the full sequence but is not telling ....&amp;nbsp; &lt;br /&gt;
&lt;br /&gt;
Trying to get hold of a novel sequence by direct (as opposed to having doc nos.) patent digging can be ardous and fruitless.&amp;nbsp; In this case however, a simple query on &lt;a href="http://gb.espacenet.com/search97cgi/s97_cgi.exe?Action=FormGen&amp;amp;Template=gb/en/advanced.hts"&gt;espacenet.com&lt;/a&gt; (CD20 title AND Genmab applicant)&amp;nbsp; gave just four Worldwide db hits from Teeling et al.&amp;nbsp; as starting points.&amp;nbsp; For reasons not entirely clear, to me at any rate,&amp;nbsp; there was also Genentech patent&amp;nbsp; WO2009086072 that included “….see also the sequences for the variable regions of ofatumumab (HuMax-CD20(TM) (2F2)) that are set forth as SEQ ID NOS: 1-4 of US 2004/0167319……the amino acid sequences of the second heavy-chain variable region and the light-chain variable region are also depicted in Fig. 53 of US 2004/0167319 with their designated CDR regions…...”&amp;nbsp; So, bingo, this was easy to find in the 319 PDF and nicely laid out below.&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/_ShT2TREjNbI/TG78SZcEU-I/AAAAAAAAAIw/yOLLznAuR64/s1600/WO2004035607_fig53.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="296" src="http://2.bp.blogspot.com/_ShT2TREjNbI/TG78SZcEU-I/AAAAAAAAAIw/yOLLznAuR64/s400/WO2004035607_fig53.JPG" width="400" /&gt; &lt;/a&gt;&lt;/div&gt;&lt;br /&gt;
If you didn’t know about the&amp;nbsp; &lt;a href="http://en.wikipedia.org/wiki/Complementarity_determining_region"&gt;CDR's&lt;/a&gt; as indicated above, they are the meat in the IgG sandwich not only in determining the antigen binding specificity but also for engineering novel sequence that can be claimed in a patent.&amp;nbsp; After a few minutes to type in the sequences they were ready to BLAST against the patent sequence databases for cross-checking.&amp;nbsp; Having worked during 2008 in the office door next to where LW and HM were setting up the new EBI non-redundant patent sequence databases&amp;nbsp; ( &lt;a href="http://www.ncbi.nlm.nih.gov/pubmed/19884134?dopt=Abstract"&gt;PMID 19884134&lt;/a&gt;) these were my first port of call.&amp;nbsp; This provided circular confirmation of my typing accuracy and the patent links by 100% match of the 142 residue 2F2 VH to &lt;a href="http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?[uniparc-AccNumber:UPI0000F00064]+-e"&gt;UPI0000F00064&lt;/a&gt;&amp;nbsp; and the127 residue 2F2 VL to &lt;a href="http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?[uniparc-AccNumber:UPI0000F00065]+-e"&gt;UPI0000F00065&lt;/a&gt;.&amp;nbsp;&amp;nbsp; &lt;br /&gt;
&lt;br /&gt;
I havn’t got time to explain about UniParc,&amp;nbsp;&lt;a href="http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-page+LibInfo+-id+1i9Kb1bk77x+-lib+UNIPARC"&gt;UniParc&lt;/a&gt;&amp;nbsp; (maybe another post) but at 23 million it’s the mother-of-all protein datbases and it clusters patent proteins in with everything else.&amp;nbsp; It also infers these two clusters are “novel” as VH clusters only from 3 patents and VL only from 9,&amp;nbsp; all from EPO, JPO and KPO (Korean Patent Office), but not USPTO (one of the VLs may a false positive, nominally from Kirin Pharma but with broken doc links).&lt;br /&gt;
&lt;br /&gt;
OK, next step was a cross-check against NCBI patent proteins via BLAST.&amp;nbsp; Surprisingly there were no 100% hits – why ?&amp;nbsp; I guess because USPTO&amp;nbsp; has recently begun to rely on&amp;nbsp; &lt;a href="http://www.patentlens.net/"&gt;PatentLens&lt;/a&gt;&amp;nbsp; to extract sequences.&amp;nbsp; This is an interesting operation that extracts full text and include a nifty NCBI database reciprocal coupling to extracted sequences, but for this application family they only picked up a few DNA listings.&amp;nbsp; The EPO process, on the other hand,&amp;nbsp; had fed in no less than 43 DNA and protein sequences from the Genentech&amp;nbsp; &lt;a href="http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?[patabs-ID:WO2009086072]+-e"&gt;WO2009086072&lt;/a&gt;&amp;nbsp; including 2F2 VL (as sequence 2) and VH&amp;nbsp; (sequence 4).&amp;nbsp; I could have searched the patent no. in the first place to save me the typing, but, although the patent is in both EBI patent abstracts and CiteExplore,&amp;nbsp; this won’t get you a link-out to the sequences.&amp;nbsp; You have to go in via SRS in patent proteins with the patent number&amp;nbsp; (it took many minutes to rumble that one).&amp;nbsp;&amp;nbsp; &lt;br /&gt;
&lt;br /&gt;
So, the last cross-check was BLAST against the “normal” protein dbs (i.e. UniProt and nr).&amp;nbsp; This brought up both the expected and unexpected.&amp;nbsp; The unexpected was a 100% match for an Ofatumumab crystal structure&amp;nbsp; PDB: 3GIZH,&amp;nbsp; the expected was only 92% match anywhere else (i.e. being novel).&amp;nbsp; So, you might ask first,&amp;nbsp; why wasn’t the 100% PDB sequence match in the UniParc cluster ? good question - but it’s the overhang thing - because UniParc is strictly non-redundant (checksum based) so different lengths cluster out even if they are identical over their in-common length.&amp;nbsp; You might then ask, so the sequence was public after all ?&amp;nbsp; well yes (it actually didn’t occur to me to check PDB for commercial mabs) but deposited in June 2009 many years after the priority dates of the Genmab patents and is still not the full chain sequences.&amp;nbsp; Retrospectively I was able come with more shortcuts.&amp;nbsp; The google pop “ofatumumab patent”&amp;nbsp; came up with a full article about Ofatumumab from the journal MAbs.&amp;nbsp;&lt;a href="http://www.ncbi.nlm.nih.gov/pubmed/20068404"&gt;PMID 20068404&lt;/a&gt;.&amp;nbsp;&amp;nbsp; As a 2009 open-access text it contained the following. “Four international patents have already been issued on anti-CD20 mAbs. Three, anti-CD20 mAb (WO-2004035607), use of anti-CD20 mAbs for the potential treatment and prevention of B-cell related disease (WO-2005103081) and chronic obstructive pulmonary disease to reduce dyspnea (WO-2008003319), are Genmab patents. The fourth, WO-2006076651, is a Genentech patent claiming use of anti-CD20 at a lower dose in autoimmune disease, such as RA and MS. The earliest expiration date of these patents is 2023”.&amp;nbsp; The irony here is if I had found this article first it would have provided almost a double shortcut to the sequences (but you may have to unravel the WO’s back to national offices for sequence links). &lt;br /&gt;
&lt;br /&gt;
Any more surprises, well,&amp;nbsp; ofatumumab has no less than 3 substance IDs in PubChem.&amp;nbsp; These can’t be linked to CIDs by small-molecule chemistry merging rules so what use are they I thought ?&amp;nbsp; Well, two of them are useful term-links to KEGG via&amp;nbsp;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?sid=96025994&amp;amp;loc=es_rss"&gt;SID 96025994&lt;/a&gt; in the first instance and a to a whole swathe of NIH and FDA resources from ChemID plus via&amp;nbsp;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?sid=85152690&amp;amp;loc=es_rss"&gt;SID 85152690&lt;/a&gt;&amp;nbsp; (including all the clinical.trials.org links) &lt;br /&gt;
&lt;br /&gt;
The third substance entry&amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?sid=58874664&amp;amp;loc=es_rss"&gt;SID 58874664&lt;/a&gt; was a nice quirk.&amp;nbsp; It’s a zinc ion coming through from the ofatumumab crystal structure via &lt;a href="http://www.ncbi.nlm.nih.gov/Structure/mmdb/mmdbsrv.cgi?uid=72916"&gt;MMDB 72916&lt;/a&gt;.&amp;nbsp; Unsurprisingly, this joins 5,848 other zincs from MMDB out of 6005 zinc SIDs in &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=32051&amp;amp;loc=ec_rcs"&gt;CID 32051&lt;/a&gt;.&amp;nbsp; But don’t overlook the utility here of the name-to-PDB link that would have immediately alerted you to the existence of the structure (i.e. that you were not looking for) that would have been nested in a very long PubMed list.&amp;nbsp; There are many other SID mab name links that link, for example&amp;nbsp;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?sid=46507042&amp;amp;loc=es_rss"&gt;SID 46507042&lt;/a&gt;&amp;nbsp; to a DrugBan&lt;span style="font-size: small;"&gt;k entry for &lt;/span&gt;&lt;span style="font-size: small;"&gt;cetuximab with the fu&lt;/span&gt;ll sequence in this case (but DrugBank has not yet picked up ofatumumab) &lt;br /&gt;
&lt;br /&gt;
There are more quirks that interest me in this mab sequence business but the last one I’ll touch on here is the use of the protein molecular formular that seems to be oddly specific in being used as part of mab designations.&amp;nbsp; I’m guessing this usage is a form of checksum adopted by WHO for biological INNs.&amp;nbsp; Going to DrugBank for Cetuximab&amp;nbsp;&amp;nbsp;&lt;a href="http://www.drugbank.ca/cgi-bin/getCard.cgi?CARD=DB00002.txt"&gt;DB00002&lt;/a&gt;&amp;nbsp; gives C6484H10042N1732O2023S36.&amp;nbsp; I found sequence-to-formular conversion tool at&amp;nbsp;&lt;a href="https://www.genscript.com/ssl-bin/site2/peptide_calculation.cgi"&gt;peptide propery calculator&lt;/a&gt; website.&amp;nbsp; Given the symmetry of mabs the chains should be ½ the total. Popping in the DrugBank sequences I got C2222H3432N586O678S14 for the heavy and C1020H1592N280O335S4 for the light.&amp;nbsp; Low and behold we get 6484 for the carbons and 36 for the sulfurs so – it worked.&amp;nbsp; OK as we have these mab sequences with no effort lets square the circle by checking the patent sequence datbases again. This time I hit UniParc direct and got&amp;nbsp;&lt;a href="http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-e+[uniparc-id:UPI000187A9E0]%7C[uniparc-acc:UPI000187A9E0]+-noSession"&gt;UPI000187A9E0&lt;/a&gt;&amp;nbsp; and &lt;a href="http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-e+[uniparc-id:UPI000050AA80]%7C[uniparc-acc:UPI000050AA80]+-noSession"&gt;UPI000050AA80&lt;/a&gt;.&amp;nbsp; These are 100% zips (but the BLAST % id score is reduced because the low-complexity filter is default)&amp;nbsp; to the patent clusters (but from Pikamab not ImClone ?) and this time a PDB light chain is also in the cluster.&amp;nbsp; PubChem gives no less than 12 SID matches to cetuximab but 6 of them are different sugars coming through from that PDB 1YY9.&amp;nbsp; &lt;br /&gt;
&lt;br /&gt;
So what if you want to do any of this ?&amp;nbsp; just get stuck in but remember that hitting google and NCBI Entrez thouroghly and iteratively first&amp;nbsp; might save you time before diving in to the sources above. So what about that ofatumumab CAS no.?&amp;nbsp; that would be telling wouldn't it ....but if you do have SciFinder acess it&amp;nbsp; also turns out to be a bit of a puzzle….&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2155351992730855318-2543866823883037768?l=cdsouthan.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://cdsouthan.blogspot.com/2010/08/ferreting-out-those-therapeutic.html</link><author>noreply@blogger.com (Chris Southan)</author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://2.bp.blogspot.com/_ShT2TREjNbI/TG78SZcEU-I/AAAAAAAAAIw/yOLLznAuR64/s72-c/WO2004035607_fig53.JPG" height="72" width="72" /><thr:total>0</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-2155351992730855318.post-6673480746695867933</guid><pubDate>Sun, 27 Jun 2010 16:37:00 +0000</pubDate><atom:updated>2011-06-22T16:54:29.859-07:00</atom:updated><category domain="http://www.blogger.com/atom/ns#">chem</category><title>2009  FDA approvals:  making the PubChem links</title><description>&lt;link href="file:///C:%5CUsers%5CCHRIST%7E1%5CAppData%5CLocal%5CTemp%5Cmsohtml1%5C01%5Cclip_filelist.xml" rel="File-List"&gt;&lt;/link&gt;&lt;style&gt;
&lt;!--
 /* Font Definitions */
 @font-face
	{font-family:&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;SimSun&lt;/span&gt;;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;panose&lt;/span&gt;-1:2 1 6 0 3 1 1 1 1 1;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-font-alt:宋体;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-font-&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;charset&lt;/span&gt;:134;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-generic-font-family:auto;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-font-pitch:variable;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-font-signature:3 680460288 22 0 262145 0;}
@font-face
	{font-family:"\@&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;SimSun&lt;/span&gt;";
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;panose&lt;/span&gt;-1:2 1 6 0 3 1 1 1 1 1;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-font-&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;charset&lt;/span&gt;:134;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-generic-font-family:auto;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-font-pitch:variable;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-font-signature:3 680460288 22 0 262145 0;}
 /* Style Definitions */
 p.&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;MsoNormal&lt;/span&gt;, &lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;li&lt;/span&gt;.&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;MsoNormal&lt;/span&gt;, div.&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;MsoNormal&lt;/span&gt;
	{&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-style-parent:"";
	margin:0cm;
	margin-bottom:.0001pt;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-pagination:widow-orphan;
	font-size:12.0pt;
	font-family:"Verdana";
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;fareast&lt;/span&gt;-font-family:&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;SimSun&lt;/span&gt;;}
@page Section1
	{size:612.0pt 792.0pt;
	margin:72.0pt 90.0pt 72.0pt 90.0pt;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-header-margin:36.0pt;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-footer-margin:36.0pt;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-paper-source:0;}
div.Section1
	{page:Section1;}
--&gt;
&lt;/style&gt;  &lt;br /&gt;
&lt;div class="MsoNormal" style="font-family: inherit;"&gt;&lt;div style="font-family: Times,&amp;quot;Times New Roman&amp;quot;,serif;"&gt;&lt;span lang="EN-GB" style="font-size: large;"&gt;The interest in this listing frome a drug R&amp;amp;D perspective is self-evident. What you can find here complements the &lt;a href="http://www.nature.com/nrd/journal/v9/n2/full/nrd3101.html"&gt;&lt;i&gt;Nature Reviews Drug Discovery&lt;/i&gt;&lt;/a&gt; article by providing&amp;nbsp; &lt;a href="http://pubchem.ncbi.nlm.nih.gov/"&gt;PubChem&lt;/a&gt; links.&amp;nbsp; I won't make any comments on the small-molecule&amp;nbsp; “class of 2009” because this is well covered already, for example on the &lt;a href="http://chembl.blogspot.com/search/label/2009%20New%20Drugs"&gt;ChEMBL blog&lt;/a&gt;, Google,&amp;nbsp; Wikepedia, &lt;a href="http://dailymed.nlm.nih.gov/dailymed/about.cfm"&gt;DailyMed.&lt;/a&gt; or the&amp;nbsp;&lt;a href="http://www.pharmaprojects.com/therapy_analysis/annual-review-2010-NAS.htm"&gt;Pharma (Projects)  R&amp;amp;D Annual Review&lt;/a&gt;. You can also walk thought the primary source listing yourself at &lt;/span&gt;&lt;span lang="EN-GB" style="font-size: large;"&gt;&lt;a href="http://www.accessdata.fda.gov/scripts/cder/drugsatfda/index.cfm?fuseaction=Reports.ReportsMenu"&gt;Drugs-at-FDA&lt;/a&gt; &lt;/span&gt;&lt;span lang="EN-GB" style="font-size: large;"&gt;by selecting &lt;/span&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;"Original New Drug Approvals (NDAs and BLAs) by Month"&lt;/b&gt;&lt;/span&gt;&lt;span lang="EN-GB" style="font-size: large;"&gt; and just stepping through the dates.&amp;nbsp;&lt;a href="http://www.accessdata.fda.gov/scripts/cder/drugsatfda/index.cfm?fuseaction=Reports.ReportsMenu"&gt; &lt;/a&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="font-family: Times,&amp;quot;Times New Roman&amp;quot;,serif;"&gt;&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;/div&gt;&lt;div style="font-family: Times,&amp;quot;Times New Roman&amp;quot;,serif;"&gt;&lt;span lang="EN-GB" style="font-size: large;"&gt;What I will do in this post is to pick out a few observations and add some context&amp;nbsp; from the&amp;nbsp; PubChem links, some of which might not only be of interest but also non-obvious.&amp;nbsp;&amp;nbsp; Those of you familliar with PubChem can just click on through and see where you get to but I’ll expand on this after you take a look at the table. I have included those primary molecular target gene/protein IDs that I could find support for via the links but not just via the &lt;i&gt;Nature&lt;/i&gt; RDD list.&lt;/span&gt;&lt;/div&gt;&lt;link href="file:///C:%5CUsers%5CCHRIST%7E1%5CAppData%5CLocal%5CTemp%5Cmsohtml1%5C01%5Cclip_filelist.xml" rel="File-List"&gt;&lt;/link&gt;&lt;span style="font-size: large;"&gt;&lt;o:smarttagtype name="stockticker" namespaceuri="urn:schemas-microsoft-com:office:smarttags"&gt;&lt;/o:smarttagtype&gt;&lt;o:smarttagtype name="place" namespaceuri="urn:schemas-microsoft-com:office:smarttags"&gt;&lt;/o:smarttagtype&gt;&lt;/span&gt;&lt;style&gt;
&lt;!--
 /* Font Definitions */
 @font-face
	{font-family:&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;SimSun&lt;/span&gt;;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;panose&lt;/span&gt;-1:2 1 6 0 3 1 1 1 1 1;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-font-alt:宋体;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-font-&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;charset&lt;/span&gt;:134;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-generic-font-family:auto;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-font-pitch:variable;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-font-signature:3 680460288 22 0 262145 0;}
@font-face
	{font-family:"\@&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;SimSun&lt;/span&gt;";
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;panose&lt;/span&gt;-1:2 1 6 0 3 1 1 1 1 1;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-font-&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;charset&lt;/span&gt;:134;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-generic-font-family:auto;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-font-pitch:variable;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-font-signature:3 680460288 22 0 262145 0;}
 /* Style Definitions */
 p.&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;MsoNormal&lt;/span&gt;, &lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;li&lt;/span&gt;.&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;MsoNormal&lt;/span&gt;, div.&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;MsoNormal&lt;/span&gt;
	{&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-style-parent:"";
	margin:0cm;
	margin-bottom:.0001pt;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-pagination:widow-orphan;
	font-size:12.0pt;
	font-family:"Times New Roman";
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;fareast&lt;/span&gt;-font-family:&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;SimSun&lt;/span&gt;;}
a:link, span.&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;MsoHyperlink&lt;/span&gt;
	{color:blue;
	text-decoration:underline;
	text-underline:single;}
a:visited, span.&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;MsoHyperlinkFollowed&lt;/span&gt;
	{color:purple;
	text-decoration:underline;
	text-underline:single;}
@page Section1
	{size:612.0pt 792.0pt;
	margin:72.0pt 90.0pt 72.0pt 90.0pt;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-header-margin:36.0pt;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-footer-margin:36.0pt;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-paper-source:0;}
div.Section1
	{page:Section1;}
--&gt;
&lt;/style&gt;  &lt;br /&gt;
&lt;div class="MsoNormal"&gt;&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;/div&gt;&lt;table border="1" cellpadding="0" cellspacing="0" class="MsoTableColorful1" style="border-collapse: collapse; border: medium none;"&gt;&lt;tbody&gt;
&lt;tr&gt;   &lt;td style="background: none repeat scroll 0% 0% black; border-color: teal -moz-use-text-color aqua teal; border-style: solid none solid solid; border-width: 1.5pt medium 1pt 1.5pt; padding: 0cm 5.4pt; width: 212.4pt;" valign="top" width="283"&gt;&lt;div align="center" class="MsoNormal" style="text-align: center;"&gt;&lt;span style="font-size: large;"&gt;&lt;st1:place&gt;&lt;b&gt;&lt;i&gt;&lt;span lang="EN-GB" style="color: white;"&gt;INN&lt;/span&gt;&lt;/i&gt;&lt;/b&gt;&lt;/st1:place&gt;&lt;b&gt;&lt;i&gt;&lt;span lang="EN-GB" style="color: white;"&gt; (trade name)&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/i&gt;&lt;/b&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;   &lt;td style="background: none repeat scroll 0% 0% black; border-color: teal -moz-use-text-color aqua; border-style: solid none; border-width: 1.5pt medium 1pt; padding: 0cm 5.4pt; width: 180pt;" valign="top" width="240"&gt;&lt;div align="center" class="MsoNormal" style="text-align: center;"&gt;&lt;span style="font-size: large;"&gt;&lt;st1:stockticker&gt;&lt;b&gt;&lt;i&gt;&lt;span lang="EN-GB" style="color: white;"&gt;CID&lt;/span&gt;&lt;/i&gt;&lt;/b&gt;&lt;/st1:stockticker&gt;&lt;b&gt;&lt;i&gt;&lt;span lang="EN-GB" style="color: white;"&gt; (estimated total   CIDs)&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/i&gt;&lt;/b&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;   &lt;td style="background: none repeat scroll 0% 0% black; border-color: teal teal aqua -moz-use-text-color; border-style: solid solid solid none; border-width: 1.5pt 1.5pt 1pt medium; padding: 0cm 5.4pt; width: 252pt;" valign="top" width="336"&gt;&lt;div align="center" class="MsoNormal" style="text-align: center;"&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;&lt;i&gt;&lt;span lang="EN-GB" style="color: white;"&gt;Primary targets&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/i&gt;&lt;/b&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;  &lt;/tr&gt;
&lt;tr&gt;   &lt;td style="background: none repeat scroll 0% 0% rgb(53, 28, 117); border-color: -moz-use-text-color -moz-use-text-color aqua teal; border-style: none none solid solid; border-width: medium medium 1pt 1.5pt; padding: 0cm 5.4pt; width: 212.4pt;" valign="top" width="283"&gt;&lt;div align="center" class="MsoNormal" style="text-align: center;"&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;&lt;i&gt;&lt;span lang="EN-GB" style="color: white;"&gt;milnacipran (Savella)&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/i&gt;&lt;/b&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;   &lt;td style="background: none repeat scroll 0% 0% teal; border-color: -moz-use-text-color -moz-use-text-color aqua; border-style: none none solid; border-width: medium medium 1pt; padding: 0cm 5.4pt; width: 180pt;" valign="top" width="240"&gt;&lt;div align="center" class="MsoNormal" style="text-align: center;"&gt;&lt;span style="color: white; font-size: large;"&gt;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=163701&amp;amp;loc=ec_rcs"&gt;CID:   163701&lt;/a&gt;(6)&lt;/span&gt;&lt;span lang="EN-GB" style="color: white; font-size: large;"&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;   &lt;td style="background: none repeat scroll 0% 0% teal; border-color: -moz-use-text-color teal aqua -moz-use-text-color; border-style: none solid solid none; border-width: medium 1.5pt 1pt medium; padding: 0cm 5.4pt; width: 252pt;" valign="top" width="336"&gt;&lt;div align="center" class="MsoNormal" style="text-align: center;"&gt;&lt;span style="color: white; font-size: large;"&gt;&lt;a href="http://www.genenames.org/data/hgnc_data.php?hgnc_id=HGNC:11048"&gt;SLC6A2&lt;/a&gt;   and &lt;a href="http://www.genenames.org/data/hgnc_data.php?hgnc_id=HGNC:11048"&gt;SLC6A2&lt;/a&gt;&lt;/span&gt;&lt;span lang="EN-GB" style="color: white; font-size: large;"&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;  &lt;/tr&gt;
&lt;tr&gt;   &lt;td style="background: none repeat scroll 0% 0% rgb(53, 28, 117); border-color: -moz-use-text-color -moz-use-text-color aqua teal; border-style: none none solid solid; border-width: medium medium 1pt 1.5pt; padding: 0cm 5.4pt; width: 212.4pt;" valign="top" width="283"&gt;&lt;div align="center" class="MsoNormal" style="text-align: center;"&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;&lt;i&gt;&lt;span lang="EN-GB" style="color: white;"&gt;febuxostat (Uloric)&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/i&gt;&lt;/b&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;   &lt;td style="background: none repeat scroll 0% 0% teal; border-color: -moz-use-text-color -moz-use-text-color aqua; border-style: none none solid; border-width: medium medium 1pt; padding: 0cm 5.4pt; width: 180pt;" valign="top" width="240"&gt;&lt;div align="center" class="MsoNormal" style="text-align: center;"&gt;&lt;span style="color: white; font-size: large;"&gt;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=134018&amp;amp;loc=ec_rcs"&gt;CID:   134018&lt;/a&gt;(1)&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;   &lt;td style="background: none repeat scroll 0% 0% teal; border-color: -moz-use-text-color teal aqua -moz-use-text-color; border-style: none solid solid none; border-width: medium 1.5pt 1pt medium; padding: 0cm 5.4pt; width: 252pt;" valign="top" width="336"&gt;&lt;div align="center" class="MsoNormal" style="text-align: center;"&gt;&lt;span lang="EN-GB" style="color: white; font-size: large;"&gt;&lt;a href="http://www.genenames.org/data/hgnc_data.php?hgnc_id=12805"&gt;XDH&lt;/a&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;  &lt;/tr&gt;
&lt;tr&gt;   &lt;td style="background: none repeat scroll 0% 0% rgb(53, 28, 117); border-color: -moz-use-text-color -moz-use-text-color aqua teal; border-style: none none solid solid; border-width: medium medium 1pt 1.5pt; padding: 0cm 5.4pt; width: 212.4pt;" valign="top" width="283"&gt;&lt;div align="center" class="MsoNormal" style="text-align: center;"&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;&lt;i&gt;&lt;span lang="EN-GB" style="color: white;"&gt;everolimus (Afinitor)&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/i&gt;&lt;/b&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;   &lt;td style="background: none repeat scroll 0% 0% teal; border-color: -moz-use-text-color -moz-use-text-color aqua; border-style: none none solid; border-width: medium medium 1pt; padding: 0cm 5.4pt; width: 180pt;" valign="top" width="240"&gt;&lt;div align="center" class="MsoNormal" style="text-align: center;"&gt;&lt;span style="color: white; font-size: large;"&gt;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=6442177&amp;amp;loc=ec_rcs"&gt;CID:   6442177&lt;/a&gt; (10)&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;   &lt;td style="background: none repeat scroll 0% 0% teal; border-color: -moz-use-text-color teal aqua -moz-use-text-color; border-style: none solid solid none; border-width: medium 1.5pt 1pt medium; padding: 0cm 5.4pt; width: 252pt;" valign="top" width="336"&gt;&lt;div align="center" class="MsoNormal" style="text-align: center;"&gt;&lt;span lang="EN-GB" style="color: white; font-size: large;"&gt;&lt;a href="http://www.genenames.org/data/hgnc_data.php?hgnc_id=3942"&gt;MTOR&lt;/a&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;  &lt;/tr&gt;
&lt;tr&gt;   &lt;td style="background: none repeat scroll 0% 0% rgb(53, 28, 117); border-color: -moz-use-text-color -moz-use-text-color aqua teal; border-style: none none solid solid; border-width: medium medium 1pt 1.5pt; padding: 0cm 5.4pt; width: 212.4pt;" valign="top" width="283"&gt;&lt;div align="center" class="MsoNormal" style="text-align: center;"&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;&lt;i&gt;&lt;span lang="EN-GB" style="color: white;"&gt;artemether + lumefantrine (Coartem)&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/i&gt;&lt;/b&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;   &lt;td style="background: none repeat scroll 0% 0% teal; border-color: -moz-use-text-color -moz-use-text-color aqua; border-style: none none solid; border-width: medium medium 1pt; padding: 0cm 5.4pt; width: 180pt;" valign="top" width="240"&gt;&lt;div align="center" class="MsoNormal" style="text-align: center;"&gt;&lt;span style="color: white; font-size: large;"&gt;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=6450800&amp;amp;loc=ec_rcs"&gt;&lt;span lang="EN-GB"&gt;CID: 6450800&lt;/span&gt;&lt;/a&gt;&lt;/span&gt;&lt;span lang="EN-GB" style="color: white; font-size: large;"&gt; (both)&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div align="center" class="MsoNormal" style="text-align: center;"&gt;&lt;span style="color: white; font-size: large;"&gt;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=6437380&amp;amp;loc=ec_rcs"&gt;&lt;span lang="EN-GB"&gt;CID: 6437380&lt;/span&gt;&lt;/a&gt;&lt;/span&gt;&lt;span lang="EN-GB" style="color: white; font-size: large;"&gt; (lumefantrine) &lt;/span&gt;&lt;span style="color: white; font-size: large;"&gt;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=104888&amp;amp;loc=ec_rcs"&gt;&lt;span lang="EN-GB"&gt;CID: 104888&lt;/span&gt;&lt;/a&gt;&lt;/span&gt;&lt;span lang="EN-GB" style="color: white; font-size: large;"&gt;&amp;nbsp; (artemether)&lt;/span&gt;&lt;span lang="EN-GB" style="color: white; font-size: large;"&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;   &lt;td style="background: none repeat scroll 0% 0% teal; border-color: -moz-use-text-color teal aqua -moz-use-text-color; border-style: none solid solid none; border-width: medium 1.5pt 1pt medium; padding: 0cm 5.4pt; width: 252pt;" valign="top" width="336"&gt;&lt;div align="center" class="MsoNormal" style="text-align: center;"&gt;&lt;span lang="EN-GB" style="color: white; font-size: large;"&gt;unknown&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;  &lt;/tr&gt;
&lt;tr&gt;   &lt;td style="background: none repeat scroll 0% 0% navy; border-color: -moz-use-text-color -moz-use-text-color aqua teal; border-style: none none solid solid; border-width: medium medium 1pt 1.5pt; padding: 0cm 5.4pt; width: 212.4pt;" valign="top" width="283"&gt;&lt;div align="center" class="MsoNormal" style="text-align: center;"&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;&lt;i&gt;&lt;span lang="EN-GB" style="color: white;"&gt;benzyl alcohol   (Ulesfia)&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/i&gt;&lt;/b&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;   &lt;td style="background: none repeat scroll 0% 0% teal; border-color: -moz-use-text-color -moz-use-text-color aqua; border-style: none none solid; border-width: medium medium 1pt; padding: 0cm 5.4pt; width: 180pt;" valign="top" width="240"&gt;&lt;div align="center" class="MsoNormal" style="text-align: center;"&gt;&lt;span style="color: white; font-size: large;"&gt;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=244&amp;amp;loc=ec_rcs"&gt;CID:   244&lt;/a&gt;(3)&lt;/span&gt;&lt;span lang="EN-GB" style="color: white; font-size: large;"&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;   &lt;td style="background: none repeat scroll 0% 0% teal; border-color: -moz-use-text-color teal aqua -moz-use-text-color; border-style: none solid solid none; border-width: medium 1.5pt 1pt medium; padding: 0cm 5.4pt; width: 252pt;" valign="top" width="336"&gt;&lt;div align="center" class="MsoNormal" style="text-align: center;"&gt;&lt;span lang="EN-GB" style="color: white; font-size: large;"&gt;n/a&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;  &lt;/tr&gt;
&lt;tr&gt;   &lt;td style="background: none repeat scroll 0% 0% navy; border-color: -moz-use-text-color -moz-use-text-color aqua teal; border-style: none none solid solid; border-width: medium medium 1pt 1.5pt; padding: 0cm 5.4pt; width: 212.4pt;" valign="top" width="283"&gt;&lt;div align="center" class="MsoNormal" style="text-align: center;"&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;&lt;i&gt;&lt;span lang="EN-GB" style="color: white;"&gt;iloperidone (Fanapt)&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/i&gt;&lt;/b&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;   &lt;td style="background: none repeat scroll 0% 0% teal; border-color: -moz-use-text-color -moz-use-text-color aqua; border-style: none none solid; border-width: medium medium 1pt; padding: 0cm 5.4pt; width: 180pt;" valign="top" width="240"&gt;&lt;div align="center" class="MsoNormal" style="text-align: center;"&gt;&lt;span style="color: white; font-size: large;"&gt;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=71360&amp;amp;loc=ec_rcs"&gt;CID:   71360&lt;/a&gt; (1)&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;   &lt;td style="background: none repeat scroll 0% 0% teal; border-color: -moz-use-text-color teal aqua -moz-use-text-color; border-style: none solid solid none; border-width: medium 1.5pt 1pt medium; padding: 0cm 5.4pt; width: 252pt;" valign="top" width="336"&gt;&lt;div class="MsoNormal"&gt;&lt;span lang="EN-GB" style="color: white; font-size: large;"&gt;Biding data in&amp;nbsp; &lt;/span&gt;&lt;span style="color: white; font-size: large;"&gt;&lt;a href="http://www.ncbi.nlm.nih.gov/pubmed/8997630?dopt=AbstractPlus"&gt;&lt;span lang="EN-GB"&gt;PMID: 8997630&lt;/span&gt;&lt;/a&gt;&lt;/span&gt;&lt;span lang="EN-GB" style="color: white; font-size: large;"&gt; and full panel via   ChEBI at &lt;a href="http://www.ncbi.nlm.nih.gov/pubmed/7707315"&gt;PMID 7707315&lt;/a&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;  &lt;/tr&gt;
&lt;tr&gt;   &lt;td style="background: none repeat scroll 0% 0% navy; border-color: -moz-use-text-color -moz-use-text-color aqua teal; border-style: none none solid solid; border-width: medium medium 1pt 1.5pt; padding: 0cm 5.4pt; width: 212.4pt;" valign="top" width="283"&gt;&lt;div align="center" class="MsoNormal" style="text-align: center;"&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;&lt;i&gt;&lt;span lang="EN-GB" style="color: white;"&gt;tolvaptan (samsca)&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/i&gt;&lt;/b&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;   &lt;td style="background: none repeat scroll 0% 0% teal; border-color: -moz-use-text-color -moz-use-text-color aqua; border-style: none none solid; border-width: medium medium 1pt; padding: 0cm 5.4pt; width: 180pt;" valign="top" width="240"&gt;&lt;div align="center" class="MsoNormal" style="text-align: center;"&gt;&lt;span lang="EN-GB" style="color: white; font-size: large;"&gt;&amp;nbsp;&lt;/span&gt;&lt;span style="color: white; font-size: large;"&gt;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=216237&amp;amp;loc=ec_rcs"&gt;CID:   216237&lt;/a&gt; (3)&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;   &lt;td style="background: none repeat scroll 0% 0% teal; border-color: -moz-use-text-color teal aqua -moz-use-text-color; border-style: none solid solid none; border-width: medium 1.5pt 1pt medium; padding: 0cm 5.4pt; width: 252pt;" valign="top" width="336"&gt;&lt;div align="center" class="MsoNormal" style="text-align: center;"&gt;&lt;span lang="EN-GB" style="color: white; font-size: large;"&gt;&lt;a href="http://www.genenames.org/data/hgnc_data.php?hgnc_id=897"&gt;AVPR2&lt;/a&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;  &lt;/tr&gt;
&lt;tr&gt;   &lt;td style="background: none repeat scroll 0% 0% navy; border-color: -moz-use-text-color -moz-use-text-color aqua teal; border-style: none none solid solid; border-width: medium medium 1pt 1.5pt; padding: 0cm 5.4pt; width: 212.4pt;" valign="top" width="283"&gt;&lt;div align="center" class="MsoNormal" style="text-align: center;"&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;&lt;i&gt;&lt;span lang="EN-GB" style="color: white;"&gt;besifloxacin   (Besivance)&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/i&gt;&lt;/b&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;   &lt;td style="background: none repeat scroll 0% 0% teal; border-color: -moz-use-text-color -moz-use-text-color aqua; border-style: none none solid; border-width: medium medium 1pt; padding: 0cm 5.4pt; width: 180pt;" valign="top" width="240"&gt;&lt;div align="center" class="MsoNormal" style="text-align: center;"&gt;&lt;span style="color: white; font-size: large;"&gt;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=10224595&amp;amp;loc=ec_rcs"&gt;CID:   10224595&lt;/a&gt; (1)&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;   &lt;td style="background: none repeat scroll 0% 0% teal; border-color: -moz-use-text-color teal aqua -moz-use-text-color; border-style: none solid solid none; border-width: medium 1.5pt 1pt medium; padding: 0cm 5.4pt; width: 252pt;" valign="top" width="336"&gt;&lt;div align="center" class="MsoNormal" style="text-align: center;"&gt;&lt;span style="font-size: large;"&gt;&lt;st1:stockticker&gt;&lt;span lang="EN-GB" style="color: white;"&gt;DNA&lt;/span&gt;&lt;/st1:stockticker&gt;&lt;/span&gt;&lt;span lang="EN-GB" style="color: white; font-size: large;"&gt; gyrase according to &lt;a href="http://www.ncbi.nlm.nih.gov/pubmed/19147516"&gt;PMID 19147516&lt;/a&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;  &lt;/tr&gt;
&lt;tr&gt;   &lt;td style="background: none repeat scroll 0% 0% navy; border-color: -moz-use-text-color -moz-use-text-color aqua teal; border-style: none none solid solid; border-width: medium medium 1pt 1.5pt; padding: 0cm 5.4pt; width: 212.4pt;" valign="top" width="283"&gt;&lt;div align="center" class="MsoNormal" style="text-align: center;"&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;&lt;i&gt;&lt;span lang="EN-GB" style="color: white;"&gt;dronedarone (Multaq)&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/i&gt;&lt;/b&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;   &lt;td style="background: none repeat scroll 0% 0% teal; border-color: -moz-use-text-color -moz-use-text-color aqua; border-style: none none solid; border-width: medium medium 1pt; padding: 0cm 5.4pt; width: 180pt;" valign="top" width="240"&gt;&lt;div align="center" class="MsoNormal" style="text-align: center;"&gt;&lt;span style="color: white; font-size: large;"&gt;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=208898&amp;amp;loc=ec_rcs"&gt;CID:   208898&lt;/a&gt; (2)&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;   &lt;td style="background: none repeat scroll 0% 0% teal; border-color: -moz-use-text-color teal aqua -moz-use-text-color; border-style: none solid solid none; border-width: medium 1.5pt 1pt medium; padding: 0cm 5.4pt; width: 252pt;" valign="top" width="336"&gt;&lt;div align="center" class="MsoNormal" style="text-align: center;"&gt;&lt;span style="color: white; font-size: large;"&gt;calcium channel L type blocker ( KEGG)&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;  &lt;/tr&gt;
&lt;tr&gt;   &lt;td style="background: none repeat scroll 0% 0% navy; border-color: -moz-use-text-color -moz-use-text-color aqua teal; border-style: none none solid solid; border-width: medium medium 1pt 1.5pt; padding: 0cm 5.4pt; width: 212.4pt;" valign="top" width="283"&gt;&lt;div align="center" class="MsoNormal" style="text-align: center;"&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;&lt;i&gt;&lt;span lang="EN-GB" style="color: white;"&gt;prasugrel (Effient)&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/i&gt;&lt;/b&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;   &lt;td style="background: none repeat scroll 0% 0% teal; border-color: -moz-use-text-color -moz-use-text-color aqua; border-style: none none solid; border-width: medium medium 1pt; padding: 0cm 5.4pt; width: 180pt;" valign="top" width="240"&gt;&lt;div align="center" class="MsoNormal" style="text-align: center;"&gt;&lt;span style="color: white; font-size: large;"&gt;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=6918456&amp;amp;loc=ec_rcs"&gt;CID:   6918456&lt;/a&gt; (2)&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;   &lt;td style="background: none repeat scroll 0% 0% teal; border-color: -moz-use-text-color teal aqua -moz-use-text-color; border-style: none solid solid none; border-width: medium 1.5pt 1pt medium; padding: 0cm 5.4pt; text-align: center; width: 252pt;" valign="top" width="336"&gt;&lt;span style="font-size: large;"&gt;&lt;a href="http://www.genenames.org/data/hgnc_data.php?hgnc_id=18124"&gt;P2Y12&lt;/a&gt;&lt;/span&gt;&lt;/td&gt;  &lt;/tr&gt;
&lt;tr&gt;   &lt;td style="background: none repeat scroll 0% 0% navy; border-color: -moz-use-text-color -moz-use-text-color aqua teal; border-style: none none solid solid; border-width: medium medium 1pt 1.5pt; padding: 0cm 5.4pt; width: 212.4pt;" valign="top" width="283"&gt;&lt;div align="center" class="MsoNormal" style="text-align: center;"&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;&lt;i&gt;&lt;span lang="EN-GB" style="color: white;"&gt;saxagliptin (Onglyza)&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/i&gt;&lt;/b&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;   &lt;td style="background: none repeat scroll 0% 0% teal; border-color: -moz-use-text-color -moz-use-text-color aqua; border-style: none none solid; border-width: medium medium 1pt; padding: 0cm 5.4pt; width: 180pt;" valign="top" width="240"&gt;&lt;div align="center" class="MsoNormal" style="text-align: center;"&gt;&lt;span style="color: white; font-size: large;"&gt;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=44590597&amp;amp;loc=ec_rcs"&gt;CID:   44590597&lt;/a&gt; (6)&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;   &lt;td style="background: none repeat scroll 0% 0% teal; border-color: -moz-use-text-color teal aqua -moz-use-text-color; border-style: none solid solid none; border-width: medium 1.5pt 1pt medium; padding: 0cm 5.4pt; width: 252pt;" valign="top" width="336"&gt;&lt;div align="center" class="MsoNormal" style="text-align: center;"&gt;&lt;span style="color: white; font-size: large;"&gt;&amp;nbsp;&lt;/span&gt;&lt;span style="color: white; font-size: large;"&gt;&lt;a href="http://www.genenames.org/data/hgnc_data.php?hgnc_id=3009"&gt;DPP4&lt;/a&gt;&lt;/span&gt;&lt;span lang="EN-GB" style="color: white; font-size: large;"&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;  &lt;/tr&gt;
&lt;tr&gt;   &lt;td style="background: none repeat scroll 0% 0% navy; border-color: -moz-use-text-color -moz-use-text-color aqua teal; border-style: none none solid solid; border-width: medium medium 1pt 1.5pt; padding: 0cm 5.4pt; width: 212.4pt;" valign="top" width="283"&gt;&lt;div align="center" class="MsoNormal" style="text-align: center;"&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;&lt;i&gt;&lt;span lang="EN-GB" style="color: white;"&gt;pitavastatin (Livalo)&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/i&gt;&lt;/b&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;   &lt;td style="background: none repeat scroll 0% 0% teal; border-color: -moz-use-text-color -moz-use-text-color aqua; border-style: none none solid; border-width: medium medium 1pt; padding: 0cm 5.4pt; width: 180pt;" valign="top" width="240"&gt;&lt;div align="center" class="MsoNormal" style="text-align: center;"&gt;&lt;span style="color: white; font-size: large;"&gt;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=5282452&amp;amp;loc=ec_rcs"&gt;CID:   5282452&lt;/a&gt; (6)&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;   &lt;td style="background: none repeat scroll 0% 0% teal; border-color: -moz-use-text-color teal aqua -moz-use-text-color; border-style: none solid solid none; border-width: medium 1.5pt 1pt medium; padding: 0cm 5.4pt; width: 252pt;" valign="top" width="336"&gt;&lt;div align="center" class="MsoNormal" style="text-align: center;"&gt;&lt;span lang="EN-GB" style="color: white; font-size: large;"&gt;&lt;a href="http://www.genenames.org/data/hgnc_data.php?hgnc_id=5006"&gt;HMGCR&lt;/a&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;  &lt;/tr&gt;
&lt;tr&gt;   &lt;td style="background: none repeat scroll 0% 0% navy; border-color: -moz-use-text-color -moz-use-text-color aqua teal; border-style: none none solid solid; border-width: medium medium 1pt 1.5pt; padding: 0cm 5.4pt; width: 212.4pt;" valign="top" width="283"&gt;&lt;div align="center" class="MsoNormal" style="text-align: center;"&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;&lt;i&gt;&lt;span lang="EN-GB" style="color: white;"&gt;asenapine (Saphris)&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/i&gt;&lt;/b&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;   &lt;td style="background: none repeat scroll 0% 0% teal; border-color: -moz-use-text-color -moz-use-text-color aqua; border-style: none none solid; border-width: medium medium 1pt; padding: 0cm 5.4pt; width: 180pt;" valign="top" width="240"&gt;&lt;div align="center" class="MsoNormal" style="text-align: center;"&gt;&lt;span style="color: white; font-size: large;"&gt;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=163091&amp;amp;loc=ec_rcs"&gt;CID:   163091&lt;/a&gt; (4)&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;   &lt;td style="background: none repeat scroll 0% 0% teal; border-color: -moz-use-text-color teal aqua -moz-use-text-color; border-style: none solid solid none; border-width: medium 1.5pt 1pt medium; padding: 0cm 5.4pt; width: 252pt;" valign="top" width="336"&gt;&lt;div align="center" class="MsoNormal" style="text-align: center;"&gt;&lt;span lang="EN-GB" style="color: white; font-size: large;"&gt;&lt;a href="http://www.genenames.org/data/hgnc_data.php?hgnc_id=5293"&gt;HTR2A&lt;/a&gt; and   &lt;a href="http://www.genenames.org/data/hgnc_data.php?hgnc_id=3023"&gt;DRD2&lt;/a&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;  &lt;/tr&gt;
&lt;tr&gt;   &lt;td style="background: none repeat scroll 0% 0% navy; border-color: -moz-use-text-color -moz-use-text-color aqua teal; border-style: none none solid solid; border-width: medium medium 1pt 1.5pt; padding: 0cm 5.4pt; width: 212.4pt;" valign="top" width="283"&gt;&lt;div align="center" class="MsoNormal" style="text-align: center;"&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;&lt;i&gt;&lt;span lang="EN-GB" style="color: white;"&gt;vigabatrin (Sabril)&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/i&gt;&lt;/b&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;   &lt;td style="background: none repeat scroll 0% 0% teal; border-color: -moz-use-text-color -moz-use-text-color aqua; border-style: none none solid; border-width: medium medium 1pt; padding: 0cm 5.4pt; width: 180pt;" valign="top" width="240"&gt;&lt;div align="center" class="MsoNormal" style="text-align: center;"&gt;&lt;span style="color: white; font-size: large;"&gt;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=5665&amp;amp;loc=ec_rcs"&gt;CID:   5665&lt;/a&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;   &lt;td style="background: none repeat scroll 0% 0% teal; border-color: -moz-use-text-color teal aqua -moz-use-text-color; border-style: none solid solid none; border-width: medium 1.5pt 1pt medium; padding: 0cm 5.4pt; width: 252pt;" valign="top" width="336"&gt;&lt;div align="center" class="MsoNormal" style="text-align: center;"&gt;&lt;span lang="EN-GB" style="color: white; font-size: large;"&gt;&lt;a href="http://www.genenames.org/data/hgnc_data.php?hgnc_id=23"&gt;ABAT&lt;/a&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;  &lt;/tr&gt;
&lt;tr&gt;   &lt;td style="background: none repeat scroll 0% 0% navy; border-color: -moz-use-text-color -moz-use-text-color aqua teal; border-style: none none solid solid; border-width: medium medium 1pt 1.5pt; padding: 0cm 5.4pt; width: 212.4pt;" valign="top" width="283"&gt;&lt;div align="center" class="MsoNormal" style="text-align: center;"&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;&lt;i&gt;&lt;span lang="EN-GB" style="color: white;"&gt;bepotastine (Bepreve)&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/i&gt;&lt;/b&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;   &lt;td style="background: none repeat scroll 0% 0% teal; border-color: -moz-use-text-color -moz-use-text-color aqua; border-style: none none solid; border-width: medium medium 1pt; padding: 0cm 5.4pt; width: 180pt;" valign="top" width="240"&gt;&lt;div align="center" class="MsoNormal" style="text-align: center;"&gt;&lt;span style="color: white; font-size: large;"&gt;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=2350&amp;amp;loc=ec_rcs"&gt;CID:   2350&lt;/a&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;   &lt;td style="background: none repeat scroll 0% 0% teal; border-color: -moz-use-text-color teal aqua -moz-use-text-color; border-style: none solid solid none; border-width: medium 1.5pt 1pt medium; padding: 0cm 5.4pt; width: 252pt;" valign="top" width="336"&gt;&lt;div align="center" class="MsoNormal" style="text-align: center;"&gt;&lt;span lang="EN-GB" style="color: white; font-size: large;"&gt;&lt;a href="http://www.genenames.org/data/hgnc_data.php?hgnc_id=5182"&gt;HRH1&lt;/a&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;  &lt;/tr&gt;
&lt;tr&gt;   &lt;td style="background: none repeat scroll 0% 0% navy; border-color: -moz-use-text-color -moz-use-text-color aqua teal; border-style: none none solid solid; border-width: medium medium 1pt 1.5pt; padding: 0cm 5.4pt; width: 212.4pt;" valign="top" width="283"&gt;&lt;div align="center" class="MsoNormal" style="text-align: center;"&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;&lt;i&gt;&lt;span lang="EN-GB" style="color: white;"&gt;Telavancin (Vibative)&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/i&gt;&lt;/b&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;   &lt;td style="background: none repeat scroll 0% 0% teal; border-color: -moz-use-text-color -moz-use-text-color aqua; border-style: none none solid; border-width: medium medium 1pt; padding: 0cm 5.4pt; width: 180pt;" valign="top" width="240"&gt;&lt;div align="center" class="MsoNormal" style="text-align: center;"&gt;&lt;span style="color: white; font-size: large;"&gt;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=6918702&amp;amp;loc=ec_rcs"&gt;CID:   6918702&lt;/a&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;   &lt;td style="background: none repeat scroll 0% 0% teal; border-color: -moz-use-text-color teal aqua -moz-use-text-color; border-style: none solid solid none; border-width: medium 1.5pt 1pt medium; padding: 0cm 5.4pt; width: 252pt;" valign="top" width="336"&gt;&lt;div align="center" class="MsoNormal" style="text-align: center;"&gt;&lt;span lang="EN-GB" style="color: white; font-size: large;"&gt;Interaction with &lt;/span&gt;&lt;span style="color: white; font-size: large;"&gt;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=25245038&amp;amp;loc=ec_rcs"&gt;&lt;span lang="EN-GB"&gt;CID: 25245038&lt;/span&gt;&lt;/a&gt;&lt;/span&gt;&lt;span lang="EN-GB" style="color: white; font-size: large;"&gt;&amp;nbsp; and &lt;/span&gt;&lt;span style="color: white; font-size: large;"&gt;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=5242513&amp;amp;loc=ec_rcs"&gt;&lt;span lang="EN-GB"&gt;CID: 5242513&lt;/span&gt;&lt;/a&gt;&lt;/span&gt;&lt;span lang="EN-GB" style="color: white; font-size: large;"&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;  &lt;/tr&gt;
&lt;tr&gt;   &lt;td style="background: none repeat scroll 0% 0% navy; border-color: -moz-use-text-color -moz-use-text-color aqua teal; border-style: none none solid solid; border-width: medium medium 1pt 1.5pt; padding: 0cm 5.4pt; width: 212.4pt;" valign="top" width="283"&gt;&lt;div align="center" class="MsoNormal" style="text-align: center;"&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;&lt;i&gt;&lt;span lang="EN-GB" style="color: white;"&gt;Pralatrexate (Folotyn)&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/i&gt;&lt;/b&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;   &lt;td style="background: none repeat scroll 0% 0% teal; border-color: -moz-use-text-color -moz-use-text-color aqua; border-style: none none solid; border-width: medium medium 1pt; padding: 0cm 5.4pt; width: 180pt;" valign="top" width="240"&gt;&lt;div align="center" class="MsoNormal" style="text-align: center;"&gt;&lt;span style="color: white; font-size: large;"&gt;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=148121&amp;amp;loc=ec_rcs"&gt;CID:   148121&lt;/a&gt; (3)&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;   &lt;td style="background: none repeat scroll 0% 0% teal; border-color: -moz-use-text-color teal aqua -moz-use-text-color; border-style: none solid solid none; border-width: medium 1.5pt 1pt medium; padding: 0cm 5.4pt; width: 252pt;" valign="top" width="336"&gt;&lt;div align="center" class="MsoNormal" style="text-align: center;"&gt;&lt;span style="color: white; font-size: large;"&gt;&lt;a href="http://www.genenames.org/data/hgnc_data.php?hgnc_id=2861"&gt;DHFR&lt;/a&gt;&lt;/span&gt;&lt;span lang="EN-GB" style="color: white; font-size: large;"&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;  &lt;/tr&gt;
&lt;tr&gt;   &lt;td style="background: none repeat scroll 0% 0% navy; border-color: -moz-use-text-color -moz-use-text-color aqua teal; border-style: none none solid solid; border-width: medium medium 1pt 1.5pt; padding: 0cm 5.4pt; width: 212.4pt;" valign="top" width="283"&gt;&lt;div align="center" class="MsoNormal" style="text-align: center;"&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;&lt;i&gt;&lt;span lang="EN-GB" style="color: white;"&gt;Pazopanib (Votrient)&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/i&gt;&lt;/b&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;   &lt;td style="background: none repeat scroll 0% 0% teal; border-color: -moz-use-text-color -moz-use-text-color aqua; border-style: none none solid; border-width: medium medium 1pt; padding: 0cm 5.4pt; width: 180pt;" valign="top" width="240"&gt;&lt;div align="center" class="MsoNormal" style="text-align: center;"&gt;&lt;span style="color: white; font-size: large;"&gt;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=10113978&amp;amp;loc=ec_rcs"&gt;CID:   10113978&lt;/a&gt; (2)&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;   &lt;td style="background: none repeat scroll 0% 0% teal; border-color: -moz-use-text-color teal aqua -moz-use-text-color; border-style: none solid solid none; border-width: medium 1.5pt 1pt medium; padding: 0cm 5.4pt; width: 252pt;" valign="top" width="336"&gt;&lt;div align="center" class="MsoNormal" style="text-align: center;"&gt;&lt;span style="color: white; font-size: large;"&gt;&lt;a href="http://www.genenames.org/data/hgnc_data.php?hgnc_id=6307"&gt;KDR&lt;/a&gt;&lt;/span&gt;&lt;span lang="EN-GB" style="color: white; font-size: large;"&gt;&amp;nbsp; but pan-VEGFR&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;  &lt;/tr&gt;
&lt;tr&gt;   &lt;td style="background: none repeat scroll 0% 0% navy; border-color: -moz-use-text-color -moz-use-text-color teal teal; border-style: none none solid solid; border-width: medium medium 1.5pt 1.5pt; padding: 0cm 5.4pt; width: 212.4pt;" valign="top" width="283"&gt;&lt;div align="center" class="MsoNormal" style="text-align: center;"&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;&lt;i&gt;&lt;span lang="EN-GB" style="color: white;"&gt;Romidepsin (istodax)&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/i&gt;&lt;/b&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;   &lt;td style="background: none repeat scroll 0% 0% teal; border-color: -moz-use-text-color -moz-use-text-color teal; border-style: none none solid; border-width: medium medium 1.5pt; padding: 0cm 5.4pt; width: 180pt;" valign="top" width="240"&gt;&lt;div align="center" class="MsoNormal" style="text-align: center;"&gt;&lt;span style="color: white; font-size: large;"&gt;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=5352062&amp;amp;loc=ec_rcs"&gt;CID:   5352062&lt;/a&gt; (2)&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;   &lt;td style="background: none repeat scroll 0% 0% teal; border-color: -moz-use-text-color teal teal -moz-use-text-color; border-style: none solid solid none; border-width: medium 1.5pt 1.5pt medium; padding: 0cm 5.4pt; width: 252pt;" valign="top" width="336"&gt;&lt;div align="center" class="MsoNormal" style="text-align: center;"&gt;&lt;span lang="EN-GB" style="color: white; font-size: large;"&gt;&lt;a href="http://www.genenames.org/data/hgnc_data.php?hgnc_id=4852"&gt;HDAC1&lt;/a&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;  &lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;&lt;span style="font-size: large;"&gt;&amp;nbsp;  &lt;/span&gt;&lt;span lang="EN-GB" style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;/div&gt;&lt;div class="Section1" style="font-family: inherit;"&gt;&lt;div class="MsoNormal"&gt;&lt;link href="file:///C:%5CUsers%5CCHRIST%7E1%5CAppData%5CLocal%5CTemp%5Cmsohtml1%5C01%5Cclip_filelist.xml" rel="File-List"&gt;&lt;/link&gt;&lt;span style="font-size: large;"&gt;&lt;o:smarttagtype name="country-region" namespaceuri="urn:schemas-microsoft-com:office:smarttags"&gt;&lt;/o:smarttagtype&gt;&lt;o:smarttagtype name="stockticker" namespaceuri="urn:schemas-microsoft-com:office:smarttags"&gt;&lt;/o:smarttagtype&gt;&lt;o:smarttagtype name="place" namespaceuri="urn:schemas-microsoft-com:office:smarttags"&gt;&lt;/o:smarttagtype&gt;&lt;/span&gt;&lt;style&gt;
&lt;!--
 /* Font Definitions */
 @font-face
	{font-family:Wingdings;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;panose&lt;/span&gt;-1:5 0 0 0 0 0 0 0 0 0;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-font-&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;charset&lt;/span&gt;:2;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-generic-font-family:auto;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-font-pitch:variable;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-font-signature:0 268435456 0 0 -2147483648 0;}
@font-face
	{font-family:&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;SimSun&lt;/span&gt;;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;panose&lt;/span&gt;-1:2 1 6 0 3 1 1 1 1 1;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-font-alt:宋体;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-font-&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;charset&lt;/span&gt;:134;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-generic-font-family:auto;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-font-pitch:variable;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;SimSun&lt;/span&gt;";
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;panose&lt;/span&gt;-1:2 1 6 0 3 1 1 1 1 1;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-font-&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;charset&lt;/span&gt;:134;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-generic-font-family:auto;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-font-pitch:variable;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-font-signature:3 680460288 22 0 262145 0;}
 /* Style Definitions */
 p.&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;MsoNormal&lt;/span&gt;, &lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;li&lt;/span&gt;.&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;MsoNormal&lt;/span&gt;, div.&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;MsoNormal&lt;/span&gt;
	{&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-style-parent:"";
	margin:0cm;
	margin-bottom:.0001pt;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-pagination:widow-orphan;
	font-size:12.0pt;
	font-family:"Times New Roman";
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;fareast&lt;/span&gt;-font-family:&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;SimSun&lt;/span&gt;;}
a:link, span.&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;MsoHyperlink&lt;/span&gt;
	{color:blue;
	text-decoration:underline;
	text-underline:single;}
a:visited, span.&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;MsoHyperlinkFollowed&lt;/span&gt;
	{color:purple;
	text-decoration:underline;
	text-underline:single;}
@page Section1
	{size:612.0pt 792.0pt;
	margin:72.0pt 90.0pt 72.0pt 90.0pt;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-header-margin:36.0pt;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-footer-margin:36.0pt;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-paper-source:0;}
div.Section1
	{page:Section1;}
 /* List Definitions */
 @list l0
	{&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-list-id:1884950290;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-list-type:hybrid;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-list-template-ids:1132226934 69009409 69009411 69009413 69009409 69009411 69009413 69009409 69009411 69009413;}
@list l0:level1
	{&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-level-number-format:bullet;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-level-text:;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-level-tab-stop:36.0pt;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-level-number-position:left;
	text-indent:-18.0pt;
	font-family:Symbol;}
&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;ol&lt;/span&gt;
	{margin-bottom:0cm;}
&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;ul&lt;/span&gt;
	{margin-bottom:0cm;}
--&gt;
&lt;/style&gt;  &lt;br /&gt;
&lt;div class="MsoNormal"&gt;&lt;span lang="EN-GB" style="font-size: large;"&gt;I had thought of doing a full maitrix indicating which had&amp;nbsp; DrugBank, ClinincalTrials.org, ChEMBL , BioAssay links ect.&amp;nbsp; However, the multiple &lt;/span&gt;&lt;span style="font-size: large;"&gt;&lt;st1:stockticker&gt;&lt;span lang="EN-GB"&gt;CID&lt;/span&gt;&lt;/st1:stockticker&gt;&lt;/span&gt;&lt;span lang="EN-GB" style="font-size: large;"&gt;s makes this difficult because of the consequenct out-link spaghetti so I shall just pick up selected examples. &amp;nbsp;As ever, ferreting around produced a host of interesing connectivities and a sprinkling of anomalies but I can only show you a small cut. &lt;/span&gt;&lt;/div&gt;&lt;ul&gt;&lt;li class="MsoNormal"&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;&lt;span lang="EN-GB"&gt;The multi-mappers&lt;/span&gt;&lt;/b&gt;&lt;/span&gt;&lt;span lang="EN-GB" style="font-size: large;"&gt;;&amp;nbsp; I won’t go in to the PubChem chemistry      rules that are explained on the site,&amp;nbsp;      but one important consequence is that any form of submitter representational      heterogeneity, usually tautomeric, spawns multiple &lt;/span&gt;&lt;span style="font-size: large;"&gt;&lt;st1:stockticker&gt;&lt;span lang="EN-GB"&gt;CID&lt;/span&gt;&lt;/st1:stockticker&gt;&lt;/span&gt;&lt;span lang="EN-GB" style="font-size: large;"&gt;s.&amp;nbsp; I have only included the direct&amp;nbsp;&lt;a href="http://www.who.int/medicines/services/inn/innquidance/en/index.html"&gt;International Nonproprietary Names&lt;/a&gt; &lt;/span&gt;&lt;span style="font-size: large;"&gt;&lt;st1:place&gt;&lt;span lang="EN-GB"&gt;(INN&lt;/span&gt;&lt;/st1:place&gt;&lt;/span&gt;&lt;span lang="EN-GB" style="font-size: large;"&gt;) matches in my &lt;/span&gt;&lt;span style="font-size: large;"&gt;&lt;st1:stockticker&gt;&lt;span lang="EN-GB"&gt;CID&lt;/span&gt;&lt;/st1:stockticker&gt;&lt;/span&gt;&lt;span lang="EN-GB" style="font-size: large;"&gt; counts above (discounting some      false-positives na&lt;span style="font-family: Times,&amp;quot;Times New Roman&amp;quot;,serif;"&gt;med “saxagliptin analogues”). These don’t necessarily      hit all the &lt;/span&gt;&lt;/span&gt;&lt;span style="font-family: Times,&amp;quot;Times New Roman&amp;quot;,serif; font-size: large;"&gt;&lt;st1:stockticker&gt;&lt;span lang="EN-GB"&gt;CID&lt;/span&gt;&lt;/st1:stockticker&gt;&lt;/span&gt;&lt;span lang="EN-GB" style="font-size: large;"&gt;&lt;span style="font-family: Times,&amp;quot;Times New Roman&amp;quot;,serif;"&gt;s (they can&lt;/span&gt; be named something else) so if you want the true numbers you have to click on “Related      Compounds: Same, Connectivity: ” &amp;nbsp;you may also have to split the mixtures      to “parent”.&amp;nbsp; If&amp;nbsp; submitter numbers reflects popularity      them &amp;nbsp;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=5282452&amp;amp;loc=ec_rcs"&gt;pitavastatin&lt;/a&gt;&amp;nbsp;does well at 70 submitter links. Topping the &lt;/span&gt;&lt;span style="font-size: large;"&gt;&lt;st1:stockticker&gt;&lt;span lang="EN-GB"&gt;CID&lt;/span&gt;&lt;/st1:stockticker&gt;&lt;/span&gt;&lt;span lang="EN-GB" style="font-size: large;"&gt; mltiplexers was&amp;nbsp;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=104888&amp;amp;loc=ec_rcs"&gt;artemether&lt;/a&gt;&amp;nbsp; with 26 CIDs. Second was&amp;nbsp;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=44590597&amp;amp;loc=ec_rcs"&gt;saxagliptin&lt;/a&gt; with 13, but 7 are isotope      forms. &amp;nbsp;The deuterated derivatives are picked up by Thomson Pharma, probably from patents. &lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;span lang="EN-GB" style="font-size: large;"&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;
&lt;ul&gt;&lt;li class="MsoNormal"&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;&lt;span lang="EN-GB"&gt;In the mix:&lt;/span&gt;&lt;/b&gt;&lt;/span&gt;&lt;span lang="EN-GB" style="font-size: large;"&gt;&amp;nbsp; Several from the list are specified as      salts (e.g. chlorides) and there was a &lt;/span&gt;&lt;span style="font-size: large;"&gt;&lt;st1:stockticker&gt;&lt;span lang="EN-GB"&gt;mixture CID&lt;/span&gt;&lt;/st1:stockticker&gt;&lt;/span&gt;&lt;span lang="EN-GB" style="font-size: large;"&gt; in each case.&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=2350&amp;amp;loc=ec_rcs"&gt;Bepotstatine (besilate)&lt;/a&gt; has 23 &lt;a href="http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=search&amp;amp;db=pcsubstance&amp;amp;term=164522%5bccid%5d"&gt;mixtures&lt;/a&gt; at the SID level&amp;nbsp; (oddly, one of them not a mixture ? &lt;/span&gt;&lt;span style="font-size: large;"&gt;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?sid=58108348&amp;amp;loc=es_rss"&gt;&lt;span lang="EN-GB"&gt;SID: 58108348&lt;/span&gt;&lt;/a&gt;&lt;/span&gt;&lt;span lang="EN-GB" style="font-size: large;"&gt;) in addition to just the Besylic      acid &lt;/span&gt;&lt;span style="font-size: large;"&gt;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=7371&amp;amp;loc=ec_rcs"&gt;&lt;span lang="EN-GB"&gt;CID: 7371&lt;/span&gt;&lt;/a&gt;&lt;/span&gt;&lt;span lang="EN-GB" style="font-size: large;"&gt;, &amp;nbsp;Lots of these mixtures come from Thomson Pharama, presumably also feeding in via patents.&amp;nbsp; Only the &lt;i&gt;&lt;b style="color: black;"&gt;besylic acid&lt;/b&gt;&lt;/i&gt; (a.k.a. &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=7371&amp;amp;loc=ec_rcs"&gt;benzenesulfonic acid&lt;/a&gt;) made it into the &lt;/span&gt;&lt;span style="font-size: large;"&gt;&lt;st1:stockticker&gt;&lt;span lang="EN-GB"&gt;MLS&lt;/span&gt;&lt;/st1:stockticker&gt;&lt;/span&gt;&lt;span lang="EN-GB" style="font-size: large;"&gt;CN sceening collection but has no hits.&amp;nbsp; &lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;span lang="EN-GB" style="font-size: large;"&gt; &lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;
&lt;ul&gt;&lt;li class="MsoNormal"&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;&lt;span lang="EN-GB"&gt;Oldies but goldies&lt;/span&gt;&lt;/b&gt;&lt;/span&gt;&lt;span lang="EN-GB" style="font-size: large;"&gt;: Three or four-digit CIDs mean      the compounds have been in PubChem since fairly early on, maybe      2004/2005. You can check in the&amp;nbsp; "CID create date"&amp;nbsp; field. The oldest of this list&amp;nbsp; (in &lt;/span&gt;&lt;span style="font-size: large;"&gt;&lt;st1:stockticker&gt;&lt;span lang="EN-GB"&gt;CID&lt;/span&gt;&lt;/st1:stockticker&gt;&lt;/span&gt;&lt;span lang="EN-GB" style="font-size: large;"&gt; “age” with three digits) not surprisingly, is &lt;i&gt;&lt;b style="color: black;"&gt;ulesfia&lt;/b&gt;&lt;/i&gt;,&amp;nbsp; a.k.a&amp;nbsp; &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=244&amp;amp;loc=ec_rcs"&gt;benzyl alcohol&lt;/a&gt;, &amp;nbsp;‘nuff      said.&amp;nbsp; Next oldest may be&amp;nbsp;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=5665&amp;amp;loc=ec_rcs"&gt;vigabatrin&lt;/a&gt;      which back-maps via the MeSH term (i.e. the &lt;/span&gt;&lt;span style="font-size: large;"&gt;&lt;st1:place&gt;&lt;span lang="EN-GB"&gt;INN&lt;/span&gt;&lt;/st1:place&gt;&lt;/span&gt;&lt;span lang="EN-GB" style="font-size: large;"&gt; retro-mapped to the IUPAC) to      1977 (&lt;a href="http://www.ncbi.nlm.nih.gov/pubmed/263549"&gt;PMID 263549&lt;/a&gt;).&amp;nbsp; Then we get&amp;nbsp;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=2350&amp;amp;loc=ec_rcs"&gt;bepotastine&lt;/a&gt; with the &lt;/span&gt;&lt;span style="font-size: large;"&gt;&lt;st1:place&gt;&lt;span lang="EN-GB"&gt;INN&lt;/span&gt;&lt;/st1:place&gt;&lt;/span&gt;&lt;span lang="EN-GB" style="font-size: large;"&gt; mapped back to 1997 (&lt;a href="http://www.ncbi.nlm.nih.gov/pubmed/9285833"&gt;PMID 9285833&lt;/a&gt;) and      DrugBank (&lt;/span&gt;&lt;span style="font-size: large;"&gt;&lt;a href="http://www.drugbank.ca/cgi-bin/getCard.cgi?CARD=DB04890.txt"&gt;&lt;span lang="EN-GB"&gt;DB04890&lt;/span&gt;&lt;/a&gt;&lt;/span&gt;&lt;span lang="EN-GB" style="font-size: large;"&gt;)&amp;nbsp; reporting its approval in &lt;/span&gt;&lt;span style="font-size: large;"&gt;&lt;st1:country-region&gt;&lt;st1:place&gt;&lt;span lang="EN-GB"&gt;Japan&lt;/span&gt;&lt;/st1:place&gt;&lt;/st1:country-region&gt;&lt;/span&gt;&lt;span lang="EN-GB" style="font-size: large;"&gt; in 2000.&amp;nbsp; &lt;/span&gt;&lt;span lang="EN-GB" style="font-size: large;"&gt;The converse, &lt;/span&gt;&lt;span lang="EN-GB" style="font-size: large;"&gt;for reasons again related to the      chemistruy rules,&amp;nbsp; (high numbers = recent) cannot be assumed to      be “new” structures. A good example being &lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=6450800&amp;amp;loc=ec_rcs"&gt;coartem&lt;/a&gt; where the      “literature" links in the CID records map back to Chinese research from the early 80’s.&amp;nbsp; Remember that these two compounds (and      maybe others in the list) or their close analogues may be in the literature      beyond the MeSH back-map reach.&amp;nbsp; &lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;ul&gt;&lt;li class="MsoNormal"&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;&lt;span lang="EN-GB"&gt;Patent wacks&lt;/span&gt;&lt;/b&gt;&lt;b&gt;&lt;span lang="EN-GB"&gt; &lt;/span&gt;&lt;/b&gt;&lt;/span&gt;&lt;span lang="EN-GB" style="font-size: large;"&gt;: &amp;nbsp;While the commercial submissions from Thomson Pharma and Discovery Gate&amp;nbsp; may link back to patents via subscriber access there are some direct      hits from these drugs via the SureChem link in ChemSpider.&amp;nbsp; While these are “teasers” for SurChemPro they can provide a match between automated name-to-structure      extraction and a patent number. &amp;nbsp;Heading      the list is well over a 1000&amp;nbsp; hits for&amp;nbsp;      &lt;i&gt;&lt;b&gt;&lt;span style="color: black;"&gt;milnacipran&lt;/span&gt;&lt;/b&gt;&lt;/i&gt;&amp;nbsp; (via &lt;/span&gt;&lt;span style="font-size: large;"&gt;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=65833&amp;amp;loc=ec_rcs"&gt;&lt;span lang="EN-GB"&gt;CID: 65833&lt;/span&gt;&lt;/a&gt;&lt;/span&gt;&lt;span lang="EN-GB" style="font-size: large;"&gt;&amp;nbsp;&amp;nbsp; to ChemSpider &lt;a href="http://www.chemspider.com/Chemical-Structure.59245.html"&gt;59245&lt;/a&gt; . However, &lt;/span&gt;&lt;span lang="EN-GB" style="font-size: large;"&gt;note that such big hitlists are usually caused by frequent mentions of&lt;span style="color: red;"&gt;&lt;b&gt; &lt;/b&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="font-size: large;"&gt;an&amp;nbsp; established INN, either&lt;/span&gt;&lt;span lang="EN-GB" style="font-size: large;"&gt; in non-claim contexts or "shotgun" combination claims. &lt;/span&gt;&lt;span lang="EN-GB" style="font-size: large;"&gt;The modest five BMS hits from &lt;i&gt;&lt;b style="color: black;"&gt;saxagliptin&lt;/b&gt;&lt;/i&gt;&amp;nbsp;      (via &lt;/span&gt;&lt;span style="font-size: large;"&gt;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=11243969&amp;amp;loc=ec_rcs"&gt;&lt;span lang="EN-GB"&gt;CID: 11243969&lt;/span&gt;&lt;/a&gt;&lt;/span&gt;&lt;span lang="EN-GB" style="font-size: large;"&gt; to ChemSpider &amp;nbsp;&lt;a href="http://www.chemspider.com/Chemical-Structure.9419005.html"&gt;9419005&lt;/a&gt;) are more likely to be genuine IUPAC-to-struc&lt;/span&gt;&lt;span lang="EN-GB" style="font-size: large;"&gt; from claimed examples. I'm not going to pour over the&amp;nbsp;&lt;a href="http://www.freepatentsonline.com/"&gt;Freepatentsonline&lt;/a&gt; entries to find out but the titles and years suggest these &lt;/span&gt;&lt;span lang="EN-GB" style="font-size: large;"&gt;could be the original filings.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;span lang="EN-GB" style="font-size: large;"&gt; &lt;/span&gt;&lt;br /&gt;
&lt;ul&gt;&lt;li class="MsoNormal"&gt;&lt;span lang="EN-GB" style="font-size: large;"&gt;&lt;b&gt;Antimalarial data: &lt;/b&gt;Given&lt;b&gt; &lt;/b&gt;the recent activity in public antimalarial screening its odd that these two exact structures&lt;b&gt;&amp;nbsp; &lt;/b&gt;&lt;/span&gt;&lt;link href="file:///C:%5CUsers%5CCHRIST%7E1%5CAppData%5CLocal%5CTemp%5Cmsohtml1%5C01%5Cclip_filelist.xml" rel="File-List"&gt;&lt;/link&gt;&lt;style&gt;
&lt;!--
 /* Font Definitions */
 @font-face
	{font-family:&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;SimSun&lt;/span&gt;;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;panose&lt;/span&gt;-1:2 1 6 0 3 1 1 1 1 1;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-font-alt:宋体;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-font-&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;charset&lt;/span&gt;:134;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-generic-font-family:auto;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-font-pitch:variable;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-font-signature:3 680460288 22 0 262145 0;}
@font-face
	{font-family:"\@&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;SimSun&lt;/span&gt;";
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;panose&lt;/span&gt;-1:2 1 6 0 3 1 1 1 1 1;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-font-&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;charset&lt;/span&gt;:134;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-generic-font-family:auto;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-font-pitch:variable;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-font-signature:3 680460288 22 0 262145 0;}
 /* Style Definitions */
 p.&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;MsoNormal&lt;/span&gt;, &lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;li&lt;/span&gt;.&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;MsoNormal&lt;/span&gt;, div.&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;MsoNormal&lt;/span&gt;
	{&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-style-parent:"";
	margin:0cm;
	margin-bottom:.0001pt;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-pagination:widow-orphan;
	font-size:12.0pt;
	font-family:"Times New Roman";
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;fareast&lt;/span&gt;-font-family:&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;SimSun&lt;/span&gt;;}
a:link, span.&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;MsoHyperlink&lt;/span&gt;
	{color:blue;
	text-decoration:underline;
	text-underline:single;}
a:visited, span.&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;MsoHyperlinkFollowed&lt;/span&gt;
	{color:purple;
	text-decoration:underline;
	text-underline:single;}
@page Section1
	{size:612.0pt 792.0pt;
	margin:72.0pt 90.0pt 72.0pt 90.0pt;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-header-margin:36.0pt;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-footer-margin:36.0pt;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-paper-source:0;}
div.Section1
	{page:Section1;}
--&gt;
&lt;/style&gt;&lt;span style="font-size: large;"&gt;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=6437380&amp;amp;loc=ec_rcs"&gt;&lt;span lang="EN-GB"&gt;CID: 6437380&lt;/span&gt;&lt;/a&gt;&lt;/span&gt;&lt;span style="font-size: large;"&gt; &lt;span lang="EN-GB"&gt;(&lt;i&gt;&lt;b style="color: black;"&gt;lumefantrine&lt;/b&gt;&lt;/i&gt;) &lt;/span&gt;&lt;/span&gt;&lt;span style="font-size: large;"&gt;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=104888&amp;amp;loc=ec_rcs"&gt;CID: 104888&lt;/a&gt;&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;span lang="EN-GB" style="font-size: large;"&gt; (&lt;i&gt;&lt;b style="color: black;"&gt;artemether&lt;/b&gt;&lt;/i&gt;) did not make it into the &lt;a href="http://www.ncbi.nlm.nih.gov/sites/entrez?term=%22GlaxoSmithKline%20%28GSK%29%22[sourcename]&amp;amp;cmd=search&amp;amp;db=pcsubstance"&gt;GSK 13533 set&lt;/a&gt;&amp;nbsp; linked to the direct PubChem BioAssay results (altought analogues might have)&amp;nbsp; but the first one is in the GNF set (from the Novartis institute b.t.w. not Novartis Pharma).&amp;nbsp; I might blog about this separately but you can actually track to the GNF screening results from &lt;a href="http://www.ncbi.nlm.nih.gov/pubmed/18579783"&gt;PMID 18579783&lt;/a&gt; , via&amp;nbsp; the ChEBI outlinks and thence to the ChEMBL entry of &lt;a href="http://www.ebi.ac.uk/chembldb/index.php/compound/inspect/156095"&gt;156095&lt;/a&gt; for &lt;i&gt;&lt;b style="color: black;"&gt;lumefantarine&lt;/b&gt;&lt;/i&gt;.&amp;nbsp; The &lt;i&gt;&lt;b style="color: black;"&gt;artemether&lt;/b&gt;&lt;/i&gt; outlinks include &lt;/span&gt;&lt;span lang="EN-GB" style="font-size: large;"&gt; &lt;a href="http://www.ebi.ac.uk/chembldb/index.php/compound/inspect/353595"&gt;ChEMBL  353595&lt;/a&gt;&lt;/span&gt;&lt;span lang="EN-GB" style="font-size: large;"&gt; that links to a BioAssay &lt;a href="http://pubchem.ncbi.nlm.nih.gov/assay/assay.cgi?aid=158019"&gt;AID 131169&lt;/a&gt;&amp;nbsp; and the results extracted for &lt;/span&gt;&lt;span style="font-size: large;"&gt;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=104888&amp;amp;loc=ec_rcs"&gt;CID:  104888&lt;/a&gt; &lt;/span&gt;&lt;span lang="EN-GB" style="font-size: large;"&gt;with four analogues from an earlier paper &lt;/span&gt;&lt;span lang="EN-GB" style="font-size: large;"&gt;&lt;a href="http://www.ncbi.nlm.nih.gov/pubmed/10639291"&gt;PMID  10639291&lt;/a&gt; .&amp;nbsp; &lt;/span&gt;&lt;span lang="EN-GB" style="font-size: large;"&gt;In fact, the extracted data is matrixed so four ChEMBL &lt;/span&gt;&lt;span lang="EN-GB" style="font-size: large;"&gt;literature assay &lt;/span&gt;&lt;span lang="EN-GB" style="font-size: large;"&gt;AIDs were spawned from this paper.&amp;nbsp; Its great to be able to explore data unions at this level of detail but its not for the faint-hearted.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;ul&gt;&lt;li class="MsoNormal"&gt;&lt;link href="file:///C:%5CUsers%5CCHRIST%7E1%5CAppData%5CLocal%5CTemp%5Cmsohtml1%5C01%5Cclip_filelist.xml" rel="File-List"&gt;&lt;/link&gt;&lt;style&gt;
&lt;!--
 /* Font Definitions */
 @font-face
	{font-family:&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;SimSun&lt;/span&gt;;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;panose&lt;/span&gt;-1:2 1 6 0 3 1 1 1 1 1;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-font-alt:宋体;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-font-&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;charset&lt;/span&gt;:134;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-generic-font-family:auto;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-font-pitch:variable;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-font-signature:3 680460288 22 0 262145 0;}
@font-face
	{font-family:"\@&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;SimSun&lt;/span&gt;";
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;panose&lt;/span&gt;-1:2 1 6 0 3 1 1 1 1 1;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-font-&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;charset&lt;/span&gt;:134;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-generic-font-family:auto;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-font-pitch:variable;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-font-signature:3 680460288 22 0 262145 0;}
 /* Style Definitions */
 p.&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;MsoNormal&lt;/span&gt;, &lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;li&lt;/span&gt;.&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;MsoNormal&lt;/span&gt;, div.&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;MsoNormal&lt;/span&gt;
	{&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-style-parent:"";
	margin:0cm;
	margin-bottom:.0001pt;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-pagination:widow-orphan;
	font-size:12.0pt;
	font-family:"Times New Roman";
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;fareast&lt;/span&gt;-font-family:&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;SimSun&lt;/span&gt;;}
@page Section1
	{size:612.0pt 792.0pt;
	margin:72.0pt 90.0pt 72.0pt 90.0pt;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-header-margin:36.0pt;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-footer-margin:36.0pt;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-paper-source:0;}
div.Section1
	{page:Section1;}
--&gt;
&lt;/style&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;&lt;span lang="EN-GB"&gt;Pro-drugs and metabolites&lt;/span&gt;&lt;/b&gt;&lt;/span&gt;&lt;span lang="EN-GB" style="font-size: large;"&gt;.&amp;nbsp;&amp;nbsp; Linking these often present a challenge for databases and so it proves for two examples here.&amp;nbsp; Amoungst other publications for&amp;nbsp;&lt;span style="color: red;"&gt;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=6918456&amp;amp;loc=ec_rcs"&gt;prasugrel&lt;/a&gt;&amp;nbsp; &lt;a href="http://www.ncbi.nlm.nih.gov/pubmed/17403916"&gt;PMID 17403916&lt;/a&gt;&lt;/span&gt; &lt;/span&gt;&lt;link href="file:///C:%5CUsers%5CCHRIST%7E1%5CAppData%5CLocal%5CTemp%5Cmsohtml1%5C01%5Cclip_filelist.xml" rel="File-List"&gt;&lt;/link&gt;&lt;style&gt;
&lt;!--
 /* Font Definitions */
 @font-face
	{font-family:&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;SimSun&lt;/span&gt;;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;panose&lt;/span&gt;-1:2 1 6 0 3 1 1 1 1 1;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-font-alt:宋体;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-font-&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;charset&lt;/span&gt;:134;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-generic-font-family:auto;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-font-pitch:variable;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-font-signature:3 680460288 22 0 262145 0;}
@font-face
	{font-family:"\@&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;SimSun&lt;/span&gt;";
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;panose&lt;/span&gt;-1:2 1 6 0 3 1 1 1 1 1;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-font-&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;charset&lt;/span&gt;:134;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-generic-font-family:auto;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-font-pitch:variable;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-font-signature:3 680460288 22 0 262145 0;}
 /* Style Definitions */
 p.&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;MsoNormal&lt;/span&gt;, &lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;li&lt;/span&gt;.&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;MsoNormal&lt;/span&gt;, div.&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;MsoNormal&lt;/span&gt;
	{&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-style-parent:"";
	margin:0cm;
	margin-bottom:.0001pt;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-pagination:widow-orphan;
	font-size:12.0pt;
	font-family:"Times New Roman";
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;fareast&lt;/span&gt;-font-family:&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;SimSun&lt;/span&gt;;}
a:link, span.&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;MsoHyperlink&lt;/span&gt;
	{color:blue;
	text-decoration:underline;
	text-underline:single;}
a:visited, span.&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;MsoHyperlinkFollowed&lt;/span&gt;
	{color:purple;
	text-decoration:underline;
	text-underline:single;}
@page Section1
	{size:612.0pt 792.0pt;
	margin:72.0pt 90.0pt 72.0pt 90.0pt;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-header-margin:36.0pt;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-footer-margin:36.0pt;
	&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;mso&lt;/span&gt;-paper-source:0;}
div.Section1
	{page:Section1;}
--&gt;
&lt;/style&gt;&lt;span lang="EN-GB" style="font-size: large;"&gt;shows about 17 metabolites including what the authors label the active drug as R-138727 and inactive metabolites including R-119251 and R-106583.&amp;nbsp; While you can Google and &lt;a href="http://www.tlcpharmachem.com/tlc_item.php?upc=P-396&amp;amp;li=&amp;amp;sub="&gt;buy&lt;/a&gt; some of these&amp;nbsp;&amp;nbsp;&amp;nbsp; I could not hit those three R-numbers in PubChem.&amp;nbsp; However,&amp;nbsp; via downloading the catalogue .jpg for R-138727&amp;nbsp; I popped it into OSRA &lt;a href="http://cactus.nci.nih.gov/cgi-bin/osra/index.cgi"&gt;Optical Structure Recognition&lt;/a&gt;,&amp;nbsp; added the missing S to get&amp;nbsp; O=C(O)/C=C\3CN(C(C(=O)C1CC1)c2ccccc2F)CCC3S and then, bingo, hit &lt;a href="http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=search&amp;amp;db=pccompound&amp;amp;term=10405534[uid]"&gt;CID 10405534&lt;/a&gt; (parent) and&amp;nbsp;&lt;a href="http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=search&amp;amp;db=pccompound&amp;amp;term=11155476[uid]"&gt;CID 11155476&lt;/a&gt; as the triflouracetate.&amp;nbsp; The parent &amp;nbsp;had been submitted as R-99224 (??) from a different chemical supplier.&amp;nbsp;&amp;nbsp; OK, &amp;nbsp;so back to PubMed where R-99224 hits&amp;nbsp;&lt;a href="http://www.ncbi.nlm.nih.gov/pubmed/11156560"&gt;PMID 11156560&lt;/a&gt; and maps between&amp;nbsp; the Sankyo code name CS-747 aka LY640315 from Lilly and R-99224. This fits with the fact that, via Chem Spider&amp;nbsp; &lt;a href="http://www.chemspider.com/Chemical-Structure.8580972.html"&gt;8580972&lt;/a&gt;,&amp;nbsp; there are two Sankyo patent hits in SureChem.&amp;nbsp; So are the pro-drug, drug and metabolites linked ?&amp;nbsp; Nominally - yes, via name mappings in PubMed, but not as structure pointers in PubChem (from a quick pop I was unable to make a substructure search join either). &amp;nbsp;So, does R-138727 = R-99224 ? - maybe but the catalogue CAS nos are different ….. I gave up at this point (not having “SciFinder home edition”….)&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;span style="font-size: large;"&gt;.&lt;/span&gt;&lt;span lang="EN-GB" style="font-size: large;"&gt; The case with&amp;nbsp;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=104888&amp;amp;loc=ec_rcs"&gt;artemether&lt;/a&gt; seems equally complex because of the many analogues and tautomers. A PubMed search indicates the prodrug-to-drug of &lt;i&gt;&lt;b style="color: black;"&gt;artemether&lt;/b&gt;&lt;/i&gt; to dihydroartemisinin, with the latter name-mapping to &lt;a href="http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=search&amp;amp;db=pccompound&amp;amp;term=107770[uid]"&gt;CID 10777.&lt;/a&gt;&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;ul&gt;&lt;li class="MsoNormal"&gt;&lt;span lang="EN-GB" style="font-size: large;"&gt;&amp;nbsp;&lt;/span&gt;&lt;link href="file:///C:%5CUsers%5CCHRIST%7E1%5CAppData%5CLocal%5CTemp%5Cmsohtml1%5C01%5Cclip_filelist.xml" rel="File-List"&gt;&lt;/link&gt;&lt;style&gt;
&lt;!--
 /* Font Definitions */
 @font-face
	{font-family:SimSun;
	panose-1:2 1 6 0 3 1 1 1 1 1;
	mso-font-alt:宋体;
	mso-font-charset:134;
	mso-generic-font-family:auto;
	mso-font-pitch:variable;
	mso-font-signature:3 680460288 22 0 262145 0;}
@font-face
	{font-family:"\@SimSun";
	panose-1:2 1 6 0 3 1 1 1 1 1;
	mso-font-charset:134;
	mso-generic-font-family:auto;
	mso-font-pitch:variable;
	mso-font-signature:3 680460288 22 0 262145 0;}
 /* Style Definitions */
 p.MsoNormal, li.MsoNormal, div.MsoNormal
	{mso-style-parent:"";
	margin:0cm;
	margin-bottom:.0001pt;
	mso-pagination:widow-orphan;
	font-size:12.0pt;
	font-family:"Times New Roman";
	mso-fareast-font-family:SimSun;}
@page Section1
	{size:612.0pt 792.0pt;
	margin:72.0pt 90.0pt 72.0pt 90.0pt;
	mso-header-margin:36.0pt;
	mso-footer-margin:36.0pt;
	mso-paper-source:0;}
div.Section1
	{page:Section1;}
--&gt;
&lt;/style&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;&lt;span lang="EN-GB"&gt;Asssays and other bioactive links.&amp;nbsp; &lt;/span&gt;&lt;/b&gt;&lt;/span&gt;&lt;link href="file:///C:%5CUsers%5CCHRIST%7E1%5CAppData%5CLocal%5CTemp%5Cmsohtml1%5C01%5Cclip_filelist.xml" rel="File-List"&gt;&lt;/link&gt;&lt;style&gt;
&lt;!--
 /* Font Definitions */
 @font-face
	{font-family:SimSun;
	panose-1:2 1 6 0 3 1 1 1 1 1;
	mso-font-alt:宋体;
	mso-font-charset:134;
	mso-generic-font-family:auto;
	mso-font-pitch:variable;
	mso-font-signature:3 680460288 22 0 262145 0;}
@font-face
	{font-family:"\@SimSun";
	panose-1:2 1 6 0 3 1 1 1 1 1;
	mso-font-charset:134;
	mso-generic-font-family:auto;
	mso-font-pitch:variable;
	mso-font-signature:3 680460288 22 0 262145 0;}
 /* Style Definitions */
 p.MsoNormal, li.MsoNormal, div.MsoNormal
	{mso-style-parent:"";
	margin:0cm;
	margin-bottom:.0001pt;
	mso-pagination:widow-orphan;
	font-size:12.0pt;
	font-family:"Times New Roman";
	mso-fareast-font-family:SimSun;}
@page Section1
	{size:612.0pt 792.0pt;
	margin:72.0pt 90.0pt 72.0pt 90.0pt;
	mso-header-margin:36.0pt;
	mso-footer-margin:36.0pt;
	mso-paper-source:0;}
div.Section1
	{page:Section1;}
--&gt;
&lt;/style&gt;&lt;span lang="EN-GB" style="font-size: large;"&gt;&amp;nbsp;Only&amp;nbsp;&lt;a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=163701&amp;amp;loc=ec_rcs"&gt;milnacipran&lt;/a&gt; has made it into the&amp;nbsp;&lt;a href="http://www.ncbi.nlm.nih.gov/sites/entrez?term=%22MLSMR%22[sourcename]&amp;amp;cmd=search&amp;amp;db=pcsubstance"&gt;Molecular Libraries Small Molecule Repository&lt;/a&gt; &amp;nbsp; &lt;/span&gt;&lt;link href="file:///C:%5CUsers%5CCHRIST%7E1%5CAppData%5CLocal%5CTemp%5Cmsohtml1%5C01%5Cclip_filelist.xml" rel="File-List"&gt;&lt;/link&gt;&lt;style&gt;
&lt;!--
 /* Font Definitions */
 @font-face
	{font-family:SimSun;
	panose-1:2 1 6 0 3 1 1 1 1 1;
	mso-font-alt:宋体;
	mso-font-charset:134;
	mso-generic-font-family:auto;
	mso-font-pitch:variable;
	mso-font-signature:3 680460288 22 0 262145 0;}
@font-face
	{font-family:"\@SimSun";
	panose-1:2 1 6 0 3 1 1 1 1 1;
	mso-font-charset:134;
	mso-generic-font-family:auto;
	mso-font-pitch:variable;
	mso-font-signature:3 680460288 22 0 262145 0;}
 /* Style Definitions */
 p.MsoNormal, li.MsoNormal, div.MsoNormal
	{mso-style-parent:"";
	margin:0cm;
	margin-bottom:.0001pt;
	mso-pagination:widow-orphan;
	font-size:12.0pt;
	font-family:"Times New Roman";
	mso-fareast-font-family:SimSun;}
@page Section1
	{size:612.0pt 792.0pt;
	margin:72.0pt 90.0pt 72.0pt 90.0pt;
	mso-header-margin:36.0pt;
	mso-footer-margin:36.0pt;
	mso-paper-source:0;}
div.Section1
	{page:Section1;}
--&gt;
&lt;/style&gt;&lt;span lang="EN-GB" style="font-size: large;"&gt;and therefore spawned cross-screening data from the NIH network. In general new approvals are unlikely to be in the collection&amp;nbsp; but this is a new indication &lt;/span&gt;&lt;span lang="EN-GB" style="font-size: large;"&gt;(a nice repurposing example?) &lt;/span&gt;&lt;span lang="EN-GB" style="font-size: large;"&gt;for an older compound already available from chemical suppliers.&amp;nbsp; The substance went in (by SID date) in 2007 and again in 2008 (replenishment?).&amp;nbsp; Over these three years it accumulated 238 assay results (which seem to be mostly direct rather than literature extracted) but only one “hit” from&amp;nbsp; &lt;/span&gt;&lt;span style="font-size: large;"&gt;&lt;a href="http://www.ncbi.nlm.nih.gov/sites/entrez?db=pccompound&amp;amp;cmd=Link&amp;amp;LinkName=pccompound_pcassay_active&amp;amp;from_uid=163701"&gt;AID  2830&lt;/a&gt;&amp;nbsp;&lt;/span&gt;&lt;span lang="EN-GB" style="font-size: large;"&gt; an assay for&amp;nbsp; &lt;/span&gt;&lt;link href="file:///C:%5CUsers%5CCHRIST%7E1%5CAppData%5CLocal%5CTemp%5Cmsohtml1%5C01%5Cclip_filelist.xml" rel="File-List"&gt;&lt;/link&gt;&lt;style&gt;
&lt;!--
 /* Font Definitions */
 @font-face
	{font-family:SimSun;
	panose-1:2 1 6 0 3 1 1 1 1 1;
	mso-font-alt:宋体;
	mso-font-charset:134;
	mso-generic-font-family:auto;
	mso-font-pitch:variable;
	mso-font-signature:3 680460288 22 0 262145 0;}
@font-face
	{font-family:"\@SimSun";
	panose-1:2 1 6 0 3 1 1 1 1 1;
	mso-font-charset:134;
	mso-generic-font-family:auto;
	mso-font-pitch:variable;
	mso-font-signature:3 680460288 22 0 262145 0;}
 /* Style Definitions */
 p.MsoNormal, li.MsoNormal, div.MsoNormal
	{mso-style-parent:"";
	margin:0cm;
	margin-bottom:.0001pt;
	mso-pagination:widow-orphan;
	font-size:12.0pt;
	font-family:"Times New Roman";
	mso-fareast-font-family:SimSun;}
@page Section1
	{size:612.0pt 792.0pt;
	margin:72.0pt 90.0pt 72.0pt 90.0pt;
	mso-header-margin:36.0pt;
	mso-footer-margin:36.0pt;
	mso-paper-source:0;}
div.Section1
	{page:Section1;}
--&gt;
&lt;/style&gt;&lt;span lang="EN-GB" style="font-size: large;"&gt;“Identification of small molecules that induce b-cell replication in the MIN-6 cell line”.&amp;nbsp; The significance is unclear because this only in a primary screen with a 2% hit rate but maybe someone at Forest Labs has followed-up. &amp;nbsp;&lt;/span&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;&lt;span lang="EN-GB"&gt; &lt;/span&gt;&lt;/b&gt;&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;span lang="EN-GB" style="font-size: large;"&gt;&lt;a href="http://www.ebi.ac.uk/chembldb/index.php/compound/inspect/353595"&gt;  &lt;/a&gt;&lt;/span&gt;&lt;br /&gt;
&lt;ul&gt;&lt;li class="MsoNormal"&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;&lt;span lang="EN-GB"&gt;Pain in the …&lt;/span&gt;&lt;/b&gt;&lt;/span&gt;&lt;span lang="EN-GB" style="font-size: large;"&gt; Could &lt;a href="http://chem.sis.nlm.nih.gov/chemidplus/chemidheavy.jsp"&gt;ChemIDPLus&lt;/a&gt;      please fix the state problem so we are not link-locked when we get there      from PubChem ?&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;ul&gt;&lt;li class="MsoNormal"&gt;&lt;span lang="EN-GB" style="font-size: large;"&gt;&amp;nbsp;&lt;/span&gt;&lt;span lang="EN-GB" style="font-size: large;"&gt;&lt;b&gt;Overall&amp;nbsp; Impressions&lt;/b&gt;.&amp;nbsp; The utility and value of checking&amp;nbsp; new drugs in this way is clear.&amp;nbsp; However, they present a complex “linkage spaghetti” that&amp;nbsp; also seems oddly patchy,&amp;nbsp; with some connections we might not have expected while others we would have predicted have no links (yet).&amp;nbsp; This is likely to be more to do with variable capture by the sources and the fact that, while they crossed the finishing-line within the same year, they had very different starting times.&amp;nbsp;&amp;nbsp; I’ll check for changes in this set when I run the class of 2010. &lt;/span&gt;&lt;span lang="EN-GB" style="font-size: large;"&gt;I have stepped throught &lt;/span&gt;&lt;span lang="EN-GB" style="font-size: large;"&gt;&lt;a href="http://www.accessdata.fda.gov/scripts/cder/drugsatfda/index.cfm?fuseaction=Reports.ReportsMenu"&gt;Drugs-at-FDA&lt;/a&gt;&lt;/span&gt;&lt;span lang="EN-GB" style="font-size: large;"&gt; for the first half of 2010 but its looking thin so far on  the chemistry and targets side.&amp;nbsp;&lt;/span&gt;&lt;span lang="EN-GB" style="font-size: large;"&gt;&amp;nbsp;&amp;nbsp; &lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;&lt;/div&gt;&lt;div style="font-family: inherit;"&gt;&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;/div&gt;&lt;div style="font-family: inherit;"&gt;&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;/div&gt;&lt;div style="font-family: inherit;"&gt;&lt;span style="font-size: large;"&gt;&lt;br /&gt;
&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;
&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2155351992730855318-6673480746695867933?l=cdsouthan.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://cdsouthan.blogspot.com/2010/05/2009-fda-approvals-in-pubchem.html</link><author>noreply@blogger.com (Chris Southan)</author><thr:total>0</thr:total></item></channel></rss>

