It seems like everyone is looking at structural variant detection these days.
We recently had a visit from Ben Raphael, a friend of the genome center whom we tried to recruit years ago when he was a postdoc. Now he heads a group at Brown University, where (by his own admission) they basically taps into some of the large datasets out there (like TSP and TCGA) and develop/apply their own algorithms. Ben gave a talk on structural variation in human and cancer genomes, in which he presented some of the work that he and colleagues have pioneered in End Sequence Profiling (ESP).
Who is this guy?
Ben’s main background is in mathematics and computer science. The cancer research came later, when (in 2003) a group at UCSF approached him with a cancer genome sequence that had seen massive rearrangement. They developed a way to reconstruct the tumor genome architecture and published the results in Bioinformatics in 2004. Incidentally, this work of Ben’s was profiled when he was named one of Tomorrow’s PIs by Genome Technology. The GT article came out in late 2006, a time when I was very interested in SV, and I remember thinking “who is this guy?”
End Sequence Profiling in Cancer
When I think of ESP, I tend to think of the Tuzun et al 2004 paper, as many people in the field do. There was, however, a study published a year earlier (in 2003) on ESP as an approach for sequence-based analysis of rearranged genomes. The idea is to sequence 500 bp at each end of clones (100-250kbp in size) and then apply a geometric clustering algorithm to look for rearrangements. Ben Raphael’s group applied this method to BRCA cell lines as well as primary tumors (breast, prostate, ovarian, and brain cancers). The principal goal was to identify fusion genes (like the widely known Philadelphia chromosome). In studies published this year, Ben’s group did find rearrangements that created fusion genes, though none appeared to be transcribed.
ESP compared to CGH
Ben’s group compared their findings to competitive genome hybridization (CGH) array results and found a “statistically significant” amount of overlap in rearrangements predicted by both methods (Agilent 244K CGH arrays and 150K ESPs). This past summer, they snagged some of our TCGA glioblastoma data and did the same comparison. In the case of GBM, Ben noted that they found far too many SV’s for them to all be somatic; more likely, most of them are germline variants. As many as 5-20% of them were known inversion polymorphisms, which also seemed high. Nevertheless, I think the audience was impressed by their methods, and my guess would be that invitations to join in the next round of TCGA analysis may be forthcoming.
Zee says
Do they have any publications or more info on their latest protocol?