
Screen capture of Chad Nusbaum, courtesy of YouTube.
Product launches are exciting things, and the Ion Torrent Proton has officially launched (as of the recent Ion World conference September 13-14, 2012). For those who were not able to make it to San Francisco for that two-day event, we now have the videos of four presentations up on the Ion Torrent YouTube channel, and for those who have complained about the ‘lack of data from the Ion Proton’ there’s quite a lot of ground to cover, and there’s a fair amount of interesting items about the PGM too. The Ion World-specific list of videos are linked here on an Ion Torrent Community page for ease of reference.
Joe Boland M.Sc., NCI Core Genotyping Facility, Advanced Technology Center, Division of Cancer Epidemiology and Genetics
Yes that’s a mouthful, but Joe works in Stephen Chanock’s group. If you are not familiar with Stephen Chanock well you should be; as a matter of fact his recent editorial on the ENCODE publications is on my reading list. Looking into cancer susceptibility via GWAS, they genotype on the order of 20,000 samples yearly, and with two Illumina HiSeqs, a Roche 454 / FLX, a Fluidigm Access Array system, a RainDance RDT 1000, and a panel of four PGMs you can say they are also well-equipped to do a lot of next-generation sequencing.
One interesting point (starting around 15 minutes and 30 seconds of a talk entitled “Rapid Innovation and Flexibility of the PGM and Proton Sequencer”) is where their group took 4 PGM 318 runs of whole-exome data (about 5 Gb of data) from a reference HapMap sample that had been deeply sequenced by Complete Genomics to high accuracy, and determined the genotype discordant rate; the same reference sample on the HiSeq 2000 with 5.4 Gb of data had over twice the discordant rate (1.9% vs. 5.4% for all variants, and 1.5% vs. 2.7% for known variants). Why this increase in accuracy of the PGM over HiSeq he did not elaborate (at that point in the presentation he was going pretty quickly) but it is noteworthy.
A few slides later, highlighting their experience with the Custom AmpliSeq (single-tube multiplex PCR making sample prep for targeted resequencing very simple), they had designed 8 panels with a total of 1,121 amplicons (ranging from 21 to 621 per project) and achieved 1,105 of these 1,121 designs to work well across a number of samples, or an empirical success rate of 98.5%. I had not seen these high rates of assay conversion before.
Regarding the Proton, their group received four of them, and he notes that they will be starting to generate data; also they had some trio samples for exome sequencing performed at Life Tech, and showed very good results: at 101x average target coverage, 90 to 92% of the target was covered at >20x.
Tim Triche M.D. Ph.D., Keck School of Medicine, Univ. Southern California, Children’s Hospital Los Angeles
As an expert on long non-coding RNAs and their implication in cancer, Tim’s talk is centered around RNA-Seq and how to use Ion Torrent in that context. He did a set of experiments (around the 9 minute mark, in a talked titled “Total RNA Transcript Analysis on the Ion Torrent™ Platform”) comparing non-purified, Poly-A selected (i.e. positive selection for mRNA), and Ribo-Minus selected (i.e. negative selection for mRNA), and threw in plus or minus size selection on top of these other variables, so there were a lot of items to compare against each other. His results showed an optimum of using Ribo-Minus ribosomal RNA depletion along with size-selection.
He presented the Universal Human Reference RNA-Seq dataset produced on the Ion Torrent Proton (this was the same RNA sample used in the MicroArray Quality Control project, published in 2006). With each replicate sample detecting >75% of all RefSeq genes from the UHR mixture of RNAs, he called it ‘spectacular’, as an expected percentage. The MAQC paper reported a detection of about 9,000 to 12,000 RefSeq genes; the Proton data consistently detected over 16,000.
His other work on a 400kb long ncRNA has very interesting implications, and is worth a look if you are interested in the world of ncRNA.
Donna Muzny M.Sc., Baylor College of Medicine, Director of Operations, Human Genome Sequencing Center
At Baylor College of Medicine, they have been going through an early-access process that has taken some time. (There was a big splash with a press release back in April 2012, and an ‘unboxing video‘, but precious little news other than commercial announcements of purchases over the summer.) Donna said that Baylor had some 257 Proton runs with their system, but didn’t elaborate on any of the quality of those runs over time. Over the summer months of June through August, there was very little we could share with customers, other than the fact that the internal focus on development activity was intense. We on the commercial side (sales and marketing) basically stayed out of the way.
Donna showed an interesting slide comparing readlength and quality over time from 200 base-pair readlengths and Torrent Server 2.2 software analysis version, to 300 bp under Torrent Server 3.0, and a tantalizing line in green overlaying preliminary results with ‘300 dev’ and TS 3.0, which I suspect is the 400 base-pair improvement on the PGM which is expected late this year or early in 2013.
Another slide to mention is a Rhodobacter species which is a standard platform evaluation tool, as Rhodobacter has a 68% G-C nucleotide content. In a stretch where the G-C content rises to >80%, she compares ‘current methods’ 200bp reads to 300bp reads and OneTouch 2 that ships with the Proton, and at the same 60x read coverage the improvement is obvious. (The OneTouch 2 can process templates usable on the PGM as well as the Proton.)
Baylor’s Proton data centered around a Charcot Marie Tooth sample they published on in 2010 (James Lupski is a prominent genetics researcher who has been working on CMT and happens to be personally affected by it, along with several of his family members, and published this NEJM paper using SOLiD WGS). They had 12Gb of exome data from two Proton runs, 92.1% of the exome was callable (presumably >20x coverage), and only 46 discordant sites (0.17%). The two causative mutations identified in 2010 (in the SH3TC2 gene) were successfully identified.
Chad Nusbaum Ph.D., MIT Broad Institute, Co-Director, Genome Sequencing and Analysis Program
Chad is one of those people you want to listen to when talking next-generation sequencing platforms, because the Broad does both a tremendous amount of NGS, as well as invests the time and resources into optimizing process as well as understanding system variation, and develops automation in addition to applying all this to world-class genomic science. (I’d mention here that if you have time for only one of the Ion World presentations, make time for this one, called “Implementing and Applying Ion Torrent™ Technology at the Broad Institute”).
One interesting slide describing the improvements over time (and Torrent Server software iterations) illustrated the jump in quality from TS 1.5 to TS 2.0, not by improving the already-good mismatch error rate, but by simultaneously reducing the insertion and deletion error rate. (All NGS platforms suffer from sensitivity problems in detecting what is known as InDels.) And going from TS 2.0 to TS 3.0, the insertion error rate lowers to match the already-low mismatch rate, with the deletion error rate not far behind.
Chad makes the comment that he feels good that ‘solving this problem is a software problem’, rather than a solution that would have a larger impact on his sequencing operations.
He also showed a slide with a 400 base-pair readlengths on the PGM in development, with a mode peak at above 450 base-pairs, with the main distribution from 350 to 650.
His Proton data comparison was from one run on a 600kb region (in a collaboration between Bill Biggs at Aviir in Palo Alto and Mark DePristo from the Broad) that had a known set of mutations. This region (from 1000 Genomes Project data of a CEPH trio) via PGM found 149/153 variants, with very good sensitivity and concordance, and no Mendelian violation errors (i.e. called variants that due to the mother / father variation is impossible in the child). But on the PGM, there were problems in calling indels correctly; there are three known indels in the sample, but 737 were called; using hard filtering that number reduced to an overwhelming 91. Looking into the data further, it is an issue of alignment and calling of homopolymer regions.
Moving onto the Proton data, the SNP calls were identical to the 1000GP “truth set”, on a slide entitled “Significant Improvements with Proton Data”. (He points out there were zero false positive and zero false negative calls.) For the indels, all three were correctly found and located, also zero false positive and zero false negatives. The three indel mutations were an AAC insertion, a T deletion in a run of 10 T’s, and a T insertion in a run of 14 T’s. For the latter two, it was very close to being called correctly, and he looks at it from the read level, so while the Proton data with TS 3.0 did not get these two calls correct, getting close is good progress.
He also discusses additional ChIP experiments on the PGM. At AGBT last February in Marco Island, there was an intriguing PGM ChIP poster that concluded that 2M PGM reads gave better data than 15M short reads, and resulted in a nice whitepaper published by the Broad (apparently this has been taken down from their website, but available from me upon request). He elaborated on this work at Ion World, confirming that the data was better at 2M compared to 15M on another platform, but without an explanation for it. (It isn’t just a matter of mappability, nor library / sample preparation, as these are side-by-side comparisons of the same sample, rather a function of something in the sequencing method that results in great sensitivity for peak calling and identification.) Downsampled HiSeq reads to 2M meant meaningless data, as the signal was lost at that level entirely, and the input amount was successful down to 0.5ng of input material. This work was then applied to a melanoma sample, with some nice results.
So there you have it, the first four presentations from Ion World, and likely more to be posted in the coming weeks.