Photo courtesy of user {a href=”http://www.flickr.com/photos/vizzzual-dot-com/”}viZZZual.com{/a} via Flickr.
Although the whole genome versus whole exome discussion was held previously, details around the methods of selecting out the whole exome have been not discussed (also called ‘targeted selection’), and the wide array of methods, costs, and effort required can be a rather complicated affair.
Image of Pacific Biosciences’ sequencing data, courtesy of a {a href=”http://investor.pacificbiosciences.com/events.cfm”}PacBio{/a} investor presentation.
In previous posts I covered the basics of next-generation sequencing – library preparation, template preparation, and the sequencing methodology itself, whether by pyrophosphate detection, single base extension with reversible terminators, or probe addition by ligation. And single molecule sequencing’s attractiveness as a technology has been covered here, but here I’ll detail how the startup Pacific Biosciences does it’s magic. (For some additional commentary on the company and its prospects check here.)
Centaur by {a href=”http://www.flickr.com/photos/consciousvision/”}JustMN{/a} via Flickr.
The myth of the complete genome is something that is not commonly known to active observers of genomic technologies. (The term ‘active observer’ is from the point of view of one with varying degrees of background in the biological sciences, and is in noway an aspersion.) The ‘first draft’ of the human genome was announced at a Clinton-era press conference on June 26, 2000, and it was an agreement between the two famously competitive individuals (Francis Collins and Craig Venter) representing the public (NIH and DOE) effort and the private one (Celera). This first draft was exactly that – about 90% complete, and the completed version was declared in 2003. This is not to discount the first seminal publications of this draft, as in 2001 when these papers were published (in Science and Nature respectively) the largest previously sequenced genome was 1/25th the size. In other words, the human genome represented a 25-fold leap in size and complexity of anything done to-date.
Image from {a href=”http://www.nanoporetech.com”}Oxford Nanopore{/a}
Oxford Nanopore, based in Oxford U.K., made a remarkable announcement that surprised many in February’s AGBT meeting in Marco Island. A GridION and MiniION single-molecule sequencers were announced, promising 15 minute runtimes, no sample preparation, and a disposable USB-stick sequencer for $900 (in the case of the MiniION), with 50kb long readlengths (and 100kb promised) at only a 4% error rate it appears to be a dream come true for many research challenges that await.
Photo courtesy of {a href=”http://www.flickr.com/photos/kyz/”}kyz{/a} via Flickr Creative Commons.
A few days ago I reviewed in brief the history of Helicos Biosciences (HCLS), a company that held out the promise of single molecule sequencing, but failed to deliver on several fronts to the next-generation sequencing market. (This would include accuracy, throughput per dollar, and ease of use / reliability was yet another factor.)
But why is single molecule sequencing so attractive in the first place? What can it do that other technologies cannot?
Recently I was asked what Roche would purchase when they said publicly that they “would not revisit Illumina, and will pursue smaller takeovers”, and I answered there were a few small development companies out there but even fewer with something ready to sell. (Roche isn’t known for development of NGS as witnessed with their 454 / Curagen acquisition.) And among the firms that was preparing a launch, including GnuBio and Oxford Nanopore, was a small company called Intelligent Bio-Systems.
The next-generation sequencing market continues its downward trajectory – routinely violating Moore’s Law by an estimated 3x, the cost per megabase curve started to significantly bend downward around 2007 when the Solexa 1G started selling in volume, and gave the 454 GS20 (as it was known then) the first competition for massively parallel sequencing the market had seen.
Sanger Wellcome Trust - book of the human genome via {a href="http://www.flickr.com/photos/eibar/"}Flickr{/a}.
In every technological revolution, there is a first seminal breakthrough, a burst of commercial activity from many individual companies, and then the eventual maturing of a market, of standards, and the discovery of new uses for the technology in often surprising ways.
Complete Genomics is a startup business founded upon a particular idea – that the whole genome sequencing of human individuals is going to be industrialized, commonplace, and have such clinical utility so as to become the dominant application for next-generation sequencing. (Disclosure – I have no financial interest in this company, just an interested observer.)
If you’ve been following thus far, we’ve covered sequencing by pyrophosphate detection, sequencing by reversible terminators, and now we have sequencing by ligation. Note that the term ‘sequencing by synthesis’ is not used here (although Illumina likes to use the term and names some of their reagents ‘SBS’ accordingly) as all three methods use synthesis of a corresponding strand to determine the sequence of bases.
Genome Analyzer at the Wellcome Trust via {a href="http://commons.wikimedia.org/wiki/File:GA2.JPG"}Wikimedia. {/a}
As mentioned previously, there are three main methods of sequencing, the first being the pyrophosphate detection approach, and here is the second (and most popular) approach, using reversible terminators.
Given a set of amplified template molecules (remember there are millions to many hundreds of millions of these discrete ‘clusters’ of molecules) and a sequencing primer hybridized to one end of the adapter on each molecule, a mixture of modified deoxynucleotides are added. These nucleotides can only extend a single base, just like the Sanger method, and are modified in that each nucleotide is labeled with a fluorescent molecule. Since there is a protecting group on the 3′ hydroxyl end, a single nucleotide is added regardless of how many of the same nucleotide is added, thus if there were a stretch of say four A bases in a row, only one A base would be added in each sequencing round.
Image courtesy of {a href="http://commons.wikimedia.org/wiki/File:Pyrophosphate-3D-balls.png"}Wikimedia commons.{/a}
After preparation of the library (and careful quantitation) and preparation of the amplified template comes the main event: the sequencing itself. While there are several methods available, the methods can be divided into three broad divisions.
The three divisions are (firstly) Pyrophosphate Release (named for the original patent by Mostafa Ronaghi and others in 1998 when he was a graduate student at the Royal Institute of Technology in Stockholm); this is the method that uses individual nucleotide flow across all templates, and then detects the signal. (Pyrosequencing – now owned by QIAGEN – detects pyrophosphate, but is not a ‘next-generation’ sequencer as it is not massively parallel; however Roche / 454 FLX used essentially the same method.) Jonathan Rothberg, who parallelized the FLX pyrosequencing method at Curagen, simply changed the detection method with Ion Torrent.