Error, alignment, and the myth of the complete genome

Centaur by {a href=”http://www.flickr.com/photos/consciousvision/”}JustMN{/a} via Flickr.

The myth of the complete genome is something that is not commonly known to active observers of genomic technologies. (The term ‘active observer’ is from the point of view of one with varying degrees of background in the biological sciences, and is in noway an aspersion.) The ‘first draft’ of the human genome was announced at a Clinton-era press conference on June 26, 2000, and it was an agreement between the two famously competitive individuals (Francis Collins and Craig Venter) representing the public (NIH and DOE) effort and the private one (Celera). This first draft was exactly that – about 90% complete, and the completed version was declared in 2003. This is not to discount the first seminal publications of this draft, as in 2001 when these papers were published (in Science and Nature respectively) the largest previously sequenced genome was 1/25th the size. In other words, the human genome represented a 25-fold leap in size and complexity of anything done to-date.

Read more

Oxford Nanopore, the first nanopore-based sequencing technology

Oxford Nanopore illustration
Image from {a href=”http://www.nanoporetech.com”}Oxford Nanopore{/a}

Oxford Nanopore, based in Oxford U.K., made a remarkable announcement that surprised many in February’s AGBT meeting in Marco Island. A GridION and MiniION single-molecule sequencers were announced, promising 15 minute runtimes, no sample preparation, and a disposable USB-stick sequencer for $900 (in the case of the MiniION), with 50kb long readlengths (and 100kb promised) at only a 4% error rate it appears to be a dream come true for many research challenges that await.

Read more

What’s so special about single molecule sequencing?

Photo courtesy of {a href=”http://www.flickr.com/photos/kyz/”}kyz{/a} via Flickr Creative Commons.

A few days ago I reviewed in brief the history of Helicos Biosciences (HCLS), a company that held out the promise of single molecule sequencing, but failed to deliver on several fronts to the next-generation sequencing market. (This would include accuracy, throughput per dollar, and ease of use / reliability was yet another factor.)

But why is single molecule sequencing so attractive in the first place? What can it do that other technologies cannot?

Read more

QIAGEN and NGS – the Intelligent Bio-Systems Acquisition

Recently I was asked what Roche would purchase when they said publicly that they “would not revisit Illumina, and will pursue smaller takeovers”, and I answered there were a few small development companies out there but even fewer with something ready to sell. (Roche isn’t known for development of NGS as witnessed with their 454 / Curagen acquisition.) And among the firms that was preparing a launch, including GnuBio and Oxford Nanopore, was a small company called Intelligent Bio-Systems.

Read more

Helicos Single Molecule Sequencing – A Pioneer

The next-generation sequencing market continues its downward trajectory – routinely violating Moore’s Law by an estimated 3x, the cost per megabase curve started to significantly bend downward around 2007 when the Solexa 1G started selling in volume, and gave the 454 GS20 (as it was known then) the first competition for massively parallel sequencing the market had seen.

Read more

What’s so special about a $1,000 genome?

Sanger Wellcome Trust - book of the human genome via {a href="http://www.flickr.com/photos/eibar/"}Flickr{/a}.

In every technological revolution, there is a first seminal breakthrough, a burst of commercial activity from many individual companies, and then the eventual maturing of a market, of standards, and the discovery of new uses for the technology in often surprising ways.

Read more

Complete Genomics and the Whole Genome Sequencing market

Complete Genomics Logo

Complete Genomics is a startup business founded upon a particular idea – that the whole genome sequencing of human individuals is going to be industrialized, commonplace, and have such clinical utility so as to become the dominant application for next-generation sequencing. (Disclosure – I have no financial interest in this company, just an interested observer.)

Read more

Next Generation Sequencing – Sequencing by Ligation

Image of the Life Technologies 5500xl

If you’ve been following thus far, we’ve covered sequencing by pyrophosphate detection, sequencing by reversible terminators, and now we have sequencing by ligation. Note that the term ‘sequencing by synthesis’ is not used here (although Illumina likes to use the term and names some of their reagents ‘SBS’ accordingly) as all three methods use synthesis of a corresponding strand to determine the sequence of bases.

Read more

Next Generation Sequencing – Sequencing by Reversible Terminators

Genome Analyzer at the Wellcome Trust via {a href="http://commons.wikimedia.org/wiki/File:GA2.JPG"}Wikimedia. {/a}

As mentioned previously, there are three main methods of sequencing, the first being the pyrophosphate detection approach, and here is the second (and most popular) approach, using reversible terminators.

Given a set of amplified template molecules (remember there are millions to many hundreds of millions of these discrete ‘clusters’ of molecules) and a sequencing primer hybridized to one end of the adapter on each molecule, a mixture of modified deoxynucleotides are added. These nucleotides can only extend a single base, just like the Sanger method, and are modified in that each nucleotide is labeled with a fluorescent molecule. Since there is a protecting group on the 3′ hydroxyl end, a single nucleotide is added regardless of how many of the same nucleotide is added, thus if there were a stretch of say four A bases in a row, only one A base would be added in each sequencing round.

Read more

Next Generation Sequencing – Sequencing by Pyrophosphate Release

Image courtesy of {a href="http://commons.wikimedia.org/wiki/File:Pyrophosphate-3D-balls.png"}Wikimedia commons.{/a}

After preparation of the library (and careful quantitation) and preparation of the amplified template comes the main event: the sequencing itself. While there are several methods available, the methods can be divided into three broad divisions.

The three divisions are (firstly) Pyrophosphate Release (named for the original patent by Mostafa Ronaghi and others in 1998 when he was a graduate student at the Royal Institute of Technology in Stockholm); this is the method that uses individual nucleotide flow across all templates, and then detects the signal. (Pyrosequencing – now owned by QIAGEN – detects pyrophosphate, but is not a ‘next-generation’ sequencer as it is not massively parallel; however Roche / 454 FLX used essentially the same method.) Jonathan Rothberg, who parallelized the FLX pyrosequencing method at Curagen, simply changed the detection method with Ion Torrent.

Read more

Next Generation Sequencing – Template Preparation

Image courtesy {a href="http://www.flickr.com/photos/homard/"}homard{/a} via Flickr.

After a library is properly prepared, (remember it can be from many sources – randomly sheared genomic DNA, cDNA from a small RNA sample, an immunoprecipitated sample) the library molecules need to be amplified in some manner, before the sequencing takes place. Thus there is a critical need for accurate quantitation of the library DNA, whose importance can be overlooked.

Read more

Next Generation Sequencing – Library Preparation

Image via Flickr courtesy {a href="http://www.flickr.com/photos/ccacnorthlib/"}CCAC North Library{/a}

Looking at sequencing from one perspective, library preparation is straightforward. Sequencing a genome (whether bacterial on the order of 5 million bases or a human at 3 billion bases) is a shotgun-based affair with tens of millions to tens of billions of reads that overlap multiple times across the genome (known as ‘fold coverage’). (Thus a 30x human genome coverage would require some 90 billion bases, or a 15x coverage of each haploid allele.) Multiple random start points, 30-fold coverage across the entire genomic sample, one takes a gDNA sample, randomly shears it, attaches synthetic adapters, and off you go following the manufacturer’s protocol on getting sequence data out, whether by Roche / 454, Illumina GAIIx or HiSeq 2000, Life Technologies SOLiD or 5500xl, Pacific BioSciences RS, Illumina MiSeq, Life Technologies Ion Torrent PGM…

Read more