Error, alignment, and the myth of the complete genome

Centaur by {a href=”http://www.flickr.com/photos/consciousvision/”}JustMN{/a} via Flickr.

The myth of the complete genome is something that is not commonly known to active observers of genomic technologies. (The term ‘active observer’ is from the point of view of one with varying degrees of background in the biological sciences, and is in noway an aspersion.) The ‘first draft’ of the human genome was announced at a Clinton-era press conference on June 26, 2000, and it was an agreement between the two famously competitive individuals (Francis Collins and Craig Venter) representing the public (NIH and DOE) effort and the private one (Celera). This first draft was exactly that – about 90% complete, and the completed version was declared in 2003. This is not to discount the first seminal publications of this draft, as in 2001 when these papers were published (in Science and Nature respectively) the largest previously sequenced genome was 1/25th the size. In other words, the human genome represented a 25-fold leap in size and complexity of anything done to-date.

Read more

Oxford Nanopore, the first nanopore-based sequencing technology

Oxford Nanopore illustration
Image from {a href=”http://www.nanoporetech.com”}Oxford Nanopore{/a}

Oxford Nanopore, based in Oxford U.K., made a remarkable announcement that surprised many in February’s AGBT meeting in Marco Island. A GridION and MiniION single-molecule sequencers were announced, promising 15 minute runtimes, no sample preparation, and a disposable USB-stick sequencer for $900 (in the case of the MiniION), with 50kb long readlengths (and 100kb promised) at only a 4% error rate it appears to be a dream come true for many research challenges that await.

Read more

QIAGEN and NGS – the Intelligent Bio-Systems Acquisition

Recently I was asked what Roche would purchase when they said publicly that they “would not revisit Illumina, and will pursue smaller takeovers”, and I answered there were a few small development companies out there but even fewer with something ready to sell. (Roche isn’t known for development of NGS as witnessed with their 454 / Curagen acquisition.) And among the firms that was preparing a launch, including GnuBio and Oxford Nanopore, was a small company called Intelligent Bio-Systems.

Read more

Helicos Single Molecule Sequencing – A Pioneer

The next-generation sequencing market continues its downward trajectory – routinely violating Moore’s Law by an estimated 3x, the cost per megabase curve started to significantly bend downward around 2007 when the Solexa 1G started selling in volume, and gave the 454 GS20 (as it was known then) the first competition for massively parallel sequencing the market had seen.

Read more

What’s so special about a $1,000 genome?

Sanger Wellcome Trust - book of the human genome via {a href="http://www.flickr.com/photos/eibar/"}Flickr{/a}.

In every technological revolution, there is a first seminal breakthrough, a burst of commercial activity from many individual companies, and then the eventual maturing of a market, of standards, and the discovery of new uses for the technology in often surprising ways.

Read more

Complete Genomics and the Whole Genome Sequencing market

Complete Genomics Logo

Complete Genomics is a startup business founded upon a particular idea – that the whole genome sequencing of human individuals is going to be industrialized, commonplace, and have such clinical utility so as to become the dominant application for next-generation sequencing. (Disclosure – I have no financial interest in this company, just an interested observer.)

Read more

Next Generation Sequencing – Sequencing by Pyrophosphate Release

Image courtesy of {a href="http://commons.wikimedia.org/wiki/File:Pyrophosphate-3D-balls.png"}Wikimedia commons.{/a}

After preparation of the library (and careful quantitation) and preparation of the amplified template comes the main event: the sequencing itself. While there are several methods available, the methods can be divided into three broad divisions.

The three divisions are (firstly) Pyrophosphate Release (named for the original patent by Mostafa Ronaghi and others in 1998 when he was a graduate student at the Royal Institute of Technology in Stockholm); this is the method that uses individual nucleotide flow across all templates, and then detects the signal. (Pyrosequencing – now owned by QIAGEN – detects pyrophosphate, but is not a ‘next-generation’ sequencer as it is not massively parallel; however Roche / 454 FLX used essentially the same method.) Jonathan Rothberg, who parallelized the FLX pyrosequencing method at Curagen, simply changed the detection method with Ion Torrent.

Read more

Next Generation Sequencing – Template Preparation

Image courtesy {a href="http://www.flickr.com/photos/homard/"}homard{/a} via Flickr.

After a library is properly prepared, (remember it can be from many sources – randomly sheared genomic DNA, cDNA from a small RNA sample, an immunoprecipitated sample) the library molecules need to be amplified in some manner, before the sequencing takes place. Thus there is a critical need for accurate quantitation of the library DNA, whose importance can be overlooked.

Read more

Next Generation Sequencing – Library Preparation

Image via Flickr courtesy {a href="http://www.flickr.com/photos/ccacnorthlib/"}CCAC North Library{/a}

Looking at sequencing from one perspective, library preparation is straightforward. Sequencing a genome (whether bacterial on the order of 5 million bases or a human at 3 billion bases) is a shotgun-based affair with tens of millions to tens of billions of reads that overlap multiple times across the genome (known as ‘fold coverage’). (Thus a 30x human genome coverage would require some 90 billion bases, or a 15x coverage of each haploid allele.) Multiple random start points, 30-fold coverage across the entire genomic sample, one takes a gDNA sample, randomly shears it, attaches synthetic adapters, and off you go following the manufacturer’s protocol on getting sequence data out, whether by Roche / 454, Illumina GAIIx or HiSeq 2000, Life Technologies SOLiD or 5500xl, Pacific BioSciences RS, Illumina MiSeq, Life Technologies Ion Torrent PGM…

Read more

Next Generation Sequencing – A Few Fundamental Concepts

Image courtesy of {a href="http://www.flickr.com/photos/pacoseoaneperez/"}Paco Seone{/a} via Flickr

As I mentioned in my prior post, Sanger capillary sequencing is not going away anytime soon. Yet next-generation sequencing has made a huge mark in the world – growing from zero in 2005 to a USD $1 Billion market in 2012. And its growth is estimated by various sources to grow 20 to 25% every year for the next five years, approximately tripling in size from where we are now.

Read more

Next-Generation Sequencing – its historical context

Photo of J. Craig Venter Inst. circa 2005 by {a href="http://www.flickr.com/photos/jurvetson/"}jurvetson{/a} via Flickr.

Even though the history of next-generation sequencing is short (the 454 GS20 came out in 2005, the Solexa 1G in 2007, and the SOLiD 2 in 2008), there is a robust genomic revolution going on, and a fierce battle in the marketplace with plummeting costs and soaring throughput. Whether Moore’s Law is beat by some 2.5-fold or even faster, there is no question that we are in the middle of burgeoning growth, remarkable discovery, and new insights and discoveries just about every day.

Read more

The whole-exome vs. whole-genome sequencing debate

By Sarah Kusala via {a href="http://commons.wikimedia.org/wiki/File:In_solution_capture.png"}Wikimedia Commons{/a}

An enterprising salesperson from Complete Genomics used this newfangled social media thing called LinkedIn to make her mark on the world (perhaps) by posing a discussion question. (It was over at the ‘Genome Interpretation‘ group in case you were wondering.) Entitled, “The last days of exome sequencing“, she posed the question whether exome sequencing day’s were numbered.

Read more