According to one thought-leader, it was the highlight of Advances of Genome Biology and Technology 2019 in Marco Island FL
Way back in 1997 Craig Venter and Daniel Cohen (co-founder of Millenium Pharmaceuticals, among other things) wrote an essay entitled ‘The Century of Biology’. Here we are a little under twenty years into this new century, a century of genetics and genomics and truly remarkable technology. As it was said in the essay’s opening paragraph:
If the 20th century was the century of physics, the 21st century will be the century of biology. While combustion, electricity and nuclear power defined scientific advance in the last century, the new biology of genome research – which will provide the complete genetic blueprint of a species, including the human species – will define the next.
From ‘The Century of Biology’, New Perspectives Quarterly
When I first joined the Life Technologies’ NGS specialist team in 2010, a friend who was conducting the training said something memorable about the pace of technological change form providers of genetic analysis equipment. “Where there is a financial incentive, miracles happen.” During Illumina’s lunchtime workshop the last day of AGBT, Dr. Gary Schroth (Illumina VP) presented the following slide illustrating the amounts of sequencing capacity shipped since 2014, a calculation of the capacity of a given instrument multiplied by their sales volumes across different instruments and added together. (Note: it isn’t clear whether the Y-axis is sequencing capacity per year or per quarter.)
A notable metric is that this metric only starts four years ago, the first quarter of 2014 through the last quarter of 2018. If you call the first quarter of 2014 at about 8,000,000 Gigabases of output (or 8 Petabases), the last quarter of 2018 is all the way up to 118,000,000 Gigabases of output (or 118 Petabases). That is a 14.75-fold increase in four years, or a CAGR (compound annual growth rate) of 96%. In other words, this is a doubling of sequencing capacity being shipped every twelve months – not counting the capacity of any existing sequencing equipment that may be in use.
The BGI Revolocity at ICG-10 in 2015
Way back in 2015 when I was with SeraCare launching NGS0-based reference standard materials, I had the privilege of traveling to Shenzhen China (BGI Headquarters) to their annual conference called the International Congress of Genomics. Of the many things that stand out in my memory four years later was meeting several inventors of genetic and genomic analysis technologies over the years, including Charles Cantor (co-founder of Sequenom) and Rade Drmanac (co-founder of Complete Genomics). It was there that I learned first-hand details about the Revolocity, a $12M system designed to compete against the Hi-Seq X10, with a throughput capable of 10,000 whole genomes per year, or about 200 samples per week.
Having sold a handful of systems, and starting to install this complex system (the Revolocity required 1500 square feet of space in addition to a full-time, on-site engineer from the company), BGI decided only a month later in November 2015 to cancel the sale, marketing and production of this high-end sequencing system, as well as layoff a substantial number of the workforce of Complete Genomics. And less than four years later, remarkably, a new BGI subsidiary MGI announces the first details of the T-7, the Revolocity’s successor.
The MGISEQ-T7 announced October 2018 with a few details
At the ICG-12 in October 2018 (only three years since I attended the ICG-10 event with the Revolocity) the MGISEQ-T7 was announced. Using the same nanoball technology as the prior system, the capacity is now twice what the Revolocity was, to 20,000 whole genomes per year, 1 to 6 Terabases per day. Four independently-run flow-cells can run independent applications simultaneously and can be started and stopped independent of the other.
At the January JP Morgan Healthcare conference MGI said that not only have they reached a milestone of selling 1,000 NGS instruments (the original BGISEQ-50 and BGISEQ-500 have been modernized and upgraded to the MGISEQ-200 and MGISEQ-2000 respectively), but that they sold their first MGISEQ-T7 to their first customer, a direct-to-consumer company called WeGene based in Shenzhen China, for the Chinese market. (GenomeWeb Premium has a piece about WeGene here, subscription required.)
The surprising MGISEQ-T7 at #AGBT19
Dr. Roy Tan, the General Manager of BGI Americas, likened the availability of the MGISEQ-T7 as ‘genomic broadband for all’, similar to the shift to 5G cellular networks that will take place worldwide over the next several years, increasing wireless bandwidth about 20-fold. Of course there is a bit of irony in this analogy, given the existing political turmoil over the Huawei 5G ban in the United States, the arrest and extradition of a top executive for banned financial relationships with Iran, and concern over security.
Nonetheless the headline specifications impress: a full 6 Terabases per 2×150 Paired End (PE) run, in less than 24 hours. For those figuring out how you get from here to 20,000 whole genomes per year, the math is thus: the haploid human genome is 3.1 Gigabases, and 30 times that is 93 Gigabases per individual human sample. 6,000 Gigabases per 24 hour run is 64.5 samples per day, and 64.5 times 6 days per week (allowing one day per week for maintenance and downtime) or 312 days is 20,124 human genomes.
Dr. Tan showed the above slide that has some interesting details, namely the number of total reads (5,000 M or 5 Billion). Taking this to mean 5 Billion reads in PE format, thus 2x 150 would be 5 Billion * 300 bp = 1.5 Trillion bases. And as the MGISEQ-T7 runs four of these flowcells, 4 x 1.5 TB = 6 TB.
A friend saw firsthand the MGISEQ-2000 flowcells, and was very impressed with its engineering, marveling at the thinness. I was told that the width of the flowcell was a paltry 50 microns. This friend expressed some concern about the manufacturability of the MGISEQ-T7, even though Dr. Tan said there were three in operation and another 4 undergoing final testing. In my own opinion, having visited Shenzhen and the BGI headquarters, that is low-risk given the amount of skilled manufacturing available in China.
Dr. Tan pointed out the DNBSeq™ Technology uses linear rolling circle amplification technology to produce the DNA NanoBalls (where the ‘DNB’ comes from) used before with the Complete Genomics in-house service business. Since the same original DNA template is used again and again in linear amplification, rather than as a PCR exponential amplification (that is in PCR a copy is used as a template for additional copies), errors introduced in the PCR process do not propagate with their DNBSeq technology. I hadn’t considered this before, this is good messaging MGI, and congratulate the MGI marketing team for me!
The reason the throughput is so high is the incredible density. MGI has technology to bind only one single DNB to a patterned array, and as the flowcell slide indicates a pitch of only 700nm. It is difficult to communicate how small submicron features are; suffice it to say a single E. coli bacterium is about 0.5 um by 2 um in length. Think about it: MGI has technology to manufacture a feature that binds a single DNA nanoball, and only one nanoball, per feature on a plate with 2.5 billion of these features.
Accuracy of the MGISEQ-T7 promises to be high
The claimed accuracy was >99.9% Precision and >99.9% Sensitivity for SNVs, and >99% Precision and >99% Sensitivity for InDels (few details were given, understandable as they are just running their first sample data). Duplicate rate is less than 2%, and index hopping, an acknowledged problem with the Illumina platforms with patterned flowcells, is ‘0 to 0.00001%’. (James Hadfield has a nice review and write-up from 2017 here.)
And now for something cool: CoolNGS
At this point in the presentation I was expecting Dr. Tan to go into the derivation of the SNV error and InDel error rates, or perhaps into some kind of liquid handling automation such as this automated sample preparation system. Instead he introduced a new method of detection of single base extension using antibodies.
The prior BGISEQ-50 and BGISEQ-500 had a sequencing-by-synthesis scheme that was very similar in concept to Illumina’s, which is a reversible terminator with a cleavable dye component. Thus a modified polymerase incorporates a highly modified nucleotide with a fluor on it, and only one base extends because the 3’-OH group is blocked. A laser interrogates the fluor and determines which of four colors it is, then chemicals cleave off the fluor and unblock the 3’-OH group in preparation for the next round.
CoolNGS has 3’-OH blocked nucleotides, but are unlabeled. Then four different fluorescently-labelled antibodies, each recognizing different epitopes comprising of both the nucleotide base and the ‘extension blocks’, interrogate which of the four bases have been just added. A laser will interrogate the fluor attached to the antibody, the 3’-OH group gets unblocked and the antibody gets removed, in preparation for the next round of sequencing.
Dr. Tan said they do not see signal suppression with the potential to extend to longer readlengths, even 600bp PE reads or 800bp SE (single end) reads.
If you look at the dot-plot on the right, you want to see four clusters clearly separated from the others. From this plot, you see the dots at the origin, cleanly separated from the dots along the axis, and the cloud of dots along the diagonal. You can surmise from this chart that they are using a two-color system, similar to the method used when the Illumina NextSeq was introduced: the four color states are color 1, color 2, color 1&2, and no color 1 nor 2.
PCR-free library data, a revised LFR method and automation
He finished with a busy slide with 99% mapping rates for both conventional and PCR-free libraries, higher unique mapping rates compared side-by-side with the NovaSeq (for PCR-free, 94% vs. 90% respectively), much lower duplicate rates (for PCR-free, 1.5% vs 6.4% respectively), and similar overall >20x coverage rate (for PCR-free, 92.5% vs. 96.4%).
One interesting development from the library methods side was a new LFR method they call stLFR, for ‘single-tube Long Fragment Read’. From one histogram it appeared the mean ‘synthetic fragment’ produced by the MGISEQ-T7 was 50kb. This bioRxiv paper details Dr. Tan’s claim of up to 300kb reads can be obtained by this method.
Dr. Tan did conclude with optional automation. The MGISP-960 automates 8 to 96 DNA purifications from plasma, saliva, and FFPE although it isn’t clear whether it does library preparation or not. A second instrument, the MGIDL, is a required piece of equipment for loading the flowcells. The MGISEQ-T7 itself is completely automated. Lastly on the informatics front, he had on a slide an aligner called MegaBOLT, a 20-fold increase in speed for ‘WGS, WES bioinformatics analysis (vs. GATK)’. His closing thought was the following: with 1,000 laboratories with 5 T7’s each sequencing 20,000 samples per year, they could finish sequencing every human on earth (7 billion people) in 50 years. And the math works out.
The MGISEQ-T7 will cost $1M and ‘available worldwide before the end of 2019’. Also the consumables is expected to be $5 per Gb, keeping to the ‘5G’ theme mentioned at the beginning. For context, the NovoSeq 6000 costs $985,000, and the least expensive list price cost for reagents on that platform is $10.34 per Gb for the NovoSeq S4 300.
Some final thoughts on the MGISEQ-T7 from #AGBT19
By the end of AGBT, it is quite a stretch from a Wednesday noon ‘Spatial Summit’ sponsored by NanoString all the way through the Saturday evening closing party. Getting up early to catch the 5am shuttle bus for the one-hour trip to the airport, there were several friends on-board headed back early.
One thought-leader told me it was the MGI presentation was the highlight of their AGBT this year. A very high-throughput customer told me they were happy for competition in the marketplace, an opinion shared by two other people (also from very high-throughput facilities in very different contexts).
It was a telling comment, when Eric Green (the Director of the NHGRI) asked the first question, whether the T7 would be sold by Amazon.
Here’s a nice promotional video of the MGISEQ-T7 Dr. Tan showed toward the end of his presentation, and here’s the T7 product webpage and a product brochure (PDF).
Since I’ve written way too much already on this post on my way back home, I’ll conclude with a quote from the Century of Biology essay.
By the end of this century, the human genome project could be judged as the Manhattan Project of our time and us scientists as tinkering Frankensteins who couldn’t leave well enough alone. Or, mapping the human genome could be judged as the greatest advance in the history of our species since we stood up on two legs.
Everything depends on the prudent application of the accumulated wisdom of human experience to the stunning new scientific discoveries of our age. Cognizant of both the great possibilities and risks knowledge of the human genetic code brings, our hope is that future generations will never have to ask, with T.S. Eliot, “Where is the wisdom we have lost in knowledge?”
From ‘The Century of Biology’, New Perspectives Quarterly
Pingback: 2-color SBS: @MGI_BGI vs @Illumina - Enseqlopedia