Skip to main content
Premium Trial:

Request an Annual Quote

Catalog, Seagate Partner to Advance DNA-Based Data Storage, Computing

Premium

NEW YORK – Catalog, a startup pursuing DNA-based data storage and computing technology, has entered into a collaboration with hard drive maker Seagate Technology.

Under the terms of the collaboration, the firms will work to bring together Catalog's molecule designs for storing data in DNA and performing computation across a library of molecules and Seagate's silicon-based lab-on-a-chip technology. Both companies bring proprietary DNA synthesis chemistries to the table, and Seagate has expertise on how to structure data for storage applications.

If all goes well, it would be an important step toward creating a commercial DNA-based data storage platform. "This work with Seagate is essential to eventually lowering costs and reducing the complexity of storage systems," Catalog CEO and Founder Hyunjun Park said.

"There is no financial component in this multi-year engagement," he added. "However, we expect to achieve significant milestones on a quarterly basis." 

The deal brings together a heavyweight in the digital data storage market — Seagate has a market cap of approximately $13.57 billion and hard disk market share of over 40 percent — and a startup that has gone a step further than most others in the DNA-based data storage space with its quest to develop DNA-based computing.

"[Catalog] might have the most mature, complete read-write DNA data storage system today," said Ed Gage, VP of Seagate Research Group. "And their focus on DNA for computing is a real strength. … If you could add compute right to that massive amount of data, that's a huge savings of time and energy."

Early work will focus on making Catalog's chemistries work in smaller volumes, Park said, with the goal of reaching femtoliter volumes. "This will provide insights into how DNA-based storage and computation systems can be made smaller with greater levels of automation," he said. "The analogy in today's electronic world is that chips have consistently been made smaller, and more transistors have been put on them."

Gage added that the collaboration will "bring some solid effort" to addressing DNA writing and reading speed needed to take the idea from a demonstration to a product.

The collaboration is another vote of confidence for Catalog, which raised $35 million in Series B financing in October 2021, led by Hanwha Impact Partners. The firm also raised $10 million in Series A financing in 2020. Based in Boston, it has grown to approximately 20 employees and is hoping to add 10 more by the end of the year, Park said.

Catalog plans to use the funding to help it develop a DNA-based computing platform, where data is manipulated before being read out. "Over the last year, we've done as much as we could in that area while improving our processes for reading and writing DNA," Park said.

The Seagate collaboration shows that data storage is still an important project for the company. Catalog has been in contact with Seagate for approximately four years, since Park met some Seagate executives at a Library of Congress meeting on digital archiving. Both companies are members of the DNA Data Storage Alliance.

Seagate is not the only company Catalog is talking to, Park said, but it is one the firm has worked with before. Phase one of the partnership, which lasted "a couple of years," ended last summer. In addition to testing actual chemistry, it was a test of the chemistry between the technical staff on both sides.

For its part, Seagate's interest in DNA predates data storage, as it launched a next-generation sequencing project more than five years ago, Gage said. The method's error rate and speed were "not satisfactory," Gage said, so the company shelved the project; however, it provided an entry into working with DNA. The firm has since developed a DNA synthesis method that relies on a library of building blocks. Pairing that with an in-house lab-on-a-chip technology, it has entered the DNA-based data storage field.

"One of the real keys is to shrink the volume of a droplet so you don't need a swimming pool filled with chemicals to do a data center," Gage said. Seagate's chips work with femtoliter-scale droplets, a necessity to be commercially viable, he noted.

 "We know we can move [droplets] quickly," he said. "We can mix them, split them, take them out, and put them back. The next stage is to really shrink it down and solve the traffic problem of how to route them on the chip."

Getting Catalog's molecules to work at that level of miniaturization is important, but not the only challenge, as doing DNA data storage is different than simply writing information into DNA.

"On a hard drive, we store data in a sector," Gage said. But user data — pictures, PDFs, program files — are not the only data encoded. "We also have timing info, error correction, position info of where you are in the sequence," he said. "All that needs to be put into a gene to make it useful in a data storage application."

Another issue is speed. Synthesizing DNA one base at a time is not feasible, Gage said. "You'll never get to required data rates." Seagate's approach is to start with a prebuilt library of oligos that contain some of the information needed to store and access data. 

How long a molecule needs to be is yet to be determined. "It changes all the time," he said. "A hard drive sector size is 4 kilobytes. That would be a nice number. A challenge, but a nice number."

Sequencing is another issue, one that isn't necessarily a focus of this collaboration, but which needs to be solved long term. Current technologies are likely not fast enough. "Oxford Nanopore Technologies was one we were quite interested in," Gage said. "It looked the closest to what we thought a DNA sequencing system would look like for data storage. But I still think they're many orders of magnitude behind what we need for a product."

On the plus side, sequencing for data storage doesn’t need to be as accurate as for biology, with an error rate as high as one in 100 being potentially tolerable.

Even the performance metrics for DNA-based data storage and computing have yet to be settled. "We're not building something that neatly replaces an existing technology," Park said, so simply grabbing metrics from silicon-based data storage, such as recording speed, doesn't make sense. "I could give you very attractive numbers in terms of information density or the ability to copy information very quickly, but I don't know if they really capture the essence of what we want to address with computational storage."

But the market won't have to wait long to hear Catalog's take on which metrics are important. "We'll be ready, I think, to talk about that later in the year," Park said.

The Scan

Cystatin C Plays Role in Immunosuppression, Cancer Immunotherapy Failure, Study Finds

A study in Cell Genomics provides insight into how glucocorticoids can lead to cancer immunotherapy failure via cystatin C production.

Aging, Species Lifespan Gene Expression Signatures Overlap

An Osaka Metropolitan University team reports in Nucleic Acids Research that transcriptional signatures of aging and maximum lifespan have similarities.

Splicing Subgroup Provides Protocols for Evaluating Splicing Variant Data

The group presents their approach on how to apply evidence codes to splicing predictions and other data in the American Journal of Human Genetics.

Single-Cell Transcriptomic Atlas of Mouse Cochlea to Aid Treatment Development

Researchers in PNAS conducted single-cell and single-nuclear sequencing of about 120,000 cells at three key timepoints in cochlear development to generate a transcriptomic atlas.