Decoding the Potential of DNA as a Digital Data Storage Systems
Many great scientists have argued throughout history about what living things are. Until not so long ago, minerals were considered the third kingdom of life, as they grew in the eyes of the experimenter from tiny particles to perfectly ordered crystals. The growth and reproduction of the characteristics contained in a body is therefore a fundamental basis of living things. Of course, mysticism gradually gave way to scientific evidence during the 19th and 20th centuries. Eventually, they would find in the nucleus of cells a slightly acidic mass, which seemed to be important for the cytoplasmic maintenance since if they removed it, cellular functions would soon cease. It was not until the joint discovery of Rosalind Franklin, Watson, and Crick that DNA was understood as a chemically stable tangle that was capable of encoding all the biological information that makes up an organism.
These strands were also accessible by the proteins they encoded and performed impressive molecular functions, such as the duplication of the information contained, its maintenance and protection against mutations, and the regulation of gene expression in response to environmental stimuli. This invention of evolution is so perfect for storing information that, if we think about its origin, we will soon realize that since the appearance of the first modern DNA molecules on earth no physical or chemical agent has spoiled the information they contained, perpetuating, propagating and updating itself automatically for millions of years. We ourselves are heirs to this marvel of natural engineering, of the first self-replicating strands.
Parallel to the development of molecular biology, humans have also developed a system for perpetuating information, albeit of an abiotic nature: computation. It is in the progress of these two disciplines that the irremediable gaze of scientists and engineers turns to the same point where their paths converge. In the search for more efficient and compact storage technology, it seems that engineers cannot compete for now with the evolution of primordial chemistry.
The advantages are many apart from the small size and stability. The most obvious and tempting is the encryption capability, which would move from a binary system of zeros and ones to a quaternary system, with the four nucleotides adenine, cytosine, guanine, and thymine. Moreover, molecular engineering allows us to be imaginative in this matter since we can devise a system using more than two pairs of nucleotides. The chemical structure of DNA can also be modified to suit our interests; a good example of this is the manufacture of morpholinos, DNA molecules that are reinvented or based on the structure of DNA without having the same composition.
However, there are still some negative aspects that should be taken into account. Encrypting the information in DNA is relatively easy, you just need an encryption pattern with which to read the nucleotides in a particular direction. Things start to get complicated when it comes to reading this “codex”.
State-of-the-art DNA sequencing technologies – such as illumina or the Oxford Nanopore – cannot read entire DNA molecules, only more or less short fragments. If you think that it is enough to put together the pieces that have been sequenced, you are not entirely wrong, but it is more complicated than that. It turns out that to read DNA you can’t do it with just one molecule, because you need to have enough concentration of molecules to be able to do the sequencing reaction. There is always a prior step and that is the amplification of our encoded DNA by the now famous PCR (or polymerase chain reaction). As you can imagine, sequencing forms a rather intricate amalgam of puzzles. It should also be noted that these processes, including sequence reading, have an error rate and the code can be compromised and it could take several weeks before we get the information.
Despite these serious drawbacks, Yaniv Erlich and Dina Zielinski published on March 3, 2017, a reliable method that avoided errors in this encoding and reading called DNA Fountain, as well as having been able to “store a full computer operating system, movie, and other files with a total of 2.14 × 106 bytes in DNA oligonucleotides and perfectly retrieve the information from a sequencing coverage equivalent to a single tile of Illumina sequencing”. Since then, many private initiatives have been seeking to perfect this process.
One of the most promising companies is Catalog, founded by two MIT scientists, which aims to be the first to commercialize this type of storage. Another interesting start-up is Evonetix, which has focused on enhancing the read length of DNA strands. In synthetic DNA manufacturing, Kilobaser is making DNA “printers” for around $9k. There is also much interest in DNA storage in vivo, as the cellular medium is ideal for maintaining it.
The tendency to optimize the processes occurring in nature is inevitable. The same is true for industrial processes by humans. The future looks bright for DNA storage, at least in the short and medium term. While we cannot venture that this system is here to stay and prevail, since the upper limit for DNA information encoding has been calculated by weight (4,606 × 1020 Bytes/g) and volume (4,606 × 1017 Bytes/mm3), and although it is quite a lot, perhaps a better way of compressing information without redundancies will be found. What is certain about the future of information is that the demand for its storage will grow, since it is natural to want to preserve it: the very essence of nature tells us so.