I could fit all movies ever made inside of this tube. If you can't see it, that's kind of the point.
Before we understand how this is possible, it's important to understand the value of this feat. All of our thoughts and actions these days, through photos and videos—even our fitness activities—are stored as digital data. Aside from running out of space on our phones, we rarely think about our digital footprint. But humanity has collectively generated more data in the last few years than all of preceding human history.
Big data has become a big problem. Digital storage is really expensive, and none of these devices that we have really stand the test of time. There's this nonprofit website called the Internet Archive. In addition to free books and movies, you can access web pages as far back as 1996. Now, this is very tempting, but I decided to go back and look at the TED website's very humble beginnings. As you can see, it's changed quite a bit in the last 30 years. So this led me to the first-ever TED, back in 1984, and it just so happened to be a Sony executive explaining how a compact disk works.
Now, it's really incredible to be able to go back in time and access this moment. It's also really fascinating that after 30 years, after that first TED, we're still talking about digital storage.
Now, if we look back another 30 years, IBM released the first-ever hard drive back in 1956. Here it is being loaded for shipping in front of a small audience. It held the equivalent of one MP3 song and weighed over one ton. At 10,000 dollars a megabyte, I don't think anyone in this room would be interested in buying this thing, except maybe as a collector's item. But it's the best we could do at the time.
We've come such a long way in data storage. Devices have evolved dramatically. But all media eventually wear out or become obsolete. If someone handed you a floppy drive today to back up your presentation, you'd probably look at them kind of strange, maybe laugh, but you'd have no way to use the damn thing. These devices can no longer meet our storage needs, although some of them can be repurposed. All technology eventually dies or is lost, along with our data, all of our memories. There's this illusion that the storage problem has been solved, but really, we all just externalize it. We don't worry about storing our emails and our photos. They're just in the cloud.
But behind the scenes, storage is problematic. After all, the cloud is just a lot of hard drives. Now, most digital data, we could argue, is not really critical. Surely, we could just delete it. But how can we really know what's important today? We've learned so much about human history from drawings and writings in caves, from stone tablets. We've deciphered languages from the Rosetta Stone. You know, we'll never really have the whole story, though. Our data is our story, even more so today. We won't have our record recorded on stone tablets. But we don't have to choose what is important now. There's a way to store it all. It turns out that there's a solution that's been around for a few billion years, and it's actually in this tube.
DNA is nature's oldest storage device. After all, it contains all the information necessary to build and maintain a human being. But what makes DNA so great? Well, let's take our own genome as an example. If we were to print out all three billion A's, T's, C's and G's on a standard font, standard format, and then we were to stack all of those papers, it would be about 130 meters high, somewhere between the Statue of Liberty and the Washington Monument. Now, if we converted all those A's, T's, C's and G's to digital data, to zeroes and ones, it would total a few gigs. And that's in each cell of our body. We have more than 30 trillion cells. You get the idea: DNA can store a ton of information in a minuscule space.
DNA is also very durable, and it doesn't even require electricity to store it. We know this because scientists have recovered DNA from ancient humans that lived hundreds of thousands of years ago. One of those is Ötzi the Iceman. Turns out, he's Austrian.
He was found high, well-preserved, in the mountains between Italy and Austria, and it turns out that he has living genetic relatives here in Austria today. So one of you could be a cousin of Ötzi.
The point is that we have a better chance of recovering information from an ancient human than we do from an old phone. It's also much less likely that we'll lose the ability to read DNA than any single man-made device. Every single new storage format requires a new way to read it. We'll always be able to read DNA. If we can no longer sequence, we have bigger problems than worrying about data storage.
Storing data on DNA is not new. Nature's been doing it for several billion years. In fact, every living thing is a DNA storage device. But how do we store data on DNA? This is Photo 51. It's the first-ever photo of DNA, taken about 60 years ago. This is around the time that that same hard drive was released by IBM. So really, our understanding of digital storage and of DNA have coevolved. We first learned to sequence, or read DNA, and very soon after, how to write it, or synthesize it. This is much like how we learn a new language. And now we have the ability to read, write and copy DNA. We do it in the lab all the time. So anything, really anything, that can be stored as zeroes and ones can be stored in DNA.
To store something digitally, like this photo, we convert it to bits, or binary digits. Each pixel in a black-and-white photo is simply a zero or a one. And we can write DNA much like an inkjet printer can print letters on a page. We just have to convert our data, all of those zeroes and ones, to A's, T's, C's and G's, and then we send this to a synthesis company. So we write it, we can store it, and when we want to recover our data, we just sequence it.
Now, the fun part of all of this is deciding what files to include. We're serious scientists, so we had to include a manuscript for good posterity. We also included a $50 Amazon gift card—don't get too excited, it's already been spent, someone decoded it—as well as an operating system, one of the first movies ever made and a Pioneer plaque. Some of you might have seen this. It has a depiction of a typical—apparently—male and female, and our approximate location in the Solar System, in case the Pioneer spacecraft ever encounters extraterrestrials.
So once we decided what sort of files we want to encode, we package up the data, convert those zeroes and ones to A's, T's, C's and G's, and then we just send this file off to a synthesis company. And this is what we got back. Our files were in this tube. All we had to do was sequence it. This all sounds pretty straightforward, but the difference between a really cool, fun idea and something we can actually use is overcoming these practical challenges.
Now, while DNA is more robust than any man-made device, it's not perfect. It does have some weaknesses. We recover our message by sequencing the DNA, and every time data is retrieved, we lose the DNA. That's just part of the sequencing process. We don't want to run out of data, but luckily, there's a way to copy the DNA that's even cheaper and easier than synthesizing it. We actually tested a way to make 200 trillion copies of our files, and we recovered all the data without error. So sequencing also introduces errors into our DNA, into the A's, T's, C's and G's. Nature has a way to deal with this in our cells. But our data is stored in synthetic DNA in a tube, so we had to find our own way to overcome this problem. We decided to use an algorithm that was used to stream videos. When you're streaming a video, you're essentially trying to recover the original video, the original file. When we're trying to recover our original files, we're simply sequencing. But really, both of these processes are about recovering enough zeroes and ones to put our data back together. And so, because of our coding strategy, we were able to package up all of our data in a way that allowed us to make millions and trillions of copies and still always recover all of our files back.
This is the movie we encoded. It's one of the first movies ever made, and now the first to be copied more than 200 trillion times on DNA.
Soon after our work was published, we participated in an "Ask Me Anything" on the website reddit. If you're a fellow nerd, you're very familiar with this website. Most questions were thoughtful. Some were comical. For example, one user wanted to know when we would have a literal thumb drive. Now, the thing is, our DNA already stores everything needed to make us who we are. It's a lot safer to store data on DNA in synthetic DNA in a tube.
Writing and reading data from DNA is obviously a lot more time-consuming than just saving all your files on a hard drive—for now. So initially, we should focus on long-term storage. Most data are ephemeral. It's really hard to grasp what's important today, or what will be important for future generations. But the point is, we don't have to decide today. There's this great program by UNESCO called the "Memory of the World" program. It's been created to preserve historical materials that are considered of value to all of humanity. Items are nominated to be added to the collection, including that film that we encoded. While a wonderful way to preserve human heritage, it doesn't have to be a choice. Instead of asking the current generation—us—what might be important in the future, we could store everything in DNA.
Storage is not just about how many bytes but how well we can actually store the data and recover it. There's always been this tension between how much data we can generate and how much we can recover and how much we can store. Every advance in writing data has required a new way to read it. We can no longer read old media. How many of you even have a disk drive in your laptop, never mind a floppy drive? This will never be the case with DNA. As long as we're around, DNA is around, and we'll find a way to sequence it.
Archiving the world around us is part of human nature. This is the progress we've made in digital storage in 60 years, at a time when we were only beginning to understand DNA. Yet, we've made similar progress in half that time with DNA sequencers, and as long as we're around, DNA will never be obsolete.
Thank you.