Skip to content
← All talks

The talk

Databases: A History of Places to Put Your Stuff

Delivered 2 times · 2023–2024

Slides
Download PDF
Databases: A History of Places to Put Your Stuff, slide 1 of 33Databases: A History of Places to Put Your Stuff, slide 2 of 33Databases: A History of Places to Put Your Stuff, slide 3 of 33Databases: A History of Places to Put Your Stuff, slide 4 of 33Databases: A History of Places to Put Your Stuff, slide 5 of 33Databases: A History of Places to Put Your Stuff, slide 6 of 33Databases: A History of Places to Put Your Stuff, slide 7 of 33Databases: A History of Places to Put Your Stuff, slide 8 of 33Databases: A History of Places to Put Your Stuff, slide 9 of 33Databases: A History of Places to Put Your Stuff, slide 10 of 33Databases: A History of Places to Put Your Stuff, slide 11 of 33Databases: A History of Places to Put Your Stuff, slide 12 of 33Databases: A History of Places to Put Your Stuff, slide 13 of 33Databases: A History of Places to Put Your Stuff, slide 14 of 33Databases: A History of Places to Put Your Stuff, slide 15 of 33Databases: A History of Places to Put Your Stuff, slide 16 of 33Databases: A History of Places to Put Your Stuff, slide 17 of 33Databases: A History of Places to Put Your Stuff, slide 18 of 33Databases: A History of Places to Put Your Stuff, slide 19 of 33Databases: A History of Places to Put Your Stuff, slide 20 of 33Databases: A History of Places to Put Your Stuff, slide 21 of 33Databases: A History of Places to Put Your Stuff, slide 22 of 33Databases: A History of Places to Put Your Stuff, slide 23 of 33Databases: A History of Places to Put Your Stuff, slide 24 of 33Databases: A History of Places to Put Your Stuff, slide 25 of 33Databases: A History of Places to Put Your Stuff, slide 26 of 33Databases: A History of Places to Put Your Stuff, slide 27 of 33Databases: A History of Places to Put Your Stuff, slide 28 of 33Databases: A History of Places to Put Your Stuff, slide 29 of 33Databases: A History of Places to Put Your Stuff, slide 30 of 33Databases: A History of Places to Put Your Stuff, slide 31 of 33Databases: A History of Places to Put Your Stuff, slide 32 of 33Databases: A History of Places to Put Your Stuff, slide 33 of 33

We rely on context a lot as software engineers, passing state around so one function knows what the last did, and how it was impacted by the one before that, and on and on. Without passing around context, a lot of things would be much more difficult to build, or at least much more wordy. People need context too. Sure, we’re smarter than a function that only knows how to do one thing, and we can figure it out eventually, but it’s easier to learn a new tool or concept if someone gives us the historical context.

Databases have been around for a long time! Longer than you might think, in fact. Having the context of the evolution and history of how data has been stored, accessed, and operated helps us understand why things might work (or not work!) the way that they do. In this talk, Matty and Kat will take you on a journey through the history of databases and how they’ve changed and evolved - which should help give you some insight into the technical hurdles and pain points that led to where we are today. You’ll come away with the historical context of databases!

Databases & Data

Every delivery (2)

Resources

Transcript · 5,948 words · ~30 min read

Lightly edited for readability from the video’s captions. Download as text

All right, so welcome to our second part of our meetup. Our talk is "A Database is a History of Places to Put Your Stuff," which is like way more interesting than it sounds. We're making a lot of promises here, so it is interesting. I had fun writing this talk.

This is going to get turned into a blog post, and we'll both give it individually later. But this is how developer relations work — multi-repurposed content, yes. So this is a photo of me and Matthew. We're very attractive, intelligent, and authoritative developer advocates who take our jobs very seriously, as you can see from this photo and my chins.

So I work at Dell — I run developer relations there. Which seems confusing — why would Dell have developer advocates? But surprise, we do. My job is not to sell you Dell's products; it is to fix your problems whether it makes Dell money or not. And that's probably the extent of how much we're going to talk about our employers. I cannot fix the problems with drivers for anything on your Dell Linux Sputnik laptop. Not my problem.

Anyway, let's rock into this talk. So most people outside of this room generally do not give databases a whole ton of thought beyond arguing about whether they should be using whichever two databases are jockeying for most trendy of the day, or whatever the gumball machine at AWS spits out for you, because I can't keep track of all the databases they offer.

And that's fine, there's nothing wrong with that. Realistically, for the overwhelming majority of applications, you are actually fine just rolling Postgres and calling it a day. And that's not to say that there isn't an advantage to choosing the database that is the most specifically useful for your application, for your data, for the way you need to query it. But what I am saying is that if you just roll Postgres you will get by until you absolutely have to switch to something else. At first you are fine just rolling Postgres, and that's okay.

When you look at the mess of options out there, it is hard to choose the correct thing. And in part I think that's because it is all marketed the exact same way to you. The technical marketing copy for every database, regardless of the way the data is structured or the way the data is queried, it is all sold to you like this: "high throughput and scalability with low latency." Congratulations, you can now be a CMO of a database company. I didn't make that up — these are studies actually, copy ripped directly from like five different databases when I was writing some content for something else eight months ago, and it still irritates me. It's like one person wrote the absolute flappiest piece of database copy known to man 30 years ago and everybody has just been rehashing it over and over and over again since then. They're all the best.

It really doesn't matter what the database says — they are all going to be sold to you that way. It drives me absolutely nuts.

These are all of the different types of databases listed in the Wikipedia article for databases. I don't know what quite a few of these are. I can venture a guess based on just the naming but couldn't tell you specifically. What I can guarantee you is that they are all being sold to you in the exact same way — they're all saying "oh we've got high throughput, scalability, low latency" — every single one of them regardless. And they are not necessarily telling the truth for your specific application; they're telling the truth for their specific application.

I don't think that they're sold to you this way because of actually lazy copywriting. I don't think it's people ripping each other off for decades — maybe a little bit of that. I know spatial is actually stuff like the way Google Maps works and temporal is time series databases.

The actual answer to why these things are sold to you this way is because — if you saw my keynote this morning, I said that for this talk, and it is true for all of the talks I give — it's because we have been solving the same problem for 70 years. We are always trying to scale more, we are always wanting higher throughput, we are always wanting lower latency, we always want to do more stuff faster. This is the problem we have always been trying to solve since the dawn of computing, so that's why we're still selling it to you this way. These are the keywords we've landed on for databases specifically because these are the things we need.

And we're going to tell this story to you with the assistance of a TV show that I have not seen, but Matt assures me that it is very funny. The story through the decades, a journey through the decades with WandaVision.

Anyway, the history of databases goes much further back than you would expect. It actually starts in the 1950s when we first had actual computers. The first computer was the ENIAC machine which was released in 1945 — we're not going to talk about the ENIAC, we're going to talk about data storage.

To understand how data storage and organization has evolved over time, we need to understand what it was like at the dawn of computing — and it was that we didn't have it. It was not a thing. At the very beginning of the computing era we didn't have a way to store data because the machines were not intelligent in any way; they were functionally just really, really, really big calculators.

Storing data, really storing a program, meant just lugging around huge boxes of punch cards. If you've never seen a punch card before, this is one from an early IBM series. They didn't all look exactly like this but generally that's the vibe. And for context: if you think this punch card shows a program, this punch card and probably about 800 more are a program. So you had to have them all — you were carrying literal cases of these things. There is a very fun photo from the Apollo mission that shows a woman standing next to a stack of punch cards that is taller than her, and that was just handling the calculations for the inventorying system for the Saturn V, which we'll talk about later.

The way you used a computer back then was you showed up with your stack of punch cards that was one program, for your appointment at the computer — which seems crazy today. You put it in and stood there waiting, and then took your literally printed output. This was before the invention of the dot matrix printer, so the printing was not as fast or easy as you even imagined for like old dot matrix.

So in the early 50s things started to change. This is the UNIVAC I, which was our first example of magnetic tape storage. This was a pretty big deal because it did allow us to write things much faster than with the punch cards, but reading was still a huge problem because everything was stored sequentially so you had to seek backwards across the tape, and these were pretty fragile.

Unfortunately, magnetic tape storage is still used today. I spent a decade working in off-site data backup and I find it super unfortunate that magnetic tape backup is still a thing, but if you want to pay Iron Mountain for it they will absolutely back up and store your data on tape drives. Do not leave one in your car in the summer — it will absolutely melt. And there's no way to verify data integrity on a tape drive, by the way. You can verify integrity on a magnetic platter but you cannot do it on a magnetic tape.

We didn't have to deal with that for very long as the standard, because just a few years later IBM — the literal decades king of computers — introduced disk storage with the 305 RAMAC. I was going to bring one of these discs with me as a prop because I do have one, but I forgot it, and they're huge. They're about this big. They are very very large — you've probably seen them in old movies, you've probably seen old pictures of them. They're big, scary, gold-looking plates.

Unlike magnetic tape, the data on disks could be accessed randomly, which sped up both reads and writes — very cool. However, we did not have a system to organize that data, and so that did not actually make accessing that data any easier. It was faster, but we couldn't really get at it. We had only been accessing data and executing programs sequentially because we hadn't figured out concurrency or multi-programming yet, so as a concept this was a pretty huge leap for people, and it didn't really go anywhere immediately.

Until this guy — who looks like everybody's cutest grandpa to me, and that bow tie is really doing it for me — this is Charles Bachman, and he wrote the first DBMS when he was working at General Electric, one of IBM and the Seven Dwarves. This was called the Integrated Data Store, or IDS, and this opened the door to a ton of new technology for us. It was architecturally pretty damn near perfect. IDS-type databases still exist today — not IDS itself, but databases directly influenced by it, databases that mimic its architecture still exist. And while they are very very difficult to use, they are basically unmatched in performance, and you still mostly see them used in the telecom industry. They're still a thing.

A few years later, with some other general-purpose databases popping up but not really a standard way to interact with them, Bachman also decided to form CODASYL, which standardized programming languages pretty much, and this is how we got COBOL.

To bring things back to Kubernetes: if you would like to run COBOL programs inside Kubernetes, you can do that, and JJ Asghar from IBM has written blogs and projects about how to do that. There is actually a fair amount of COBOL still in use today. If you have a bank account, you're probably using COBOL.

Space.

All right, so in the 60s we got another navigational database that blew everybody's socks off. This one you're more likely to have heard of if you are a huge dweeb or you ever had a hyperfixation on space travel. We got IBM's IMS — the Information Management System — and this was developed and released for IBM System/360, which I also talked about this morning. One of the most important computers in history. The IBM System/360 and IBM's Information Management System is what sent us to the Moon.

This database system was designed specifically to handle calculations for the inventory for the absolute thickest rocket ship anyone in human history has ever built. This is a photo of every single Saturn V rocket that has ever launched, and it is so cool. We can't build this rocket anymore, and that is emotionally devastating. But if you ever get a chance to go to one of the various space centers that have a Saturn V and go look at it, you should, because it's a very cool piece of human engineering history both from a physical engineering and software engineering perspective.

It is a super heavy-lift rocket. I think the Falcon is the closest heavy-lift we have managed to build, and this thing was originally built in the 60s and we're just now able to match it. That's impressive. And that was originally possible because of the IBM System/360 and IBM's Information Management System — a database that was exclusively built to handle this thing's bill of materials.

That's a lot of parts. I built the Lego of it — there's a lot of parts.

All right, so now we come to the 70s. The collars get wider and the databases get relational.

We kind of think about these early existing data systems — navigational databases, very sequential. The thing about that was like everything was fundamentally a linked list, right? So search? That's crazy talk.

So enter this cat. This is Edgar Codd. I like to call him the Top Gun of tables. Codd in 1970 wrote a number of papers that outlined this new approach to database construction, which ultimately culminated in this groundbreaking work with the sick title "A Relational Model of Data for Large Shared Databanks." Clearly, Edgar's pretty dope — titling things was not his thing.

But in this paper he talked about — instead of records being stored in one sort of a linked list like we'd been using — his idea was what if we organize the data as a number of tables? Each table is used for a different kind of entity, and they have a fixed number of columns containing those attributes. If you're not a database person, think about an Excel spreadsheet. Sounds maybe not revolutionary, but it was. And one or more of the columns are designated as a primary key, so the rows of the table could be uniquely identified, and this particular record cross-references using those primary keys rather than their disk address — which is how it would work in a navigational one. That lets us do queries to join those tables based on those relationships, etc.

Codd used mathematical terms to define the model, so he talked about relations, tuples, and domains rather than tables, rows, and columns like we might be used to today. Later on he got actually kind of pissed that the practical implementation of relational databases talked about tables and columns and rows instead of the mathematical foundations. He looks like a guy who'd be mad about that.

Before this it was a linked list — that was fundamental, that's all it was, and that's why you couldn't do searches. This is why IMS — IBM says that IMS was a hierarchical database, but fundamentally it was a navigational database and you were just pointing at an address that had some chunk of whatever in it.

So in the early 70s IBM started working on a prototype that was based on Codd's paper, and this prototype was called System R. They kind of fussed with it for a few years from like 1974 to 1979, and it kind of became clear that there was a demand for a production-ready version of this. So they created a production-ready version of System R which was known as Database 2, or DB2, of which you may have heard.

Around the same time we enter our friend Larry Ellison. The Oracle database — or Oracle as it's more commonly known now — started from a different chain also based upon IBM's papers and System R. The Oracle V1 implementations were completed around 1978, but Oracle version 2 shipped in 1979, which beat IBM's DB2 to the market. Relational Database through Functional Software Systems was the original name of Larry Ellison's company before just naming it after Oracle. Super villain with an island, so maybe that's the trick.

All right, so in 1973, Codd's paper was also picked up by two people at Berkeley: Eugene Wong and Michael Stonebraker, who's this cat up here. So they started this product known as Ingres, and they were using funding that had been allocated for a geographical database project. Starting in 1973, Ingres delivered its first test products, which were used pretty widely until about 1979 or so. Ingres was really similar to System R; it included the idea of using a language for data access that was known as QUEL — Q-U-E-L — and over time Ingres moved to the more existing SQL standard that we're familiar with today.

About 20 years later, Michael Stonebraker created Postgres, also now known as PostgreSQL, based on what was learned from Ingres. So Postgres has been around for quite some time, but we aren't in the 90s yet, so hold on — first we have to make it through the 80s.

Until the 80s or so, the evolution of databases and computers in general had mostly been driven by the changing needs of enterprises — big businesses — because you could not really buy these machines; they were very big, they were very expensive. You generally leased them from IBM or Honeywell or GE or whoever. You did not just go out and buy a System/360 — you were paying a monthly lease for that thing.

Until the 80s, computers didn't really exist in a form factor — physical size — that was accessible to most users. And I'm not just talking about home computers; I'm talking about having desktop workstations, which was not a thing until the 80s. The computers were just too big and the graphical interfaces often weren't there — a lot of times there was not a graphical interface.

So then the 1980s showed up, and with the most fashionable decade in human history we also got some pretty sweet hacker movies that kind of helped to popularize the whole desktop computer situation. This is a screengrab from WarGames, which is excellent if you haven't seen it. But computers were no longer a thing that took up an entire room and required a justifiable recurring line item on a corporate outgoing payments. I mean, like Ferris Bueller — a high school student could have one. Granted, a rich kid from the North Shore of Chicago, but still. It was still only like a rich people thing, but universities had them more generally. You started seeing computer labs at universities and you started seeing desktops in offices.

So initially a bunch of different lightweight databases were kind of jockeying for dominance in the market back then. We did have computer games, but largely these were still productivity tools. The champion however in general was dBASE II. Fun fact: there is no dBASE I. Originally dBASE was released as Vulcan, and it was not available for PCs — it was for mainframe machines.

But when IBM was planning the release of their DOS line of PCs, which were also immediately dominant in the market, they commissioned a PC port of Vulcan. And the people at Vulcan decided that they were going to call the PC port dBASE II because the "II" implied a second and thus less buggy release. Marketing.

Also dBASE was ultimately killed by a buggy release — it was dBASE IV. So that didn't play out long term. But dBASE was immediately the dominant player in the market upon the release of IBM's DOS-based PCs, just because it was one of the very few pieces of professional software that was immediately available upon shipping of these PCs — not necessarily because it was the best, but because it was the one that was most available.

It immediately dominated the market and it remained one of the most popular-selling pieces of software through like the early 90s until dBASE IV sucked and everybody was like, "nah dog, we can do something else."

What was significant about dBASE II was that it abstracted away a lot of the kind of crappy parts of interacting with the database that weren't really relevant to what you were trying to achieve, but are still necessary to interact with it. So when you're using dBASE II, you didn't have to worry about things like opening and closing files — it abstracted that away for you.

The fact that this was so easy to use relative to its predecessors — it was not easy to use, to be clear, it still sucked, but relative to its predecessors it was easy to use — meant that immediately nobody wanted to do anything else. A whole industry sprung up around dBASE II, so people were building other databases on top of it, whole companies existed just to provide services built on top of dBASE II. The same way an entire ecosystem sprung up around Kubernetes, that happened with dBASE II. It was really, really dominant. You still kind of see it with legacy applications but it's becoming pretty rare today.

And the 90s arrived. I think the music is better in the 90s, the fashion is radically worse.

In the 1990s things start to change a little bit more radically and in a very different direction, because the way we think about software engineering starts to change fundamentally now. Object-oriented programming had been a thing theoretically for a very long time — for decades. The 90s is absolutely not the first mention of object-oriented programming; it had been a discussion since about the 50s.

However, this is when it became dominant, thanks to a man named Grady Booch releasing a book called Object-Oriented Analysis and Design. I talk about him a lot and it's probably starting to get weird. He's very active on Twitter, so he's gonna hear about it anyway.

Object-oriented programming — you are all probably familiar with it. We start fundamentally thinking about our code and the data we're interacting with as objects, rather than as disparate chunks of whatever, as tables. So we need a different way to interact with the data that we're pulling out of the database that starts looking like an object, with attributes. We're not talking about just a lump of thing — we need an object that has attributes and we need to be able to interact with our data in that way.

This is how we get ORMs — Object Relational Mapping tools. These are pretty indispensable now. If you're a programmer — if you are a back-end programmer at all — you have had to interact with an ORM. It creates a sort of virtual object database within the context of your program so that you can interact with and query data in a way that feels more natural to an object-oriented programming environment.

This is largely Grady Booch's fault — but it's a good kind of fault. He gave us a lot with that book. He also gave us the fundamentals of what we now think of today as continuous integration — the term was coined in his 1991 edition of that book.

So to answer the needs of object-oriented programming, Microsoft acquired an xBase database — xBase meaning a database based on dBASE II. The one they acquired is FoxPro. And they subsequently built Visual FoxPro out of it with support for some object-oriented design features. This was immediately super popular and then very quickly not at all popular outside of a pretty small, close-knit group of developers that relied on it very heavily. As a result of the needs of that group of developers, Microsoft's extended support for Visual FoxPro actually didn't end until like 2015.

The more important thing Microsoft got out of the acquisition of FoxPro was something that they used for Access — it was FoxPro's query optimization routines. They took those and built them into Microsoft Access, which in part killed Visual FoxPro, because it almost immediately made Microsoft Access overwhelmingly the most popular database solution for Windows environments.

That is where I cut my teeth on Microsoft Access actually, so I don't touch it anymore. You'd have to pay me a lot of money to touch it now. Originally Microsoft Access was sold separately, but in the mid-90s, like '95, they started including it as part of the Microsoft Office Suite and from there it was kind of just a done deal.

So we've now entered the modern era, also known as the era of web scale. When we think about web scale and coming into the 2000s and the 2010s and beyond — so I'm not going to play this video. This is the "MongoDB is web scale" video. I'm not playing it for a couple of reasons, not the least of which being time. One of them being there are some parts of it that might not be, you know, friendly. But if you can go to that link you can watch it yourself.

What this video was was kind of a rip on people adopting whatever was the latest trendy thing and just sort of spewing out things that they had read. There's two people in the video, one saying "why are you using this?" And "well, MongoDB is web scale." This is not a dig at MongoDB — it's kind of saying "I don't know, it's fine." One of my favorite parts is he says, "Look, you'd probably just have as much luck just piping all your data to /dev/null," and he says "well, if piping to /dev/null is web scale, then I'm gonna do that." But when we think about what does web scale mean — this was a lot of what was happening, and while it sounds buzzwordy, it's just the scale at which we were working.

There's actually an interesting connection back to the history of infrastructure code: if you think about why Chef exists, Chef exists because Puppet didn't work. The folks who created Chef were working as consultants for a lot of valley-based web scale large web companies using Puppet, and Puppet was not built to handle that scale. So they had to build a system that could do that. So this is where we were at: going from the era of — coming from an ops background — managing a few hundred servers in my data center, now we're talking about thousands upon thousands of machines. We need to be able to scale horizontally, everything's kind of changing.

So before this term "NoSQL" — however we want to say that — the term was originally coined by Carlos Strozzi in 1998 because he had a project called the Strozzi NoSQL open source relational database. People really have trouble naming things sometimes. The idea of Strozzi's NoSQL database was that it didn't expose using standard Structured Query Language, SQL, but it was still relational. It stored all its data as ASCII files and used shell scripts instead of SQL to access the data.

This didn't have really anything in common with when we talk about NoSQL today. So Strozzi — kind of taking a hint from our boy the Top Gun of Tables — gets pissed off about how people talk about things. His suggestion is: "well, because the current NoSQL movement departs from the relational model, they shouldn't call it NoSQL, they should call it 'NoRel' as in non-relational." We're still salty about that.

So then when we think about the term NoSQL as we usually think about it, we're thinking about the model from Johan Oskarsson, who was a developer at Last.fm. He put together an event in 2009 to talk about open source distributed non-relational databases. It wasn't really a conference — it was kind of a big meetup. They were wanting to talk about this increasing number of non-relational distributed data stores — open source versions or clones of BigTable, MapReduce, and Amazon's DynamoDB. So he organized this meeting in San Francisco to talk about these things, and they were like, "well, we have to think about what's a brief term we could use as a Twitter hashtag while we're talking about this," and Eric Evans from Rackspace came up with NoSQL.

It's actually so online that we have an entire movement. And this is what DevOps is — DevOps is called DevOps for the same reason. Andrew Clay Shafer was watching a talk from John Allspaw and Paul Hammond at Velocity, and he was tweeting about it, and he hashtagged #devops. That's why it's called DevOps. The other reason it's called DevOps is "Agile System Infrastructure" was too long of a name for a conference so they called it DevOps Days.

The idea is that this hashtag itself was just sort of meant for that one meeting — it was just kind of a random thing. And it turned out it spread worldwide and now it's the de facto name for a whole structure of data products.

So usually when people think about a NoSQL database, they're thinking about the document model — a lot of times you think about CouchDB. But there are other models that are NoSQL, like Redis as a key-value store, column databases like Cassandra, different graph models, etc. But there are a couple of things that are generally in common about these NoSQL data platforms.

One of them is they're not relational, so joins — yeah. Mostly they're open source, although Berkeley DB is practically open source — we just need to throw some digs at Larry. They're cluster-friendly. And they're schema-less — which, they actually have a little bit of a schema, so that's a bit fussy, that's marketing. But they're mostly not as schema-dependent. And then they emerged again out of web scale needs.

What we mean by that — when we think about how we scale horizontally versus vertically. When we think about systems like a Postgres server, or building a SQL Server or something like that, and you need more horsepower: what do you do? You throw more CPU in it, you give it more memory, you're making a bigger box. But then when we think about departing from data for a second — in the beginnings of the web scale world, we were like "okay, we're going to scale our front ends," so you scale them horizontally, you make more of them. Well, then it's the same model coming to our data stores. And again, the idea of scaling horizontally should come naturally to a bunch of people at a Kubernetes conference.

Another way to think about the difference between the relational databases and these NoSQL ones is the difference between ACID and the CAP theorem. So relational databases are kind of based upon the idea around ACID. The characteristics of ACID are:

Atomicity — transactions are performed one at a time or they don't happen at all. Consistency — we don't leave our database in a halfway complete state, so if an error occurs it ensures that it's rolled back so we have consistency of our data. Isolation — transactions occur independently so no transaction has access to another transaction. And Durability — the changes made to the database through these transactions once completed are committed to the database and don't get lost. That's ACID.

When we think about the web scale database, the non-relational, NoSQL, we think more about CAP theorem. CAP theorem has three pieces. C is Consistency — not to be confused with the consistency in ACID; it's similar but the difference is that the user should be able to see the same data no matter what node or machine they connect to on the cluster. So no matter which one of maybe hundreds or thousands of nodes I hit, I'm going to get the same data. So if data has been written to one node it needs to be replicated to all the replicas — that's the C in CAP. A is Availability — every request from the user should get a response. If I make a request I get something back, whether the user wants to read or write they should get a response even if the operation was unsuccessful. And finally P is Partition Tolerance — a partition is when a node loses connectivity and can't receive messages from another node, so it's been partitioned. It could be because of all sorts of things: server crash, network failure, all kinds of reasons. Partition tolerance ensures that the system should be able to work even if there's a partition in the system. CAP theorem basically says you get two. It's sort of like "fast, cheap, good — pick two."

If you're interested in understanding more about CAP theorem, Kyle Kingsbury — aka aphyr — from the Jepsen reports basically goes through and runs reports on all of these different database systems and evaluates them across CAP theorem. If you really want to get nerdy about database fit, you should read everything Kyle ever writes.

This is an interesting one: when we think about different kinds of data stores as they've evolved, one of them is a hypermedia database. A hypermedia database is one where any word or piece of text representing an object can be hyperlinked to another object. Think about an online encyclopedia — Wikipedia is technically a hypertext database. The World Wide Web is really pretty much the biggest, largest distributed hypertext database, if you want to be technically correct, which is as we all know the best kind of correct.

There's also one other type of database system you can use. I would not recommend it — there are some thought leaders on Twitter who will tell you this is a fine way. Technically you can use Amazon Route 53 as a database. But DNS is a hierarchical database, so there you go.

When we kind of think about the things as we've gone through our journey from the 50s all the way through to the now, and looked at it — what are some of the things that we have learned? One thing we've learned is that we look really good in Devo power domes, but that's neither here nor there.

A couple things that we see as we look through this: everything we do is iterative. It's not original, there's nothing new under the sun, we're building upon the shoulders of giants, and we're continuing to go back through Kat's point about everything being about improving latency, improving speed, improving scale. Otherwise, everything old is new again. These ideas still connect back.

One thing I thought was interesting — one of my colleagues who's been programming since 1981 pointed out that it's interesting to think about: if we look across the history, relational databases really kind of came in a little bit relatively late. It doesn't seem like it, but if you look contextually it was a little bit late in there. It feels like they've been there forever, but they really only held dominance for a relatively short amount of time. We still use operational data stores — Postgres, MySQL, SQL Server, Oracle — they still exist. But in order to meet the needs of the way we do business now and that continual evolution, I think that's going back to Kat's point: there's not one thing that would rule them all. They all solve different problems and solve different problems in concert.

So this is where you can find us on the internet across various places. Speaking.MatStratton.com has the slides and it also has for this talk a list of the references, and we also like to provide attribution for photos. We had to lift a lot of images. It turns out it's actually super hard to find photos of some of these really old machines. One of the colleagues was saying could you show screenshots of what some of these systems looked like, and it was like — no, we actually cannot. The Atlas is arguably the most important computer ever built and it's basically undocumented and there are like five photos of it.

So thank you very much for hanging out with us and hopefully it was interesting.