«Покажи мне свой код и скрой свои структуры данных, и я буду продолжать мистифицировать. Покажите мне свои структуры данных, и мне обычно не понадобится ваш код; это будет очевидно. — Эрик Рэймонд , в Кафедральном соборе и базаре, 1997
Лингвистические инновации
Основная задача программирования — научить компьютер, как что-то делать. Из-за этого большая часть инноваций в области разработки программного обеспечения была лингвистической инновацией; то есть инновация в простоте и эффективности, с которой программист может обучать компьютерную систему.
Пока машины работают в двоичном формате, мы не говорим с ними таким образом. Каждое десятилетие вводит языки программирования более высокого уровня, и с каждым улучшением способности программистов выражать свои мысли. Эти улучшения включают улучшения в том, как мы выражаем структуры данных, а также в том, как мы выражаем алгоритмы.
Несоответствие объектно-реляционного импеданса
Почти все современные языки программирования поддерживают ОО, и когда мы моделируем сущности в нашем коде, мы обычно моделируем их, используя композицию примитивных типов (целых, строк и т. Д.), Массивов и объектов .
В то время как каждый язык может обрабатывать детали по-разному, идея структур вложенных объектов стала нашим универсальным языком для описания «вещей».
Структуры данных, которые мы используем для сохранения данных, развивались не с той же скоростью. В течение последних 30 лет основной структурой данных для постоянных данных была таблица — набор строк, состоящий из столбцов, содержащих скалярные значения (целые числа, строки и т. Д.). Это мир реляционной базы данных, популяризированный в 1980-х годах благодаря своей транзакционности, быстрым запросам, эффективности использования пространства по сравнению с другими современными системами баз данных и огромным количеством продавцов ORCL.
The difference between the way we model things in code, via objects, and the way they are represented in persistent storage, via tables, has been the source of much difficulty for programmers. Millennia of man-effort have been put against solving the problem of changing the shape of data from the object form to the relational form and back.
Tools called Object-Relational Mapping systems (ORMs) exist for every object-oriented language in existence, and even with these tools, almost any programmer will complain that doing O/R mapping in any meaningful way is a time-consuming chore.
Ted Neward hit it spot on when he said:
“Object-Relational mapping is the Vietnam of our industry”
There were attempts made at object databases in the 90s, but there was no technology that ever became a real alternative to the relational database. The document database, and in particular MongoDB, is the first successful Web-era object store, and because of that, represents the first big linguistic innovation in persistent data structures in a very long time. Instead of flat, two-dimensional tables of records, we have collections of rich, recursive, N-dimensional objects (a.k.a. documents) for records.
An Example: the Blog Post
Consider the blog post. Most likely you would have a class / object structure for modeling blog posts in your code, but if you are using a relational database to store your blog data, each entry would be spread across a handful of tables.
As a developer you, need to get know how to convert the each ‘BlogPost’ object to and from the set of tables that house them in the relational model.
A different approach
Using MongoDB, your blog posts can be stored in a single collection, with each entry looking like this:
{ _id: 1234, author: { name: "Bob Davis", email : "[email protected]" }, post: "In these troubled times I like to …", date: { $date: "2010-07-12 13:23UTC" }, location: [ -121.2322, 42.1223222 ], rating: 2.2, comments: [ { user: "[email protected]", upVotes: 22, downVotes: 14, text: "Great point! I agree" }, { user: "[email protected]", upVotes: 421, downVotes: 22, text: "You are a moron" } ], tags: [ "Politics", "Virginia" ] }
With a document database your data is stored almost exactly as it is represented in your program. There is no complex mapping exercise (although one often chooses to bind objects to instances of particular classes in code).
What’s MongoDB good for?
MongoDB is great for modeling many of the entities that back most modern web-apps, either consumer or enterprise:
- Account and user profiles: can store arrays of addresses with ease
- CMS: the flexible schema of MongoDB is great for heterogeneous collections of content types
- Form data: MongoDB makes it easy to evolve structure of form data over time
- Blogs / user-generated content: can keep data with complex relationships together in one object
- Messaging: vary message meta-data easily per message or message type without needing to maintain separate collections or schemas
- System configuration: just a nice object graph of configuration values, which is very natural in MongoDB
- Log data of any kind: structured log data is the future
- Graphs: just objects and pointers – a perfect fit
- Location based data: MongoDB understands geo-spatial coordinates and natively supports geo-spatial indexing
Looking forward: the data is the interface
There is a famous quote by Eric Raymond, in The Cathedral and the Bazaar (rephrasing an earlier quote by Fred Brooks from the famous The Mythical Man-Month):
“Show me your code and conceal your data structures, and I shall continue to be mystified. Show me your data structures, and I won’t usually need your code; it’ll be obvious.”
Data structures embody the essence of our programs and our ideas. Therefore, as programmers, we are constantly inviting innovation in the ease with which we can define expressive data structures to model our application domain.
People often ask me why MongoDB is so wildly popular. I tell them it’s a data structure thing.
While MongoDB may have ridden onto the scene under the banner of scalability with the rest of the NoSQL database technologies, the disproportionate success of MongoDB is largely based on its innovation as a data structure store that lets us more easily and expressively model the ‘things’ at the heart of our applications. For this reason MongoDB, or something very like it, will become the dominant database paradigm for operational data storage, with relational databases filling the role of a specialized tool.
Having the same basic data model in our code and in the database is the superior method for most use-cases, as it dramatically simplifies the task of application development, and eliminates the layers of complex mapping code that are otherwise required. While a JSON-based document database may in retrospect seem obvious (if it doesn’t yet, it will), doing it right, as the folks at 10gen have, represents a major innovation.