Добро пожаловать в нашу серию блогов, в которых освещаются различия между разработкой приложения с MongoDB и RDBMS / SQL. На прошлой неделе мы начали освещать 14-й день и добавили список приложений для запуска, организованных по регионам. Мы также добавили данные, предоставленные нам службой внешних прав. На этой неделе мы продолжим наше обсуждение 14-го дня, углубившись в вопросы.
Прежде чем мы начнем, давайте освежим наши воспоминания структурой, которую мы использовали для постановки нашего обсуждения.
- Мы используем Java
- Предположим, у нас есть слой доступа к данным между нашим приложением и MongoDB
- Что касается даты, которая учитывается при рассмотрении примеров, рассматривайте их как индикаторы относительного прогресса, а не фактическое время, необходимое для выполнения задачи.
- Мы не будем вдаваться в исключения или обработку ошибок. Мы не будем путать код с шаблонной или персисторной логикой, которая не меняется со дня на день. Мы не будем входить в соединение с базой данных или другие ресурсы установки. Основное внимание будет уделено основному коду обработки данных.
MongoDB против SQL: день 14 — Запросы
На этом этапе мы рассмотрели, как обращаться с широкими запросами, которые позволяют нам получить все или одну вещь из коллекции. Но мы знаем, что отличительной чертой SQL является его широкие возможности запросов. К счастью для нас, MongoDB также обладает богатыми возможностями запросов.
The big difference between SQL and MongoDB’s query language is that the latter is not a single string “sentence.” It doesn’t require white spaces in between words, or commas, or parentheses, or quoted characters. Instead, MongoDB uses a “cascade” of operator/operand structures, typically in name:value pairs. In other words, the operand in one structure can be another operator/operand structure.
This makes things very exciting for developers because the same techniques we use in our code to manipulate rich shapes of data going into and coming out of MongoDB — whether it’s Java, Python, JavaScript, etc. — can be used to construct, manipulate, and “parse” our query expressions. In fact, no parser is necessary. It is trivially easy to walk the cascade with standard programming techniques and find fields, values, etc. Because the query expression is a structured cascade and not a single string, this also means that it is easy to incrementally add subexpressions to the query without first having to break it apart into component pieces.
MongoDB vs SQL: Query Examples
Now let’s look at some compare and contrast, side-by-side code examples for queries.
First, we see one of the popular command line interpreters for SQL which is going to fetch us some contacts and phones. This will yield a rectangle, as we saw in earlier posts.
Next, in the MongoDB command line interpreter (“CLI”), we set up the equivalent query. Notice that we can use “dotpath” syntax to address subfields within fields in the rich shape. It is also worth noting that the “equals” operator is so common that as a shortcut, MongoDB interprets name:value as “name = value”, without having to explicitly supply the “$eq” operator.
Third, we see the equivalent query in Java / JDBC. Note that although the “sentence” is similar, we start to bump into irritants like escaping quotes.
Lastly, we see the equivalent in MongoDB via the Java driver.
We can see that the overall semantics of queries in MongoDB and SQL are the same and follow the common pattern of query setup, issuance of query, and iteration over a cursor.
Let’s look at some more complicated queries now.
In this query, we’re looking for contacts who either have at least one work phone OR have been hired after a specific date. Again, we can see that the equivalent in MongoDB is pretty straightforward. Note the use of dollar signs in the operator names ($or, $gt) as syntactic sugar. Also note that in both examples it’s important to use an actual date in our comparison, not a string.
The equivalent query in Java / JDBC will look largely the same as before, with a few more escaped quotes.
However, in practice it isn’t as complicated as it appears — and it actually offers more flexibility than SQL:
- First of all, it’s really the same two or three lines just repeated over and over again with different field:value pairs. This makes it easy to cut-and-paste these expressions as you build up your query.
- Second, it is simple to dynamically construct filters and queries without worrying about where we are in the predicate path. We don’t have to worry about white space, commas, or parentheses. We don’t have to worry about splicing in a SORT statement or dynamically adjusting the names of returned fields sandwiched between SELECT and WHERE. Parameter substitution is very straightforward and easily coded, especially when dynamic logical AND and OR statements come into the picture.
If we extrapolate beyond this small code example, it’s evident how easy it is to add more expressions into the $or statement, or to call out to another function that independently crafts a small filtering fragment that we can add to our overall query. As dynamic queries become more complex in SQL, however, the syntactic sugar that makes SQL “human readable” in the CLI begins to work against you in the programmatic construction of a query.
We’ve used the very basic Java query APIs to illustrate the operator/operand nature of the language. We also deliberately chose standard Java HashMap objects to further reduce coupling until the last moment — when we constructed the BasicDBObject to pass to the find() method. For greater convenience, a Builder pattern set of APIs exist as well, but in the end it is still building a cascade of operator/operand structures.
More MongoDB Query Capabilities
MongoDB offers capabilities you come to expect in a full-featured query language including:
- Arbitrary sorting on one or more fields, ascending or descending.
- Projection, i.e. retrieving only specified fields from each document in the cursor, not the entire document.
- Cursor skip() and limit() to easily implement pagination if desired.
- explain(), which returns a wealth of information including full details on query path analysis, document and index counts, and estimated vs. actual processing time.
Perhaps the most important feature is the Aggregation Framework (“agg”), MongoDB’s answer to SQL’s “GROUP BY” clause. Exploring the power of agg is an entire blog in its own right; please see Asya Kamsky’s posts to get an idea of the agg’s power and programmability. For now, here’s an example to get you thinking about it. Suppose we want to count all the different kinds of cell phones owned by our contacts hired before June 1, 2013. Let’s make it a bit more interesting — let’s capture the names of the people who have these phones and only emit those types where more than 1 person has it. Expressed in the MongoDB CLI, we’d try this:
x2 = db.contact.aggregate([ { $match: { "hiredate": {"$lt": new ISODate("20130601") }}}, { $unwind: "$phones"}, { $group: { "_id": "$phones.type", "n": {$sum: 1}, "who": {$push: “$name”}, }}, { $match: { “n”: {“$gt”: 1}} }, { $sort: { "n": -1, "_id": 1} } ]); x2.forEach(function(r) { printjson(r); });
This might yield:
{ "_id" : "mobile", "n" : 3, who: [ “buzz”, “sam”, “dan” ] } { "_id" : "work", "n" : 2, who: [ “sam”, “kay” ] }
The important concepts to grasp with agg are:
- Data is “flowed” through a pipeline of operations (e.g. $match and $unwind). Output from each stage is passed to the next. In the example above, we use the $match operator twice: once to filter the input set, then a second to filter the group set.
- The $unwind operator turns arrays of things into “virtual” documents, one for each element of the array, to simplify further processing.
- The $group operator is extremely powerful and can even create brand new fields based on numeric and string operations of other fields. In particular, as you group data and aggregate on a scalar (e.g. the count of types), you can use the $push operator to capture other information related to that aggregation. The output cursor contains very clear, usable rich shapes.
- The agg pipeline in the Java driver (in fact, most of the drivers) is simply a List of operators, very similar to what we saw earlier with find(). Thus, the same power and flexibility in terms of parameter substitution and subclause conditional inclusion applies to pipeline construction.
Of course, all this functionality is performed efficiently at the engine, not in the client, so millions (or billions) of documents do not have to be dragged across the network.
Next week, we’ll dive into RAD (Rapid Agile Development) and switch over to some Python examples. If you’d like to learn about migrating to MongoDB, download our RDBMS to MongoDB Migration Guide.