Продолжая мою серию статей о MongoDB и Python, эта статья знакомит с инструментарием Python MongoDB Ming и тем, что он может сделать, чтобы упростить код MongoDB и упростить обслуживание. Если вы только начинаете работать с MongoDB, вы можете сначала прочитать предыдущие статьи серии:
- Начало работы с MongoDB и Python
- Двигаясь вместе с PyMongo
- GridFS: файловая система MongoDB
- Агрегация в MongoDB (часть 1)
- Новая структура агрегации MongoDB
И теперь, когда вы все пойманы, давайте сразу же начнем с Мингом ….
Почему мин?
Если вы пришли в MongoDB из мира реляционных баз данных, вы, вероятно, были поражены тем, насколько просто все: нет необходимости в большом объектном / реляционном маппере, нет нового языка запросов для изучения (ну, может быть, немного, но мы пока что замаскирую) все просто словари Python, и это так, очень быстро! Хотя в какой-то степени это все верно, одной из важных вещей, которые вы бросаете с MongoDB, является структура .
MongoDB is sometimes referred to as a schema-free database. (This is not technically true; I find it more useful to think of MongoDB as having dynamically typed documents. The collection doesn’t tell you anything about the type of documents it contains, but each individual document can be inspected.) While this can be nice, as it’s easy to evolve your schema quickly in development, it’s easy to get yourself in trouble the first time your application tries to query by a field that only exists in some of your documents.
The fact of the matter is that even if the database cares nothing about your schema, your application does, and if you play too fast and lose with document structure, it will come back to haunt you in the end. The main reason Ming was created at SourceForge was to deal with just this problem. We wanted a (thin) layer on top of pymongo that would do a couple of things for us:
- Make sure that we don’t put malformed data into the database
- Try to ‘fix’ malformed data coming back from the database
So, without belaboring the point of its existence, let’s jump into Ming.
Defining your schema
When using Ming, the first thing you need to do is to tell it what your documents look like. For this, Ming provides the collection function.
from datetime import datetime from ming import collection, Field, Session from ming import schema as S session = Session() MyDoc = collection( 'user', session, Field('_id', S.ObjectId), Field('username', str), Field('created', datetime, if_missing=datetime.utcnow), ...)
There are a few of things to note above:
- The MongoDB collection name is passed as the first argument to collection
- The Session object is used to abstract away the pymongo connection. We will see how to configure it below.
- Each field in our schema gets its own Field definition. Fields contain a name, a schema item (S.ObjectId, str, and datetime in this example), and optional arguments that affect the field.
- The special if_missing keyword argument allows you to supply default arguments which will be ‘filled in’ by Ming. If you pass a function, as above, the function will be called to generate a default value.
Schema items bear a bit more explanation. Ming internally always works with objects from the ming.schema module, but it also provides shortcuts to ease schema definitions. The translation between shortcut and ming.schema.SchemaItem appears below:
shorthand | SchemaItem | Notes |
---|---|---|
None | Anything | |
int | Int | |
str | String | Unicode |
float | Float | |
bool | Bool | |
datetime | DateTime | |
[] | Array(Anything()) | Any valid array |
[int] | Array(Int()) | |
{str:None} | Object({str:None}) | Any valid object |
{«a»: int} | Object({«a»: int}) | Embedded schema |
Note above that we can create complex schemas using Ming. A blog post might have the following definition, for example:
BlogPost = collection( 'blog.post', session, Field('_id', S.ObjectId), Field('posted', datetime, if_missing=datetime.utcnow), Field('title', str), Field('author', dict( username=str, display_name=str)), Field('text', str), Field('comments', [ dict( author=dict( username=str, display_name=str), posted=S.DateTime(if_missing=datetime.utcnow), text=str) ]))
Note in the schema above that author is an embedded document, and comments is an embedded array of documents.
Indexing
If we expected to do a lot of queries on user.username, we could add an index simply by updating the code above to read:
... Field('username', str, index=True) ...
Creating the indexes in the schema like this has the nice property that Ming will ensure that those indexes exist the first time it touches the database. We can also set a unique index on a field by using the unique optional argument:
... Field('username', str, unique=True) ...
Ming also support specifying compound indexes by using the Index object in the collection definition. Suppose we wished to keep a separate list of users, scoped by client_id. In this case, the schema might look more like the following:
from datetime import datetime from ming import collection, Field, Index, Session from ming import schema as S session = Session() MyDoc = collection( 'user', session, Field('_id', S.ObjectId), Field('client_id', S.ObjectId, if_missing=None), Field('username', str), Field('created', datetime, if_missing=datetime.utcnow), Index('client_id', 'username', unique=True), ...)
In the example above, the index would be created as follows:
db.user.ensure_index([('client_id', 1), ('username', 1)], unique=True)
By default, each key in an index created by Ming is sorted in ascending order. If you want to change this, you can explicitly specify the sort order for the index:
... Index(('client_id', -1), ('username', 1), unique=True) ...
Connection and configuration
Once we’ve defined our schema, we can use it by binding the session to the appropriate MongoDB database using ming.datastore:
from ming import datastore session.bind = datastore.DataStore( 'mongodb://localhost:27017', database='test')
More typically, we will create our session as a named session and bind it somewhere else in our application (perhaps in our startup script):
session = ming.Session.by_name('test) ... ming.config.configure_from_nested_dict(dict( test=dict( master='mongodb://localhost:27017', database='test') ))
By using named schemas, you can decouple your schema definition code from the actual configuration of your database connection. This is often useful when you will be reading connection information from a configuration file, for instance.
Querying and updating
To show how Ming supports querying and updating, let’s go back to our simple User schema above:
from datetime import datetime from ming import collection, Field, Index, Session from ming import schema as S session = Session() MyDoc = collection( 'user', session, Field('_id', S.ObjectId), Field('client_id', S.ObjectId, if_missing=None), Field('username', str), Field('created', datetime, if_missing=datetime.utcnow), Index('client_id', 'username', unique=True), ...)
Now let’s insert some data:
>>> import pymongo >>> conn = pymongo.Connection() >>> db = conn.test >>> db.user.insert([ ... dict(username='rick'), ... dict(username='jenny'), ... dict(username='mark')]) [ObjectId('4fd24c96fb72f08265000000'), ObjectId('4fd24c96fb72f08265000001'), ObjectId('4fd24c96fb72f08265000002')]
To get the data back out, we simply use the collection’s manager property m:
>>> MyDoc.m.find().all() [{'username': u'rick', '_id': ObjectId('4fd24c96fb72f08265000000'), 'client_id': None, 'created': datetime.datetime(2012, 6, 8, 19, 8, 28, 522073)}, {'username': u'jenny', '_id': ObjectId('4fd24c96fb72f08265000001'), 'client_id': None, 'created': datetime.datetime(2012, 6, 8, 19, 8, 28, 522195)}, {'username': u'mark', '_id': ObjectId('4fd24c96fb72f08265000002'), 'client_id': None, 'created': datetime.datetime(2012, 6, 8, 19, 8, 28, 522315)}]
Notice how Ming has filled in the values we omitted when creating the user documents. In this case, it’s actually filling them in as they are returned from the database. We can drop down to the pymongo layer to see this by using the m.collection property on MyDoc:
>>> list(MyDoc.m.collection.find()) [{u'username': u'rick', u'_id': ObjectId('4fd24c96fb72f08265000000')}, {u'username': u'jenny', u'_id': ObjectId('4fd24c96fb72f08265000001')}, {u'username': u'mark', u'_id': ObjectId('4fd24c96fb72f08265000002')}]
Now let’s remove the documents we created and create some using Ming:
>>> MyDoc.m.remove() >>> >>> MyDoc(dict(username='rick')).m.insert() >>> MyDoc(dict(username='jenny')).m.insert() >>> MyDoc(dict(username='mark')).m.insert() >>> >>> MyDoc.m.collection.find_one() {u'username': u'rick', u'_id': ObjectId('4fd24f95fb72f08265000003'), u'client_id': None, u'created': datetime.datetime(2012, 6, 8, 19, 16, 37, 565000)}
Note that when we created the documents using Ming, we see the default values stored in the database.
Another thing to note above is that when we inserted the new documents, we didn’t have to specify the table. Ming documents are actually dict subclasses, but they «remember» where they came from. To update a document, all we need to do is to call .m.save() on the document:
>>> doc = MyDoc.m.get(username='rick') >>> import bson >>> doc.client_id=bson.ObjectId() >>> doc.username u'rick' >>> doc.client_id ObjectId('4fd250bdfb72f08265000006') >>> doc.m.save()
If you’d prefer to use MongoDB’s atomic updates, you can use the manager method update_partial:
>>> MyDoc.m.update_partial( ... dict(username='rick'), ... {'$set': { 'client_id': None}}) {u'updatedExisting': True, u'connectionId': 232, u'ok': 1.0, u'err': None, u'n': 1}
More to come
There’s a lot more to Ming, which I’ll cover in future articles, including data polymorphism, eager and lazy data migration, [gridfs][gridfs] support, and an object-document mapper providing object-relational type capabilities.
So what do you think? Is Ming something that you would use for your projects? Have you chosen one of the other MongoDB mappers? Please let me know in the comments below.
Other announcements
If you’re looking for MongoDB and Python training classes, please sign up to hear about it when I start offering them, and to get a 25% discount on registration. And if you happen to be attending the SouthEast LinuxFest, I’d love it if you’d drop by my talk on building your first MongoDB application on Saturday morning at 11:30.