Tags
MongoDB Performance:
We’re talking about performance. The performance of computer systems is driven by a variety of factors including the performance of the underlying hardware — the CPU, the disk and the memory.
Once we’ve chosen a hardware configuration, it’s going to be our algorithms that determine performance.And for a database-backed application, it’s going to be the algorithms that are used to satisfy our queries.
There are two ways you can impact the latency and throughput of database queries.
One is to add indexes to collections.It allows us to find the document faster.
And the second is to distribute the load across multiple servers using sharding.
In MongoDB 3.0, there is a new pluggable storage called Wired Tiger. The storage engine is the software that controls how the data is stored on disk. And it affects such things as what form the file system takes or what the in memory model of data looks like.We will talk about how the choice of storage engine impact performance.
What is storage engines inside the database ?
It is the interface between the database and the hardware which the database is running. Basically, the storage engine is what the database uses to implement , create, read, update, and delete operations.
The storage engines don’t:
– Change how you perform queries not in the shell, not in the driver.
– They don’t change behavior at the cluster level.
The storage engine does affect how data is written to disk.And this applies whether we’re talking about insert or an update.
It affects how data is deleted or removed from disk.
And it affects how data is read from disk.
Finally it affects data structure used to store data.
BSON is represented on disk.
Since 3.0, we can choose which storage engine we’re using, this allow us to optimize for our use case, for our hardware , et cetera.
Two choices (Since March 2015):
- MMAP V1
- Wired Tiger
MMAP V1:
Is a classic version of MongoDB storage engine ,it maps the data files directly into virtual memory allowing the operating system to do most of the work of the storage engine .It uses mmap system call on disk directly into virtual memory space.If the data files are not already in memory, a page fault will pull them into RAM and if they’re in memory and MongoDB updates them an fsync will propagate the changes back to disk.
How to use MMAP V1 ?
It is a default storage engine in MongoDB 3.0 but we can also use :
mongodb --storageEngine mmapvv1
we can find the storage engine by looking on the log (by default on Centos on /var/log/mongodb or by connecting on the shell:
repset:PRIMARY> db.serverStatus() { "repl" : { "setName" : "repset", "setVersion" : 14, "ismaster" : true, "secondary" : false, "hosts" : [ "base.deb.com:27017", "dn1.deb.com:27017", "dn2.deb.com:27017", "dn3.deb.com:27017" ], "primary" : "base.deb.com:27017", "me" : "base.deb.com:27017", "electionId" : ObjectId("56a03d997d543ae69374a151"), "rbid" : 2048639483 }, "storageEngine" : { "name" : "mmapv1" }
MMVAP V1 comes with collection-level document locking with MongoDB 3.0, prior that it was database-level with 2.2 to 2.6. The granularity with locking may or may not affect performance, it depends of our setup.
The first thing, locking shared resources, the most is Data. If two process attempting to write the same region on disk at the same time, corruption can occur.For this reason, MongoDB has a multiple readers single writer, that means , we can have many readers and they will lock out any writers. As soon as one writer comes in, however, it locks out not only all readers, but all others writers, as well.If data were the only issue, we might have document level locking . But the reality is there’s other information. specifically metadata, where conflicts can occur. For example, two documents located in different places on disk might share a single index, so an update to one document will also involve an update to the other document might involve an update to that same index.Causing a conflict even if the documents are widely separated on disk.
Another example of something that might cause a conflict is the journal, which we’ll talk about shortly.
We’ll note the prior to MongoDB 3.0, we had database level locking.So if we want to distribute the load on the server among multiple databases,we can have simultaneous writes without any conflicts. But with MongoDB 3.0, we have collection level locking, which allows for more flexibility even if we have multiple collections in the same database, we can still write to them simultaneously until such time as we’re fully utilizing our resources.
The next thing is the journal which is a write-ahead log.
Why do we have this ? To ensure consistency in the event of a disk failure during an fsync without some bits might get updated, others not. With the journal, our write down what we’re about to then we do it. So if a disk failure occurs while we’re writing to the journal, that’s fine, we simply don’t perform the update ,when the disk comes back up, it notes the state of the database , notes that there was a partial update logged in the journal, but not complete, it ignores that log. And our database comes up in a consistent state. If a disk failure occurs later while we’re syncing our data to the disk, that’s fine too.We’ve got it complete in the journal and when our system comes back,we know that there was an incomplete update on the document , we look at the journal, complete the update and the database is back in a consistent state.
So journaling is what ensures consistency of the data on disk in the event of failure and MongoDB strongly recommend it for all production systems.
Finally, in MMAPv1, the data on disk is BSON, raw BSON. So bits are mapped from disk to virtual memory directly, this has implications for data modeling.
MMAPv1: Documents and Data Files:
Documents and Data files in MMAPv1 are something going to affect performance. First off, what do the data files look like ?
Starting mongodb, once again mmapv1 is by default . If we connect to mongodb by shell and by default, we’re connecting on the database called test.
show dbs, we see local database.It is usually use by replication and also it does contain the startup log.So there’s going to be one entry for each time we start up MongoDB.It’s not a very big document.
We can see here
[hduser@dn1 mongo]$ ls -alh total 1.2G drwxr-xr-x 4 mongod mongod 4.0K Jan 22 23:30 . drwxr-xr-x. 68 root root 4.0K Jan 21 03:24 .. -rw------- 1 mongod mongod 16M Jan 22 23:30 blog.0 -rw------- 1 mongod mongod 16M Jan 22 23:30 blog.ns drwxr-xr-x 2 mongod mongod 4.0K Jan 22 23:30 journal -rw------- 1 mongod mongod 16M Jan 22 23:30 local.0 -rw------- 1 mongod mongod 512M Jan 22 23:30 local.1 -rw------- 1 mongod mongod 512M Jan 22 23:30 local.2 -rw------- 1 mongod mongod 16M Jan 22 23:30 local.ns -rw------- 1 mongod mongod 16M Jan 22 23:30 m101.0 -rw------- 1 mongod mongod 16M Jan 22 23:30 m101.ns -rw-r--r-- 1 mongod mongod 5 Jan 22 23:30 mongod.lock -rw------- 1 mongod mongod 16M Jan 22 23:30 pcat.0 -rw------- 1 mongod mongod 16M Jan 22 23:30 pcat.ns -rw-r--r-- 1 mongod mongod 69 Jan 22 23:30 storage.bson -rw------- 1 mongod mongod 16M Jan 22 23:30 students.0 -rw------- 1 mongod mongod 16M Jan 22 23:30 students.ns -rw------- 1 mongod mongod 16M Jan 22 23:30 test.0 -rw------- 1 mongod mongod 16M Jan 22 23:30 test.ns
i’ve got a little log file (mongod.lock). Let’s other mongod processes know not to come in , we also have local namespace file like local.ns which contains metadata, and the first data file is local.0 with 16M on this replicat set
var imax = 32; var jmax = 32; var kmax = 1000; function setValues(doc, i, j, k) { doc._id = jmax * kmax * i + kmax * j + k; doc.a = i; doc.b = j; doc.c = k; } var emptyString = 'asdf'; emptyString = emptyString.pad(1000); // make it bigger. // make one thousand copies of our document in an array. listOfDocs = [] for (i=0; i<kmax; i++) { listOfDocs.push({ _id: 0, a: 0, b : 0, c : 0, d : emptyString }); }; // one_thousand_docs is now built. db.dropDatabase(); // start with a clean slate. // db.createCollection("foo", {noPadding: true}) for (i=0; i<imax; i++) { for(j=0; j<jmax; j++) { for (k=0; k<1000; k++) { setValues(listOfDocs[k], i, j, k) }; db.foo.insert(listOfDocs) // breaks up if larger than 1000. } }
See the source here:https://university.mongodb.com/static/MongoDB_2016_M102_January/handouts/loadDatabase.623a5313a2dc.js
This little programm will show us how MongoDB create a data files with MMAPv1, let’s starting it and see the directory of data files:
[hduser@dn1 ~]$ mongo --host 192.168.56.72 --quiet loadDatabase.623a5313a2dc.js MongoDB shell version: 3.0.6 connecting to: base:27017/test
This program is going to add 32 times 32 times 1000 documents.it dropped the database then it explicitly created the collection by setting noPadding with true option . All documents we’re going to be padded. What happened with this case ?
See below the files of the test database:
[hduser@dn1 mongo]$ ls -alh total 3.7G drwxr-xr-x 4 mongod mongod 4.0K Jan 23 07:36 . drwxr-xr-x. 66 root root 4.0K Jan 23 01:23 .. -rw------- 1 mongod mongod 16M Jan 22 23:30 blog.0 -rw------- 1 mongod mongod 16M Jan 22 23:30 blog.ns drwxr-xr-x 2 mongod mongod 4.0K Jan 23 07:34 journal -rw------- 1 mongod mongod 16M Jan 23 00:47 local.0 -rw------- 1 mongod mongod 512M Jan 23 07:38 local.1 -rw------- 1 mongod mongod 512M Jan 23 07:38 local.2 -rw------- 1 mongod mongod 16M Jan 23 07:38 local.ns -rw------- 1 mongod mongod 16M Jan 22 23:30 m101.0 -rw------- 1 mongod mongod 16M Jan 22 23:30 m101.ns -rw-r--r-- 1 mongod mongod 6 Jan 23 00:47 mongod.lock -rw------- 1 mongod mongod 16M Jan 22 23:30 pcat.0 -rw------- 1 mongod mongod 16M Jan 22 23:30 pcat.ns -rw-r--r-- 1 mongod mongod 69 Jan 22 23:30 storage.bson -rw------- 1 mongod mongod 16M Jan 22 23:30 students.0 -rw------- 1 mongod mongod 16M Jan 22 23:30 students.ns -rw------- 1 mongod mongod 16M Jan 23 07:30 test.0 -rw------- 1 mongod mongod 32M Jan 23 07:29 test.1 -rw------- 1 mongod mongod 64M Jan 23 07:38 test.2 -rw------- 1 mongod mongod 128M Jan 23 07:30 test.3 -rw------- 1 mongod mongod 256M Jan 23 07:32 test.4 -rw------- 1 mongod mongod 512M Jan 23 07:37 test.5 -rw------- 1 mongod mongod 512M Jan 23 07:33 test.6 -rw------- 1 mongod mongod 512M Jan 23 07:38 test.7 -rw------- 1 mongod mongod 512M Jan 23 07:38 test.8 -rw------- 1 mongod mongod 16M Jan 23 07:38 test.ns
My first was 32 MB, then i’ve got 64MB, and 128MB and 256 — this is going to keep going until the insertion is done, doubling every time once it gets to two gigabytes, it’s going to stop and continue to allocates to two gigabytes but we’re not going to get there today because in mon configuration i choose smallfiles option (see this link https://docs.mongodb.org/manual/reference/configuration-options/#storage.smallFiles).
We can see my test database has maxed out 512MB .
Now, look at the statistics:
[hduser@dn1 ~]$ mongo --host 192.168.56.72 --quiet repset:PRIMARY> db.foo.stats(); { "ns" : "test.foo", "count" : 1024000, "size" : 1085440000, "avgObjSize" : 1060, "numExtents" : 18, "storageSize" : 1164976128, "lastExtentSize" : 307535872, "paddingFactor" : 1, "paddingFactorNote" : "paddingFactor is unused and unmaintained in 3.0. It remains hard coded to 1.0 for compatibility only.", "userFlags" : 2, "capped" : false, "nindexes" : 1, "totalIndexSize" : 28583296, "indexSizes" : { "_id_" : 28583296 }, "ok" : 1 }
We have one million documents ,we’ve got an average size of 1060 bytes. Padding factor is actually deprecated .
What we want to look at is the number of extents. Here , we have 18 extents. So , what’s an extent ?
Within the data file, several extents are going to be defined, and these extents are going to be address spaces where we can put documents. Now as it happens. We’ve set it up so that these documents are going to be right next to each other.So, it’ll put document number one on D1, document number two on D2 and so on. Now, my documents are all the same size . We can imagine a scenario where document three is really small and document four is really big, and so on.Now as we’re aware MongoDB is schemaless . Maybe we want to add a field to document number two, may be something else.Maybe we want to push some objects to the array.
So what ends up happening to document number two ? It wants to grow, but of course it’s nex the document three (small document), it’s got no room to grow.So what ends up happening is it ends up moving.Now, the movie itself isn’t going to be free, but actually there’s another issue — indexes. The issue is the indexes need to point to the document and in the case of MMAPv1, they point to the address space that the document starts at.
Now, when that document moves not only do we have to delete it from D2 where it was and recreate it after D4 in our example but more than that we need to update every index that points to that document so that it goes from pointing after D4, So in a typical MongoDB deployment,we’re going to end up with several indexes and this can get quite expensive, especially if we’re moving a lot of documents.
Now, let’s suppose that another document needs to get inserted, if it’s a big document, it would have to go over after the new place of D2 after D4. If it’s a small document, it might fit in D3 because in this extent we have a small document and there is a free place. So, suppose we need to update document six, that’s fine.We just take some of that extra space, grow it, and all’s well.
Suppose we want to give everything some space to grow,after all, most things are going to grow sooner or later.
Question: How much space do we want to give it ?
Let’s find out what the default is for MongoDB 3.0. Let’s go back to our load database program and this time we removed the noPadding:true option then execute it in our primary replication:
var imax = 32; var jmax = 32; var kmax = 1000; function setValues(doc, i, j, k) { doc._id = jmax * kmax * i + kmax * j + k; doc.a = i; doc.b = j; doc.c = k; } var emptyString = 'asdf'; emptyString = emptyString.pad(1000); // make it bigger. // make one thousand copies of our document in an array. listOfDocs = [] for (i=0; i<kmax; i++) { listOfDocs.push({ _id: 0, a: 0, b : 0, c : 0, d : emptyString }); }; // one_thousand_docs is now built. db.dropDatabase(); // start with a clean slate. // db.createCollection("foo", {noPadding: true}) for (i=0; i<imax; i++) { for(j=0; j<jmax; j++) { for (k=0; k<1000; k++) { setValues(listOfDocs[k], i, j, k) }; db.foo.insert(listOfDocs) // breaks up if larger than 1000. } }
We can see once again we got the same result like with padding:true option where largest size is 512MB
[hduser@dn1 mongo]$ ls -alh total 3.7G drwxr-xr-x 4 mongod mongod 4.0K Jan 23 08:46 . drwxr-xr-x. 66 root root 4.0K Jan 23 01:23 .. -rw------- 1 mongod mongod 16M Jan 22 23:30 blog.0 -rw------- 1 mongod mongod 16M Jan 22 23:30 blog.ns drwxr-xr-x 2 mongod mongod 4.0K Jan 23 08:44 journal -rw------- 1 mongod mongod 16M Jan 23 00:47 local.0 -rw------- 1 mongod mongod 512M Jan 23 08:48 local.1 -rw------- 1 mongod mongod 512M Jan 23 08:48 local.2 -rw------- 1 mongod mongod 16M Jan 23 08:48 local.ns -rw------- 1 mongod mongod 16M Jan 22 23:30 m101.0 -rw------- 1 mongod mongod 16M Jan 22 23:30 m101.ns -rw-r--r-- 1 mongod mongod 6 Jan 23 00:47 mongod.lock -rw------- 1 mongod mongod 16M Jan 22 23:30 pcat.0 -rw------- 1 mongod mongod 16M Jan 22 23:30 pcat.ns -rw-r--r-- 1 mongod mongod 69 Jan 22 23:30 storage.bson -rw------- 1 mongod mongod 16M Jan 22 23:30 students.0 -rw------- 1 mongod mongod 16M Jan 22 23:30 students.ns -rw------- 1 mongod mongod 16M Jan 23 08:39 test.0 -rw------- 1 mongod mongod 32M Jan 23 08:38 test.1 -rw------- 1 mongod mongod 64M Jan 23 08:48 test.2 -rw------- 1 mongod mongod 128M Jan 23 08:39 test.3 -rw------- 1 mongod mongod 256M Jan 23 08:42 test.4 -rw------- 1 mongod mongod 512M Jan 23 08:47 test.5 -rw------- 1 mongod mongod 512M Jan 23 08:44 test.6 -rw------- 1 mongod mongod 512M Jan 23 08:48 test.7 -rw------- 1 mongod mongod 512M Jan 23 08:48 test.8 -rw------- 1 mongod mongod 16M Jan 23 08:48 test.ns
So even though our documents have padding this time ,it wasn’t enough to make another data file. These were our stats from the old database where we had 18 extents and the object size 1060 bytes , so let’s start over.
[hduser@dn1 ~]$ mongo --host 192.168.56.72 --quiet repset:PRIMARY> db.foo.stats() { "ns" : "test.foo", "count" : 1024000, "size" : 2080768000, "avgObjSize" : 2032, "numExtents" : 20, "storageSize" : 2116751344, "lastExtentSize" : 536600560, "paddingFactor" : 1, "paddingFactorNote" : "paddingFactor is unused and unmaintained in 3.0. It remains hard coded to 1.0 for compatibility only.", "userFlags" : 1, "capped" : false, "nindexes" : 1, "totalIndexSize" : 28583296, "indexSizes" : { "_id_" : 28583296 }, "ok" : 1 }
Now every object size is 2032 bytes,so including padding factor in that ,and actually, i have reason to believe that it’s exactly 1024 bytes because MongoDB 3.0 uses power of two sized allocations . What that means ?
Suppose we insert a very small document,it’s going to allocate 32 bytes, we’ll notice that’s a power of two.If that document grows larger than 32 bytes, it’ll give it 64 or maybe we initial document was 64 bytes and then 128 if it’s bigger still, and so on up to two megabytes or 512 MB if we have small files option, after that , it’ll just add two megabytes at a time , but in our case, 512MB. We’re getting pretty close to the 16 megabytes document limit and we might to think about changing a schema.
So this has a few advantages:
1. There’s room to grow for every document.
2. If they grow big enough to outgrow their record space –record space is document plus padding — they leave a standardized hole that something else can come along and fit into.And since the sizes of these record spaces are quite standardized,it’s very unlikely that we’re going to have a hole that’s never going to be used again.
If the documents are growing continuously,they’re usually not going to have that full amount as padding.
Now,for MongoDB 3.0 this and no padding are the only options that we have.
Prior to that, this was also the default in MongoDB 2.6 and in MongoDB 2.4, it actually was’nt even the default.
What they did in MongoDB 2.4 was they attempted to essentially have the database guess how much space we were going to need.It would start out with no padding factor,and the if it saw that we were growing our documents, it would begin adding some — maybe 10% at first , maybe several times the document size as it would try to guess.
This would leave holes in our data files , spaces that we would never end up filling.It also created a situation where initially we were going to have a lot of document moves. Power of two size allocations actually does a much better job in situations it’s exactly where we don’t know precisely how big our documents are going to grow of making sure that we’re allocating things efficiently. That’s the great thing about it — We don’t have to know a lot about how big things are going to get.
However, if we do know how big our documents are going to get,we can initially have a relatively large document — maybe with a string field that gives us the extra space we need –and then we can immediately unset that field and shrink the document. What ends up happening is that our document doesn’t move and that extra space becomes padding factor . Since our document didn’t move, we don’t have to worry about our indexes getting updated.And since me, the human being,are designing our schema,we know exactly what’s going to be happening going forward.Another option would be to make it larger than usual but leave a placeholder for certain fields that we can use later.
Wired Tiger:
Is a new option since 2015, it is not just a storage engine for MongoDB, it’s also used in other databases. Like MongoDB, Wired Tiger is fully open source.To read more, we can see the web page: http://www.wiredtiger.com/
In conclusion, The storage engine directly determines the data file format because different storage engines can implement different types of compression, and different ways of storing the BSON for mongoDB. And the format of indexes because they are controlled by the storage engine. For instance, MongoDB uses Btrees. With MongoDB 3.0, WiredTiger will be using B+ trees, with other formats expected to come in later releases.
It comes with a number of great features:
1. Document level locking
2. Compression
3. Lacks some pitfalls of MMAPv1
4. It gives some big performance gains.
Wire Tiger is :
1. Built separately from MongoDB
2. It is actually used by others databases
3. It is a open source project
How does it work ?
here we have differents options for storage engine in MongoDB 3.0 and later:
[hduser@dn1 ~]$ mongod --help Storage options: --storageEngine arg (=mmapv1) what storage engine to use --dbpath arg directory for datafiles - defaults to /data/db --directoryperdb each database will be stored in a separate directory --noprealloc disable data file preallocation - will often hurt performance --nssize arg (=16) .ns file size (in MB) for new databases --quota limits each database to a certain number of files (8 default) --quotaFiles arg number of files allowed per db, implies --quota --smallfiles use a smaller default file size --syncdelay arg (=60) seconds between disk syncs (0=never, but not recommended) --upgrade upgrade db if needed --repair run repair on all dbs --repairpath arg root directory for repair files - defaults to dbpath --journal enable journaling --nojournal disable journaling (journaling is on by default for 64 bit) --journalOptions arg journal diagnostic options --journalCommitInterval arg how often to group/batch commit (ms) WiredTiger options: --wiredTigerCacheSizeGB arg maximum amount of memory to allocate for cache; defaults to 1/2 of physical RAM --wiredTigerStatisticsLogDelaySecs arg (=0) seconds to wait between each write to a statistics file in the dbpath; 0 means do not log statistics --wiredTigerJournalCompressor arg (=snappy) use a compressor for log records [none|snappy|zlib] --wiredTigerDirectoryForIndexes Put indexes and data in different directories --wiredTigerCollectionBlockCompressor arg (=snappy) block compression algorithm for collection data [none|snappy|zlib] --wiredTigerIndexPrefixCompression arg (=1) use prefix compression on row-store leaf pages
Now we’re going to try run with storage engine Wired tiger:
The first thing we did is to stop mongod process:
sudo service mongod stop
delete all data files belong to MMAPv1 if not we’ll have a error:
cd /var/lib/mongod sudo rm -Rf * ## Add this entry in your mongodb configuration file (#add new storage engine wiredtiger) storageEngine= wiredTiger or we can do that with command line below: mongod --storageEngine wiredTiger --logpath /var/log/mongodb/mongod.log --fork sudo service mongod start
below we have data files for wiredTiger:
[hduser@base mongo]$ ls -ltr total 149936 -rw-r--r-- 1 mongod mongod 21 Jan 23 12:08 WiredTiger.lock -rw-r--r-- 1 mongod mongod 46 Jan 23 12:08 WiredTiger -rw-r--r-- 1 mongod mongod 533 Jan 23 12:08 WiredTiger.basecfg -rw-r--r-- 1 mongod mongod 95 Jan 23 12:08 storage.bson -rw-r--r-- 1 mongod mongod 16384 Jan 23 12:08 index-10--2152637442347840544.wt -rw-r--r-- 1 mongod mongod 16384 Jan 23 12:08 index-12--2152637442347840544.wt -rw-r--r-- 1 mongod mongod 16384 Jan 23 12:08 index-14--2152637442347840544.wt -rw-r--r-- 1 mongod mongod 16384 Jan 23 12:08 index-16--2152637442347840544.wt -rw-r--r-- 1 mongod mongod 16384 Jan 23 12:08 index-18--2152637442347840544.wt -rw-r--r-- 1 mongod mongod 16384 Jan 23 12:08 index-20--2152637442347840544.wt -rw-r--r-- 1 mongod mongod 16384 Jan 23 12:08 index-22--2152637442347840544.wt -rw-r--r-- 1 mongod mongod 20480 Jan 23 12:08 index-24--2152637442347840544.wt -rw-r--r-- 1 mongod mongod 12066816 Jan 24 17:07 index-26--2152637442347840544.wt -rw-r--r-- 1 mongod mongod 16384 Jan 24 17:07 index-8--2152637442347840544.wt -rw-r--r-- 1 mongod mongod 16384 Jan 24 17:07 index-5--2152637442347840544.wt -rw-r--r-- 1 mongod mongod 32768 Jan 24 18:38 index-1--2152637442347840544.wt -rw-r--r-- 1 mongod mongod 36864 Jan 24 19:33 _mdb_catalog.wt -rw-r--r-- 1 mongod mongod 16384 Jan 24 19:33 collection-9--2152637442347840544.wt -rw-r--r-- 1 mongod mongod 32768 Jan 24 19:33 collection-0--2152637442347840544.wt -rw-r--r-- 1 mongod mongod 16384 Jan 24 19:33 collection-6--2152637442347840544.wt -rw-r--r-- 1 mongod mongod 32768 Jan 24 19:33 collection-7--2152637442347840544.wt drwxr-xr-x 2 mongod mongod 4096 Jan 24 19:33 journal -rw-r--r-- 1 mongod mongod 16384 Jan 24 19:33 collection-4--2152637442347840544.wt -rw-r--r-- 1 mongod mongod 16384 Jan 24 19:33 collection-11--2152637442347840544.wt -rw-r--r-- 1 mongod mongod 16384 Jan 24 19:33 collection-13--2152637442347840544.wt -rw-r--r-- 1 mongod mongod 16384 Jan 24 19:33 collection-15--2152637442347840544.wt -rw-r--r-- 1 mongod mongod 16384 Jan 24 19:33 collection-17--2152637442347840544.wt -rw-r--r-- 1 mongod mongod 16384 Jan 24 19:33 collection-19--2152637442347840544.wt -rw-r--r-- 1 mongod mongod 16384 Jan 24 19:33 collection-21--2152637442347840544.wt -rw-r--r-- 1 mongod mongod 32768 Jan 24 19:33 collection-23--2152637442347840544.wt -rw-r--r-- 1 mongod mongod 140713984 Jan 24 19:33 collection-25--2152637442347840544.wt -rw-r--r-- 1 mongod mongod 5 Jan 24 19:33 mongod.lock -rw-r--r-- 1 mongod mongod 36864 Jan 24 19:34 index-3--2152637442347840544.wt -rw-r--r-- 1 mongod mongod 36864 Jan 24 19:34 collection-2--2152637442347840544.wt -rw-r--r-- 1 mongod mongod 36864 Jan 24 19:35 sizeStorer.wt -rw-r--r-- 1 mongod mongod 102400 Jan 24 19:35 WiredTiger.wt -rw-r--r-- 1 mongod mongod 882 Jan 24 19:35 WiredTiger.turtle
we see the collection and an index already. Let’s go to the database then look at the startup_log on the database local, we will see the storage engine:
"_id" : "dn2.deb.com-1451924833869", "hostname" : "dn2.deb.com", "startTime" : ISODate("2016-01-04T16:27:13Z"), "startTimeLocal" : "Mon Jan 4 11:27:13.869", "cmdLine" : { "config" : "/etc/mongod.conf", "net" : { "bindIp" : "192.168.56.73" }, "processManagement" : { "fork" : true, "pidFilePath" : "/var/run/mongodb/mongod.pid" }, "replication" : { "replSet" : "repset" }, "storage" : { "dbPath" : "/var/lib/mongo", "engine" : "wiredTiger", "mmapv1" : { "smallFiles" : true } },
Let’s created a new document on test database:
it will create another collection with an index. WiredTiger works on the collection level and the indexes are the underscore ID indexes.
WiredTiger Internals:
First, it stores data on disk in B-trees,similar to the B-trees MMAPv1 uses for indexes,but not for its data.And new writes are initially written to files in un-used regions,and then incorporated in with the rest of the data in the background later.During a update, it write a new version of documents,rather than overwriting existing data the way MMAPv1 does in many cases. We might think this sounds like a lot of work,but wiredTiger handles it very well.So with WireTiger we don’t have to worry about document movement,also about padding factor.
In fact, WiredTiger doesn’t even provide padding . Next, it has two caches of memory, the first is the WireTiger cache which is half of our RAM by default, but we can tune that, and the next is the file system cache.
How our data gets from WiredTiger cache to the file system cache, and then onto the drive.
This happens periodically at what’s called a checkpoint. During a checkpoint, our data goes from the WiredTiger cache into the file system cache, and from there it gets flushed to disk.Each checkpoint is a consistent snapshot of our data.
It initiates a new checkpoint 60 seconds after the end of the last checkpoint. So roughly every minute or so,maybe a little bit more.If for some reason we’re running our MongoDB with WireTiger with no journaling and with no replication going on , we would need to go back to the last snapshot for a consistent view of our data — so about a minute.Because it’s a consistent snapshot, if we are using journaling it will truncate the journal at this point.
If too much of our WireTiger cache gets dirty, it will begin flushing data to the file system cache,and from there to disk.
One thing to note here is that since a checkpoint is a valid consistent snapshot of our data,we technically don’t need to run WiredTiger with journaling enabled. We’re guaranteed to have a consistent state of our data from the last checkpoint, and the second to last checkpoint doesn’t get deleted until the next checkpoint from that gets fully written. Of course, journaling does keep our data on disk much more timely . It gets written to disk in a much more timely fashion, that is ti say.
WiredTiger – Document level locking:
Technically WiredTiger doesn’t have locks but it has good concurrency protocols. The effect is the equivalent of document level locking. With WiredTiger , our writes should scale with the number of threads. Some caveats: No trying to use lots more threads than we have cores.Obviously our hardware is still a potential limitation.
Next , since WiredTiger has its own cache,and since the data in the WiredTiger cash doesn’t have to be the same as the data in the file system cache, or as it’s represented on the drive –actually,they’re quite different -WiredTiger is able to introduce compression.
WIREDTIGER – COMPRESSION:
3 compression options — technically two compression options, or we can run with no compression.
The default is Snappy:it’s fast , it prioritizes speed , but also tries to get as much compression in as it quickly can .
Zlib: prioritizes more compression a little bit more , potentially at the cost of some speed.
Contrast this with MMAPv1 where the BSON needs to be written out on disk, just as it is in memory, so no compression is possible.
Some options we can use:
Index prefix compress , for example, can save some space but it cost of some processor usage. We can also select the option to have indexes in a separate directory , we can also tune our cache size to maximize our performance and finally we can also have statistics logging.
INDEXES:
It is a way of finding our documents very quickly, this is the command :
repset:PRIMARY> use test switched to db test repset:PRIMARY> db.foo.createIndex({a:1}); { "createdCollectionAutomatically" : false, "numIndexesBefore" : 1, "numIndexesAfter" : 2, "ok" : 1 }
This command is very similar to this one:
db.foo.find().sort({a:1}); or i can sort by descending order: db.foo.find().sort({a:-1});
We can specify multiple fields when we create an index:
repset:PRIMARY> db.foo.createIndex({a:1,b:1}) { "createdCollectionAutomatically" : false, "numIndexesBefore" : 2, "numIndexesAfter" : 3, "ok" : 1 }
and much like with a sort the order is important ,it is a1, b1 index and is not the same at all as a b1,a1 index. I can use my index to sort on a1,b1 like this:
db.foo.find().sort({a:1,b:1}); or to sort on a negative 1: db.foo.find().sort({a:-1,b:-1});
Both are acceptable by the way, when we have an index that’s on two fields or more ,this is called a compound index . And i should note that indexes only have two directions (positive 1, positive 1) or (negative 1, negative 1) but i can’t work with negative 1,positive 1 or positive 1, negative 1, that would not use a index.
How to know which index is in my collection ?:
repset:PRIMARY> db.foo.getIndexes(); [ { "v" : 1, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "test.foo" }, { "v" : 1, "key" : { "a" : 1 }, "name" : "a_1", "ns" : "test.foo" }, { "v" : 1, "key" : { "a" : 1, "b" : 1 }, "name" : "a_1_b_1", "ns" : "test.foo" } ]
How to drop an index in my collection ?:
repset:PRIMARY>db.foo.dropIndex({a:1}); { "nIndexesWas" : 3, "ok" : 1 }
Collection Scans:
Why do we create theses indexes ? If we have a collection with for example one billion or even a million or a thousand documents, and we want to find a specific document very quickly,so if we don’t have have a index, the database will do a collection scan just sequentially through the whole thing looking for matches to this and that will be slow if the collection is large.However, if we have an index,and we will have an index in “ID” generally,as the ID index is automatically created for a collection. If we want to look for this “ID” value, we can just descend down until we find “ID” and there will be a pointer to the right record, or records and “duplicate keys” are allowed in MongoDB indexes , Not in the ID index, because of a unique key constraint.
Index Notes:
Unique Indexes:
Sparse Indexes:
TTL Indexes:
Geospatial Indexes:
Text Indexes:
Background Index Creation:
Explain Plans:
repset:PRIMARY> use pcat switched to db pcat repset:PRIMARY> show collections products repset:PRIMARY> db.products.find().pretty(); { "_id" : "ac3", "available" : true, "brand" : "ACME", "name" : "AC3 Phone", "price" : 200, "type" : "phone", "warranty_years" : 1 } { "_id" : "ac7", "available" : false, "brand" : "ACME", "name" : "AC7 Phone", "price" : 320, "type" : "phone", "warranty_years" : 1 } { "_id" : "ac9", "available" : true, "brand" : "ACME", "name" : "AC9 Phone", "price" : 333, "type" : "phone", "warranty_years" : 0.25 } { "_id" : ObjectId("507d95d5719dbef170f15bf9"), "for" : [ "ac3", "ac7", "ac9" ], "name" : "AC3 Series Charger", "price" : 19, "type" : [ "accessory", "charger" ], "warranty_years" : 0.25 } { "_id" : ObjectId("507d95d5719dbef170f15bfa"), "color" : "green", "name" : "AC3 Case Green", "price" : 12, "type" : [ "accessory", "case" ], "warranty_years" : 0 } { "_id" : ObjectId("507d95d5719dbef170f15bfb"), "for" : [ "ac3", "ac7", "ac9", "qp7", "qp8", "qp9" ], "name" : "Phone Extended Warranty", "price" : 38, "type" : "warranty", "warranty_years" : 2 } { "_id" : ObjectId("507d95d5719dbef170f15bfc"), "available" : false, "color" : "black", "for" : "ac3", "name" : "AC3 Case Black", "price" : 12.5, "type" : [ "accessory", "case" ], "warranty_years" : 0.25 } { "_id" : ObjectId("507d95d5719dbef170f15bfd"), "available" : true, "color" : "red", "for" : "ac3", "name" : "AC3 Case Red", "price" : 12, "type" : [ "accessory", "case" ], "warranty_years" : 0.25 } { "_id" : ObjectId("507d95d5719dbef170f15bfe"), "limits" : { "data" : { "n" : 20, "over_rate" : 1, "units" : "gigabytes" }, "sms" : { "n" : 100, "over_rate" : 0.001, "units" : "texts sent" }, "voice" : { "n" : 400, "over_rate" : 0.05, "units" : "minutes" } }, "monthly_price" : 40, "name" : "Phone Service Basic Plan", "term_years" : 2, "type" : "service" } { "_id" : ObjectId("507d95d5719dbef170f15bff"), "limits" : { "data" : { "n" : "unlimited", "over_rate" : 0 }, "sms" : { "n" : "unlimited", "over_rate" : 0 }, "voice" : { "n" : 1000, "over_rate" : 0.05, "units" : "minutes" } }, "monthly_price" : 60, "name" : "Phone Service Core Plan", "term_years" : 1, "type" : "service" } { "_id" : ObjectId("507d95d5719dbef170f15c00"), "limits" : { "data" : { "n" : "unlimited", "over_rate" : 0 }, "sms" : { "n" : "unlimited", "over_rate" : 0.01 }, "voice" : { "n" : 1200, "over_rate" : 0.05, "units" : "minutes" } }, "monthly_price" : 90, "name" : "Phone Service Family Plan", "sales_tax" : true, "term_years" : 3, "type" : "service" } { "_id" : ObjectId("507d95d5719dbef170f15c01"), "additional_tarriffs" : [ { "amount" : { "percent_of_service" : 0.06 }, "kind" : "federal tarriff" }, { "amount" : 2.25, "kind" : "misc tarriff" } ], "cancel_penalty" : 25, "monthly_price" : 50, "name" : "Cable TV Basic Service Package", "sales_tax" : true, "term_years" : 2, "type" : "tv" } repset:PRIMARY> db.products.getIndexes(); [ { "v" : 1, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "pcat.products" }, { "v" : 1, "key" : { "for" : 1 }, "name" : "for_1", "ns" : "pcat.products" } ] repset:PRIMARY> db.products.find({"for":"ac3"}).explain("executionStats"); { "queryPlanner" : { "plannerVersion" : 1, "namespace" : "pcat.products", "indexFilterSet" : false, "parsedQuery" : { "for" : { "$eq" : "ac3" } }, "winningPlan" : { "stage" : "FETCH", "inputStage" : { "stage" : "IXSCAN", "keyPattern" : { "for" : 1 }, "indexName" : "for_1", "isMultiKey" : true, "direction" : "forward", "indexBounds" : { "for" : [ "[\"ac3\", \"ac3\"]" ] } } }, "rejectedPlans" : [ ] }, "executionStats" : { "executionSuccess" : true, "nReturned" : 4, "executionTimeMillis" : 0, "totalKeysExamined" : 4, "totalDocsExamined" : 4, "executionStages" : { "stage" : "FETCH", "nReturned" : 4, "executionTimeMillisEstimate" : 0, "works" : 5, "advanced" : 4, "needTime" : 0, "needFetch" : 0, "saveState" : 0, "restoreState" : 0, "isEOF" : 1, "invalidates" : 0, "docsExamined" : 4, "alreadyHasObj" : 0, "inputStage" : { "stage" : "IXSCAN", "nReturned" : 4, "executionTimeMillisEstimate" : 0, "works" : 5, "advanced" : 4, "needTime" : 0, "needFetch" : 0, "saveState" : 0, "restoreState" : 0, "isEOF" : 1, "invalidates" : 0, "keyPattern" : { "for" : 1 }, "indexName" : "for_1", "isMultiKey" : true, "direction" : "forward", "indexBounds" : { "for" : [ "[\"ac3\", \"ac3\"]" ] }, "keysExamined" : 4, "dupsTested" : 4, "dupsDropped" : 0, "seenInvalidated" : 0, "matchTested" : 0 } } }, "serverInfo" : { "host" : "base.deb.com", "port" : 27017, "version" : "3.0.8", "gitVersion" : "83d8cc25e00e42856924d84e220fbe4a839e605d" }, "ok" : 1 }
Covered Queries:
Read & Write Recap:
db.currentOp() & db.killOp() Revisited:
So in MongoDB it’s possible to see what operations are currently running, on a given mongod or mongos instance and also to kill them.we’re going to take a quick look at that by using db.currentOp() and killOp().we can run db.currentOp() to see what is going on and db.killOp() to kill this operation.
If we’re looking for problems with database performance, a good place to look, when we run db.currentOp() is :“secs_running”: (look for long times) then it may be possible to kill the process with the command db.killOp().
Let’s going to try it by creating some activity on our server, we have the result below:
So we got inprog field –in progress — indicating an array of all operations in progress. So it would be very common to see a fairly long list, if we have a lot of clients connected to the database.Maybe we have 1000 connections to the database, and there’s 40 in progress operations ,here we have only one.
repset:PRIMARY> db.currentOp(); { "inprog" : [ { "desc" : "conn369", "threadId" : "0x6e9cd80", "connectionId" : 369, "opid" : 799732, "active" : true, "secs_running" : 3, "microsecs_running" : NumberLong(3303195), "op" : "getmore", "ns" : "local.oplog.rs", "query" : { "ts" : { "$gte" : Timestamp(1453703749, 34) } }, "client" : "192.168.56.72:45089", "numYields" : 0, "locks" : { }, "waitingForLock" : false, "lockStats" : { "Global" : { "acquireCount" : { "r" : NumberLong(8) } }, "Database" : { "acquireCount" : { "r" : NumberLong(4) } }, "oplog" : { "acquireCount" : { "r" : NumberLong(4) } } } }, { "desc" : "conn3046", "threadId" : "0x36eb5a0", "connectionId" : 3046, "opid" : 798240, "active" : true, "secs_running" : 800, "microsecs_running" : NumberLong(800320312), "op" : "update", "ns" : "performance.sensor_readings", "query" : { "$where" : "function(){sleep(500);return false;}" }, "client" : "192.168.56.71:40703", "numYields" : 1597, "locks" : { "Global" : "w", "Database" : "w", "Collection" : "w" }, "waitingForLock" : false, "lockStats" : { "Global" : { "acquireCount" : { "r" : NumberLong(1600), "w" : NumberLong(1598) } }, "Database" : { "acquireCount" : { "r" : NumberLong(1), "w" : NumberLong(1598) } }, "Collection" : { "acquireCount" : { "r" : NumberLong(1), "w" : NumberLong(1598) } } } } ] }
repset:PRIMARY> db.setProfilingLevel(2); { "was" : 0, "slowms" : 100, "ok" : 1 }
2016-01-25T18:18:28.882-0500 I WRITE [conn3212] update performance.sensor_readings query: { $where: "function(){sleep(500);return false;}" } update: { $set: { name: "fran" } } keyUpdates:0 writeConflicts:0 exception: JavaScript execution terminated code:16712 numYields:294 locks:{ Global: { acquireCount: { r: 297, w: 295 } }, Database: { acquireCount: { r: 1, w: 295 } }, Collection: { acquireCount: { r: 1, w: 295 } } } 147791ms 2016-01-25T18:18:28.883-0500 I COMMAND [conn3212] command performance.$cmd command: update { update: "sensor_readings", updates: [ { q: { $where: "function(){sleep(500);return false;}" }, u: { $set: { name: "fran" } }, multi: true, upsert: false } ], ordered: true } keyUpdates:0 writeConflicts:0 numYields:0 reslen:186 locks:{ Global: { acquireCount: { r: 298, w: 296 } }, Database: { acquireCount: { r: 1, w: 296 } }, Collection: { acquireCount: { r: 1, w: 296 } }, Metadata: { acquireCount: { W: 1 } } } 147792ms
repset:PRIMARY> db.killOp(798240) { "info" : "attempting to kill op" }
below source code for the script create_score, pers.js:
use students; db.grades.drop(); for (i = 0; i < 10000000; i++) { for (j = 0; j < 4; j++) { assess = ['exam', 'quiz', 'homework', 'homework']; record = {'student_id':i, 'type':assess[j], 'score':Math.random()*100}; db.grades.insert(record); } }
below source code for the script stress_students.js:
import pymongo # establish a connection to the database connection = pymongo.MongoClient("mongodb://localhost") # get a handle to the test database db=connection.school foo = db.students for j in range(1,10): for i in range(400000,500000): doc = foo.find_one({'student_id':i}) # print "first score for student ",doc['student_id'],"is ",doc['scores'][0]['score'] if (i % 1000 == 0): print "Did 1000 Searches"