Tutorial: mongodb

This tutorial is about installing a MongoDB cluster on Debian/Ubuntu.

There are different ways to run a MongoDB cluster. My prefered one is the ReplicaSet.

For a working ReplicaSet you need at least three servers.

  1. Running the master
  2. Running the slave
  3. Running the Arbiter

This is caused by the polling of the MongoDB cluster partners. Each one gives his vote to one of the servers. The one with majority of votes becomes the master. Therefore the uneven number of cluster servers.
An Arbiter is part of the cluster but not holding any data - it is only voting. You need about 5 MB of free RAM to run an Arbiter.

I don't want to talk about the pros and cons of MongoDB or NoSQL.

So back to the installation:

  1. Adding apt key

    sudo apt-key adv --keyserver keyserver.ubuntu.com --recv 7F0CEB10
    
  2. Adding the repro

    nano /etc/apt/sources.list
    Add this line for Ubuntu:
    deb http://downloads-distro.mongodb.org/repo/ubuntu-upstart dist 10gen
    Add this line for Debian:
    deb http://downloads-distro.mongodb.org/repo/debian-sysvinit dist 10gen
    
  3. Install MongoDB

    sudo apt-get update
    sudo apt-get install mongodb-10gen
    
  4. Configuration of MongoDB
    For me it was easier to create new directories for the data and for logging:

    sudo mkdir /mongodb
    sudo mkdir /mongodb/log
    sudo mkdir /mongodb/journal
    sudo chown -R mongodb:mongodb /mongodb
    

    Now we can edit the mongodb.conf

    sudo nano /etc/mongodb.conf
    
    #Path for the db files
    dbpath=/mongodb
    #Path for the log file
    logpath=/mongodb/log/mongodb.log
    logappend=true
    
    #For cluster mode mongodb has to listen to a public ip
    #Enter a public ip (if you have more than one
    #bind_ip = 127.0.0.1
    port = 27017
    
    journal=true
    noauth = true
    #auth = true
    
    #quota = true
    
    nohttpinterface = true
    rest = true
    
    #Sets the default size of db files
    #smallfiles reduces the initial size for data files and limits them to 512 megabytes
    #smallfiles setting also reduces the size of each journal files from 1 gigabyte to 128 megabytes
    smallfiles = true
    
    #shared secret - authentication information for replica set members
    keyFile = /etc/keymongodb
    #name of replica set
    replSet = myreplica
    

    To generate a secret (authentication) for the replicaset run following command:

    sudo openssl rand -base64 80 > /etc/keymongodb
    

    You have to copy this key file to all members of the replica set.

  5. Setup iptables rules
    You should limit the access to your MongoDB instances: (this have to be done for all replica set members)

    #MongoDB
    iptables -A INPUT -s ip-of-master -i $device -m state --state NEW -p tcp --dport 27017 -j ACCEPT
    iptables -A OUTPUT -s ip-of-master -m state --state NEW -p tcp --dport 27017 -j ACCEPT
    iptables -A INPUT -s ip-of-slave -i $device -m state --state NEW -p tcp --dport 27017 -j ACCEPT
    iptables -A OUTPUT -s ip-of-slave -m state --state NEW -p tcp --dport 27017 -j ACCEPT
    iptables -A INPUT -s ip-of-arbiter -i $device -m state --state NEW -p tcp --dport 27017 -j ACCEPT
    iptables -A OUTPUT -s ip-of-arbiter -m state --state NEW -p tcp --dport 27017 -j ACCEPT
    
  6. Restart the instances

    sudo service mongodb restart
    
  7. Setup the replicaset
    We have to start the mongo client "mongo". This has to be run on a single member because this information is allready synced between the different replica set members.

    mongo
    rs.initiate()
    cfg = rs.conf()
    cfg.members[0].priority = 10
    cfg.members[0].host = "ip-of-master:27017"
    rs.reconfig(cfg)
    rs.add("ip-of-slave:27017")
    rs.addArb("ip-of-arbiter:27017")
    cfg = rs.conf()
    rs.reconfig(cfg)
    

    What we do:

    • Initiate the ReplicaSet
    • load the config
    • set the priority of the master to 10 (to ensure that the first voting results into our wished master)
    • set the host of the master to it's public ip (mongodb usese the hostname which ofter does not resolve to the public ip)
    • add the node to the replica set
    • add the arbiter to the replica set
    • reload the config (check if every ip and port is correct)
    • save the config
  8. After some minutes the members of the replica set start a vote and afterwars start to sync each collection.

We are done. The cluster is running.

To test it:

Connect to the master and run following commands (on the primary master):

mongo
use testdata
doc1 = { name: "test1", value: 10}
doc2 = { name: "test2", value: 15}
db.simple.insert( doc1 )
db.simple.insert( doc2 )

show collections

db.simple.find()

We are switching to the database "testdata". If it is not present it will be automaitcally generated after the first insert.

We are creating two json documents "doc1" and "doc2".

We are inserting them into the collection "simple".

Afterwards we list all available collections and search for all "simple" documents.

Output should be like:

PRIMARY> show collections
simple
system.indexes
system.users
PRIMARY> db.simple.find()
{ "_id" : ObjectId("520f2728c3633ec65806eadc"), "name" : "test1", "value" : 10 }
{ "_id" : ObjectId("520f272cc3633ec65806eadd"), "name" : "test2", "value" : 15 }

Now we connect to the slave:

mongo
rs.slaveOk()
show collections
db.simple.find()

Second command is to ensure that query on slave side is ok.

Output should be:

mongo
MongoDB shell version: 2.4.5
connecting to: test
> use testdata
switched to db testdata
> db.auth('******','******');
1
> show collections
Sat Aug 17 03:35:40.095 JavaScript execution failed: error: { "$err" : "not master and slaveOk=false", "code" : 13435 } at src/mongo/shell/query.js:L128
> rs.slaveOk()
> show collections
simple
system.indexes
system.users
> db.simple.find()
{ "_id" : ObjectId("520f2728c3633ec65806eadc"), "name" : "test1", "value" : 10 }
{ "_id" : ObjectId("520f272cc3633ec65806eadd"), "name" : "test2", "value" : 15 }

So the replication is working.

We should look to the slaveOK things again.

MongoDB uses votes to ensure that the member with the best uptime and connection is becomming the master.

The master is handling all queries and all slaves are pulling the data from the master.

If you want to do something like load balancing you can add the flag "query from slaves too" to your mongodb client. The ReplicaSetClient is able to handle a list of ips. First thing it does is to see who is the master to ensure that the inserts go to the right member.

Next topic would be "security". The config setting:

noauth = true
#auth = true

If you know user rights like MySQL/Oracle you might think that "auth=true" is a must - but MongoDB is only knowing users per database.

So if you have access to a database or not. Every user of a database is able to do everything.

If you want to use this feature to separate web applications (as you see in my last output log) you have to create one admin user:

mongo
use admin
db.addUser("admin", "your-super-password")
db.auth('admin','your-super-password');

You can use any name because MongoDB has not any naming conventions.

After you added that user you can switch the config settings and restart each node. (users are repliaced too).

Next time you connect to your mongodb you have to run:

mongo
use admin
db.auth('admin','your-super-password');

Or you will see this error message:

MongoDB shell version: 2.0.4
connecting to: test
> show collections
Sat Aug 17 10:48:36 uncaught exception: error: {
        "$err" : "unauthorized db:test lock type:-1 client:127.0.0.1",
        "code" : 10057
}

After authentificated you can add additional users by:

use servers
db.addUser("servers", "super-password-2")

After adding the user the database "servers" is created automatically.

Last topic would be the schema less state of MongoDB collections. A collection is just a list of documents of the same type. They don't have to have the same attributes:

PRIMARY> use testdata
switched to db testdata
PRIMARY> doc3 = { name: "test3", value: 10, isactive: false}
{ "name" : "test1", "value" : 10, "isactive" : false }
PRIMARY> db.simple.insert( doc3 )
PRIMARY> db.simple.find()
{ "_id" : ObjectId("520f2728c3633ec65806eadc"), "name" : "test1", "value" : 10 }
{ "_id" : ObjectId("520f272cc3633ec65806eadd"), "name" : "test2", "value" : 15 }
{ "_id" : ObjectId("520f2b94c75fcbd13a79119b"), "name" : "test3", "value" : 10, "isactive" : false }

But no schema means no constraints too.

But you can use index to do so:

A index can be added easily:

db.events.ensureIndex( { "username" : 1, "timestamp" : -1 } )

This would speed up queries of events sorted by username (asc) and timestamp (desc).

You can use a index too to ensure some values are unique:

db.logins.ensureIndex( { "user_id": 1 }, { unique: true } )

By default, unique is false on MongoDB indexes - so you have to set this option.

If you have a lot of documents in one collection you should set the option "{background: true}" to ensure that the index creation is done in background and is therefore non blocking.

That's it.

Select you fav mongodb driver - you will find a lot: http://docs.mongodb....system/drivers/ - and start using your MongoDB.