9. Administering MongoDB
Shakuntala Gupta Edward1
and Navin Sabharwal2
9.1 Administration Tools
Before you dive into the administration
tasks, here’s a quick overview of the tools. Since MongoDB does
not have a GUI-style administrative interface, most of the
administrative tasks are done using the command line mongo shell.
However, some UIs are available as separate community projects.
9.1.1 mongo
9.1.2 Third-Party Administration Tools
A list of all of the third party
administration tools that support MongoDB is maintained by 10gen
on the MongoDB web site at
https://docs.mongodb.org/ecosystem/tools/administration-interfaces/
.
9.2 Backup and Recovery
Backup is one of the most important
administrative tasks. It ensures that the data is safe and in case
of any emergency can be restored back.
If the data cannot be restored back, the
backup is useless. So, after taking a backup, the administrator
needs to ensure that it’s in a usable format and has captured the
data in a consistent state.
9.2.1 Data File Backup
All of the MongoDB data is stored in a
data directory, which by default is C:\data\db (in Windows) or
/data/db (in LINUX). The default path can be changed to a
different directory using the –dbpath option when starting the
mongod.
The data directory content is a complete
picture of the data that is stored in the MongoDB database. Hence
taking a MongoDB backup is simply copying the entire contents of
the data directory folder.
Generally, it is not safe to copy the
data directory content when MongoDB is running. One option is to
shut down the MongoDB server before copying the data directory
content.
If the server is shut down properly, the
content of the data directory represents a safe snapshot of the
MongoDB data, so it can be copied before the server is restarted
again.
9.2.2 mongodump and mongorestore
mongodump is the MongoDB backup utility
that is supplied as part of the MongoDB distribution. It works as
a regular client by querying a MongoDB instance and writing all
the read documents to the disk.
Let’s perform a backup and then restore
it to validate that the backup is in usable and consistent format.
The following code snippets are from
running the utilities on a Windows platform. The MongoDB server is
running on the localhost instance.
2015-07-15T22:26:47.288-0700 I CONTROL
[initandlisten] MongoDB starting : pid=3820 port=27017
dbpath=c:\data\db\ 64-bit host=ANOC9
2015-07-15T22:28:23.563-0700 I NETWORK
[websvr] admin web console waiting for connections on port 28017
This dumps the entire database under the
dump folder in the bin folder directory itself, as shown in Figure
9-1.
The mongodump utility by default connects
to the localhost interface of the database on the default port.
Next, it pulls and stores each database
and collection’s associated data files into a predefined folder
structure, which defaults to
./dump/[databasename]/[collectionname].bson.
The data is saved in .bson format, which
is similar to the format used by MongoDB for storing its data
internally.
If content is already in the directory,
it will remain untouched unless the dump contains same file. For
example, if the dump contains the files c1.bson and c2.bson, and
the output directory has files c3.bson and c1.bson, then mongodump
will replace the c1.bson file of the folder with its c1.bson file,
and will copy the c2.bson file, but it won’t remove or change
the c3.bson file.
You should make sure that the directory
is empty before using it for mongodump unless you have a
requirement of overlaying the data in your backups.
9.2.2.1 Single Database Backup
In the above example, you executed
mongodump with the default setting, which dumps all of the
databases on the MongoDB database server.
In a real-life scenario, you will have
multiple application databases running on a single server, each
having a different requirement of backup strategies.
9.2.2.2 Collection Level Backup
There are two types of data in every
database: data that changes rarely, such as configuration data
where you maintain the users, their roles, and any
application-related configurations, and then you have data that
changes frequently such as the events data (in case of a
monitoring application), posts data (in case of blog
application), and so on.
As a result, the backup requirements are
different. For instance, the complete database can be backed up
once a week whereas the rapidly changing collection needs to be
backed up every hour.
9.2.2.4 mongorestore
As mentioned, it is mandatory for the
administrators to ensure that the backups are happening in a
consistent and usable format. So the next step is to restore the
data dump back using mongorestore.
This utility will restore the database
back to the state when the dump was taken. Prior to version 3.0,
it was allowed to run the command without even starting the
mongod/mongos. Starting from version 3.0, if the command is
executed before starting the mongod/mongos the following error(s)
will show:
2015-07-15T22:43:25.765-0700 I CONTROL
[initandlisten] MongoDB starting : pid=3820 port=27017
dbpath=c:\data\db\ 64-bit host=ANOC9
2015-07-15T22:43:25.865-0700 I NETWORK
[websvr] admin web console waiting for connections on port 28017
9.2.2.5 Restoring a Single Database
As you saw in the backup section, the
backup strategies can be specified at individual database level.
You can run mongodump to take a backup of a single database by
using the –d option.
2015-07-14T22:47:01.155-0700 building a
list of collections to restore from C
:\practicalmongodb\bin\dump\mydbproc dir
9.2.2.6 Restoring a Single Collection
As with mongodump where you can use –c
option to specify collection-level backups, you can also restore
individual collections by using the –c option with the
mongorestore utility.
9.2.2.7 Mongorestore –Help
The mongorestore also has multiple
options, which can be viewed using the –help option. Consult
the following web site also:
http://docs.mongodb.org/manual/core/backups/
.
9.2.3 fsync and Lock
Although the above two methods
(mongodump and mongorestore) enable you take a database backup
without any downtime, they don’t provide the ability to get a
point-in-time data view.
You saw how to copy the data files to
take the backups, but this requires shutting down the server
before copying the data, which is not feasible in a production
environment.
MongoDB’s fsync command lets you copy
content of the data directory by running MongoDB without changing
any data.
The fsync command forces all pending
writes to be flushed to the disk. Optionally, it holds a lock in
order to prevent further writes until the server is unlocked. This
lock only makes the fsync command usable for backups.
At this point, the server is locked for
any writes, ensuring that the data directory is representing a
consistent, point-in-time snapshot of the data. The data directory
contents can be safely copied to be used as the database backup.
You must unlock the database post the
completion of the backup activity. In order to do so, issue the
following command:
The fsync command lets you take a backup
without downtime and without sacrificing the backup’s
point-in-time nature. However, there is a momentary blocking of
the writes (also called a momentary write downtime).
9.2.4 Slave Backups
Slave backups are the recommended way for
data backups in MongoDB. The slave always stores a data copy that
is nearly in sync with the master, and the slave availability or
performance is not much of an issue. You can apply any of the
techniques discussed earlier on the slave rather than the master:
shutting down, fsync with lock, or dump and restore.
9.3 Importing and Exporting
When you are trying to migrate your
application from one environment to another, you often need to
import data or export data .
9.3.1 mongoimport
MongoDB provides the mongoimport utility
that lets you bulk load data directly into a collection of the
database. It reads from a file and bulk loads the data into a
collection.
The following command imports the data
from a CSV file to the testimport collection on the localhost:
9.3.2 mongoexport
Similar to the mongoimport utility,
MongoDB provides a mongoexport utility that lets you export data
from the MongoDB database. As the name suggests, this utility
exports files from the existing MongoDB collections.
Using –help shows available options
with the mongoexport utility. The following options are the ones
you will end up using most:
-
-q : This is used to specify the
query that will return as output the records that need to be
exported. This is similar to what you specify in the
db.CollectionName.find() function when you have to retrieve
records matching the selection criteria. If no query is
specified, all the documents are exported.
9.4 Managing the Server
In this section, you will look at the
various options that you need to be aware of as an administrator of
the system.
9.4.1 Starting a Server
This section covers how to start the
server. Previously, you used the mongo shell to start the server by
running mongod.exe.
The MongoDB server can be started manually
by opening a command prompt (run as administrator) in Windows or a
terminal window on Linux systems and typing the following command:
This window will display all the
connections that are being made to the mongod. It also displays
information that can be used to monitor the server.
9.4.2 Stopping a Server
The server can be shut down pressing
CTRL+C in the mongod console itself. Otherwise, you can use the
shutdownServer command from the mongo console.
2015-07-14T22:57:21.413-0700 I NETWORK
127.0.0.1:27017 failed couldn't connect to server 127.0.0.1:27017
9.4.3 Viewing Log Files
9.4.4 Server Status
db.ServerStatus() is a simple method
provided by MongoDB for checking the server status, such as number
of connections, uptime, and so on. The output of the server status
command depends upon the operating system platform, MongoDB
version, storage engine used, and type of configuration (like
standalone, replica set, and sharded cluster).
Starting from version 3.0, the following
sections are removed from the output: workingSet, indexCounters,
and recordStats.
In order to check the status of a server
using the MMAPv1 storage engine, connect to the mongo console,
switch to admin db, and issue the db.serverStatus() command .
The above serverStatus output will also
have a “backgroundflushing” section, which displays reports
corresponding to the process used by MongoDB to flush data to disk
using MMAPv1 as the storage engine.
The "opcounters" and "asserts"
sections provide useful information that can be analyzed to
classify any problem.
The “opcounters” section shows the
number of operations of each type. In order to find out if there’s
any problem, you should have a baseline of these operations. If the
counters start deviating from the baseline, this indicates a
problem and will require taking action to bring it back to the
normal state.
The “asserts” section depicts the
number of client and server warnings or exceptions that have
occurred. If you find a rise in such exceptions and warnings, you
need to take a good look at the logfiles to identify if a problem
is developing. A rise in the number of asserts may also indicate a
problem with the data, and in such scenarios MongoDB validate
functions should be used to check that the data is undamaged.
2015-07-14T22:51:05.965-0700 I CONTROL
Hotfix KB2731284 or later update is installed, no need to zero-out
data files
2015-07-29T22:51:05.965-0700 I STORAGE
[initandlisten] wiredtiger_open config:
create,cache_size=1G,session_max=20000,eviction=(threads_max=4),statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0)
9.4.5 Identifying and Repairing MongoDB
The first thing you need to do before you
can start the repair is to take the server offline if it’s not
already. You can use either option mentioned above. In this
example, type ^C in the mongod console. This will shut down the
server.
2015-07-14T22:58:31.171-0700 I CONTROL
Hotfix KB2731284 or later update is installed, no need to zero-out
data files
2015-07-14T22:58:31.173-0700 I CONTROL
[initandlisten] MongoDB starting : pid=3996 port=27017
dbpath=c:\data\db\ 64-bit host=ANOC9
This will repair mongod. If you look at
the output, you’ll find various discrepancies that the utility is
repairing. Once the repair process is over, it exits.
After completion of the repair process,
the server can be started as normal and then the latest database
backups can be used to restore missing data.
At times, you may notice that the drive is
running out of disk space when a large database is under repair.
This is due to the fact that the MongoDB needs to create a
temporary copy of the files on the same drive as the data files. To
overcome this issue, while repairing a database you should use the
–repairpath parameter to specify the drive where the temporary
files can be created during the repair process.
9.4.6 Identifying and Repairing Collection Level Data
Sometimes you might want to validate that
the collection holds valid data and had valid indexes. For such
cases, MongoDB provides a validate() method that validates the
content of the specified collection.
Both the data files and the associated
indexes are checked by default by the validate() option . The
collection statistics are provided to help in identifying if there’s
any problem with the data files or the indexes.
If running validate() indicates that the
indexes are damaged, in that case reIndex can be used to re-index
the indexes of the collection. This drops and rebuilds all the
indexes of the collection.
If the collection’s data files are
corrupt, then running the –repair option is the best way to repair
all of the data files.
No comments:
Post a Comment