6. Using MongoDB Shell
6.1 Basic Querying
This section will briefly discuss the CRUD
operations (Create, Read, Update, and Delete). Using basic
examples and exercises, you will learn how these operations are
performed in MongoDB. Also, you will understand how queries are
executed in MongoDB.
In contrast to traditional SQL, which is
used for querying, MongoDB uses its own JSON-like
query language to retrieve information from the stored data.
After the successful installation of
MongoDB, as explained in Chapter 5,
you will navigate to the directory C:\practicalmongodb\bin\. This
folder has all of the executables for running MongoDB.
The first step is always to start the
database server. Open the command prompt (by running it as
administrator) and issue the command CD \.
Next, run the command
C:\practicalmongodb\bin\mongod.exe. (If the installation is in some
other folder, the path will change accordingly. For the examples in
this chapter, the installation is in the C:\practicalmongodb
folder.) This will start the database server .
2015-07-06T02:29:24.501-0700 I CONTROL
Hotfix KB2731284 or later update is insalled, no need to zero-out
data files
2015-07-06T02:29:24.575-0700 I CONTROL
[initandlisten] MongoDB starting : pid=384 port=27017
dbpath=c:\data\db\ 64-bit host=ANC09
2015-07-06T02:29:24.575-0700 I CONTROL
[initandlisten] targetMinOS: Windows 7/windows Server 2008 R2
2015-07-06T02:29:24.575-0700 I CONTROL
[initandlisten] OpenSSL version: OpenSSL1.0.1j-fips 19 Mar 2015
2015-07-06T02:29:24.575-0700 I CONTROL
[initandlisten] build info: windows sys getwindowsversion(major=6,
minor=1, build=7601, platform=2, service_pack='Service Pack 1')
BOOST_LIB_VERSION=1_49
Now that the database server is started,
you can start issuing commands to the server using the mongo shell.
Before you look at the mongo shell, let’s
briefly look at how to use the
import/export tool to import and export data in and out of the
MongoDB database.
Next, import the data from the MongoDB
database to a new collection in order to look at how the import
tool works.
Open the command prompt by running it as an
administrator. The following command is used to get help on the
import command :
Issue the
following command to import the data from the file exporteg.csv to
a new collection called importeg in the MyDB database :
C:\>c:\practicalmongodb\bin\mongoimport.exe
--host localhost --db mydb --collection importeg --type csv --file
c:\exporteg.csv --headerline
In order to validate whether the collection
is created and the data is imported, you connect to the database
(which is localhost in this case) using mongo shell, and you issue
commands to validate whether the collection exists or not.
To start the mongo shell, run command
prompt as administrator and issue the command
C:\PracticalMongoDB\bin\mongo.exe (the path will vary based on the
installation folder; in this example, the folder is
C:\PracticalMongoDB\), and press Enter.
{ "_id" :
ObjectId("5450af58c770b7161eefd31d"), "Name" :
"S1", "Gender" : "M", "Class"
: "C1", "Score" : 95, "Age" : 25 }
{ "_id" :
ObjectId("5450af59c770b7161eefd31e"), "Name" :
"S2", "Gender" : "M", "Class"
: "C1", "Score" : 85, "Age" : 18 }
1.Connecting to
the mongo shell
2.Switching to
your database, which is MyDB in this case
3.Checking for
the collections that exist in the MyDB database using show
collections.
4.Checking the
count of the collection that you imported using the import tool.
5.Finally,
executing the find() command to check for the data in the new
collection.
As you can see in Figure 6-1,
by default the database test is used for context.
At any point in time, executing the db
command will show the current database to which the shell is
connected:
To display all the database names, you can
run the show dbs command. Executing this command will list all of
the databases for the connected server.
As shown above, if you need help on any of
the methods of db or collection, you can use db.help() or
db.<CollectionName>.help() . For example, if you need help on
the db command, execute db.help().
Until now you have been using the default
test db. The command use <newdbname> can be used to switch to
a new database.
Before you start your exploration, let’s
first briefly look at MongoDB terminology and concepts
corresponding to SQL terminology and concepts . This is summarized
in Table 6-1.
Let’s start your exploration of the
options for querying in MongoDB. Switch to the MYDBPOC database .
Although the context is switched to
MYDBPOC, the database name will not appear if the show dbs command
is issued because MongoDB doesn’t create a database until data is
inserted into the database. This is in keeping with MongoDB’s
dynamic approach to data facilitating, dynamic namespace
allocation, and a simplified and accelerated development process.
If you issue the show dbs command at this point, it will not list
the MYDBPOC database in the list of databases because the database
is not created until data is inserted into the database.
The following example assumes a
polymorphic collection named users which contains documents of the
following two prototypes:
_id: ObjectID(),
Gender: "M",
}
6.1.1 Create and Insert
You will now look at how databases and
collections are created. As explained earlier, the documents in
MongoDB are in the JSON format.
The first document complies with the first
prototype whereas the second document complies with the second
prototype. You have created two documents named user1 and user2.
You will next add both these documents
(user1 and user2) to the users collection in the following order
of operations:
The above operation will not only insert
the two documents to the users collection but it will also create
the collection as well as the database. The same can be verified
using the show collections and show dbs commands.
Along with the collection users, the
system.indexes collection also gets displayed. This system.indexes
collection is created by default when the database is created. It
manages the information of all the indexes of all collections
within the database.
{ "_id" :
ObjectId("5450c048199484c9a4d26b0a"), "FName"
: "Test", "LName" : "User", "Age"
: 30, "Gender": "M", "Country" :
"US" }
{ "_id" :
ObjectId("5450c05d199484c9a4d26b0b"), "Name" :
"Test", User", "Age" : 45,"Gender"
: "F", "Country" : "US" }
You can see that the two documents you
created are displayed. In addition to the fields you added to the
docu ment, there’s an additional _id field that is generated for
all of the documents.
All documents must have a unique__id
field. If not explicitly specified by you, the same will be
auto-assigned as a unique object ID by MongoDB, as shown in the
example above.
You didn’t explicitly insert an _id
field but when you use the find() command to display the documents
you can see an _id field associated with each document.
6.1.2 Explicitly Creating Collections
6.1.3 Inserting Documents Using Loop
Documents can also be added to the
collection using a for loop. The following code inserts users
using for.
> for(var i=1; i<=20; i++)
db.users.insert({"Name" : "Test User" + i,
"Age": 10+i, "Gender" : "F",
"Country" : "India"})
{ "_id" :
ObjectId("52f48cf474f8fdcfcae84f79"), "FName"
: "Test", "LName" : "User", "Age"
: 30, "Gender" : "M", "Country" :
"US" }
Users appear in the collection. Before
you go any further, let’s understand the “Type “it” for
more” statement.
The find command returns a cursor to the
result set. Instead of displaying all documents(which can be
thousands or millions of results) in one go on the screen, the
cursor displays first 20 documents and waits for the request to
iterate (it) to display the next 20 and so on until all of the
result set is displayed.
6.1.4 Inserting by Explicitly Specifying_id
In the previous examples of insert, the
_id field was not specified, so it was implicitly added. In the
following example, you will see how to explicitly specify the _id
field when inserting the documents within a collection.
While explicitly specifying the _id field, you have to keep in
mind the uniqueness of the field; otherwise the insert will fail.
> db.users.insert({"_id":10, "Name":
"explicit id"})
6.1.5 Update
In this section, you will explore the
update() command, which is used to update the documents in a
collection.
The update() method updates a single
document by default. If you need to update all documents that
match the selection criteria, you can do so by setting the multi
option as true.
Let’s begin by updating the values of
existing columns. The $set operator will be used for updating the
records.
{ "_id" :
ObjectId("52f48eeb74f8fdcfcae84f7b"), "Name" :
"Test User1", "Age" : 11, "Gender" :
"F", "Country" : "India" }
{ "_id" :
ObjectId("52f48eeb74f8fdcfcae84f7c"), "Name" :
"Test User2", "Age" : 12, "Gender" :
"F", "Country" : "India" }
If you check the output, you will see that
only the first document record is updated, which is the default
behavior of update since no multi option was specified.
Issue the find command again to check
whether the country has been updated for all the female employees
or not. Issuing the find command will return the following output:
{ "_id" :
ObjectId("52f48cfb74f8fdcfcae84f7a"), "Name" :
"Test User", "Age" : 45, "Gender" :
"F", "Country" : "UK" }
When working in a real-world application,
you may come across a schema evolution where you might end up
adding or removing fields from the documents. Let’s see how to
perform these alterations in the MongoDB database.
The update() operations can be used at the
document level, which helps in updating either a single document
or set of documents within a collection.
Next, let’s look at how to add new
fields to the documents. In order to add fields to the document,
use the update() command with the $set operator and the multi
option.
If you use a field name with $set , which
is non-existent, then the field will be added to the documents.
The following command will add the field company to all the
documents:
Issuing find command against the user’s
collection, you will find the new field added to all documents.
{ "Age" : 30, "Company"
: "TestComp", "Country" : "US",
"FName" : "Test", "Gender" : "M",
"LName" : "User", "_id" :
ObjectId("52f48cf474f8fdcfcae84f79") }
{ "Age" : 45, "Company"
: "TestComp", "Country" : "UK",
"Gender" : "F", "Name" : "Test
User", "_id" : ObjectId("52f48cfb74f8fdcfcae84f7a")
}
If you execute the update() command with
fields existing in the document, it will update the field’s
value; however, if the field is not present in the document, then
the field will be added to the documents.
You will next see how to use the same
update() command with the $unset operator to remove fields from
the documents.
This can be checked by issuing the find() command against the
Users collection. You can see that the Company field has been
deleted from the documents.
> db.users.find()
6.1.6 Delete
To delete documents in a collection, use
the remove () method . If you specify a selection criterion, only
the documents meeting the criteria will be deleted. If no criteria
is specified, all of the documents will be deleted.
6.1.7 Read
In this part of the chapter, you will
look at various examples illustrating the querying functionality
available as part of MongoDB that enables you to read the stored
data from the database.
In order to start with basic querying,
first create the users collection and insert data following the
insert command.
> for(var i=1; i<=20; i++)
db.users.insert({"Name" : "Test User" + i,
"Age": 10+i, "Gender" : "F",
"Country" : "India"})
{ "_id" :
ObjectId("52f4a823958073ea07e15070"), "FName"
: "Test", "LName" : "User", "Age"
: 30, "Gender" : "M", "Country" :
"US" }
{ "_id" :
ObjectId("52f4a826958073ea07e15071"), "Name" :
"Test User", "Age" : 45, "Gender" :
"F", "Country" : "US" }
{ "_id" :
ObjectId("52f4a83f958073ea07e15083"), "Name" :
"Test User18", "Age" :28, "Gender" :
"F", "Country" : "India" }
Type "it" for more
>
6.1.7.1 Query Documents
A rich query system is provided by MongoDB. Query documents
can be passed as a parameter to the find() method to filter
documents within a collection.
A query document is specified within open “{” and closed
“}” curly braces. A query document is matched against all of
the documents in the collection before returning the result set.
Using the find() command without any query document or an
empty query document such as find({}) returns all the documents
within the collection.
A query document can contain selectors and projectors.
A selector is like a where condition in SQL or a filter that
is used to filter out the results.
A projector is like the select condition or the selection list
that is used to display the data fields.
6.1.7.2 Selector
You will now see how to use the selector. The following
command will return all the female users:
> db.users.find({"Gender":"F"})
{ "_id" : ObjectId("52f4a826958073ea07e15071"),
"Name" : "Test User", "Age" : 45,
"Gender" : "F", "Country" : "US"
}
.............
{ "_id" : ObjectId("52f4a83f958073ea07e15084"),
"Name" : "Test User19", "Age" :29,
"Gender" : "F", "Country" : "India"
}
Type "it" for more
>
Let’s step it up a notch. MongoDB also supports operators
that merge different conditions together in order to refine your
search on the basis of your requirements.
Let’s refine the above query to now look for female users
from India. The following command will return the same:
> db.users.find({"Gender":"F", $or:
[{"Country":"India"}]})
{ "_id" : ObjectId("52f4a83f958073ea07e15072"),
"Name" : "Test User1", "Age" : 11,
"Gender" : "F", "Country" : "India"
}
...........
{ "_id" : ObjectId("52f4a83f958073ea07e15085"),
"Name" : "Test User20", "Age" :30,
"Gender" : "F", "Country" : "India"
}
>
Next, if you want to find all female users who belong to
either India or US, execute the following command:
>db.users.find({"Gender":"F",$or:[{"Country":"India"},{"Country":"US"}]})
{ "_id" : ObjectId("52f4a826958073ea07e15071"),
"Name" : "Test User", "Age" : 45,
"Gender" : "F", "Country" : "US"
}
..............
{ "_id" : ObjectId("52f4a83f958073ea07e15084"),
"Name" : "Test User19", "Age" :29,
"Gender" : "F", "Country" : "India"
}
Type "it" for more
For aggregation requirements, the aggregate functions need to
be used. Next, you’ll learn how to use the count() function for
aggregation.
In the above example, instead of displaying the documents, you
want to find out a count of female users who stay in either India
or the US. So execute the following command:
>db.users.find({"Gender":"F",$or:[{"Country":"India"},
{"Country":"US"}]}).count()
21
>
If you want to find a count of users irrespective of any
selectors, execute the following command:
> db.users.find().count()
22
>
6.1.7.3 Projector
You have seen how to use selectors to filter out documents
within the collection. In the above example, the find() command
returns all fields of the documents matching the selector.
Let’s add a projector to the query document where, in
addition to the selector, you will also mention specific details
or fields that need to be displayed.
Suppose you want to display the first name and age of all
female employees. In this case, along with the selector, a
projector is also used.
Execute the following command to return the desired result
set:
> db.users.find({"Gender":"F"},
{"Name":1,"Age":1})
{ "_id" : ObjectId("52f4a826958073ea07e15071"),
"Name" : "Test User", "Age" : 45 }
..........
Type "it" for more
>
6.1.7.4 sort( )
In MongoDB, the sort order is specified as follows: 1 for
ascending and -1 for descending sort.
If in the above example you want to sort the records by
ascending order of age, you execute the following command:
>db.users.find({"Gender":"F"},
{"Name":1,"Age":1}).sort({"Age":1})
{ "_id" : ObjectId("52f4a83f958073ea07e15072"),
"Name" : "Test User1", "Age" : 11 }
{ "_id" : ObjectId("52f4a83f958073ea07e15073"),
"Name" : "Test User2", "Age" : 12 }
{ "_id" : ObjectId("52f4a83f958073ea07e15074"),
"Name" : "Test User3", "Age" : 13 }
..............
{ "_id" : ObjectId("52f4a83f958073ea07e15085"),
"Name" : "Test User20", "Age" :30 }
Type "it" for more
If you want to display the records in descending order by name
and ascending order by age, you execute the following command:
>db.users.find({"Gender":"F"},{"Name":1,"Age":1}).sort({"Name":-1,"Age":1})
{ "_id" : ObjectId("52f4a83f958073ea07e1507a"),
"Name" : "Test User9", "Age" : 19 }
............
{ "_id" : ObjectId("52f4a83f958073ea07e15072"),
"Name" : "Test User1", "Age" : 11 }
Type "it" for more
6.1.7.5 limit( )
You will now look at how you can limit the records in your
result set. For example, in huge collections with thousands of
documents, if you want to return only five matching documents,
the limit command is used, which enables you to do exactly that.
Returning to your previous query of female users who live in
either India or US, say you want to limit the result set and
return only two users. The following command needs to be
executed:
>db.users.find({"Gender":"F",$or:[{"Country":"India"},{"Country":"US"}]}).limit(2)
{ "_id" : ObjectId("52f4a826958073ea07e15071"),
"Name" : "Test User", "Age" : 45,
"Gender" : "F", "Country" : "US"
}
{ "_id" : ObjectId("52f4a83f958073ea07e15072"),
"Name" : "Test User1", "Age" : 11,
"Gender" : "F", "Country" : "India"
}
6.1.7.6 skip( )
If the requirement is to skip the first two records and return
the third and fourth user, the skip command is used. The
following command needs to be executed:
>db.users.find({"Gender":"F",$or:[{"Country":"India"},
{"Country":"US"}]}).limit(2).skip(2)
{ "_id" : ObjectId("52f4a83f958073ea07e15073"),
"Name" : "Test User2", "Age" : 12,
"Gender" : "F", "Country" : "India"
}
{ "_id" : ObjectId("52f4a83f958073ea07e15074"),
"Name" : "Test User3", "Age" : 13,
"Gender" : "F", "Country" : "India"
}
>
6.1.7.7 findOne( )
Similar to find() is the findOne() command. The findOne()
method can take the same parameters as find(), but rather then
returning a cursor, it returns a single document. Say you want to
return one female user who stays in either India or US. This can
be achieved using the following command:
> db.users.findOne({"Gender":"F"},
{"Name":1,"Age":1})
{
"_id" : ObjectId("52f4a826958073ea07e15071"),
"Name" : "Test User",
"Age" : 45
}
>
Similarly, if you want to return the first record irrespective
of any selector in that case, you can use findOne() and it will
return the first document in the collection.
> db.users.findOne()
{
"_id" : ObjectId("52f4a823958073ea07e15070"),
"FName" : "Test",
"LName" : "User",
"Age" : 30,
"Gender" : "M",
"Country" : "US"}
6.1.7.8 Using Cursor
When the find() method is used, MongoDB returns the results of
the query as a cursor object. In order to display the result, the
mongo shell iterates over the returned cursor.
MongoDB enables the users to work with the Cursor object of
the find method. In the next example, you will see how to store
the cursor object in a variable and manipulate it using a while
loop.
Say you want to return all the users in the US. In order to do
so, you created a variable, assigned the output of find() to the
variable, which is a cursor, and then using the while loop you
iterate and print the output.
The code snippet is as follows:
> var c = db.users.find({"Country":"US"})
> while(c.hasNext()) printjson(c.next())
{
"_id" : ObjectId("52f4a823958073ea07e15070"),
"FName" : "Test",
"LName" : "User",
"Age" : 30,
"Gender" : "M",
"Country" : "US"
}
{
"_id" : ObjectId("52f4a826958073ea07e15071"),
"Name" : "Test User",
"Age" : 45,
"Gender" : "F",
"Country" : "US"
}
>
The next() function returns the next document. The hasNext()
function returns true if a document exists, and printjson()
renders the output in JSON format.
The variable to which the cursor object is assigned can also
be manipulated as an array. If, instead of looping through the
variable, you want to display the document at array index 1, you
can run the following command:
> var c = db.users.find({"Country":"US"})
> printjson(c[1])
{
"_id" : ObjectId("52f4a826958073ea07e15071"),
"Name" : "Test User",
.... "Gender" : "F",
"Country" : "US"}
>
6.1.7.9 explain( )
The explain() function can be used to see what steps the
MongoDB database is running while executing a query. Starting
from version 3.0, the output format of the function and the
parameter that is passed to the function have changed. It takes
an optional parameter called verbose, which determines what the
explain output should look like. The following are the verbosity
modes: allPlansExecution, executionStats , and queryPlanner. The
default verbosity mode is queryPlanner , which means if nothing
is specified, it defaults to queryPlanner.
The following code covers the steps executed when filtering on
the username field:
> db.users.find({"Name":"Test
User"}).explain("allPlansExecution")
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "mydbproc.users",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [ ]
},
"winningPlan" : {
"stage" : "COLLSCAN",
"filter" : {
"$and" : [ ]
},
"direction" : "forward"
},
"rejectedPlans" : [ ]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 20,
"executionTimeMillis" : 0,
"totalKeysExamined" : 0,
"totalDocsExamined" : 20,
"executionStages" : {
"stage" : "COLLSCAN",
"filter" : {
"$and" : [ ]
},
"nReturned" : 20,
"executionTimeMillisEstimate" : 0,
"works" : 22,
"advanced" : 20,
"needTime" : 1,
"needFetch" : 0,
"saveState" : 0,
"restoreState" : 0,
"isEOF" : 1,
"invalidates" : 0,
"direction" : "forward",
"docsExamined" : 20
},
"allPlansExecution" : [ ]
},
"serverInfo" : {
"host" : " ANOC9",
"port" : 27017,
"version" : "3.0.4",
"gitVersion" :
"534b5a3f9d10f00cd27737fbcd951032248b5952"
},
"ok" : 1
As you can see, the explain() output returns information
regarding queryPlanner, executionStats , and serverInfo. As
highlighted above, the information the output returns depends on
the verbosity mode selected.
You have seen how to perform basic querying, sorting,
limiting, etc. You also saw how to manipulate the result set
using a while loop or as an array. In the next section, you will
take a look at indexes and how you can use them in your queries.
6.1.8 Using Indexes
Indexes are used to provide high performance read operations
for queries that are used frequently. By default, whenever a
collection is created and documents are added to it, an index is
created on the _id field of the document.
In this section, you will look at how different types of
indexes can be created. Let’s begin by inserting 1 million
documents using for loop in a new collection called testindx.
>for(i=0;i<1000000;i++){db.testindx.insert({"Name":"user"+i,"Age":Math.floor(Math.random()*120)})}
Next, issue the find() command to fetch a Name with value of
user101. Run the explain() command to check what steps MongoDB is
executing in order to return the result set.
>
db.testindx.find({"Name":"user101"}).explain("allPlansExecution")
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "mydbproc.testindx",
"indexFilterSet" : false,
"parsedQuery" : {
"Name" : {
"$eq" : "user101"
}
},
"winningPlan" : {
"stage" : "COLLSCAN",
"filter" : {
"Name" : {
"$eq" : "user101"
}
},
"direction" : "forward"
},
"rejectedPlans" : [ ]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 1,
"executionTimeMillis" : 645,
"totalKeysExamined" : 0,
"totalDocsExamined" : 1000000,
"executionStages" : {
"stage" : "COLLSCAN",
"filter" : {
"Name" : {
"$eq" : "user101"
}
},
"nReturned" : 1,
"executionTimeMillisEstimate" : 20,
"works" : 1000002,
"advanced" : 1,
"needTime" : 1000000,
"needFetch" : 0,
"saveState" : 7812,
"restoreState" : 7812,
"isEOF" : 1,
"invalidates" : 0,
"direction" : "forward",
"docsExamined" : 1000000
},
"allPlansExecution" : [ ]
},
"serverInfo" : {
"host" : " ANOC9",
"port" : 27017,
"version" : "3.0.4",
"gitVersion" :
"534b5a3f9d10f00cd27737fbcd951032248b5952"
},
"ok" : 1
As you can see, the database scanned the entire table. This has
a significant performance impact and it is happening because there
are no indexes.
6.1.8.1 Single Key Index
Let’s create an index on the Name field of the document. Use
ensureIndex() to create the index.
> db.testindx.ensureIndex({"Name":1})
The index creation will take few minutes depending on the
server and the collection size.
Let’s run the same query that you ran earlier with explain()
to check what the steps the database is executing post index
creation. Check the n , nscanned , and millis fields in the
output.
>db.testindx.find({"Name":"user101"}).explain("allPathsExecution")
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "mydbproc.testindx",
"indexFilterSet" : false,
"parsedQuery" : {
"Name" : {
"$eq" : "user101"
}
},
"winningPlan" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"Name" : 1
},
"indexName" : "Name_1",
"isMultiKey" : false,
"direction" : "forward",
"indexBounds" : {
"Name" : [
"[\"user101\", \"user101\"]"
]
}
}
},
"rejectedPlans" : [ ]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 1,
"executionTimeMillis" : 0,
"totalKeysExamined" : 1,
"totalDocsExamined" : 1,
"executionStages" : {
"stage" : "FETCH",
"nReturned" : 1,
"executionTimeMillisEstimate" : 0,
"works" : 2,
"advanced" : 1,
"needTime" : 0,
"needFetch" : 0,
"saveState" : 0,
"restoreState" : 0,
"isEOF" : 1,
"invalidates" : 0,
"docsExamined" : 1,
"alreadyHasObj" : 0,
"inputStage" : {
"stage" : "IXSCAN",
"nReturned" : 1,
"executionTimeMillisEstimate" : 0,
"works" : 2,
"advanced" : 1,
"needTime" : 0,
"needFetch" : 0,
"saveState" : 0,
"restoreState" : 0,
"isEOF" : 1,
"invalidates" : 0,
"keyPattern" : {
"Name" : 1
},
"indexName" : "Name_1",
"isMultiKey" : false,
"direction" : "forward",
"indexBounds" : {
"Name" : [
"[\"user101\", \"user101\"]"
]
},
"keysExamined" : 1,
"dupsTested" : 0,
"dupsDropped" : 0,
"seenInvalidated" : 0,
"matchTested" : 0
}
},
"allPlansExecution" : [ ]
},
"serverInfo" : {
"host" : "ANOC9",
"port" : 27017,
"version" : "3.0.4",
"gitVersion" :
"534b5a3f9d10f00cd27737fbcd951032248b5952"
},
"ok" : 1
}
>
As you can see in the results, there is no table scan. The
index creation makes a significant difference in the query
execution time.
6.1.8.2 Compound Index
When creating an index, you should keep in mind that the index
covers most of your queries. If you sometimes query only the Name
field and at times you query both the Name and the Age field,
creating a compound index on the Name and Age fields will be more
beneficial than an index that is created on either of the fields
because the compound index will cover both queries.
The following command creates a compound index on fields Name
and Age of the collection testindx.
> db.testindx.ensureIndex({"Name":1, "Age":
1})
Compound indexes help MongoDB execute queries with multiple
clauses more efficiently. When creating a compound index, it is
also very important to keep in mind that the fields that will be
used for exact matches (e.g. Name : "S1" ) come first,
followed by fields that are used in ranges (e.g. Age :
{"$gt":20}).
Hence the above index will be beneficial for the following
query:
>db.testindx.find({"Name":
"user5","Age":{"$gt":25}}).explain("allPlansExecution")
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "mydbproc.testindx",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [
{
"Name" : {
"$eq" : "user5"
}
},
{
"Age" : {
"$gt" : 25
}
}
]
},
"winningPlan" : {
"stage" : "KEEP_MUTATIONS",
"inputStage" : {
"stage" : "FETCH",
"filter" : {
"Age" : {
"$gt" : 25
}
},
............................
"indexBounds" : {
"Name" : [
"[\"user5\", \"user5\"
},
"rejectedPlans" : [
{
"stage" : "FETCH",
......................................................
"indexName" : "Name_1_Age_1",
"isMultiKey" : false,
"direction" : "forward",
.....................................................
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 1,
"executionTimeMillis" : 0,
"totalKeysExamined" : 1,
"totalDocsExamined" : 1,
.....................................................
"inputStage" : {
"stage" : "FETCH",
"filter" : {
"Age" : {
"$gt" : 25
}
},
"nReturned" : 1,
"executionTimeMillisEstimate" : 0,
"works" : 2,
"advanced" : 1,
"allPlansExecution" : [
{
"nReturned" : 1,
"executionTimeMillisEstimate" : 0,
"totalKeysExamined" : 1,
"totalDocsExamined" : 1,
"executionStages" : {
.............................................................
"serverInfo" : {
"host" : " ANOC9",
"port" : 27017,
"version" : "3.0.4",
"gitVersion" :
"534b5a3f9d10f00cd27737fbcd951032248b5952"
},
"ok" : 1
}
>
6.1.8.3 Support for sort Operations
In MongoDB, a sort operation that uses an indexed field to
sort documents provides the greatest performance.
As in other databases, indexes in MongoDB have an order due to
this. If an index is used to access documents, it returns results
in the same order as the index.
A compound index needs to be created when sorting on multiple
fields. In a compound index, the output can be in the sorted
order of either an index prefix or the full index.
The index prefix is a subset of the compound index, which
contains one or more fields from the start of the index.
For example, the following are the index prefix of the
compound index: { x:1, y: 1, z: 1}.
The sort operation can be on any of the combinations of index
prefix like {x: 1}, {x: 1, y: 1}.
A compound index can only help with sorting if it is a prefix
of the sort.
For example, a compound index on Age , Name, and Class, like
> db.testindx.ensureIndex({"Age": 1, "Name":
1, "Class": 1})
will be useful for the following queries:
> db.testindx.find().sort({"Age":1})
> db.testindx.find().sort({"Age":1,"Name":1})
> db.testindx.find().sort({"Age":1,"Name":1,
"Class":1})
The above index won’t be of much help in the following
query:
> db.testindx.find().sort({"Gender":1, "Age":1,
"Name": 1})
You can diagnose how MongoDB is processing a query by using
the explain() command.
6.1.8.4 Unique Index
Creating index on a field doesn’t ensure uniqueness, so if
an index is created on the Name field, then two or more documents
can have the same names. However, if uniqueness is one of the
constraints that needs to be enabled, the unique property needs
to be set to true when creating the index.
First, let’s drop the existing indexes.
>db.testindx.dropIndexes()
The following command will create a unique index on the Name
field of the testindx collection:
> db.testindx.ensureIndex({"Name":1},{"unique":true})
Now if you try to insert duplicate names in the collection as
shown below, MongoDB returns an error and does not allow
insertion of duplicate records:
> db.testindx.insert({"Name":"uniquename"})
> db.testindx.insert({"Name":"uniquename"})
"E11000 duplicate key error index:
mydbpoc.testindx.$Name_1 dup key: { : "uniquename" }"
If you check the collection, you’ll see that only the first
uniquename was stored.
> db.testindx.find({"Name":"uniquename"})
{ "_id" : ObjectId("52f4b3c3958073ea07f092ca"),
"Name" : "uniquename" }
>
Uniqueness can be enabled for compound indexes also, which
means that although individual fields can have duplicate values,
the combination will always be unique.
For example, if you have a unique index on {"name":1,
"age":1},
> db.testindx.ensureIndex({"Name":1,
"Age":1},{"unique":true})
>
then the following inserts will be permissible:
> db.testindx.insert({"Name":"usercit"})
> db.testindx.insert({"Name":"usercit",
"Age":30})
However, if you execute
> db.testindx.insert({"Name":"usercit",
"Age":30})
it’ll throw an error like
E11000 duplicate key error index:
mydbpoc.testindx.$Name_1_Age_1 dup key: { : "usercit",
: 30.0 }
You may create the collection and insert the documents first
and then create an index on the collection. If you create a
unique index on the collection that might have duplicate values
in the fields on which the index is being created, the index
creation will fail.
To cater to this scenario, MongoDB provides a dropDups option.
The dropDups option saves the first document found and remove any
subsequent documents with duplicate values.
The following command will create a unique index on the name
field and will delete any duplicate documents:
>db.testindx.ensureIndex({"Name":1},{"unique":true,
"dropDups":true})
>
6.1.8.5 system.indexes
Whenever you create a database, by default a system.indexes
collection is created. All of the information about a database’s
indexes is stored in the system.indexes collection. This is a
reserved collection, so you cannot modify its documents or remove
documents from it. You can manipulate it only through ensureIndex
and the dropIndexes database commands.
Whenever an index is created, its meta information can be seen
in system.indexes. The following command can be used to fetch all
the index information about the mentioned collection:
db.collectionName.getIndexes()
For example, the following command will return all indexes
created on the testindx collection:
> db.testindx.getIndexes()
6.1.8.6 dropIndex
The dropIndex command is used to remove the index.
The following command will remove the Name field index from
the testindx collection:
> db.testindx.dropIndex({"Name":1})
{ "nIndexesWas" : 3, "ok" : 1 }
>
6.1.8.7 reIndex
When you have performed a number of insertions and deletions
on the collection, you may have to rebuild the indexes so that
the index can be used optimally. The reIndex command is used to
rebuild the indexes.
The following command rebuilds all the indexes of a
collection. It will first drop the indexes, including the default
index on the _id field, and then it will rebuild the indexes.
db.collectionname.reIndex()
The following command rebuilds the indexes of the testindx
collection:
> db.testindx.reIndex()
{
"nIndexesWas" : 2,
"msg" : "indexes dropped for collection",
"nIndexes" : 2,
..............
"ok" : 1
}
>
We will be discussing in detail the different types of indexes
available in MongoDB in the next chapter.
6.1.8.8 How Indexing Works
MongoDB stores indexes in a BTree structure , so range queries
are automatically supported.
If multiple selection criteria are used in a query, MongoDB
tries to find the best single index to select a candidate set.
After that, it sequentially iterates through the set to evaluate
the other criteria.
When the query is executed for the first time, MongoDB creates
multiple execution plans for each index that is available for the
query. It lets the plans execute within certain number of ticks
in turns, until the plan that executes the fastest finishes. The
result is then returned to the system, which remembers the index
that was used by the fastest execution plan.
For subsequent queries, the remembered index will be used
until some certain number of updates has happened within the
collection. After the updating limit is crossed, the system will
again follow the process to find out the best index that is
applicable at that time.
The reevaluation of the query plans will happen when either of
the following events has occurred:
-
The collection receives 1,000 write operations.
-
An index is added or dropped.
-
A restart of the mongod process happens.
-
A reindexing for rebuilding the index happens.
If you want to override MongoDB’s default index selection,
the same can be done using the hint() method .
The index filter is introduced in version 2.6. It is made of
indexes that an optimizer will evaluate for a query, including
the query, projections, and the sorting. MongoDB will use the
index as provided by the index filter and will ignore the hint().
Before version 2.6, at any point in time MongoDB uses only one
index, so you need to ensure that composite indexes exist to
match your queries better. This can be done by checking the sort
and search criteria of the queries.
Index intersection is introduced in version 2.6. It enables
intersection of indexes for fulfilling queries with compound
conditions where part of condition is fulfilled by one index and
the other part is fulfilled by the other index.
In general, an index intersection is made up of two indexes;
however, multiple index intersections can be used for resolving a
query. This capability provides better optimization.
As in other databases, index maintenance has a cost attached.
Every operation that changes the collection (such as creation,
updating, or deletion) has an overhead because the indexes also
need to be updated. To maintain an optimal balance, you need to
periodically check the effectiveness of having an index that can
be measured by the ratio of reads and writes you are doing on the
system. Identify the less-used indexes and delete them.
6.2 Stepping Beyond the Basics
This section will cover advanced querying using conditional
operators and regular expressions in the selector part. Each of
these operators and regular expressions provides you with more
control over the queries you write and consequently over the
information you can fetch from the MongoDB database.
6.2.1 Using Conditional Operators
Conditional operators enable you to have more control over the
data you are trying to extract from the database. In this section,
you will be focusing on the following operators: $lt , $lte , $gt
, $gte , $in , $nin , and $not .
The following example assumes a collection named Students that
contains the following types of documents:
{
_id: ObjectID(),
Name: "Full Name",
Age: 30,
Gender: "M",
Class: "C1",
Score: 95
}
You will first create the collection and insert few sample
documents.
>db.students.insert({Name:"S1",Age:25,Gender:"M",Class:"C1",Score:95})
>db.students.insert({Name:"S2",Age:18,Gender:"M",Class:"C1",Score:85})
>db.students.insert({Name:"S3",Age:18,Gender:"F",Class:"C1",Score:85})
>db.students.insert({Name:"S4",Age:18,Gender:"F",Class:"C1",Score:75})
>db.students.insert({Name:"S5",Age:18,Gender:"F",Class:"C2",Score:75})
>db.students.insert({Name:"S6",Age:21,Gender:"M",Class:"C2",Score:100})
>db.students.insert({Name:"S7",Age:21,Gender:"M",Class:"C2",Score:100})
>db.students.insert({Name:"S8",Age:25,Gender:"F",Class:"C2",Score:100})
>db.students.insert({Name:"S9",Age:25,Gender:"F",Class:"C2",Score:90})
>db.students.insert({Name:"S10",Age:28,Gender:"F",Class:"C3",Score:90})
> db.students.find()
{ "_id" : ObjectId("52f874faa13cd6a65998734d"),
"Name" : "S1", "Age" : 25, "Gender"
: "M", "Class" : "C1", "Score"
: 95 }
.......................
{ "_id" : ObjectId("52f8758da13cd6a659987356"),
"Name" : "S10", "Age" : 28, "Gender"
: "F", "Class" : "C3", "Score"
: 90 }
>
6.2.1.1 $lt and $lte
Let’s start with the $lt and $lte operators . They stand for
“less than” and “less than or equal to,” respectively.
If you want to find all students who are younger than 25 (Age
< 25), you can execute the following find with a selector:
> db.students.find({"Age":{"$lt":25}})
{ "_id" : ObjectId("52f8750ca13cd6a65998734e"),
"Name" : "S2", "Age" : 18, "Gender"
: "M", "Class" : "C1", "Score"
: 85 }
.............................
{ "_id" : ObjectId("52f87556a13cd6a659987353"),
"Name" : "S7", "Age" : 21, "Gender"
: "M", "Class" : "C2", "Score"
: 100 }
>
If you want to find out all students who are older than 25
(Age <= 25), execute the following:
> db.students.find({"Age":{"$lte":25}})
{ "_id" : ObjectId("52f874faa13cd6a65998734d"),
"Name" : "S1", "Age" : 25, "Gender"
: "M", "Class" : "C1", "Score"
: 95 }
....................
{ "_id" : ObjectId("52f87578a13cd6a659987355"),
"Name" : "S9", "Age" : 25, "Gender"
: "F", "Class" : "C2", "Score"
: 90 }
>
6.2.1.2 $gt and $gte
The $gt and $gte operators stand for “greater than” and
“greater than or equal to,” respectively.
Let’s find out all of the students with Age > 25. This
can be achieved by executing the following command:
> db.students.find({"Age":{"$gt":25}})
{ "_id" : ObjectId("52f8758da13cd6a659987356"),
"Name" : "S10", "Age" : 28,
"Gender" : "F", "Class" : "C3",
"Score" : 90 }
>
If you change the above example to return students with Age >=
25, then the command is
> db.students.find({"Age":{"$gte":25}})
{ "_id" : ObjectId("52f874faa13cd6a65998734d"),
"Name" : "S1", "Age" : 25, "Gender"
: "M", "Class" : "C1", "Score"
: 95 }
......................................
{ "_id" : ObjectId("52f8758da13cd6a659987356"),
"Name" : "S10", "Age" : 28,
"Gender" : "F", "Class" : "C3",
"Score" : 90 }
>
6.2.1.3 $in and $nin
Let’s find all students who belong to either class C1 or C2.
The command for the same is
> db.students.find({"Class":{"$in":["C1","C2"]}})
{ "_id" : ObjectId("52f874faa13cd6a65998734d"),
"Name" : "S1", "Age" : 25, "Gender"
: "M", "Class" : "C1", "Score"
: 95 }
................................
{ "_id" : ObjectId("52f87578a13cd6a659987355"),
"Name" : "S9", "Age" : 25, "Gender"
: "F", "Class" : "C2", "Score"
: 90 }
>
The inverse of this can be returned by using $nin.
Let’s next find students who don’t belong to class C1 or
C2. The command is
> db.students.find({"Class":{"$nin":["C1","C2"]}})
{ "_id" : ObjectId("52f8758da13cd6a659987356"),
"Name" : "S10", "Age" : 28,
"Gender" : "F", "Class" : "C3",
"Score" : 90 }
>
Let’s next see how you can combine all of the above
operators and write a query. Say you want to find out all
students whose gender is either “M” or they belong to class
“C1” or ‘C2” and whose age is greater than or equal to
25. This can be achieved by executing the following command:
>db.students.find({$or:[{"Gender":"M","Class":{"$in":["C1","C2"]}}],
"Age":{"$gte":25}})
{ "_id" : ObjectId("52f874faa13cd6a65998734d"),
"Name" : "S1", "Age" : 25, "Gender"
: "M", "Class" : "C1", "Score"
: 95 }
>
6.2.2 Regular Expressions
In this section, you will look at how to use regular
expressions. Regular expressions are useful in scenarios where you
want to find students with name starting with “A”.
In order to understand this, let’s add three or four more
students with different names.
> db.students.insert({Name:"Student1", Age:30,
Gender:"M", Class: "Biology", Score:90})
> db.students.insert({Name:"Student2", Age:30,
Gender:"M", Class: "Chemistry", Score:90})
> db.students.insert({Name:"Test1", Age:30,
Gender:"M", Class: "Chemistry", Score:90})
> db.students.insert({Name:"Test2", Age:30,
Gender:"M", Class: "Chemistry", Score:90})
> db.students.insert({Name:"Test3", Age:30,
Gender:"M", Class: "Chemistry", Score:90})
>
Say you want to find all students with names starting with “St”
or “Te” and whose class begins with “Che”. The same can be
filtered using regular expressions, like so:
> db.students.find({"Name":/(St|Te)*/i,
"Class":/(Che)/i})
{ "_id" : ObjectId("52f89ecae451bb7a56e59086"),
"Name" : "Student2", "Age" : 30,
"Gender" : "M", "Class" :
"Chemistry", "Score" : 90 }
.........................
{ "_id" : ObjectId("52f89f06e451bb7a56e59089"),
"Name" : "Test3", "Age" : 30,
"Gender" : "M", "Class" :
"Chemistry", "Score" : 90 }
>
In order to understand how the regular expression works, let’s
take the query "Name":/(St|Te)*/i.
-
//i indicates that the regex is case insensitive.
-
(St|Te)* means the Name string must start with either
“St” or “Te”.
-
The * at the end means it will match anything after
that.
When you put everything together, you are doing a case
insensitive match of names that have either “St” or “Te”
at the beginning of them. In the regex for the Class also the same
Regex is issued.
Next, let’s complicate the query a bit. Let’s combine it
with the operators covered above.
Fetch Students with names as student1, student2 and who are
male students with age >=25. The command for this is as
follows:
>db.students.find({"Name":/(student*)/i,"Age":{"$gte":25},"Gender":"M"})
{ "_id" : ObjectId("52f89eb1e451bb7a56e59085"),
"Name" : "Student1", "Age" : 30,
"Gender" : "M", "Class" :
"Biology", "Score" : 90 }
{ "_id" : ObjectId("52f89ecae451bb7a56e59086"),
"Name" : "Student2", "Age" : 30,
"Gender" : "M", "Class" :
"Chemistry", "Score" : 90 }
6.2.3 MapReduce
The MapReduce framework enables division of the task, which in
this case is data aggregation across a cluster of computers in
order to reduce the time it takes to aggregate the data set. It
consists of two parts: Map and Reduce.
Here’s a more specific description: MapReduce is a framework
that is used to process problems that are highly distributable
across enormous datasets and are run using multiple nodes. If all
the nodes have the same hardware, these nodes are collectively
referred as a cluster; otherwise, it’s referred as a grid. This
processing can occur on structured data (data stored in a
database) and unstructured data (data stored in a file system).
-
“Map”: In this step, the node that is acting as the
master takes the input parameter and divides the big problem
into multiple small sub-problems. These sub-problems are then
distributed across the worker nodes. The worker nodes might
further divide the problem into sub-problems. This leads to a
multi-level tree structure. The worker nodes will then work on
the sub-problems within them and return the answer back to the
master node.
-
“Reduce”: In this step, all the sub-problems’
answers are available with the master node, which then combines
all the answers and produce the final output, which is the
answer to the big problem you were trying to solve.
In order to understand how it works, let’s consider a small
example where you will find out the number of male and female
students in your collection.
This involves the following steps: first you create the map and
reduce functions and then you call the mapReduce function and pass
the necessary arguments.
Let’s start by defining the map function:
> var map = function(){emit(this.Gender,1);};
>
This step takes as input the document and based on the Gender
field it emits documents of the type {"F", 1} or {"M",
1} .
Next, you create the reduce function:
> var reduce = function(key, value){return
Array.sum(value);};
>
This will group the documents emitted by the map function on
the key field, which in your example is Gender, and will return
the sum of values, which in the above example is emitted as “1”.
The output of the reduce function defined above is a gender-wise
count.
Finally, you put them together using the mapReduce function,
like so:
> db.students.mapReduce(map, reduce, {out:
"mapreducecount1"})
{
"result" : "mapreducecount1",
"timeMillis" : 29,
"counts" : {
"input" : 15,
"emit" : 15,
"reduce" : 2,
"output" : 2
},
"ok" : 1,
}
>
This actually is applying the map , reduce function, which you
defined on the students collection. The final result is stored in
a new collection called mapreducecount1.
In order to vet it, run the find() command on the
mapreducecount1 collection, as shown:
> db.mapreducecount1.find()
{ "_id" : "F", "value" : 6 }
{ "_id" : "M", "value" : 9 }
>
Here’s one more example to explain the workings of MapReduce.
Let’s use MapReduce to find out a class-wise average score. As
you saw in the above example, you need to first create the map
function and then the reduce function and finally you combine them
to store the output in a collection in your database. The code
snippet is
> var map_1 = function(){emit(this.Class,this.Score);};
> var reduce_1 = function(key, value){return
Array.avg(value);};
>db.students.mapReduce(map_1,reduce_1,
{out:"MR_ClassAvg_1"})
{
"result" : "MR_ClassAvg_1",
"timeMillis" : 4,
"counts" : {
"input" : 15, "emit" : 15,
"reduce" : 3 , "output" : 5
},
"ok" : 1,
}
> db.MR_ClassAvg_1.find()
{ "_id" : "Biology", "value" : 90
}
{ "_id" : "C1", "value" : 85 }
{ "_id" : "C2", "value" : 93 }
{ "_id" : "C3", "value" : 90 }
{ "_id" : "Chemistry", "value" :
90 }
>
The first step is to define the map function, which loops
through the collection documents and returns output as {"Class":
Score}, for example {"C1":95}. The second step does a
grouping on the class and computes the average of the scores for
that class. The third step combines the results; it defines the
collection to which the map, reduce function needs to be applied
and finally it defines where to store the output, which in this
case is a new collection called MR_ClassAvg_1.
In the last step, you use find in order to check the resulting
output.
6.2.4 aggregate()
The previous section introduced the MapReduce function. In this
section, you will get a glimpse of the aggregation framework of
MongoDB.
The aggregation framework enables you find out the aggregate
value without using the MapReduce function. Performance-wise, the
aggregation framework is faster than the MapReduce function. You
always need to keep in mind that MapReduce is meant for batch
approach and not for real-time analysis.
You will next depict the above two discussed outputs using the
aggregate function. First, the output was to find the count of
male and female students. This can be achieved by executing the
following command:
> db.students.aggregate({$group:{_id:"$Gender",
totalStudent: {$sum: 1}}})
{ "_id" : "F", "totalStudent" : 6
}
{ "_id" : "M", "totalStudent" : 9
}
>
Similarly, in order to find out the class-wise average score,
the following command can be executed:
> db.students.aggregate({$group:{_id:"$Class",
AvgScore: {$avg: "$Score"}}})
{ "_id" : "Biology", "AvgScore" :
90 }
{ "_id" : "C3", "AvgScore" : 90 }
{ "_id" : "Chemistry", "AvgScore"
: 90 }
{ "_id" : "C2", "AvgScore" : 93 }
{ "_id" : "C1", "AvgScore" : 85 }
>
6.3 Designing an Application’s Data Model
In this section, you will look at how to design the data model
for an application. The MongoDB database provides two options for
designing a data model: the user can either embed related objects
within one another, or it can reference each other using ID. In this
section, you will explore these options.
In order to understand these options, you will design a blogging
application and demonstrate the usage of the two options.
A typical blog application consists of the following scenarios:
You have people posting blogs on different subjects. In addition
to the subject categorization, different tags can also be used. As
an example, if the category is politics and the post talks about a
politician, then that politician’s name can be added as a tag to
the post. This helps users find posts related to their interests
quickly and also lets them link related posts together.
The people viewing the blog can comment on the blog posts.
No comments:
Post a Comment