The people viewing the blog can comment on the blog posts.
6.3.1 Relational Data Modeling and Normalization
Before jumping into MongoDB’s approach, let’s take a little
detour into how you would model this in a relational database such
as SQL.
In relational databases, the data modelling typically progresses
by defining the tables and gradually removing data redundancy to
achieve a normal form.
6.3.1.1 What Is a Normal Form?
In relational databases , a normal form typically begins by
creating tables as per the application requirement and then
gradually removing redundancy to achieve the highest normal form,
which is also termed the third normal form or 3NF.In order to
understand this better, let’s put the blogging application data
in tabular form. The initial data is shown in Figure 6-2.
This data is actually in the first normal form. You will have
lots of redundancy because you can have multiple comments against
the posts and multiple tags can be associated with the post. The
problem with redundancy, of course, is that it introduces the
possibility of inconsistency, where various copies of the same
data may have different values. To remove this redundancy, you
need to further normalize the data by splitting it into multiple
tables. As part of this step, you must identify a key column that
uniquely identifies each row in the table so that you can create
links between the tables. The above scenarios when modeled using
the 3NF normal forms will look like the RDBMs diagram shown in
Figure 6-3.
In this case, you have a data model that is free of redundancy,
allowing you to update it without having to worry about updating
multiple rows. In particular, you no longer need to worry about
inconsistency in the data model.
6.3.1.2 The Problem with Normal Forms
As mentioned, the nice thing about normalization is that it
allows for easy updating without any redundancy (i.e. it helps
keep the data consistent). Updating a user name means updating the
name in the Users table.
However, a problem arises when you try to get the data back
out. For instance, to find all tags and comments associated with
posts by a specific user, the relational database programmer uses
a JOIN. By using a JOIN, the database returns all data as per the
application screen design, but the real problem is what operation
the database performs to get that result set.
Generally, any RDBMS reads from a disk and does a seek, which
takes well over 99% of the time spent reading a row. When it comes
to disk access, random seeks are the enemy. The reason why this is
so important in this context is because JOINs typically require
random seeks. The JOIN operation is one of the most expensive
operations within a relational database. Additionally, if you end
up needing to scale your database to multiple servers, you
introduce the problem of generating a distributed join, a complex
and generally slow operation.
6.3.2 MongoDB Document Data Model Approach
As you know, in MongoDB, data is stored in documents.
Fortunately for us as application designers, this opens up some new
possibilities in schema design. Unfortunately for us, it also
complicates our schema design process. Now when faced with a schema
design problem there’s no longer a fixed path of normalized
database design, as there is with relational databases. In MongoDB,
the schema design depends on the problem you are trying to solve.
If you have to model the above using the MongoDB document model,
you might store the blog data in a document as follows:
{
"_id" : ObjectId("509d27069cc1ae293b36928d"),
"title" : "Sample title",
"body" : "Sample text.",
"tags" : [
"Tag1",
"Tag2",
"Tag3",
"Tag4"
],
"created_date" : ISODate("2015-07-06T12:41:39.110Z"),
"author" : "Author 1",
"category_id" : ObjectId("509d29709cc1ae293b369295"),
"comments" : [
{
"subject" : "Sample comment",
"body" : "Comment Body",
"author " : "author 2",
"created_date":ISODate("2015-07-06T13:34:23.929Z")
}
]}
As you can see, you have embedded the comments and tags within a
single document only. Alternatively, you could “normalize” the
model a bit by referencing the comments and tags by the _id field:
// Authors document:
{
"_id": ObjectId("509d280e9cc1ae293b36928e "),
"name": "Author 1",}
// Tags document:
{
"_id": ObjectId("509d35349cc1ae293b369299"),
"TagName": "Tag1",.....}
// Comments document:
{
"_id": ObjectId("509d359a9cc1ae293b3692a0"),
"Author": ObjectId("508d27069cc1ae293b36928d"),
.......
"created_date" : ISODate("2015-07-06T13:34:59.336Z")
}
//Category Document
{
"_id": ObjectId("509d29709cc1ae293b369295"),
"Category": "Catgeory1"......
}
//Posts Document
{
"_id" : ObjectId("509d27069cc1ae293b36928d"),
"title" : "Sample title","body" :
"Sample text.",
"tags" : [ ObjectId("509d35349cc1ae293b369299"),
ObjectId("509d35349cc1ae293b36929c")
],
"created_date" : ISODate("2015-07-06T13:41:39.110Z"),
"author_id" : ObjectId("509d280e9cc1ae293b36928e"),
"category_id" : ObjectId("509d29709cc1ae293b369295"),
"comments" : [
ObjectId("509d359a9cc1ae293b3692a0"),
]}
The remainder of this chapter is devoted to identifying which
solution will work in your context (i.e. whether to use referencing
or whether to embed).
6.3.2.1 Embedding
In this section, you will see if embedding will have a positive
impact on the performance. Embedding can be useful when you want
to fetch some set of data and display it on the screen, such as a
page that displays comments associated with the blog; in this case
the comments can be embedded in the Blogs document.
The benefit of this approach is that since MongoDB stores the
documents contiguously on disk, all the related data can be
fetched in a single seek.
Apart from this, since JOINs are not
supported and you used referencing in this case, the application
might do something like the following to fetch the comments data
associated with the blog.
1.
Fetch the associated comments_id
from the blogs document.
2.
Fetch the comments document based
on the comments_id found in the first step.
If you take this approach, which is referencing, not only does
the database have to do multiple seeks to find your data, but
additional latency is introduced into the lookup since it now
takes two round trips to the database to retrieve your data.
If the application frequently accesses the comments data along
with the blogs, then almost certainly embedding the comments
within the blog documents will have a positive impact on the
performance.
Another concern that weighs in favor of embedding is the desire
for atomicity and isolation in writing data. MongoDB is designed
without multi-documents transactions. In MongoDB, the atomicity of
the operation is provided only at a single document level so data
that needs to be updated together atomically needs to be placed
together in a single document.
When you update data in your database, you must ensure that
your update either succeeds or fails entirely, never having a
“partial success,” and that no other database reader ever sees
an incomplete write operation.
6.3.2.2 Referencing
You have seen that embedding is the approach that will provide
the best performance in many cases; it also provides data
consistency guarantees. However, in some cases, a more normalized
model works better in MongoDB.
One reason for having multiple collections and adding
references is the increased flexibility it gives when querying the
data. Let’s understand this with the blogging example mentioned
above.
You saw how to use embedded schema, which will work very well
when displaying all the data together on a single page (i.e. the
page that displays the blog post followed by all of the associated
comments).
Now suppose you have a requirement to search for the comments
posted by a particular user. The query (using this embedded
schema) would be as follows:
db.posts.find({'comments.author': 'author2'},{'comments': 1})
The result of this query, then, would be documents of the
following form:
{
"_id" : ObjectId("509d27069cc1ae293b36928d"),
"comments" : [ {
"subject" : "Sample Comment 1 ",
"body" : "Comment1 Body.",
"author_id" : "author2",
"created_date" :
ISODate("2015-07-06T13:34:23.929Z")}...]
}
"_id" : ObjectId("509d27069cc1ae293b36928d"),
"comments" : [
{
"subject" : "Sample Comment 2",
"body" : "Comments Body.",
"author_id" : "author2",
"created_date" : ISODate("2015-07-06T13:34:23.929Z")
}...]}
The major drawback to this approach is that you get back much
more data than you actually need. In particular, you can’t ask
for just author2’s comments; you have to ask for posts that
author2 has commented on, which includes all of the other comments
on those posts as well. This data will require further filtering
within the application code.
On the other hand, suppose you decide to use a normalized
schema. In this case you will have three documents: “Authors,”
“Posts,” and “Comments.”
The “Authors” document will have Author-specific content
such as Name, Age, Gender, etc., and the “Posts” document will
have posts-specific details such as post creation time, author of
the post, actual content, and the subject of the post.
The “Comments” document will have the post’s comments
such as CommentedOn date time, created by author, and the text of
the comment. This is depicted as follows:
// Authors document:
{
"_id": ObjectId("508d280e9cc1ae293b36928e "),
"name": "Jenny",
..........
}
//Posts Document
{
"_id" :
ObjectId("508d27069cc1ae293b36928d"),....................
}
// Comments document:
{
"_id": ObjectId("508d359a9cc1ae293b3692a0"),
"Author": ObjectId("508d27069cc1ae293b36928d"),
"created_date" : ISODate("2015-07-06T13:34:59.336Z"),
"Post_id": ObjectId("508d27069cc1ae293b36928d"),
..........
}
In this scenario, the query to find the comments by “author2”
can be fulfilled by a simple find() on the comments collection:
db.comments.find({"author": "author2"})
In general, if your application’s query pattern is well
known, and data tends to be accessed in only one way, an embedded
approach works well. Alternatively, if your application may query
data in many different ways, or you are not able to anticipate the
patterns in which data may be queried, a more “normalized”
approach may be better.
For instance, in the above schema, you will be able to sort the
comments or return a more restricted set of comments using the
limit, skip operators. In the embedded case, you’re stuck
retrieving all the comments in the same order in which they are
stored in the post.
Another factor that may weigh in favor of using document
references is when you have one-to-many relationships.
For instance, a popular blog with a large amount of reader
engagement may have hundreds or even thousands of comments for a
given post. In this case, embedding carries significant penalties
with it:
-
Effect on read performance: As the document size
increases, it will occupy more memory. The problem with memory
is that a MongoDB database caches frequently accessed documents
in memory, and the larger the documents become, the lesser the
probability of them fitting into memory. This will lead to more
page faults while retrieving the documents, which will lead to
random disk I/O, which will further slow down the performance.
-
Effect on update performance: As the size increases and
an update operation is performed on such documents to append
data, eventually MongoDB is going to need to move the document
to an area with more space available. This movement, when it
happens, significantly slows update performance.
Apart from this, MongoDB documents have a hard size limit of
16MB. Although this is something to be aware of, you will usually
run into problems due to memory pressure and document copying well
before you reach the 16MB size limit.
One final factor that weighs in favor of using document
references is the case of many-to-many or M:N relationships.
For instance, in the above example, there are tags. Each blog
can have multiple tags and each tag can be associated to multiple
blog entries.
One approach to implement the blogs-tags M:N relationship is to
have the following three collections:
-
The Tags collection, which will store the tags details
-
The Blogs collection, which will have blogs details
-
A third collection, called Tag-To-Blog Mapping, which
will map between the tags and the blogs
This approach is similar to the one in relational databases,
but this will negatively impact the application’s performance
because the queries will end up doing a lot of application-level
“joins.”
Alternatively, you can use the embedding model where you embed
the tags within the blogs document, but this will lead to data
duplication. Although this will simplify the read operation a bit,
it will increase the complexity of the update operation, because
while updating a tag detail, the user needs to ensure that the
updated tag is updated at each and every place where it has been
embedded in other blog documents.
Hence for many-to-many joins, a compromise approach is often
best, embedding a list of _id values rather than the full
document:
// Tags document:
{
"_id": ObjectId("508d35349cc1ae293b369299"),
"TagName": "Tag1",
..........
}
// Posts document with Tag IDs added as References
//Posts Document
{ "_id" : ObjectId("508d27069cc1ae293b36928d"),
"tags" : [
ObjectId("509d35349cc1ae293b369299"),
ObjectId("509d35349cc1ae293b36929a"),
ObjectId("509d35349cc1ae293b36929b"),
ObjectId("509d35349cc1ae293b36929c")
],....................................
}
Although querying will be a bit complicated, you no longer need
to worry about updating a tag everywhere.
In summary, schema design in MongoDB is one of the very early
decisions that you need to make, and it is dependent on the
application requirements and queries.
As you have seen, when you need to access the data together or
you need to make atomic updates, embedding will have a positive
impact. However, if you need more flexibility while querying or if
you have a many-to-many relationships, using references is a good
choice.
Ultimately, the decision depends on the access patterns of your
application, and there are no hard-and-fast rules in MongoDB. In
the next section, you will learn about various data modelling
considerations.
6.3.2.3 Decisions of Data Modelling
This involves deciding how to structure the documents so that
the data is modeled effectively. An important point to decide is
whether you need to embed the data or use references to the data
(i.e. whether to use embedding or referencing).
This point is best demonstrated with an example. Suppose you
have a book review site which has authors and books as well as
reviews with threaded comments.
Now the question is how to structure the collections. The
decision depends on the number of comments expected on per book
and how frequently the read vs. write operations will be
performed.
6.3.2.4 Operational Considerations
In addition to the way the elements interact with each other
(i.e. whether to store the documents in an embedded manner or use
references), a number of other operational factors are important
when designing a data model for the application. These factors are
covered in the following sections.
Data Lifecycle Management
This feature needs to be used if your application has datasets
that need to be persisted in the database only for a limited time
period.
Say you need to retain the data related to the review and
comments for a month. This feature can be taken into
consideration.
This is implemented by using the Time to Live (TTL) feature of
the collection. The TTL feature of the collection ensures that
the documents are expired after a period of time.
Additionally, if the application requirement is to work with
only the recently inserted documents, using capped collections
will help optimize the performance.
Indexes
Indexes can be created to support commonly used queries to
increase the performance. By default, an index is created by
MongoDB on the _id field.
The following are a few points to consider when creating
indexes:
-
At least 8KB of data space is required by each index.
-
For write operations, an index addition has some
negative performance impact. Hence for collections with heavy
writes, indexes might be expensive because for each insert, the
keys must be added to all the indexes.
-
Indexes are beneficial for collections with heavy read
operations such as where the proportion of read-to-write
operations is high. The un-indexed read operations are not
affected by an index.
Sharding
One of the important factors when designing the application
model is whether to partition the data or not. This is
implemented using sharding in MongoDB.
Sharding is also referred as partitioning of data. In MongoDB,
a collection is partitioned with its documents distributed across
cluster of machines, which are referred as shards. This can have
a significant impact on the performance. We will discuss sharding
more in Chapter tk.
A Large Number of Collections
The design considerations for having multiple collections vs.
storing data in a single collection are the following:
-
There is no performance penalty in choosing multiple
collections for storing data.
-
Having distinct collections for different types of
data can have performance improvements in high-throughput batch
processing applications.
When you are designing models that have a large number of
collections, you need to take into consideration the following
behaviors:
-
A certain minimum overhead of few kilobytes is
associated with each collection.
-
At least 8KB of data space is required by each index,
including the _id index.
You know by now that the metadata for each database is stored
in the <database>.ns file. Each collection and index has
its own entry in the namespace file, so you need to consider the
limits_on_the_size_of_namespace files when deciding to implement
a large number of collections.
Growth of the Document
Few updates, such as pushing an element to an array, adding
new fields, etc., can lead to an increase in the document size,
which can lead to the movement of the document from one slot to
another in order to fit in the document. This process of document
relocation is both resource and time consuming. Although MongoDB
provides padding to minimize the relocation occurrences, you may
need to handle the document growth manually.
6.4 Summary
In this chapter you learned the basic CRUD operations plus
advanced querying capabilities. You also examined the two ways of
storing and retrieving data: embedding and referencing.
In the following chapter, you will learn about the MongoDB
architecture, its core components, and features.
7. MongoDB Architecture
7.1 Core Processes
These components are available as
applications under the bin folder. Let’s discuss these components
in detail.
7.1.1 mongod
The primary daemon in a MongoDB system is
known as mongod . This daemon handles all the data requests,
manages the data format, and performs operations for background
management.
When a mongod is run without any
arguments, it connects to the default data directory, which is
C:\data\db or /data/db, and default port 27017, where it listens
for socket connections.
It’s important to ensure that the data
directory exists and you have write permissions to the directory
before the mongod process is started.
If the directory doesn’t exist or you
don’t have write permissions on the directory, the start of this
process will fail. If the default port 27017 is not available, the
server will fail to start.
mongod also has a HTTP server which
listens on a port 1000 higher than the default port, so if you
started the mongod with the default port 27017, in this case the
HTTP server will be on port 28017 and will be accessible using the
URL http://localhost:28017. This basic HTTP server provides
administrative information about the database.
7.1.2 mongo
mongo provides an interactive JavaScript
interface for the developer to test queries and operations
directly on the database and for the system administrators to
manage the database. This is all done via the command line. When
the mongo shell is started, it will connect to the default
database called test. This database connection value is assigned
to global variable db.
7.2 MongoDB Tools
7.3 Standalone Deployment
Standalone deployment is used for
development purpose; it doesn’t ensure any redundancy of data and
it doesn’t ensure recovery in case of failures. So it’s not
recommended for use in production environment. Standalone
deployment has the following components: a single mongod and a
client connecting to the mongod, as shown in Figure 7-1.
7.4 Replication
In a standalone deployment, if the mongod
is not available, you risk losing all the data, which is not
acceptable in a production environment. Replication is used to offer
safety against such kind of data loss.
Replication provides for data redundancy by
replicating data on different nodes, thereby providing protection of
data in case of node failure. Replication provides high availability
in a MongoDB deployment.
Replication also simplifies certain
administrative tasks where the routine tasks such as backups can be
offloaded to the replica copies, freeing the main copy to handle the
important application requests.
In some scenarios, it can also help in
scaling the reads by enabling the client to read from the different
copies of data.
In this section, you will learn how
replication works in MongoDB and its various components. There are
two types of replication supported in MongoDB: traditional
master/slave replication and replica set.
7.4.1 Master/Slave Replication
In MongoDB, the traditional master/slave
replication is available but it is recommended only for more than
50 node replications. The preferred replication approach is replica
sets, which we will explain later. In this type of replication,
there is one master and a number of slaves that replicate the data
from the master. The only advantage with this type of replication
is that there’s no restriction on the number of slaves within a
cluster. However, thousands of slaves will overburden the master
node, so in practical scenarios it’s better to have less than
dozen slaves. In addition, this type of replication doesn’t
automate failover and provides less redundancy.
In a basic master/slave setup , you have
two types of mongod instances: one instance is in the master mode
and the remaining are in the slave mode, as shown in Figure 7-2.
Since the slaves are replicating from the master, all slaves need
to be aware of the master’s address.
The master node maintains a capped
collection (oplog) that stores an ordered history of logical writes
to the database.
The slaves replicate the data using this
oplog collection. Since the oplog is a capped collection, if the
slave’s state is far behind the master’s state, the slave may
become out of sync. In that scenario, the replication will stop and
manual intervention will be needed to re-establish the replication.
7.4.2 Replica Set
The replica set is a sophisticated form of
the traditional master-slave replication and is a recommended method
in MongoDB deployments.
Replica sets are basically a type of
master-slave replication but they provide automatic failover. A
replica set has one master, which is termed as primary, and multiple
slaves, which are termed as secondary in the replica set context;
however, unlike master-slave replication, there’s no one node that
is fixed to be primary in the replica set.
If a master goes down in replica set,
automatically one of the slave nodes is promoted to the master. The
clients start connecting to the new master, and both data and
application will remain available. In a replica set, this failover
happens in an automated fashion. We will explain the details of how
this process happens later.
The primary node is selected through an
election mechanism. If the primary goes down, the selected node will
be chosen as the primary node.
Figure 7-3
shows how a two-member replica set failover happens. Let’s discuss
the various steps that happen for a two-member replica set in
failover .
1.
2.
Replica set replication has a limitation on
the number of members. Prior to version 3.0, the limit was 12 but
this has been changed to 50 in version 3.0. So now replica set
replication can have maximum of 50 members only, and at any given
point of time in a 50-member replica set, only 7 can participate in
a vote. We will explain the voting concept in a replica set in
detail.
Starting from Version 3.0, replica set
members can use different storage engines. For example, the
WiredTiger storage engine might be used by the secondary members
whereas the MMAPv1 engine could be used by the primary. In the
coming sections, you will look at the different storage engines
available with MongoDB.
7.4.2.1 Primary and Secondary Members
Before you move ahead and look at how the
replica set functions, let’s look at the type of members that a
replica set can have. There are two types of members: primary
members and secondary members .
-
Primary member: A replica set can
have only one primary, which is elected by the voting nodes in
the replica set. Any node with associated priority as 1 can be
elected as a primary. The client redirects all the write
operations to the primary member, which is then later replicated
to the secondary members.
7.4.2.2 Types of Secondary Members
Priority 0 members are secondary members
that maintain the primary’s data copy but can never become a
primary in case of a failover. Apart from that, they function as a
normal secondary node, and they can participate in voting and can
accept read requests. The Priority 0 members are created by setting
the priority to 0.
1.
2.
3.
Hidden members are 0-priority members that
are hidden from the client applications. Like the 0-priority
members, this member also maintains a copy of the primary’s data,
cannot become the primary, and can participate in the voting, but
unlike 0-priotiy members, it can’t serve any read requests or
receive any traffic beyond what replication requires. A node can be
set as hidden member by setting the hidden property to true. In a
replica set, these members can be dedicated for reporting needs or
backups.
Delayed members are secondary members that
replicate data with a delay from the primary’s oplog. This helps
to recover from human errors, such as accidentally dropped
databases or errors that were caused by unsuccessful application
upgrades.
When deciding on the delay time, consider
your maintenance period and the size of the oplog. The delay time
should be either equal to or greater than the maintenance window
and the oplog size should be set in a manner to ensure that no
operations are lost while replicating.
Note that since the delayed members will
not have up-to-date data as the primary node, the priority should
be set to 0 so that they cannot become primary. Also, the hidden
property should be true in order to avoid any read requests.
Arbiters are secondary members that do not
hold a copy of the primary’s data, so they can never become the
primary. They are solely used as member for participating in
voting.
This enables the replica set to have an uneven number of nodes
without incurring any replication cost which arises with data
replication.
Non-voting members hold the primary’s
data copy, they can accept client read operations, and they can
also become the primary, but they cannot vote in an election.
The voting ability of a member can be
disabled by setting its votes to 0. By default every member has one
vote. Say you have a replica set with seven members. Using the
following commands in mongo shell, the votes for fourth, fifth, and
sixth member are set to 0:
7.4.2.3 Elections
In this section, you will look at the
process of election for selecting a primary member. In order to get
elected, a server need to not just have the majority but needs to
have majority of the total votes.
If there are X servers with each server
having 1 vote, then a server can become primary only when it has at
least [(X/2) + 1] votes.
The primary that went down still remains
part of the set; when it is up, it will act as a secondary server
until the time it gets a majority of votes again.
The complication with this type of voting
system is that you cannot have just two nodes acting as master and
slave. In this scenario, you will have total of two votes, and to
become a master, a node will need the majority of votes, which will
be both of the votes in this case. If one of the servers goes down,
the other server will end up having one vote out of two, and it
will never be promoted as master, so it will remain a slave.
In case of network partitioning, the master
will lose the majority of votes since it will have only its own one
vote and it’ll be demoted to slave and the node that is acting as
slave will also remain a slave in the absence of the majority of
the votes. You will end up having two slaves until both servers
reach each other again.
A replica set has number of ways to avoid
such situations. The simplest way is to use an arbiter to help
resolve such conflicts. It’s very lightweight and is just a
voter, so it can run on either of the servers itself.
Let’s now see how the above scenario will
change with the use of an arbiter. Let’s first consider the
network partitioning scenario. If you have a master, a slave, and
an arbiter, each has one vote, totalling three votes. If a network
partition occurs with the master and arbiter in one data center and
the slave in another data center, the master will remain master
since it will still have the majority of votes.
If the master fails with no network
partitioning, the slave can be promoted to master because it will
have two votes (slave + arbiter).
Example - Working of Election Process in More Details
Let’s assume you have a replica set with
the following three members: A1, B1, and C1. Each member exchanges
a heartbeat request with the other members every few seconds. The
members respond with their current situation information to such
requests. A1 sends out heartbeat request to B1 and C1. B1 and C1
respond with their current situation information, such as the
state they are in (primary or secondary), their current clock
time, their eligibility to be promoted as primary, and so on. A1
receives all this information’s and updates its “map” of the
set, which maintains information such as the members changed
state, members that have gone down or come up, and the round trip
time.
While updating the A1’s map changes, it
will check a few things depending on its state:
-
Demotions: There’s a problem
when A1 undergoes a demotion. By default in MongoDB writes are
fire-and-forget (i.e. the client issues the writes but doesn’t
wait for a response). If an application is doing the default
writes when the primary is stepping down, it will never realize
that the writes are actually not happening and might end up
losing data. Hence it’s recommended to use safe writes. In
this scenario, when the primary is stepping down, it closes all
its client connections, which will result in socket errors to
the clients. The client libraries then need to recheck who the
new primary is and will be saved from losing their write
operations data.
-
The first task A1 will do is run a sanity check where it will check answers to few question such as, Does A1 think it’s already primary? Does another member think its primary? Is A1 not eligible for election? If it can’t answer any of the basic questions, A1 will continue idling as is; otherwise, it will proceed with the election process:
-
When B1 and C1 receive the
message, they will check the view around them. They will run
through a big list of sanity checks, such as, Is there any
other node that can be primary? Does A1 have the most recent
data or is there any other node that has the most recent data?
If all the checks seem ok, they send a “go-ahead” message;
however, if any of the checks fail, a “stop election”
message is sent.
7.4.2.4 Data Replication Process
Let’s look at how the data replication
works. The members of a replica set replicate data continuously.
Every member, including the primary member, maintains an oplog. An
oplog is a capped collection where the members maintain a record of
all the operations that are performed on the data set.
The secondary members copy the primary
member’s oplog and apply all the operations in an asynchronous
manner.
Oplog
Oplog stands for the operation log . An
oplog is a capped collection where all the operations that modify
the data are recorded.
The oplog is maintained in a special
database, namely local in the collection oplog.$main. Every
operation is maintained as a document, where each document
corresponds to one operation that is performed on the master
server. The document contains various keys, including the following
keys :
Only operations that change the data are
maintained in the oplog because it’s a mechanism for ensuring
that the secondary node data is in sync with the primary node data.
The operations that are stored in the
oplog are transformed so that they remain idempotent, which means
that even if it’s applied multiple times on the secondary, the
secondary node data will remain consistent. Since the oplog is a
capped collection, with every new addition of an operation, the
oldest operations are automatically moved out. This is done to
ensure that it does not grow beyond a pre-set bound, which is the
oplog size.
Depending on the OS, whenever the replica
set member first starts up, the oplog is created of a default size
by MongoDB.
By default in MongoDB, available free
space or 5% is used for the oplog on Windows and 64-bit Linux
instances. If the size is lower than 1GB, then 1GB of space is
allocated by MongoDB.
Although the default size is sufficient in
most cases, you can use the –oplogsize option to specify the
oplog size in MB when starting the server.
Initial Sync and Replication
1.
2.
1.
2.
3.
Syncing – Normal Operation
In normal operations, the secondary
chooses a member from where it will sync its data, and then the
operations are pulled from the chosen source’s oplog collection
(local.oplog.rs).
1.
2.
3.
Suppose it crashes between step 1 and step
2, and then it comes back again. In this scenario, it’ll assume
the operation has not been performed and will re-apply it.
Starting Up
Whom to Sync From?
In this section, you will look at how the
source is chosen to sync from . As of 2.0, based on the average
ping time servers automatically sync from the “nearest” node.
When you bring up a new node, it sends
heartbeats to all nodes and monitors the response time. Based on
the data received, it then decides the member to sync from using
the following algorithm:
Making Writes Work with Chaining Slaves
You have seen that the above algorithm for
choosing a source to sync from implies that slave chaining is
semi-automatic. When a server is started, it’ll most probably
choose a server within the same data center to sync from, thus
reducing the WAN traffic.
However, this will never lead to a loop
because the nodes will sync only from a secondary that has a latest
value of lastOpTimeWritten which is greater than its own. You will
never end up in a scenario where N1 is syncing from N2 and N2 is
syncing from N1. It will always be either N1 is syncing from N2 or
N2 is syncing from N1.
In this section, you will see how w (write
operation ) works with slave chaining . If N1 is syncing from N2,
which is further syncing from N3, in this case how N3 will know that
until which point N1 is synced to.
When N1 starts its sync from N2, a special
“handshake” message is sent, which intimates to N2 that N1 will
be syncing from its oplog. Since N2 is not primary, it will forward
the message to the node it is syncing from (i.e. it opens a
connection to N3 pretending to be N1). By the end of the above step,
N2 has two connections that are opened with N3: one connection for
itself and the other for N1.
Whenever an op request is made by N1 to N2,
the op is sent by N2 from its oplog and a dummy request is forwarded
on the link of N1 to N3, as shown in Figure 7-4.
Although this minimizes network traffic, it
increases the absolute time for the write to reach to all of the
members.
No comments:
Post a Comment