Replication
Introduction
to Replication
Replication is the process of synchronizing
data across multiple servers. Replication provides redundancy and increases
data availability with multiple copies of data on different database servers.
Replication protects a database from the loss of a single server. Replication
also allows you to recover from hardware failure and service interruptions.
With additional copies of the data, you can dedicate one to disaster recovery,
reporting, or backup.
Why Replication
• To keep your data safe
• High (24*7) availability of data
• Disaster recovery
• No downtime for maintenance (like backups,
index rebuilds, compaction)
• Read scaling (extra copies to read from)
• Replica set is transparent to the application
• Replica set. Refers to a group of mongod
instances that hold the same data. The
purpose of replication is to ensure high availability, in case one of the
servers goes down
How Replication Works in MongoDB
MongoDB achieves replication by the use of
replica set. A replica set is a group of mongo instances that
maintain same data set. Replica sets provide redundancy and high availability. In a replica, one node is primary node that
receives all write operations. All other instances, such as secondary, apply
operations from the primary so that they have the same data set. Replica set
can have only one primary node.
• Replica set is a group of two or more nodes
(generally minimum 3 nodes are required).
• In a replica set, one node is primary node
and remaining nodes are secondary.
• All data replicates from primary to secondary
node.
• At the time of automatic failover or
maintenance, election establishes for primary and a new primary node is
elected.
• After the recovery of failed node, it again
join the replica set and works as a secondary node.
A typical diagram of MongoDB replication is
shown in which client application always interact with the primary node and the
primary node then replicates the data to the secondary nodes.
Master-Slave Replication
The master-slave replication is the oldest mode of replication that
MongoDB supports. In the earlier versions of MongoDB, the master-slave
replication was used for failover, backup, and read scaling. However, in the
new versions, it is replaced by replica sets for most use cases. Although,
MongDB still supports the master-slave replication, replica sets are
recommended for new production deployments to replicate data in a cluster. In
the next screen, we will discuss replica sets in MongoDB.
Replica Set in MongoDB
A replica set consists of a group of mongod (read as mongo D) instances
that host the same data set. In a replica set, the primary mongod receives all
write operations and the secondary mongod replicates the operations from the
primary and thus both have the same data set. The primary node receives write
operations from clients. A replica set can have only one primary and therefore
only one member of the replica set can receive write operations. A replica set
provides strict consistency for all read operations from the primary. The
primary logs any changes or updates to its data sets in its oplog(read as op
log). The secondaries also replicate the oplog (read as op log) of the primary
and apply all the operations to their data sets. When the primary becomes unavailable,
the replica set nominates a secondary as the primary. By default, clients read
data from the primary. However, they can also specify read preferences and
request read operations to send to the secondaries. Reads from secondaries may
return data that does not reflect the state of the primary. The diagram given
on the screen depicts that the primary member of the replica set receives all
the read and write requests by default. However, read requests can be directly
sent to the secondary member of the replica set too. We will continue our
discussion on replica sets in MongoDB in the next screen.
You can choose to add an extra mongod instance to a replica set to act
as an arbiter. Note that arbiters do not maintain a data set, they only elect
the primary. An arbiter is the node that just participates in the election to
select the primary node, and it does not store any data and hence does not
require a dedicated hardware. Secondary members in a replica set asynchronously
apply operations from the primary. By applying operations after the primary,
replica sets can function without any secondary members. As a result, all
secondary members may not return the updated data to clients. A primary can
convert to a secondary or vice versa. However, an arbiter will remain
unchanged. We will discuss how a replica set handles automatic failover in the
next screen.
Replica Set Features
•
A cluster of
N nodes
•
Any one node
can be primary
•
All write
operations go to primary
•
Automatic
failover
•
Automatic
recovery
•
Consensus
election of primary
Set Up a Replica Set
In this tutorial, We will convert standalone
MongoDB instance to a replica set. To convert to replica set, following are the
steps −
•
Shutdown
already running MongoDB
•
Start the
MongoDB server by specifying -- replSet option. Following is the basic syntax
of --replSet −
mongod --port "PORT" --dbpath
"YOUR_DB_DATA_PATH" --replSet "REPLICA_SET_INSTANCE_NAME"
Example
mongod --port 27017 --dbpath "D:\set
up\mongodb\data" --replSet rs0
•
It will start
a mongod instance with the name rs0, on port 27017.
•
Now start the
command prompt and connect to this mongod instance.
•
In Mongo
client, issue the command rs.initiate() to initiate a new
replica set.
•
To check the
replica set configuration, issue the command rs.conf(). To check
the status of replica set issue the command rs.status().
Automatic Failover
When the primary node of a replica set stops communicating with other
members for more than 10 seconds or fails, the replica set selects another
member as the new primary. The selection of the new primary happens through an
election process and whichever secondary node gets majority of the votes
becomes the primary. A replica set supports application needs in various ways.
For example, you may deploy a replica set in multiple data centers, or
manipulate the primary election by adjusting the priority of members. In
addition, replica sets support dedicated members for functions, such as
reporting, disaster recovery, or backup. In the next screen, we will discuss
replica set members.
Replica Set Members
In addition to the primary and secondaries, a replica set can also have
an arbiter. Unlike secondaries, arbiters do not replicate or store data.
However, arbiters play a crucial role in selecting a secondary to take the
place of the primary when the primary becomes unavailable. Therefore, a typical
replica set contains a primary, a secondary, and an arbiter. Typically, most of
the replica set deployments keep three members that store data, one primary and
two secondaries. A replica set in MongoDB version 3.0. (read as three point O )
can have up to 50 members but only 7 members are capable of voting. In the
previous versions of MongoDB, replica sets could have maximum 12 members. In
the next screen, we will discuss riority zero replica set members
Priority 0 Replica Set Members
A priority zero member in a replica set is a secondary member that
cannot become the primary. These members can act as normal secondaries, but
cannot trigger any election. A priority zero member performs functions, such as
maintaining data set copies, accepting and performing read operations, and
electing the primary. By configuring a priority zero member, you can prevent
secondaries from becoming the primary. This is particularly useful in
multi-data center deployments. In a replica set containing three members, one
data center hosts both, the primary and a secondary, and the second data center
hosts one priority zero member. Typically, a priority zero member acts as a
backup. For example, in some replica sets, a new member may not be able to add
immediately. The backup or standby member stores the updated data and can
immediately replace an unavailable member. In the next screen, we will discuss
hidden replica set members.
Hidden Replica Set Members
Hidden members of a replica set are invisible to the client
applications. They store a copy of the primary’s data and are good for workloads
with different usage patterns from the other members in the replica set.
Although the hidden replica set members are priority zero members and can never
replace the primary, they may elect the primary. In the five-member replica set
shown on the screen, all the four secondary members contain the primary’ data
set copies and one hidden secondary member. Typically, the clients of read
operations do not distribute appropriate read rights to the hidden members.
Therefore, the hidden members receive only the basic replication traffic. You
can utilize the hidden members for dedicated functions like reporting and
backups. In a sharded cluster, mongos (read as mongo s) do not interact with
the hidden replica set members In the next screen we will discuss delayed
replica set members.
Delayed Replica Set Members
Delayed replica set members are those secondaries that copy data from
the primary node’s oplog file after a certain interval or delay. Delayed
replica set members store their data set copies. However, they reflect a
previous version, or a delayed state of the set. For example, if the data shown
in the primary is for 10 AM then the delayed member will show data for 9 AM.
Delayed members perform a “roll backup” or run a “historical” snapshot of the
data set. Therefore, they help you manage various human errors and recover from
errors, such as unsuccessful application upgrade, and dropped databases and
collections. A delayed member: • Must be a priority zero member • Must be
hidden and not visible to applications and • Must participate in electing the
primary In the next screen, we will discuss the configuration settings of
delayed replica set members
You can configure a delayed secondary member with the following
settings: • Priority value - zero, • hidden value - true, and • slaveDelay
value —number of seconds to delay You can set a 1-hour delay on a secondary
member currently at the index 0 in the members array. To set the delay, issue
the sequence of operations given on the screen in a mongo shell connected to
the primary. After a replica set reconfigures, the delayed secondary member
cannot replace the primary and is hidden from applications. The slaveDelay
value delays both replication and the member’s oplog by 3600 seconds or 1 hour.
In the next screen, we will view a demo on how to start a replica set.
Write Concern
The write concern provides an assurance of the success of a write
operation. The strength or weakness of a write concern determines the level of
assurance. For operations having weak concerns, such as updates, inserts, and
deletes, the write operations return a quick result. Write operations having
weak write concerns may fail. When stronger write concerns are used in an
operation, clients wait after sending the operation for MongoDB to confirm
their success. MongoDB provides different levels of write concerns to address
the varied application needs. Clients may manage write concerns to ensure that
the critical operations are successfully carried out on the MongoDB deployment.
For less critical operations, clients can manage the write concern to ensure
faster performance rather than ensure persistence to the entire deployment.
Typically, regardless of the write concern level or journaling configuration,
MongoDB allows clients to read the inserted or modified documents before
committing the modifications to the disk. As a result, if an application has
multiple readers and writers, MongoDB allows a reader to read the data of
another writer even when the data is still not committed to the journal. We
will continue our discussion on Write Concern in the next slide.
MongoDB modifies each document of a collection separately. For
multi-document operations, MongoDB does not provide any multi-document
transactions or isolation. When a standalone mongod returns a successful
journaled write concern, the data is stored to the disk and becomes available
after mongod restarts. Write operations in a replica set can become durable
only after a write data gets replicated and committed to the journal on
majority of the voting members of a replica set. MongoDB periodically commits
data to the journal as defined by the commitIntervalMs (read as commit interval
millisecond) parameter. In the next screen, we will discuss write concern
levels.
Write Concern Levels
MongoDB has two levels of conceptual write concern—unacknowledged and
acknowledged. In an unacknowledged write concern, MongoDB does not acknowledge
the received write operations or ignores it. This is similar to ignored errors.
However, the driver is capable of receiving and handling network errors
whenever possible. The driver’s ability to detect and handle network errors
depend on the networking configuration of the system. In an acknowledged write
concern, the mongod confirms the receipt of the write operation. It also
confirms that the changes to the in-memory data view are applied. An
acknowledged write concern can detect network, duplicate key, and other errors.
By default, MongoDB uses the acknowledged write concern in the driver. The
mongo shell includes the write concern in the write methods and provides the
default write concern whether it runs interactively or in a script. An
acknowledged write concern does not confirm that the write operation has
persisted to the disk system. Note that journaling must be enabled to use a
write concern. In the next screen, we will discuss the write concern for a
replica set.
Write Concern for a Replica Set
Replica sets have additional considerations for write concerns.
Typically, the default write concerns in a replica set require acknowledgement
only from the primary member. A write concern acknowledged by a replica set
ensures that the write operation spreads to the other members of the set. The
default write concern confirms write operations only for the primary. However,
you can choose to override this write concern to check write operations on some
replica set members. You can override the default write concern by specifying a
write concern for each write operation. For example, the method given on the
screen contains a write concern that specifies that the method should return a
response only after the write operation is spread to the primary and minimum
one secondary, or the method times out after 5 seconds. In the next screen, we
will discuss how to modify default write concerns.
Modify Default Write Concern
You can modify a default write concern by changing the
“getLastErrorDefaults” (read as get last error defaults) setting in the replica
set configuration. This write operation will complete most of the voting
members before returning the result. The sequence of commands given on the
screen creates a configuration that waits for the write operation to complete.
Additionally, you can include a timeout threshold for a write concern to
prevent blocking write operations. For example, a write concern must be
acknowledged by four members of a replica set. However, only three members are
available in the replica set. The operation blocks until all the four members
are available. You can prevent the operation from blocking by adding a timeout.
In the next screen, we will discuss Read Preferences.
Read Preference
Read preferences define how MongoDB clients route read operations to
replica set members. By default, a client directs its read operations to the
primary. Therefore, reading from the primary member means that you have the
latest version of a document. However, you can distribute reads among the
secondary members to improve the read performance or reduce latency for
applications that do not require the up-to-date data The typical use cases for
using non-primary read preferences are as follows. • Running systems operations
that do not affect the front-end application. • Providing local reads for the
geographically distributed applications. • Maintaining availability during a
failover. You can use Mongo.setReadPref() (read as mongo dot set read
preference method) to set a read preference. We will discuss the read
preference modes in the next screen.
Read Preference Modes
The read preference modes supported in a replica set are as follows.
Primary: This is the default read preference mode. All operations read from the
current replica set are primary. PrimaryPreferred: Operations are mostly read
from the primary member. However, when the primary is unavailable, operations
are read from a secondary member. Secondary: All operations are read from
secondary members of the replica set. secondaryPreferred: In most situations,
operations are read from a secondary, but when a secondary member is
unavailable, operations are read from the primary. nearest: Operations are read
from a member of the replica set with the least network latency, irrespective
of the member’s type. In the next screen, we will discuss blocking for
replication.
Blocking for Replication
The getLastError (read as get last error) command in MongoDB ensures how
the up-to-date replication is done by using the optional "w"
parameter. In the query shown on the screen, the getLastError command is run
that blocks until at least N number of servers have replicated the last write
operation. The query will return immediately if N is either not present or is
less than two. If N equals to two, the master will respond to the command only
after one slave replicates the last operation. Note that the master is included
in N. The master uses the "syncedTo" information stored in local
slaves to identify how up-to-date each slave is. When specifying "w",
the getLastError command takes an additional parameter,
"wtimeout"(read as w- timeout) —a timeout in milliseconds. This
allows the getLastError command to time out and return an error before the last
operation has replicated to N servers. However, by default the command has no
timeout. Blocking the replication significantly slows down write operations,
particularly for large values of "w". Setting "w" to two or
three for important operations yields a good combination of efficiency and
safety. In the next screen, we will discuss tag sets.
Tag Set
Tag sets allow tagging target read operations to select the replica set
members. Customized read preferences and write concerns assess tag sets
differently. Typically, read preferences stresses on the tag value when
selecting a replica set member to read from. Write concerns on the other hand,
ignore the tag value when selecting a member. However, they confirm whether the
value is unique. You can specify tag sets with the following read preference
modes: • primaryPreferred • secondary • secondaryPreferred and • nearest
Remember, tags and the primary mode are not compatible and, in general, tags
apply only when selecting a secondary member of a read operation. However, tags
are compatible with the nearest mode. When combined together, the nearest mode
selects the matching member, primary or secondary, with the lowest network
latency. Typically, all interfaces use the same member selection logic to
choose the member to direct read operations, and base the choice on the read
preference mode and tag sets. In the next screen, we will discuss how to
configure tag sets for replica sets.
Configure Tag Sets for Replica set
Using tag sets, you can customize write concerns and read preferences in
a replica set. MongoDB stores tag sets in the replica set configuration object,
which is the document returned by rs.conf()(read as R-S dot conf), in the
members.tags (read as members dot tags) embedded document. When using either
write concern or read scaling, you may want to have some control over which
secondaries receive writes or reads. For example, you have deployed a five-node
replica set across two data centers, NYK (read as New York) and LON (read as
London). New York is the primary data center that contains three nodes. The
secondary data center, London contains two. Suppose you want to use a write
concern to block until a certain write is replicated to at least one node in
the London data center. You cannot use a “w” value of majority, because this will
translate into a value of 3, and the most likely scenario is that the three
nodes in New York will acknowledge first. Replica set tagging resolves such
problems by allowing you to define special write concern modes that target the
replica set members with defined tags. To see how this works, you first need to
learn how to tag a replica set member. In the config document, each member can
have a key called tags pointing to an object containing key-value pairs. You
can add tag sets to the members of a replica set with the following sequence of
commands in the mongo shell as given on the screen. Modes can be defined as a
document with one or more keys in the “getLastErrorModes” command. Here, in
this command, you have defined a mode containing 2 keys, “dc” and “rackNYK”
(read as rack new york) whose values are integers 2. These integers indicate
that the number of the tagged value must be satisfied in order to complete the
getlastError command. In the next screen, we will discuss Replica Set
Deployment.
Replica Set Deployment Strategies
A replica set architecture impacts the set’s capability. In a standard
replica set for a production system, a three-member replica set is deployed.
These sets provide redundancy and fault tolerance, avoid complexity, but let
your application requirements define an architecture. Use the following
deployment strategies for a replica set. Deploy an Odd Number of Member: For
electing the primary member, you need an odd number of members in a replica
set. Add an arbiter in case a replica set has even number of members. Consider
Fault Tolerance: Fault tolerance for a replica set is the number of members
that can become unavailable and still leave enough members in the set to elect
the primary. A replica set cannot accept any write operation if the primary
member is not available. Remember that adding a new member to a replica set
does not always increase its fault tolerance capability. However, the
additional members added in a replica set can provide support for some
dedicated functions, such as backups or reporting. Use Hidden and Delayed
Members: Add hidden or delayed members to support dedicated functions, such as
backup or reporting. Load Balance on Read-Heavy Deployments: In deployments
with high read traffic, to improve read performance, distribute reads to
secondary members. As your deployment grows, add or move members to alternate
data centers and improve redundancy and availability. However, ensure that the
main facility is able to elect the primary. In the next screen, we will continue
with the Replica Set Deployment strategy.
Replica Set Deployment Strategies (contd.)
Some more replica set deployment strategies are given below. Add
Capacity Ahead of Demand: You must be able to add a new member to an existing
replica set to meet the increased data demands. Add new members before new
demands arise. Distribute Members Geographically: For data safety in case of
any failure, keep at least one member in an alternate data center. Also set the
priorities of these members to zero so that they do not become the primary.
Keep a Majority of Members in One Location When replica set members are
distributed across multiple locations, network partition may hinder the
communication between the data centers, and affect data replication. When electing
the primary, all the members must be able to see each other to create a
majority. To enable the replica set members create a majority and elect the
primary, ensure that majority of the members are in one location. Use replica
set tag sets: This ensures that all operations are replicated at specific data
centers. Using tag sets helps to route read operations to specific computers.
Use Journaling to Protect Against Power Failures To avoid the data loss
problem, use journaling so that data can be safely written on a disk in case of
shutdowns, power failure, and other unexpected failures. In the next screen, we
will discuss replica set deployment patterns.
Replica Set Deployment Patterns
The common deployment patterns for a replica set are as follows: • Three
Member Replica Sets: This is the minimum recommended architecture for a replica
set. • Replica Sets with Four or More Members: This set provides greater
redundancy and supports greater distribution of read operations and dedicated
functionality. • Geographically Distributed Replica Sets: These include members
in multiple locations to protect data against facility-specific failures, such
as power outages. In the next screen, we will discuss the oplog file.
Oplog File
The record of operations maintained by the master server is called the
operation log or oplog (read as op log). The oplog is stored in a database
called local, in the “oplog.$main” (read as op log dot main) collection. Each
oplog document denotes a single operation performed on the master server and
contains the following keys: Timestamp or TS for the operation: It is an
internal function used to track operations. It contains a 4-byte timestamp and
a 4-byte incrementing counter. Op: This is the type of operation performed as a
1-byte code, for example, “i” for an insert. Namespace or Ns is the collection
name where the operation is performed. O: This key is used to document further
specifying the operation to perform. For an insert, this would be the document
to insert. The oplog stores only those operations that change the state of the
database. The oplog is intended only as a mechanism for keeping the data on
slaves in sync with the master. The oplog files do not exactly store the
operations that are performed on the master rather the operation gets
transformed and then is stored into the oplog file. In the next screen, we will
discuss replication state and local database.
Replication State and Local Database
MongoDB maintains a local database called “local” to keep the
information about the replication state and the list of master and slaves. The
content of this database does not get replicated and remains local to the
master and slaves. This list is stored in the slave’s collection. Slaves store
the replication information in the local database. The unique slave identifier
gets saved in the “me” collection and the list of masters gets saved in
“sources” collection. The master and slave both use the timestamp stored in the
“syncedTo” command to understand how up-to-date a slave is. A slave uses “synced
to” to query the oplog for new operations and find out if any operation is out
of sync. In the next screen, we will discuss Replication Administration.
Replication Administration
MongoDB inspects the replication status using some administrative helpers.
To check the replication status, use the function given on the screen when
connected to the master. This function provides information on the oplog size
and the date ranges of operations contained in the oplog. In the given example,
the oplog size is 10 megabyte and can accommodate about 30 seconds of
operations. In such a case, you should increase the size of the oplog. You can
compute the log length by determining the time difference between the first and
the last operation. In case the server has recently started, then the first
logged operation will be relatively recent. Hence, the log length will be
small, even though the oplog may have some free space available. The log length
serves as a metric for servers that have been operational long enough for the
oplog to “roll over.” You can also get some information when connected to the
slave, using the functions given on the screen. This function will populate a
list of sources for a slave, each displaying information such as how far behind
it is from the master. In the next screen, we will view a demo on how to check
the status of a replica set.
Demo—
Add Members to Replica Set
To add members to replica set, start mongod instances on multiple
machines. Now start a mongo client and issue a command rs.add().
Syntax
The basic syntax of rs.add() command is as follows −
>rs.add(HOST_NAME:PORT)
Example
Suppose your mongod instance name is mongod1.net and it is running on
port 27017. To add this instance to replica set, issue the command rs.add()in
Mongo client.
>rs.add("mongod1.net:27017") >
You can add mongod instance to replica set only when you are connected
to primary node. To check whether you are connected to primary or not, issue
the command db.isMaster() in mongo client.
Replica set replication has a limitation on the number of members. Prior
to version 3.0, the limit was 12 but this has been changed to 50 in version
3.0. So now replica set replication can have maximum of 50 members only, and at
any given point of time in a 50-member replica set, only 7 can participate in a
vote
Sharding
Sharding is the process of distributing data across multiple servers for
storage. MongoDB uses sharding to manage massive data growth. With an increase
in the data size, a single machine may not be able to store data or provide an
acceptable read and write throughput. Sharding supports horizontal scaling and
thus is capable of distributing data across multiple machines. Sharding allows
you to add more servers to your database to support data growth and
automatically balances data and load across various servers. Sharding provides
additional write capacity by distributing the write load over a number of
mongod instances. It splits the data set and distributes them across multiple
databases, or shards. Each shard serves as an independent database, and
together, shards make a single logical database. Sharding reduces the number of
operations each shard handles and as a cluster grows, each shard handles fewer
operations and stores lesser data. As a result, a cluster can increase its
capacity and input horizontally. For example, to insert data into a particular
record, the application needs to access only the shard that holds the record.
If a database has a 1 terabyte data set distributed amongst 4 shards, then each
shard may hold only 256 Giga Byte of data. If the database contains 40 shards,
then each shard will hold only 25 Giga Byte of data In the next screen, we will
discuss when sharding should be used.
When to Use Sharding?
Typically, sharded clusters require a proper infrastructure setup. This
increases the overall complexity of the deployment. Therefore, consider
deploying sharded clusters only when there is an application or operational
requirement. You must consider deploying a sharded cluster when your system
shows the following characteristics: • The data set outgrows the storage
capacity of a single MongoDB instance. • The size of the active working set
exceeds the capacity of the maximum available RAM. • A single MongoDB instance
is unable to manage write operations. In the absence of these characteristics,
sharding will not benefit your system, rather, it will add complexity.
Deploying sharding consumes time and resources. In case your database system
has already surpassed its capacity, deploying sharding without impacting the
application is not possible. Therefore, deploy sharding if you expect that the
read and write operations are going to be increased in future. In the next
screen, we will discuss what a shard is.
Sharding Components
You will next look at the components that enable sharding in MongoDB.
Sharding is enabled in MongoDB via sharded clusters.
The following are the components of a sharded cluster:
• Shards
• mongos
• Config servers
The shard is the component where the actual data is stored. For the
sharded cluster, it holds a subset of data and can either be a mongod or a
replica set. All shard’s data combined together forms the complete dataset for
the sharded cluster.
Sharding is enabled per collection basis, so there might be
collections that are not sharded. In every sharded cluster there’s a primary
shard where all the unsharded collections are placed in addition to the sharded
collection data.
Config servers are special mongods that hold the sharded cluster’s
metadata. This metadata depicts the sharded system state and organization.
The config server stores data for a single sharded cluster. The config
servers should be available for the proper functioning of the cluster.
One config server can lead to a cluster’s single point of failure. For
production deployment it’s recommended to have at least three config servers,
so that the cluster keeps functioning even if one config server is not
accessible.
A config server stores the data in the config
database, which enables routing of the client requests to the respective data.
This database should not be updated.
MongoDB writes data to the config server only when the
data distribution has changed for balancing the cluster.
The mongos act as the routers. They are responsible for routing the read
and write request from the application to the shards.
An application interacting with a mongo database need not worry about
how the data is stored internally on the shards. For them, it’s transparent
because it’s only the mongos they interact with. The mongos, in turn, route the
reads and writes to the shards.
The mongos cache the metadata from config server so that for every read
and write request they don’t overburden the config server.
However, in the following cases, the data is read from the config server :
• Either an existing mongos
has restarted or a new mongos has started for the first time.
• Migration of chunks. We will explain chunk migration in detail later.
There are three main components −
Shards − Shards are used to store data. They provide high
availability and data consistency. In production environment, each shard is a
separate replica set.
Config Servers − Config servers store the cluster's
metadata. This data contains a mapping of the cluster's data set to the shards.
The query router uses this metadata to target operations to specific shards. In
production environment, sharded clusters have exactly 3 config servers.
Query Routers − Query routers are basically mongo
instances, interface with client applications and direct operations to the
appropriate shard. The query router processes and targets the operations to
shards and then returns results to the clients. A sharded cluster can contain
more than one query router to divide the client request load. A client sends
requests to one query router. Generally, a sharded cluster have many query
routers.
What is a Shard?
A shard is a replica set or a single mongod instance that holds the data
subset used in a sharded cluster. Shards hold the entire data set for a
cluster. Each shard is a replica set that provides redundancy and high
availability for the data it holds. MongoDB shards data on a per collection
basis and lets you access the sharded data through mongos instances. If you
directly connect to a shard, you will be able to view only a fraction of the
data contained in a cluster. Data is not organized in any particular order. In
addition, MongoDB does not guarantee that any two contiguous data chunk will
reside on any particular shard. Note that every database contains a “primary”
shard that holds all the un-sharded collections in that database. In the next
screen, we will discuss a shard key.
What is a Shard Key
When deploying sharding, you need to choose a key from a collection and
split the data using the key’s value. This key is called a shard key that
determines how to distribute the documents of a collection among the different
shards in a cluster. The shard key is a field that exists in every document in
the collection and can be an indexed or indexed compound field. . MongoDB
performs data partitions in a collection using the different ranges or chunks
of shard key values. Each range or chunk defines a non-overlapping range of
shard key values. MongoDB distributes chunks, and their documents, among the
shards in a cluster. MongoDB also distributes documents according to the range
of values in the shard key. In the next screen, we will discuss how to choose a
shard key.
Choosing a Shard Key
To enhance and optimize the performance, functioning and capability of
your database, you need to choose the correct shard key. Choosing the
appropriate shard key depends on two factors—the schema of your data and the
way applications in your database query and perform write operations. In the
next screen, we will discuss the characteristics of an ideal shard key.
Ideal Shard Key
An ideal shard key must have the following characteristics: Must be
easily divisible— an easily divisible shard key enables MongoDB to perform data
distribution among shards. If shard keys contain limited number of possible
values, then the chunks in shards cannot be split. For example, if a chunk or a
range represents a single shard key value, then the chunk cannot be split even
if it exceeds the recommended size. High Degree of Randomness—A shard key must
possess a high degree of randomness. This ensures that a single shard
distributes write operations among the cluster and does not become a
bottleneck. Target a Single Shard: A shard key must target a single shard to
enable the mongos program to return most of the query operations directly from
a single mongod instance. In addition, the shard key should be the primary
field in queries. Fields having a high degree of “randomness” cannot target
operations to specific shards. Use a Compound Shard Key: If an existing field
in your collection is not the ideal key, compute a special purpose shard key or
use a compound shard key. In the next screen, we will discuss range-based
sharding.
Range-Based Shard Key
In range-based sharding, MongoDB divides data sets into different ranges
based on the values of shard keys. Thus it provides range-based partitioning.
For example, consider a numeric shard key. If an imaginary number line goes
from negative infinity to positive infinity, each shard key value falls at some
point on that line. MongoDB partitions this line into chunks where each chunk
can have a range of values. In range-based sharding, documents having “close”
shard key values reside in the same chunk and shard. Range-based partitioning
supports range queries because for a given range query of a shard key, the
query router can easily find which shards contain those chunks. Data
distribution in range-based partitioning can be uneven, which may negate some
benefits of sharding. For example, if a shard key field size increases
linearly, such as time, then all requests for a given time range will map to
the same chunk and shard. In such cases, a small set of shards may receive most
of the requests and the system would fail to scale. In the next screen, we will
discuss Hash based sharding.
Hash-Based Sharding
For hash based partitioning, MongoDB first calculates the hash of a
field’s value, and then creates chunks using those hashes. Unlike range-based partitioning,
in hash based partitioning, documents with “close” shard key values may not
reside in the same chunk. This ensures that the collection in a cluster is
randomly distributed. In hash based partitioning, data is evenly distributed.
Hashed key values randomly distribute data across chunks and shards. The random
distribution of data across chunks using hash based partitioning makes range
query on the shard key ineffective, as data will be distributed to many shards
rather than a few number of shards as in the case of range based partitioning .
In the next screen, we will discuss the impact of shard key cluster operation.
Impact of Shard Keys on Cluster Operation
Some shard keys are capable of scaling write operations. Typically, a
computed shard key that possess “randomness,” to some extent allows a cluster
to scale write operations. For example, hash keys that include a cryptographic
hash, such as MD5 or SHA1 (read asMD Five or S-H-A one) can scale write
operations. However, random shard keys do not provide query isolation. To
improve write scaling, MongoDB enables sharding a collection on a hashed index.
MongoDB improves write scaling using two methods, querying and query isolation.
Querying: A mongos instance provides an interface to enable applications to
interact with sharded clusters. When mongos receives queries from client
applications, it uses metadata from the config server and routs queries to the
mongod instances. mongos makes querying operational in sharded environments.
However, the shard key you select can impact the query performance. Query
Isolation: Query execution will be fast and efficient if mongos can route to a
single shard using a shard key and metadata stored from the config server.
Queries that require routed to many shared will not be efficient as they will
have to wait for a response from all of those shards. If your query contains
the first component of a compound shard key then mongos can route a query to a
single or the minimum number of shards, which provides good performance. In the
next screen, we will discuss production cluster architecture.
Production Cluster Architecture
Special mongod instances, such as config servers store metadata for a
sharded cluster. Config servers provide consistency and reliability using a
two-phase commit. Config servers do not run as replica sets and must be
available to deploy a sharded cluster or to make changes to a cluster metadata.
The production sharded cluster shown in the image on the screen has the
following three config servers. Each config server must be on separate
machines. A single sharded cluster must have an exclusive use of its config
servers. If you have multiple sharded clusters, you will need to have a group
of config servers for each cluster. Config Database: Config servers store the
metadata in the config databases. The mongos instance caches this data and uses
it to route the reads and writes to shards. MongoDB performs writes operations
in the config server only in the following cases: • To split the existing
chunks and • To migrate a chunk between shards. MongoDB reads data from the
config server when: • A new mongos starts for the first time, or an existing
mongos restarts. • After a chunk migration, the mongos instances update
themselves with the new cluster metadata. MongoDB also uses the config server
to manage the distributed locks. In the next screen, we will discuss Config
server availability.
Config Server Availability
When one or more config servers are unavailable, the metadata of the
cluster becomes read-only. You can still read and write data from shards.
However chunk migrations occur only when all the three servers are available.
If all three config servers are unavailable, you can still use the cluster
provided the mongos instances do not restart. If the mongos instances are
restarted before the config servers are available, the mongos become unable to
route reads and writes. Clusters are inoperable without the cluster metadata.
Therefore, ensure that the config servers remain available and intact
throughout. Query routers are the mongos processes that interface with the
client applications and direct queries to the appropriate shard or shards. All
the queries from client applications are processed by the query router and in
the sharded cluster it is recommended to use 3 query routers. In the next
screen, we will discuss production cluster deployment.
Production Cluster Deployment
When deploying a production cluster, ensure data redundancy and systems
availability. A production cluster must have the following components: Config
Servers: there are three config servers. Each of the config servers must be
hosted on separate machines. Each single sharded cluster must have an exclusive
use of its config servers. For multiple sharded clusters, you must have a group
of config servers for each cluster. Shards: A production cluster must have two
or more replica sets or shards. Query Routers or mongos: A production cluster
must have one or more mongos instances. The mongos instances act as the routers
for the cluster. Production cluster deployments typically have one mongos
instance on each application server. You may deploy more than one mongos
instances and also use a proxy or load balancer between the application and
mongos. When making these deployments, configure the load balancer to enable a
connection from a single client reach the same mongos. Cursors and other
resources are specific to a single mongos instance, and hence each client must
interact with only one mongos instance. In the next screen, we will discuss how
to deploy a sharded cluster.
Deploy a Sharded Cluster
To deploy a sharded cluster, perform the following sequence of tasks:
Create data directories for each of the three config server instances. By
default, a config server stores its data files in the /data/configdb directory
(read as slash data slash config D-B directory). You can also choose an
alternate location to store the data. To create a data directory, issue the
command as shown on the screen. Start each config server by issuing a command
using the syntax given on the screen. The default port for config servers is
27019 (read as 2-7-0-1-9). You can also specify a different port. The example
given on the screen starts a config server using the default port and default
data directory. By default, a mongos instance runs on the port 27017. To start
a mongos instance, issue a command using the syntax given on the screen: For
example, to start a mongos that connects to a config server instance running on
the following hosts and on the default ports, issue the last command given on
the screen. In the next screen, we will discuss how to add shards to a cluster.
Add Shards to a Cluster
To add shards to a cluster, perform the following steps. Step 1. From a
mongo shell, connect to the mongos instance and issue a command using the
syntax given on the screen. If a mongos is accessible at node1.example.net
(read as node one dot example dot net) on the port 27017, (read as 2-7-0-1-7)
issue the following command given on the screen: Step 2. Add each shard to the
cluster using the sh.addShard() (read as Add shard method on the sh console )
method, as shown in the examples given on the screen. Next, issue the
sh.addShard() method separately for each shard. If a shard is a replica set,
specify the name of the replica set and specify a member of the set. In
production deployments, all shards should be replica sets. To add a shard for a
replica set named rs1 (read as RS one) with a member running on the port 27017
on mongodb0.example.net(read as mongo D-B zero dot example dot net), issue the
following command given on the screen. To add a shard for a standalone mongod
on the port 27017 of mongodb0.example.net, issue the following command given on
the screen: In the next screen, we will view a demo on how to create a sharded
cluster in MongoDB.
Enable Sharding for Database
Before you start sharding a collection, first enable sharding for the
database of the collection. When you enable sharding for a database, it does
not redistribute data but allows sharding the collections in that database.
MongoDB assigns a primary shard for that database where all the data is stored
before sharding begins. To enable sharding, perform the following steps. Step
1: From a mongo shell, connect to the mongos instance. Issue a command using
the syntax given on the screen. Step 2. Issue the sh.enableSharding() (read as
Sh dot enable sharding) method, specify the name of the database for which you
want to enable sharding. Use the syntax given on the screen: Optionally, you
can enable sharding for a database using the “enableSharding” command. For
this, use the syntax given on the screen. In the next screen, we will discuss
how to enable sharding for a collection.
Enable Sharding for Collection
You can enable sharding on a per-collection basis. To enable sharding
for a collection, perform the following steps. Step 1: Determine the shard key
value. The selected shard key value impacts the efficiency of sharding. Step 2:
If the collection already contains data, create an index on the shard key using
the createIndex() method. If the collection is empty then MongoDB will create
the index as a part of the sh.shardCollection()(read as S-H dot shard
collection) step. Step 3. To enable sharding for a collection, open the mongo
shell and issue the sh.shardCollection()(read as S-H dot shard collection)
method. The method uses the syntax given on the screen. We will continue with
our discussion on enabling sharding for a collection.
Enable Sharding for Collection (contd.)
To enable sharding for a collection replace the string . (read as
database dot collection) with the full namespace of your database. This string
consists of the name of your database, a dot, and the full name of the
collection. The shard-key-pattern represents your shard key, which you specify
in the same form as you would an index key pattern. This example given on the
screen shows sharding collections based on the partition key. In the next
screen, we will discuss maintaining a balanced data distribution.
Maintaining a Balanced Data
Distribution
When you add new data or new servers to a cluster, it may impact the
data distribution balances within the cluster. For example, some shards may
contain more chunks than another shard or the chunk sizes may vary and some can
be significantly larger than other chunks. MongoDB uses two background
processes, splitting and a balancer to ensure a balanced cluster. In the next
screen, we will discuss how MongoDB uses splitting to ensure a balanced
cluster.
Splitting
Splitting is a background process that checks the chunk sizes. When a
chunk exceeds a specified size, MongoDB splits the chunk into two halves.
MongoDB uses the insert and update functions to trigger a split. When creating
splits, MongoDB does not perform any data migration or affect the shards. When
you split, the chunk distribution may uneven a collection across the shards. In
such cases, the mongos instances will redistribute the chunks across shards. In
the next screen, we will discuss Chunk Size
Chunk Size
In MongoDB, the default chunk size is 64 megabytes. You can alter a
chunk size. However, this may impact the efficiency of the cluster. Smaller
chunks enable an even distribution of data but may result in frequent
migrations. This creates expense at the query routing or mongos layer. Larger
chunks encourage fewer migrations and are more efficient from the networking
perspective and in terms of internal overhead at the query routing layer.
However, this may lead to an uneven distribution of data. The chunk size
affects the maximum number of documents per chunk to migrate. If you have many
deployments to make, you can use larger chunks. This will help avoid frequent
and unnecessary migrations although the data distribution will be uneven. When
you split chunks, changing the chunk size affects with the following
limitations. • Automatic splitting occurs only during inserts or updates. When
you reduce the chunk size, all chunks may take time to split to the new size. •
Once you split chunks, they cannot be “undone”. When you increase the chunk
size, the existing chunks must grow through inserts or updates until they reach
the new size. In the next screen, we will discuss special chunk types.
Special Chunk Type
Following are the special chunk types. Jumbo Chunks MongoDB does not
migrate a chunk if its size is more than the specified size or if the number of
documents contained in the chunk exceeds the Maximum Number of Documents per
Chunk to Migrate. In such situations, MongoDB splits the chunk. If the chunk
cannot be split, MongoDB labels the chunk as jumbo. This avoids repeated
attempts to migrate the same chunk.
Shard Balancing
Sometimes, chunks can grow beyond the specified size but cannot be
split. For example, when a chunk represents a single shard key value. Invisible
chunks are those chunks that cannot be broken by the split process. In the next
screen, we will discuss shard balancing.
Shard Balancing (contd.)
Chunk migrations carry bandwidth and workload overheads, which may
impact the database performance. Shard balancing minimizes the impact by: •
Moving only one chunk at a time. • Starting a balancing round only when the
difference between the greatest number of chunks for a sharded collection and
the shard with the lowest number of chunks for that collection reaches the
migration threshold. In some scenarios, you may want to disable the balancer
temporarily such as for maintenance of MongoDB or during the peak load time to
prevent impacting the performance of MongoDB. In the next screen, we will
discuss customized data distribution.
Customized Data Distribution with Tag Aware Sharding
A MongoDB administrator can control sharding by using tags. Tags help
control the balancer behaviour and chunks distribution in a cluster. As an
administrator you can create tags with the ranges of shard keys and assign them
those to the shards. The balancer process migrates the tagged data to the
shards that are assigned for those tags. Tag aware sharding helps you improve
the locality of data for a sharded cluster. In the next screen, we will discuss
Tag Aware Sharding
Tag Aware Sharding
In mongoDB, you can create tags for a range of shard keys to associate
those ranges to a group of shards. Those shards receive all inserts within that
tagged range. The balancer that moves chunks from one shard to another always
obeys these tagged ranges. The balancer moves or keeps a specific subset of the
data on a specific set of shards and ensures that the most relevant data
resides on the shard which is geographically closer to the client\application
server. In the next screen, we will discuss how to add shard tags.
Add Shard Tags
When connected to a mongos instance, use the sh.addShardTag() (read as
s-h dot add shard tag) method to associate tags with a particular shard. A
single shard may have multiple tags whereas multiple shards may have a common
tag. Consider the example given on the screen. This adds the tag NYC to two
shards, and adds the tags SFO and NRT to a third shard. To assign a tag to a
range of shard keys, connect to the mongos instance and use the
sh.addTagRange()(read as s-h dot add tag Range) method. Any given shard key
range may only have one assigned tag. You cannot overlap defined ranges, or tag
the same range more than once. A collection named users in the records database
is sharded by the zipcode field. The following operations assign: • two ranges
of zip codes in Manhattan and Brooklyn, the NYC tag • one range of zip codes in
San Francisco, the SFO tag Note that the shard key range tags are distinct from
the replica set member tags. • Hash-based sharding only supports tag-aware
sharding on an entire collection. • Shard ranges include the lower value and
exclude the upper value. In the next screen, we will discuss how to remove
shard tags.
Remove Shard Tags
Typically, shard tags exist in the shard’s document in a collection of
the config database. To return all shards with a specific tag, use the
operations as given on the screen. This will return only those shards that are
tagged with NYC: You can find tag ranges for all the namespaces in the tags
collection of the config database. The output of sh.status()(read as s-h dot
status) displays all tag ranges. To return all shard key ranges tagged with
NYC, use the following sequence of operations given on the screen: The mongod
facilitates removing a tag range. To delete a tag assignment from a shard key range,
remove the corresponding document from the tags collection of the config
database. Each document in the tags holds the namespace of the sharded
collection and a minimum shard key value. The third example given on the screen
removes the NYC tag assignment for the range of zip codes within Manhattan.
No comments:
Post a Comment