Tuesday 13 February 2018

MongoDB Sharding, Replication in Details

 

Replication

Introduction to Replication

Replication is the process of synchronizing data across multiple servers. Replication provides redundancy and increases data availability with multiple copies of data on different database servers. Replication protects a database from the loss of a single server. Replication also allows you to recover from hardware failure and service interruptions. With additional copies of the data, you can dedicate one to disaster recovery, reporting, or backup.

Why Replication

      To keep your data safe

      High (24*7) availability of data

      Disaster recovery

      No downtime for maintenance (like backups, index rebuilds, compaction)

      Read scaling (extra copies to read from)

      Replica set is transparent to the application

      Replica set. Refers to a group of mongod instances that hold the same data.  The purpose of replication is to ensure high availability, in case one of the servers goes down

 

How Replication Works in MongoDB

MongoDB achieves replication by the use of replica set. A replica set is a group of mongo instances that maintain same data set. Replica sets provide redundancy and high availability. In a replica, one node is primary node that receives all write operations. All other instances, such as secondary, apply operations from the primary so that they have the same data set. Replica set can have only one primary node.

      Replica set is a group of two or more nodes (generally minimum 3 nodes are required).

      In a replica set, one node is primary node and remaining nodes are secondary.

      All data replicates from primary to secondary node.

      At the time of automatic failover or maintenance, election establishes for primary and a new primary node is elected.

      After the recovery of failed node, it again join the replica set and works as a secondary node.

A typical diagram of MongoDB replication is shown in which client application always interact with the primary node and the primary node then replicates the data to the secondary nodes.

Master-Slave Replication

The master-slave replication is the oldest mode of replication that MongoDB supports. In the earlier versions of MongoDB, the master-slave replication was used for failover, backup, and read scaling. However, in the new versions, it is replaced by replica sets for most use cases. Although, MongDB still supports the master-slave replication, replica sets are recommended for new production deployments to replicate data in a cluster. In the next screen, we will discuss replica sets in MongoDB.

Replica Set in MongoDB

A replica set consists of a group of mongod (read as mongo D) instances that host the same data set. In a replica set, the primary mongod receives all write operations and the secondary mongod replicates the operations from the primary and thus both have the same data set. The primary node receives write operations from clients. A replica set can have only one primary and therefore only one member of the replica set can receive write operations. A replica set provides strict consistency for all read operations from the primary. The primary logs any changes or updates to its data sets in its oplog(read as op log). The secondaries also replicate the oplog (read as op log) of the primary and apply all the operations to their data sets. When the primary becomes unavailable, the replica set nominates a secondary as the primary. By default, clients read data from the primary. However, they can also specify read preferences and request read operations to send to the secondaries. Reads from secondaries may return data that does not reflect the state of the primary. The diagram given on the screen depicts that the primary member of the replica set receives all the read and write requests by default. However, read requests can be directly sent to the secondary member of the replica set too. We will continue our discussion on replica sets in MongoDB in the next screen.

You can choose to add an extra mongod instance to a replica set to act as an arbiter. Note that arbiters do not maintain a data set, they only elect the primary. An arbiter is the node that just participates in the election to select the primary node, and it does not store any data and hence does not require a dedicated hardware. Secondary members in a replica set asynchronously apply operations from the primary. By applying operations after the primary, replica sets can function without any secondary members. As a result, all secondary members may not return the updated data to clients. A primary can convert to a secondary or vice versa. However, an arbiter will remain unchanged. We will discuss how a replica set handles automatic failover in the next screen.

Replica Set Features

         A cluster of N nodes

         Any one node can be primary

         All write operations go to primary

         Automatic failover

         Automatic recovery

         Consensus election of primary

Set Up a Replica Set

In this tutorial, We will convert standalone MongoDB instance to a replica set. To convert to replica set, following are the steps −

         Shutdown already running MongoDB

         Start the MongoDB server by specifying -- replSet option. Following is the basic syntax of --replSet −

mongod --port "PORT" --dbpath "YOUR_DB_DATA_PATH" --replSet "REPLICA_SET_INSTANCE_NAME"

Example

mongod --port 27017 --dbpath "D:\set up\mongodb\data" --replSet rs0

         It will start a mongod instance with the name rs0, on port 27017.

         Now start the command prompt and connect to this mongod instance.

         In Mongo client, issue the command rs.initiate() to initiate a new replica set.

         To check the replica set configuration, issue the command rs.conf(). To check the status of replica set issue the command rs.status().

 Automatic Failover

When the primary node of a replica set stops communicating with other members for more than 10 seconds or fails, the replica set selects another member as the new primary. The selection of the new primary happens through an election process and whichever secondary node gets majority of the votes becomes the primary. A replica set supports application needs in various ways. For example, you may deploy a replica set in multiple data centers, or manipulate the primary election by adjusting the priority of members. In addition, replica sets support dedicated members for functions, such as reporting, disaster recovery, or backup. In the next screen, we will discuss replica set members.

 Replica Set Members

In addition to the primary and secondaries, a replica set can also have an arbiter. Unlike secondaries, arbiters do not replicate or store data. However, arbiters play a crucial role in selecting a secondary to take the place of the primary when the primary becomes unavailable. Therefore, a typical replica set contains a primary, a secondary, and an arbiter. Typically, most of the replica set deployments keep three members that store data, one primary and two secondaries. A replica set in MongoDB version 3.0. (read as three point O ) can have up to 50 members but only 7 members are capable of voting. In the previous versions of MongoDB, replica sets could have maximum 12 members. In the next screen, we will discuss riority zero replica set members

Priority 0 Replica Set Members

A priority zero member in a replica set is a secondary member that cannot become the primary. These members can act as normal secondaries, but cannot trigger any election. A priority zero member performs functions, such as maintaining data set copies, accepting and performing read operations, and electing the primary. By configuring a priority zero member, you can prevent secondaries from becoming the primary. This is particularly useful in multi-data center deployments. In a replica set containing three members, one data center hosts both, the primary and a secondary, and the second data center hosts one priority zero member. Typically, a priority zero member acts as a backup. For example, in some replica sets, a new member may not be able to add immediately. The backup or standby member stores the updated data and can immediately replace an unavailable member. In the next screen, we will discuss hidden replica set members.

Hidden Replica Set Members

Hidden members of a replica set are invisible to the client applications. They store a copy of the primary’s data and are good for workloads with different usage patterns from the other members in the replica set. Although the hidden replica set members are priority zero members and can never replace the primary, they may elect the primary. In the five-member replica set shown on the screen, all the four secondary members contain the primary’ data set copies and one hidden secondary member. Typically, the clients of read operations do not distribute appropriate read rights to the hidden members. Therefore, the hidden members receive only the basic replication traffic. You can utilize the hidden members for dedicated functions like reporting and backups. In a sharded cluster, mongos (read as mongo s) do not interact with the hidden replica set members In the next screen we will discuss delayed replica set members.

 Delayed Replica Set Members

Delayed replica set members are those secondaries that copy data from the primary node’s oplog file after a certain interval or delay. Delayed replica set members store their data set copies. However, they reflect a previous version, or a delayed state of the set. For example, if the data shown in the primary is for 10 AM then the delayed member will show data for 9 AM. Delayed members perform a “roll backup” or run a “historical” snapshot of the data set. Therefore, they help you manage various human errors and recover from errors, such as unsuccessful application upgrade, and dropped databases and collections. A delayed member: • Must be a priority zero member • Must be hidden and not visible to applications and • Must participate in electing the primary In the next screen, we will discuss the configuration settings of delayed replica set members

You can configure a delayed secondary member with the following settings: • Priority value - zero, • hidden value - true, and • slaveDelay value —number of seconds to delay You can set a 1-hour delay on a secondary member currently at the index 0 in the members array. To set the delay, issue the sequence of operations given on the screen in a mongo shell connected to the primary. After a replica set reconfigures, the delayed secondary member cannot replace the primary and is hidden from applications. The slaveDelay value delays both replication and the member’s oplog by 3600 seconds or 1 hour. In the next screen, we will view a demo on how to start a replica set.

Write Concern

The write concern provides an assurance of the success of a write operation. The strength or weakness of a write concern determines the level of assurance. For operations having weak concerns, such as updates, inserts, and deletes, the write operations return a quick result. Write operations having weak write concerns may fail. When stronger write concerns are used in an operation, clients wait after sending the operation for MongoDB to confirm their success. MongoDB provides different levels of write concerns to address the varied application needs. Clients may manage write concerns to ensure that the critical operations are successfully carried out on the MongoDB deployment. For less critical operations, clients can manage the write concern to ensure faster performance rather than ensure persistence to the entire deployment. Typically, regardless of the write concern level or journaling configuration, MongoDB allows clients to read the inserted or modified documents before committing the modifications to the disk. As a result, if an application has multiple readers and writers, MongoDB allows a reader to read the data of another writer even when the data is still not committed to the journal. We will continue our discussion on Write Concern in the next slide.

MongoDB modifies each document of a collection separately. For multi-document operations, MongoDB does not provide any multi-document transactions or isolation. When a standalone mongod returns a successful journaled write concern, the data is stored to the disk and becomes available after mongod restarts. Write operations in a replica set can become durable only after a write data gets replicated and committed to the journal on majority of the voting members of a replica set. MongoDB periodically commits data to the journal as defined by the commitIntervalMs (read as commit interval millisecond) parameter. In the next screen, we will discuss write concern levels.

Write Concern Levels

MongoDB has two levels of conceptual write concern—unacknowledged and acknowledged. In an unacknowledged write concern, MongoDB does not acknowledge the received write operations or ignores it. This is similar to ignored errors. However, the driver is capable of receiving and handling network errors whenever possible. The driver’s ability to detect and handle network errors depend on the networking configuration of the system. In an acknowledged write concern, the mongod confirms the receipt of the write operation. It also confirms that the changes to the in-memory data view are applied. An acknowledged write concern can detect network, duplicate key, and other errors. By default, MongoDB uses the acknowledged write concern in the driver. The mongo shell includes the write concern in the write methods and provides the default write concern whether it runs interactively or in a script. An acknowledged write concern does not confirm that the write operation has persisted to the disk system. Note that journaling must be enabled to use a write concern. In the next screen, we will discuss the write concern for a replica set.

Write Concern for a Replica Set

Replica sets have additional considerations for write concerns. Typically, the default write concerns in a replica set require acknowledgement only from the primary member. A write concern acknowledged by a replica set ensures that the write operation spreads to the other members of the set. The default write concern confirms write operations only for the primary. However, you can choose to override this write concern to check write operations on some replica set members. You can override the default write concern by specifying a write concern for each write operation. For example, the method given on the screen contains a write concern that specifies that the method should return a response only after the write operation is spread to the primary and minimum one secondary, or the method times out after 5 seconds. In the next screen, we will discuss how to modify default write concerns.

Modify Default Write Concern

You can modify a default write concern by changing the “getLastErrorDefaults” (read as get last error defaults) setting in the replica set configuration. This write operation will complete most of the voting members before returning the result. The sequence of commands given on the screen creates a configuration that waits for the write operation to complete. Additionally, you can include a timeout threshold for a write concern to prevent blocking write operations. For example, a write concern must be acknowledged by four members of a replica set. However, only three members are available in the replica set. The operation blocks until all the four members are available. You can prevent the operation from blocking by adding a timeout. In the next screen, we will discuss Read Preferences.

Read Preference

Read preferences define how MongoDB clients route read operations to replica set members. By default, a client directs its read operations to the primary. Therefore, reading from the primary member means that you have the latest version of a document. However, you can distribute reads among the secondary members to improve the read performance or reduce latency for applications that do not require the up-to-date data The typical use cases for using non-primary read preferences are as follows. • Running systems operations that do not affect the front-end application. • Providing local reads for the geographically distributed applications. • Maintaining availability during a failover. You can use Mongo.setReadPref() (read as mongo dot set read preference method) to set a read preference. We will discuss the read preference modes in the next screen.

Read Preference Modes

The read preference modes supported in a replica set are as follows. Primary: This is the default read preference mode. All operations read from the current replica set are primary. PrimaryPreferred: Operations are mostly read from the primary member. However, when the primary is unavailable, operations are read from a secondary member. Secondary: All operations are read from secondary members of the replica set. secondaryPreferred: In most situations, operations are read from a secondary, but when a secondary member is unavailable, operations are read from the primary. nearest: Operations are read from a member of the replica set with the least network latency, irrespective of the member’s type. In the next screen, we will discuss blocking for replication.

Blocking for Replication

The getLastError (read as get last error) command in MongoDB ensures how the up-to-date replication is done by using the optional "w" parameter. In the query shown on the screen, the getLastError command is run that blocks until at least N number of servers have replicated the last write operation. The query will return immediately if N is either not present or is less than two. If N equals to two, the master will respond to the command only after one slave replicates the last operation. Note that the master is included in N. The master uses the "syncedTo" information stored in local slaves to identify how up-to-date each slave is. When specifying "w", the getLastError command takes an additional parameter, "wtimeout"(read as w- timeout) —a timeout in milliseconds. This allows the getLastError command to time out and return an error before the last operation has replicated to N servers. However, by default the command has no timeout. Blocking the replication significantly slows down write operations, particularly for large values of "w". Setting "w" to two or three for important operations yields a good combination of efficiency and safety. In the next screen, we will discuss tag sets.

Tag Set

Tag sets allow tagging target read operations to select the replica set members. Customized read preferences and write concerns assess tag sets differently. Typically, read preferences stresses on the tag value when selecting a replica set member to read from. Write concerns on the other hand, ignore the tag value when selecting a member. However, they confirm whether the value is unique. You can specify tag sets with the following read preference modes: • primaryPreferred • secondary • secondaryPreferred and • nearest Remember, tags and the primary mode are not compatible and, in general, tags apply only when selecting a secondary member of a read operation. However, tags are compatible with the nearest mode. When combined together, the nearest mode selects the matching member, primary or secondary, with the lowest network latency. Typically, all interfaces use the same member selection logic to choose the member to direct read operations, and base the choice on the read preference mode and tag sets. In the next screen, we will discuss how to configure tag sets for replica sets.

Configure Tag Sets for Replica set

Using tag sets, you can customize write concerns and read preferences in a replica set. MongoDB stores tag sets in the replica set configuration object, which is the document returned by rs.conf()(read as R-S dot conf), in the members.tags (read as members dot tags) embedded document. When using either write concern or read scaling, you may want to have some control over which secondaries receive writes or reads. For example, you have deployed a five-node replica set across two data centers, NYK (read as New York) and LON (read as London). New York is the primary data center that contains three nodes. The secondary data center, London contains two. Suppose you want to use a write concern to block until a certain write is replicated to at least one node in the London data center. You cannot use a “w” value of majority, because this will translate into a value of 3, and the most likely scenario is that the three nodes in New York will acknowledge first. Replica set tagging resolves such problems by allowing you to define special write concern modes that target the replica set members with defined tags. To see how this works, you first need to learn how to tag a replica set member. In the config document, each member can have a key called tags pointing to an object containing key-value pairs. You can add tag sets to the members of a replica set with the following sequence of commands in the mongo shell as given on the screen. Modes can be defined as a document with one or more keys in the “getLastErrorModes” command. Here, in this command, you have defined a mode containing 2 keys, “dc” and “rackNYK” (read as rack new york) whose values are integers 2. These integers indicate that the number of the tagged value must be satisfied in order to complete the getlastError command. In the next screen, we will discuss Replica Set Deployment.

Replica Set Deployment Strategies

A replica set architecture impacts the set’s capability. In a standard replica set for a production system, a three-member replica set is deployed. These sets provide redundancy and fault tolerance, avoid complexity, but let your application requirements define an architecture. Use the following deployment strategies for a replica set. Deploy an Odd Number of Member: For electing the primary member, you need an odd number of members in a replica set. Add an arbiter in case a replica set has even number of members. Consider Fault Tolerance: Fault tolerance for a replica set is the number of members that can become unavailable and still leave enough members in the set to elect the primary. A replica set cannot accept any write operation if the primary member is not available. Remember that adding a new member to a replica set does not always increase its fault tolerance capability. However, the additional members added in a replica set can provide support for some dedicated functions, such as backups or reporting. Use Hidden and Delayed Members: Add hidden or delayed members to support dedicated functions, such as backup or reporting. Load Balance on Read-Heavy Deployments: In deployments with high read traffic, to improve read performance, distribute reads to secondary members. As your deployment grows, add or move members to alternate data centers and improve redundancy and availability. However, ensure that the main facility is able to elect the primary. In the next screen, we will continue with the Replica Set Deployment strategy.

Replica Set Deployment Strategies (contd.)

Some more replica set deployment strategies are given below. Add Capacity Ahead of Demand: You must be able to add a new member to an existing replica set to meet the increased data demands. Add new members before new demands arise. Distribute Members Geographically: For data safety in case of any failure, keep at least one member in an alternate data center. Also set the priorities of these members to zero so that they do not become the primary. Keep a Majority of Members in One Location When replica set members are distributed across multiple locations, network partition may hinder the communication between the data centers, and affect data replication. When electing the primary, all the members must be able to see each other to create a majority. To enable the replica set members create a majority and elect the primary, ensure that majority of the members are in one location. Use replica set tag sets: This ensures that all operations are replicated at specific data centers. Using tag sets helps to route read operations to specific computers. Use Journaling to Protect Against Power Failures To avoid the data loss problem, use journaling so that data can be safely written on a disk in case of shutdowns, power failure, and other unexpected failures. In the next screen, we will discuss replica set deployment patterns.

Replica Set Deployment Patterns

The common deployment patterns for a replica set are as follows: • Three Member Replica Sets: This is the minimum recommended architecture for a replica set. • Replica Sets with Four or More Members: This set provides greater redundancy and supports greater distribution of read operations and dedicated functionality. • Geographically Distributed Replica Sets: These include members in multiple locations to protect data against facility-specific failures, such as power outages. In the next screen, we will discuss the oplog file.

Oplog File

The record of operations maintained by the master server is called the operation log or oplog (read as op log). The oplog is stored in a database called local, in the “oplog.$main” (read as op log dot main) collection. Each oplog document denotes a single operation performed on the master server and contains the following keys: Timestamp or TS for the operation: It is an internal function used to track operations. It contains a 4-byte timestamp and a 4-byte incrementing counter. Op: This is the type of operation performed as a 1-byte code, for example, “i” for an insert. Namespace or Ns is the collection name where the operation is performed. O: This key is used to document further specifying the operation to perform. For an insert, this would be the document to insert. The oplog stores only those operations that change the state of the database. The oplog is intended only as a mechanism for keeping the data on slaves in sync with the master. The oplog files do not exactly store the operations that are performed on the master rather the operation gets transformed and then is stored into the oplog file. In the next screen, we will discuss replication state and local database.

Replication State and Local Database

MongoDB maintains a local database called “local” to keep the information about the replication state and the list of master and slaves. The content of this database does not get replicated and remains local to the master and slaves. This list is stored in the slave’s collection. Slaves store the replication information in the local database. The unique slave identifier gets saved in the “me” collection and the list of masters gets saved in “sources” collection. The master and slave both use the timestamp stored in the “syncedTo” command to understand how up-to-date a slave is. A slave uses “synced to” to query the oplog for new operations and find out if any operation is out of sync. In the next screen, we will discuss Replication Administration.

Replication Administration

MongoDB inspects the replication status using some administrative helpers. To check the replication status, use the function given on the screen when connected to the master. This function provides information on the oplog size and the date ranges of operations contained in the oplog. In the given example, the oplog size is 10 megabyte and can accommodate about 30 seconds of operations. In such a case, you should increase the size of the oplog. You can compute the log length by determining the time difference between the first and the last operation. In case the server has recently started, then the first logged operation will be relatively recent. Hence, the log length will be small, even though the oplog may have some free space available. The log length serves as a metric for servers that have been operational long enough for the oplog to “roll over.” You can also get some information when connected to the slave, using the functions given on the screen. This function will populate a list of sources for a slave, each displaying information such as how far behind it is from the master. In the next screen, we will view a demo on how to check the status of a replica set.

Demo—

Add Members to Replica Set

To add members to replica set, start mongod instances on multiple machines. Now start a mongo client and issue a command rs.add().

Syntax

The basic syntax of rs.add() command is as follows −

>rs.add(HOST_NAME:PORT)

Example

Suppose your mongod instance name is mongod1.net and it is running on port 27017. To add this instance to replica set, issue the command rs.add()in Mongo client.

>rs.add("mongod1.net:27017") >

You can add mongod instance to replica set only when you are connected to primary node. To check whether you are connected to primary or not, issue the command db.isMaster() in mongo client.

Replica set replication has a limitation on the number of members. Prior to version 3.0, the limit was 12 but this has been changed to 50 in version 3.0. So now replica set replication can have maximum of 50 members only, and at any given point of time in a 50-member replica set, only 7 can participate in a vote

 

Sharding

 

Sharding is the process of distributing data across multiple servers for storage. MongoDB uses sharding to manage massive data growth. With an increase in the data size, a single machine may not be able to store data or provide an acceptable read and write throughput. Sharding supports horizontal scaling and thus is capable of distributing data across multiple machines. Sharding allows you to add more servers to your database to support data growth and automatically balances data and load across various servers. Sharding provides additional write capacity by distributing the write load over a number of mongod instances. It splits the data set and distributes them across multiple databases, or shards. Each shard serves as an independent database, and together, shards make a single logical database. Sharding reduces the number of operations each shard handles and as a cluster grows, each shard handles fewer operations and stores lesser data. As a result, a cluster can increase its capacity and input horizontally. For example, to insert data into a particular record, the application needs to access only the shard that holds the record. If a database has a 1 terabyte data set distributed amongst 4 shards, then each shard may hold only 256 Giga Byte of data. If the database contains 40 shards, then each shard will hold only 25 Giga Byte of data In the next screen, we will discuss when sharding should be used.

 When to Use Sharding?

Typically, sharded clusters require a proper infrastructure setup. This increases the overall complexity of the deployment. Therefore, consider deploying sharded clusters only when there is an application or operational requirement. You must consider deploying a sharded cluster when your system shows the following characteristics: • The data set outgrows the storage capacity of a single MongoDB instance. • The size of the active working set exceeds the capacity of the maximum available RAM. • A single MongoDB instance is unable to manage write operations. In the absence of these characteristics, sharding will not benefit your system, rather, it will add complexity. Deploying sharding consumes time and resources. In case your database system has already surpassed its capacity, deploying sharding without impacting the application is not possible. Therefore, deploy sharding if you expect that the read and write operations are going to be increased in future. In the next screen, we will discuss what a shard is.

Sharding Components

You will next look at the components that enable sharding in MongoDB. Sharding is enabled in MongoDB via sharded clusters.

The following are the components of a sharded cluster:

•        Shards

•        mongos

•        Config servers

The shard is the component where the actual data is stored. For the sharded cluster, it holds a subset of data and can either be a mongod or a replica set. All shard’s data combined together forms the complete dataset for the sharded cluster.

Sharding is enabled per collection basis, so there might be collections that are not sharded. In every sharded cluster there’s a primary shard where all the unsharded collections are placed in addition to the sharded collection data.

Config servers are special mongods that hold the sharded cluster’s metadata. This metadata depicts the sharded system state and organization.

The config server stores data for a single sharded cluster. The config servers should be available for the proper functioning of the cluster.

One config server can lead to a cluster’s single point of failure. For production deployment it’s recommended to have at least three config servers, so that the cluster keeps functioning even if one config server is not accessible.

A config server stores the data in the config database, which enables routing of the client requests to the respective data. This database should not be updated.

MongoDB writes data to the config server only when the data distribution has changed for balancing the cluster.

The mongos act as the routers. They are responsible for routing the read and write request from the application to the shards.

An application interacting with a mongo database need not worry about how the data is stored internally on the shards. For them, it’s transparent because it’s only the mongos they interact with. The mongos, in turn, route the reads and writes to the shards.

The mongos cache the metadata from config server so that for every read and write request they don’t overburden the config server.

However, in the following cases, the data is read from the config server :

•        Either an existing mongos has restarted or a new mongos has started for the first time.

•        Migration of chunks. We will explain chunk migration in detail later.

There are three main components −

Shards − Shards are used to store data. They provide high availability and data consistency. In production environment, each shard is a separate replica set.

Config Servers − Config servers store the cluster's metadata. This data contains a mapping of the cluster's data set to the shards. The query router uses this metadata to target operations to specific shards. In production environment, sharded clusters have exactly 3 config servers.

Query Routers − Query routers are basically mongo instances, interface with client applications and direct operations to the appropriate shard. The query router processes and targets the operations to shards and then returns results to the clients. A sharded cluster can contain more than one query router to divide the client request load. A client sends requests to one query router. Generally, a sharded cluster have many query routers.

 What is a Shard?

A shard is a replica set or a single mongod instance that holds the data subset used in a sharded cluster. Shards hold the entire data set for a cluster. Each shard is a replica set that provides redundancy and high availability for the data it holds. MongoDB shards data on a per collection basis and lets you access the sharded data through mongos instances. If you directly connect to a shard, you will be able to view only a fraction of the data contained in a cluster. Data is not organized in any particular order. In addition, MongoDB does not guarantee that any two contiguous data chunk will reside on any particular shard. Note that every database contains a “primary” shard that holds all the un-sharded collections in that database. In the next screen, we will discuss a shard key.

What is a Shard Key

When deploying sharding, you need to choose a key from a collection and split the data using the key’s value. This key is called a shard key that determines how to distribute the documents of a collection among the different shards in a cluster. The shard key is a field that exists in every document in the collection and can be an indexed or indexed compound field. . MongoDB performs data partitions in a collection using the different ranges or chunks of shard key values. Each range or chunk defines a non-overlapping range of shard key values. MongoDB distributes chunks, and their documents, among the shards in a cluster. MongoDB also distributes documents according to the range of values in the shard key. In the next screen, we will discuss how to choose a shard key.

Choosing a Shard Key

To enhance and optimize the performance, functioning and capability of your database, you need to choose the correct shard key. Choosing the appropriate shard key depends on two factors—the schema of your data and the way applications in your database query and perform write operations. In the next screen, we will discuss the characteristics of an ideal shard key.

Ideal Shard Key

An ideal shard key must have the following characteristics: Must be easily divisible— an easily divisible shard key enables MongoDB to perform data distribution among shards. If shard keys contain limited number of possible values, then the chunks in shards cannot be split. For example, if a chunk or a range represents a single shard key value, then the chunk cannot be split even if it exceeds the recommended size. High Degree of Randomness—A shard key must possess a high degree of randomness. This ensures that a single shard distributes write operations among the cluster and does not become a bottleneck. Target a Single Shard: A shard key must target a single shard to enable the mongos program to return most of the query operations directly from a single mongod instance. In addition, the shard key should be the primary field in queries. Fields having a high degree of “randomness” cannot target operations to specific shards. Use a Compound Shard Key: If an existing field in your collection is not the ideal key, compute a special purpose shard key or use a compound shard key. In the next screen, we will discuss range-based sharding.

Range-Based Shard Key

In range-based sharding, MongoDB divides data sets into different ranges based on the values of shard keys. Thus it provides range-based partitioning. For example, consider a numeric shard key. If an imaginary number line goes from negative infinity to positive infinity, each shard key value falls at some point on that line. MongoDB partitions this line into chunks where each chunk can have a range of values. In range-based sharding, documents having “close” shard key values reside in the same chunk and shard. Range-based partitioning supports range queries because for a given range query of a shard key, the query router can easily find which shards contain those chunks. Data distribution in range-based partitioning can be uneven, which may negate some benefits of sharding. For example, if a shard key field size increases linearly, such as time, then all requests for a given time range will map to the same chunk and shard. In such cases, a small set of shards may receive most of the requests and the system would fail to scale. In the next screen, we will discuss Hash based sharding.

Hash-Based Sharding

For hash based partitioning, MongoDB first calculates the hash of a field’s value, and then creates chunks using those hashes. Unlike range-based partitioning, in hash based partitioning, documents with “close” shard key values may not reside in the same chunk. This ensures that the collection in a cluster is randomly distributed. In hash based partitioning, data is evenly distributed. Hashed key values randomly distribute data across chunks and shards. The random distribution of data across chunks using hash based partitioning makes range query on the shard key ineffective, as data will be distributed to many shards rather than a few number of shards as in the case of range based partitioning . In the next screen, we will discuss the impact of shard key cluster operation.

Impact of Shard Keys on Cluster Operation

Some shard keys are capable of scaling write operations. Typically, a computed shard key that possess “randomness,” to some extent allows a cluster to scale write operations. For example, hash keys that include a cryptographic hash, such as MD5 or SHA1 (read asMD Five or S-H-A one) can scale write operations. However, random shard keys do not provide query isolation. To improve write scaling, MongoDB enables sharding a collection on a hashed index. MongoDB improves write scaling using two methods, querying and query isolation. Querying: A mongos instance provides an interface to enable applications to interact with sharded clusters. When mongos receives queries from client applications, it uses metadata from the config server and routs queries to the mongod instances. mongos makes querying operational in sharded environments. However, the shard key you select can impact the query performance. Query Isolation: Query execution will be fast and efficient if mongos can route to a single shard using a shard key and metadata stored from the config server. Queries that require routed to many shared will not be efficient as they will have to wait for a response from all of those shards. If your query contains the first component of a compound shard key then mongos can route a query to a single or the minimum number of shards, which provides good performance. In the next screen, we will discuss production cluster architecture.

Production Cluster Architecture

Special mongod instances, such as config servers store metadata for a sharded cluster. Config servers provide consistency and reliability using a two-phase commit. Config servers do not run as replica sets and must be available to deploy a sharded cluster or to make changes to a cluster metadata. The production sharded cluster shown in the image on the screen has the following three config servers. Each config server must be on separate machines. A single sharded cluster must have an exclusive use of its config servers. If you have multiple sharded clusters, you will need to have a group of config servers for each cluster. Config Database: Config servers store the metadata in the config databases. The mongos instance caches this data and uses it to route the reads and writes to shards. MongoDB performs writes operations in the config server only in the following cases: • To split the existing chunks and • To migrate a chunk between shards. MongoDB reads data from the config server when: • A new mongos starts for the first time, or an existing mongos restarts. • After a chunk migration, the mongos instances update themselves with the new cluster metadata. MongoDB also uses the config server to manage the distributed locks. In the next screen, we will discuss Config server availability.

Config Server Availability

When one or more config servers are unavailable, the metadata of the cluster becomes read-only. You can still read and write data from shards. However chunk migrations occur only when all the three servers are available. If all three config servers are unavailable, you can still use the cluster provided the mongos instances do not restart. If the mongos instances are restarted before the config servers are available, the mongos become unable to route reads and writes. Clusters are inoperable without the cluster metadata. Therefore, ensure that the config servers remain available and intact throughout. Query routers are the mongos processes that interface with the client applications and direct queries to the appropriate shard or shards. All the queries from client applications are processed by the query router and in the sharded cluster it is recommended to use 3 query routers. In the next screen, we will discuss production cluster deployment.

Production Cluster Deployment

When deploying a production cluster, ensure data redundancy and systems availability. A production cluster must have the following components: Config Servers: there are three config servers. Each of the config servers must be hosted on separate machines. Each single sharded cluster must have an exclusive use of its config servers. For multiple sharded clusters, you must have a group of config servers for each cluster. Shards: A production cluster must have two or more replica sets or shards. Query Routers or mongos: A production cluster must have one or more mongos instances. The mongos instances act as the routers for the cluster. Production cluster deployments typically have one mongos instance on each application server. You may deploy more than one mongos instances and also use a proxy or load balancer between the application and mongos. When making these deployments, configure the load balancer to enable a connection from a single client reach the same mongos. Cursors and other resources are specific to a single mongos instance, and hence each client must interact with only one mongos instance. In the next screen, we will discuss how to deploy a sharded cluster.

 Deploy a Sharded Cluster

To deploy a sharded cluster, perform the following sequence of tasks: Create data directories for each of the three config server instances. By default, a config server stores its data files in the /data/configdb directory (read as slash data slash config D-B directory). You can also choose an alternate location to store the data. To create a data directory, issue the command as shown on the screen. Start each config server by issuing a command using the syntax given on the screen. The default port for config servers is 27019 (read as 2-7-0-1-9). You can also specify a different port. The example given on the screen starts a config server using the default port and default data directory. By default, a mongos instance runs on the port 27017. To start a mongos instance, issue a command using the syntax given on the screen: For example, to start a mongos that connects to a config server instance running on the following hosts and on the default ports, issue the last command given on the screen. In the next screen, we will discuss how to add shards to a cluster.

Add Shards to a Cluster

To add shards to a cluster, perform the following steps. Step 1. From a mongo shell, connect to the mongos instance and issue a command using the syntax given on the screen. If a mongos is accessible at node1.example.net (read as node one dot example dot net) on the port 27017, (read as 2-7-0-1-7) issue the following command given on the screen: Step 2. Add each shard to the cluster using the sh.addShard() (read as Add shard method on the sh console ) method, as shown in the examples given on the screen. Next, issue the sh.addShard() method separately for each shard. If a shard is a replica set, specify the name of the replica set and specify a member of the set. In production deployments, all shards should be replica sets. To add a shard for a replica set named rs1 (read as RS one) with a member running on the port 27017 on mongodb0.example.net(read as mongo D-B zero dot example dot net), issue the following command given on the screen. To add a shard for a standalone mongod on the port 27017 of mongodb0.example.net, issue the following command given on the screen: In the next screen, we will view a demo on how to create a sharded cluster in MongoDB.

Enable Sharding for Database

Before you start sharding a collection, first enable sharding for the database of the collection. When you enable sharding for a database, it does not redistribute data but allows sharding the collections in that database. MongoDB assigns a primary shard for that database where all the data is stored before sharding begins. To enable sharding, perform the following steps. Step 1: From a mongo shell, connect to the mongos instance. Issue a command using the syntax given on the screen. Step 2. Issue the sh.enableSharding() (read as Sh dot enable sharding) method, specify the name of the database for which you want to enable sharding. Use the syntax given on the screen: Optionally, you can enable sharding for a database using the “enableSharding” command. For this, use the syntax given on the screen. In the next screen, we will discuss how to enable sharding for a collection.

Enable Sharding for Collection

You can enable sharding on a per-collection basis. To enable sharding for a collection, perform the following steps. Step 1: Determine the shard key value. The selected shard key value impacts the efficiency of sharding. Step 2: If the collection already contains data, create an index on the shard key using the createIndex() method. If the collection is empty then MongoDB will create the index as a part of the sh.shardCollection()(read as S-H dot shard collection) step. Step 3. To enable sharding for a collection, open the mongo shell and issue the sh.shardCollection()(read as S-H dot shard collection) method. The method uses the syntax given on the screen. We will continue with our discussion on enabling sharding for a collection.

Enable Sharding for Collection (contd.)

To enable sharding for a collection replace the string . (read as database dot collection) with the full namespace of your database. This string consists of the name of your database, a dot, and the full name of the collection. The shard-key-pattern represents your shard key, which you specify in the same form as you would an index key pattern. This example given on the screen shows sharding collections based on the partition key. In the next screen, we will discuss maintaining a balanced data distribution.

 Maintaining a Balanced Data Distribution

When you add new data or new servers to a cluster, it may impact the data distribution balances within the cluster. For example, some shards may contain more chunks than another shard or the chunk sizes may vary and some can be significantly larger than other chunks. MongoDB uses two background processes, splitting and a balancer to ensure a balanced cluster. In the next screen, we will discuss how MongoDB uses splitting to ensure a balanced cluster.

Splitting

Splitting is a background process that checks the chunk sizes. When a chunk exceeds a specified size, MongoDB splits the chunk into two halves. MongoDB uses the insert and update functions to trigger a split. When creating splits, MongoDB does not perform any data migration or affect the shards. When you split, the chunk distribution may uneven a collection across the shards. In such cases, the mongos instances will redistribute the chunks across shards. In the next screen, we will discuss Chunk Size

Chunk Size

In MongoDB, the default chunk size is 64 megabytes. You can alter a chunk size. However, this may impact the efficiency of the cluster. Smaller chunks enable an even distribution of data but may result in frequent migrations. This creates expense at the query routing or mongos layer. Larger chunks encourage fewer migrations and are more efficient from the networking perspective and in terms of internal overhead at the query routing layer. However, this may lead to an uneven distribution of data. The chunk size affects the maximum number of documents per chunk to migrate. If you have many deployments to make, you can use larger chunks. This will help avoid frequent and unnecessary migrations although the data distribution will be uneven. When you split chunks, changing the chunk size affects with the following limitations. • Automatic splitting occurs only during inserts or updates. When you reduce the chunk size, all chunks may take time to split to the new size. • Once you split chunks, they cannot be “undone”. When you increase the chunk size, the existing chunks must grow through inserts or updates until they reach the new size. In the next screen, we will discuss special chunk types.

Special Chunk Type

Following are the special chunk types. Jumbo Chunks MongoDB does not migrate a chunk if its size is more than the specified size or if the number of documents contained in the chunk exceeds the Maximum Number of Documents per Chunk to Migrate. In such situations, MongoDB splits the chunk. If the chunk cannot be split, MongoDB labels the chunk as jumbo. This avoids repeated attempts to migrate the same chunk.

Shard Balancing

Sometimes, chunks can grow beyond the specified size but cannot be split. For example, when a chunk represents a single shard key value. Invisible chunks are those chunks that cannot be broken by the split process. In the next screen, we will discuss shard balancing.

Shard Balancing (contd.)

Chunk migrations carry bandwidth and workload overheads, which may impact the database performance. Shard balancing minimizes the impact by: • Moving only one chunk at a time. • Starting a balancing round only when the difference between the greatest number of chunks for a sharded collection and the shard with the lowest number of chunks for that collection reaches the migration threshold. In some scenarios, you may want to disable the balancer temporarily such as for maintenance of MongoDB or during the peak load time to prevent impacting the performance of MongoDB. In the next screen, we will discuss customized data distribution.

Customized Data Distribution with Tag Aware Sharding

A MongoDB administrator can control sharding by using tags. Tags help control the balancer behaviour and chunks distribution in a cluster. As an administrator you can create tags with the ranges of shard keys and assign them those to the shards. The balancer process migrates the tagged data to the shards that are assigned for those tags. Tag aware sharding helps you improve the locality of data for a sharded cluster. In the next screen, we will discuss Tag Aware Sharding

Tag Aware Sharding

In mongoDB, you can create tags for a range of shard keys to associate those ranges to a group of shards. Those shards receive all inserts within that tagged range. The balancer that moves chunks from one shard to another always obeys these tagged ranges. The balancer moves or keeps a specific subset of the data on a specific set of shards and ensures that the most relevant data resides on the shard which is geographically closer to the client\application server. In the next screen, we will discuss how to add shard tags.

Add Shard Tags

When connected to a mongos instance, use the sh.addShardTag() (read as s-h dot add shard tag) method to associate tags with a particular shard. A single shard may have multiple tags whereas multiple shards may have a common tag. Consider the example given on the screen. This adds the tag NYC to two shards, and adds the tags SFO and NRT to a third shard. To assign a tag to a range of shard keys, connect to the mongos instance and use the sh.addTagRange()(read as s-h dot add tag Range) method. Any given shard key range may only have one assigned tag. You cannot overlap defined ranges, or tag the same range more than once. A collection named users in the records database is sharded by the zipcode field. The following operations assign: • two ranges of zip codes in Manhattan and Brooklyn, the NYC tag • one range of zip codes in San Francisco, the SFO tag Note that the shard key range tags are distinct from the replica set member tags. • Hash-based sharding only supports tag-aware sharding on an entire collection. • Shard ranges include the lower value and exclude the upper value. In the next screen, we will discuss how to remove shard tags.

Remove Shard Tags

Typically, shard tags exist in the shard’s document in a collection of the config database. To return all shards with a specific tag, use the operations as given on the screen. This will return only those shards that are tagged with NYC: You can find tag ranges for all the namespaces in the tags collection of the config database. The output of sh.status()(read as s-h dot status) displays all tag ranges. To return all shard key ranges tagged with NYC, use the following sequence of operations given on the screen: The mongod facilitates removing a tag range. To delete a tag assignment from a shard key range, remove the corresponding document from the tags collection of the config database. Each document in the tags holds the namespace of the sharded collection and a minimum shard key value. The third example given on the screen removes the NYC tag assignment for the range of zip codes within Manhattan.

 

No comments:

Post a Comment