11. MongoDB Limitations
Shakuntala Gupta Edward1
and Navin Sabharwal2
11.1 MongoDB Space Is Too Large (Applicable for MMAPv1)
Let’s start with the issue of disk
space. MongoDB (with storage engine MMAPv1 ) space is too large; in
other words, the data directory files are larger than the
database’s actual data.
This is because of preallocated data
files. This is by design in order to prevent file system
fragmentation.
The files in the data directory are named
as <dbname>.0, <dbname>.1 and so on. The size of the
first file as allocated by the mongod is 64MB; all subsequent file
sizes increase by factor of 2, so the second file will 128MB, the
third file will be 256MB, and so on until it reaches 2GB, post
which all files will be 2GB in size. Though the space is allocated
to the data files while creation, there might be files that are 90%
empty. This unused allocated space is mostly small for larger
databases.
-
MongoDB pre-allocates 3GB of data
for journaling, which is over and above the actual database
size(s), making it not fit for small installations. The
workaround available for this is to use –smallflags in your
command line flags or /etc/mongod.conf files until you are
running in an environment where you have the required disk space.
But this feature makes it not fit for small installations.
11.2 Memory Issues (Applicable for Storage Engine MMAPv1)
11.3 32-bit vs. 64-bit
11.4 BSON Documents
This section covers the limitations of
BSON documents .
-
Field names: If you store 1,000
documents with the key “col1”, the key is stored that many
times in the data set. Although arbitrary documents are supported
in MongoDB, in practice most of the field names are the same.
Keeping short field names is considered a good practice for
optimizing the usage of space.
11.8 Sharding Limitations
Sharding is the mechanism of splitting
data across shards. The following sections talk about the
limitations that you need to be aware of when dealing with
sharding.
11.8.1 Shard Early to Avoid Any Issues
11.8.2 Shard Key Can’t Be Updated
The shard key can’t be updated once the
document is inserted in the collection because MongoDB uses shard
keys to determine to which shard the document should be routed. If
you want to change the shard key of a document, the suggested
solution is to remove the document and reinsert the document when
he change has been made.
11.8.4 Select the Correct Shard Key
It’s very important to choose a correct
shard key because once the key is chosen it’s not easy to
correct it.
What’s
considered a wrong shard key depends completely on the
application. Say the application is a news feed; choosing a
timestamp field as a shard key would be a wrong shard key because
this will end up inserting, querying, and migrating data from one
shard only, and not from the complete cluster. If you need to
correct the shard key, the process that is commonly used is to
dump and restore the collection.
11.9 Security Limitations
Security is an important matter when it
comes to databases. Let’s look at MongoDB limitations from
security perspective.
11.9.2 Traffic to and from MongoDB Isn’t Encrypted
By default the connections to and from
MongoDB are not encrypted. When running on a public network,
consider encrypting the communication; otherwise it can pose a
threat to your data. Communications on a public network can be
encrypted using the SSL-supported build of MongoDB, which is
available in the 64-bit version only.
11.10 Write and Read Limitations
11.10.1 Case-Sensitive Queries
For example, the following two commands
will return different results: db.books.find({name:
'PracticalMongoDB'}) and db.books.find({name:
'practicalmongodb'}). You should ensure that you know in which
case the data is stored. Although regex searches like
db.books.find({name: /practicalmongodb/i}) can be used, they
aren’t ideal because they are relatively slow.
11.10.4 Transactions
MongoDB only supports single document
atomicity. Since a write operation can modify multiple documents,
this operation is not atomic. However, you can isolate write
operations that affect multiple documents using the isolation
operator.
11.10.4.1 Replica Set Limitations - Number of Replica Set Members
A replica set is used to ensure data
redundancy in MongoDB. One member acts as a primary member and
the rest act as secondary members. Due to the way voting works
with MongoDB, you must use an odd number of members.
This is because a node needs majority of
votes to become primary. If you use an even number of nodes, you
will end up in a tie with no primary being chosen because no one
member will have the majority of vote. In this scenario, the
replica set will become read only.
You can use arbiters to break such ties.
They can help support failover and save on cost. To learn more
about replica set functioning, please refer to Chapter 7.
11.12 Summary
12. MongoDB Best Practices
Shakuntala Gupta Edward1
and Navin Sabharwal2
12.1 Deployment
While deciding on the deployment strategy ,
keep the following tips in mind so that the hardware sizing is done
appropriately. These tips will also help you decide whether to use
sharding and replication .
-
Disk Type : If speed is not a
primary concern or if the data set is larger than what any
in-memory strategy can support, it’s very important to select a
proper disk type. IOPS (input/output operations per second) is the
key for selecting a disk type; the higher the IOPS, the better the
MongoDB performance. If possible, local disks should be used
because network storage can cause poor performance and high
latency. It is also advised to use RAID 10 when creating disk
arrays (wherever possible).
-
CPU : If you anticipate using map
reducing, then the clock speed and the available processors become
important considerations. Clock speed can also have a major impact
on the overall performance when you are running a mongod with the
majority of data in memory. In circumstances where you want to
maximize the operations per second, you must consider including a
CPU with a high clock/bus speed in your deployment strategy.
-
A 2x1 deployment is the most common
configuration for replication with three nodes, where there are
two nodes in one data center and a backup node in a secondary data
center, as depicted in Figure 12-1.
No comments:
Post a Comment