MongoDB: The Definitive Guide
Shannon Bradshaw, Eoin Brazil, and Kristina Chodorow
While the publisher and the author have made every good faith effort to ensure that the information and instructions contained in this work are accurate, the publisher and the author disclaim the same. Use of the information and instructions in this work is at your own risk.
Preface
How This Book Is Organized
Getting Started with MongoDB
Developing with MongoDB
Replication
Sharding
Application Administration
Server Administration
Appendixes
Conventions Used in This Book
Displays text to be replaced with user-supplied or context-specified values.
Using Code Examples
O’Reilly Online Learning
For example: "MongoDB: The Definitive Guide, Third Edition by Shannon Bradshaw, Eoin Brazil, and Kristina Chodorow (O'Reilly). O'Reilly's online learning platform gives you on-demand access to live training courses, in-depth learning paths, interactive coding environments, and a large collection of text and video from O'Reilly and 200+ other publishers.
How to Contact Us
For more information about our books, courses, conferences and news, see our website at http://www.oreilly.com. Find us on Facebook: http://facebook.com/oreilly Follow us on Twitter: http://twitter.com/oreillymedia Watch us on YouTube: http://www.youtube.com/oreillymedia.
Introduction to MongoDB
Introduction
Ease of Use
Developers can try dozens of models for the data and then choose the best one to pursue.
Designed to Scale
This is both cheaper and more scalable; however, it is more difficult to manage a thousand machines than it is to maintain one. The topology of a MongoDB cluster, or whether there is actually a cluster rather than a single node at the other end of a database connection, is transparent to the application.
Rich with Features…
Similarly, the application logic can remain the same if the topology of an existing deployment needs to be changed to, for example, scale up to support a higher load. MongoDB supports time-to-live (TTL) collections for data that should expire at a specified time, such as sessions, and fixed-size (limited) collections for storing recent data, such as logs.
Without Sacrificing Speed
The Philosophy
Getting Started
A single instance of MongoDB can host multiple independent databases, each containing its own collections. The mongo shell provides built-in support for administering MongoDB instances and manipulating data using the MongoDB query.
Documents
As you can see, the values in the documents are not just "spots". They can be one of several different types of data (or even an entire embedded document - see "Embedded Documents"). They should generally be treated as reserved and drivers will complain if they are used inappropriately.
Collections
Dynamic Schemas
Keeping different types of documents in the same collection can be a nightmare for developers and administrators. Grouping documents of the same type together in the same collection allows data locality.
Naming
For example, an application that contains a blog might have a collection named blog.posts and a separate collection named blog.authors. For example, in the database shell, db.blog will give you the blog collection, and db.blog.posts will give you the blog.posts collection.
Databases
MongoDB sharded clusters (see Chapter 14) use the configuration database to store information about each shard. For example, if you are using the blog.posts collection in the cms database, the namespace of that collection would be cms.blog.posts.
Getting and Starting MongoDB
For detailed information on installing MongoDB on your system, see Appendix A or the appropriate installation guide in the MongoDB documentation. You can safely stop mongod by pressing Ctrl-C in the command line environment from which you started the mongod server.
Introduction to the MongoDB Shell
Running the Shell
If the statement is not complete, you can continue writing it in the shell on the next line.
A MongoDB Client
Pressing Enter three times in a row will cancel the half-formed command and return you to the > prompt. Now that we can access a collection in the shell, we can perform almost any database operation.
Basic Operations with the Shell
We can see that a "_id" key has been added and the other key/value pairs have been saved as we entered them. If we want to see only one document from a collection, we can use findOne:. find and findOne can also pass criteria in the form of a query document.
Data Types
Basic Data Types
Exactly how values of each type are represented varies by language, but this is a list of the commonly supported types and how they are represented as part of a document in the shell. Finally, there are a few types that are mostly used internally (or superseded by other types).
Dates
For more information on the MongoDB data format, see Appendix B. Dates in the shell are displayed using the local time zone settings. However, dates in the database are only stored as milliseconds since the epoch, so they have no timezone information associated with them.
Arrays
For example, in the previous example, MongoDB can query all documents in which 3.14 is an element of the. MongoDB also enables atomic updates that change the contents of arrays, such as reaching the array and changing the value "pie" to pi.
Embedded Documents
We'll discuss schema design in depth later, but even from this basic example we can begin to see how embedded documents can change the way we work with data. When we did a merge of people and addresses, we would get the updated address for anyone who shared it.
Although synchronized clocks are a good idea for other reasons (see "Synchronizing Clocks"), the actual timestamp doesn't matter for ObjectIds, just that it's often new (once per second) and incrementing. These first nine bytes of an ObjectId therefore guarantee its uniqueness across machines and processes for a single second.
Using the MongoDB Shell
You can use these commands to connect to another database or server at any time.
Tips for Using the Shell
Database level help is provided by db.help() and collection level help is provided by db.foo.help().
Running Scripts with the Shell
If the script is not in your current directory, you can give the shell a relative or absolute path to it. You can disable the loading of your .mongorc.js by using the -- norc option when you start the shell.
Customizing Your Prompt
Note that prompt functions must return strings and be very careful about catching exceptions: it can be extremely confusing if your prompt turns into an exception. The .mongorc.js file is a good place to set your prompt if you always want to use a custom one (or set up some custom prompts that you can switch between in the shell).
Editing Complex Variables
Add EDITOR="/path/to/editor"; to your .mongorc.js file and you don't have to worry about resetting it.
Inconvenient Collection Names
Creating, Updating, and Deleting
Inserting Documents
If a document causes an insertion error, no documents are inserted beyond that point in the array. If we specify unordered insertions instead, the first, second, and fourth documents will be inserted into the array.
Insert Validation
Removing Documents
In this example, we used a filter that could only match one document, because the '_id' values are unique in a collection. Which document is found first depends on several factors, including the order in which the documents were inserted, what updates were made to the documents (for some storage engines), and .
MongoDB drivers introduced the deleteOne and deleteMany methods at the same time as the MongoDB 3.0 server release, and the shell began supporting these methods in MongoDB 3.2. While delete is still supported for backwards compatibility, you must use deleteOne and deleteMany in your applications.
Updating Documents
Document Replacement
A common mistake is to match more than one document with the criteria and then create a duplicate "_id" value with the second parameter. It will try to replace that document with the one in the joe variable, but there is already a document in this collection right away.
Using Update Operators
34;$push" adds elements to the end of an array if the array exists and creates a new array if it doesn't. If the array is smaller than 10 elements (after the push), all elements will be kept become
Upserts
If the vote value for the comment identified by elem is less than or equal to -5, we'll add a field called "hidden" to it. This means we're making a round trip to the database, plus sending an update or insert, every time someone visits a page.
Updating Multiple Documents
Returning Updated Documents
Querying
You can perform many meta-operations on a pointer, including skipping a certain number of results, limiting the number of results returned, and sorting results.
Introduction to find
When we start adding key/value pairs to the query document, we start narrowing our search. Multiple conditions can be linked together by adding more key/value pairs to the query document, which is interpreted as 'condition1 AND condition2 AND'.
Specifying Which Keys to Return
As you can see from the previous output, the "_id" key is returned by default, even if not specifically requested. You can also use this second parameter to exclude specific key/value pairs from the results of a query.
Limitations
Query Criteria
Query Conditionals
To query documents where a key's value is not equal to a certain value, you must use another conditional operator.
OR Queries
In a normal AND-type search, you want to narrow your results as much as possible with as few arguments as possible. OR-type searches are the opposite: they are most efficient when the first arguments match as many documents as possible.
Type-Specific Queries
Regular Expressions
MongoDB uses the Perl Compatible Regular Expression (PCRE) library to match regular expressions; any regular expression syntax allowed by PCRE is allowed in MongoDB. If the regular expression uses a case-sensitive query, the matches can be performed based on values in the index if an index exists for the field.
Querying Arrays
First, you can use "$elemMatch" to force MongoDB to match both clauses with a single array element. However, you can only use min and max when you have an index on the field you're searching for, and you must pass all index fields to min and max.
Querying on Embedded Documents
If Joe decides to add a middle name field, this query will suddenly stop working; it does not fit the entire embedded document. Embedded document matches must match the entire document, and they do not match the "comment" key.
34;$where" queries should not be used unless strictly necessary: they are much slower than regular queries. You can reduce the penalty by using other query filters in combination with "$where".
Cursors
The advantage of doing this is that you can look at one result at a time. Almost every method on a Cursor object returns the Cursor itself, so you can concatenate options in any order.
Limits, Skips, and Sorts
If your collection contains fewer than three documents, you will not receive any documents back. However, large containers do not perform very well; The next section provides suggestions on how to avoid them.
Avoiding Large Skips
If there aren't any documents in the collection, this technique will eventually return null, which makes sense. For example, if we want to find a random plumber in California, we can create an index on "profession".
Immortal Cursors
In that case, many drivers have implemented a function called immortal, or a similar mechanism, that tells the database not to release the pointer. Otherwise, it will sit in the database hogging resources until the server is restarted.
Designing Your Application
Indexes
What are indexes and why would you want to use them How to choose fields to index. As you will see, choosing the right indexes for your collections is critical to performance.
Introduction to Indexes