DF100 - 02 - Storage and Retrieval Part 1
DF100 - 02 - Storage and Retrieval Part 1
Release: 20240216
Topics we cover
Creating Documents
Cursors
Updating Documents
Absolute Changes
Relative Changes
Conditional Changes
Deleting Documents
3
Load Sample Data
sample_training
Validate the loaded data by checking collection counts for sample_training.grades and
sample_training.inspections.
countDocuments causes the query to return just the number of results found.
Basic Database CRUD Interactions
MongoDB APIs allow us to perform Create Read Update and Delete operations options to perform
single or multiple operations.
Creating
Documents
7
Creating New Documents - insertOne()
MongoDB> db.customers.insertOne({
name: "Andi Smith", orders: [], spend: 0,
lastpurchase: null
})
insertOne() adds a document to the collection on which it is called. It is the most basic way to
add a new document to a collection.
There are a very few default constraints, the document - which is represented by a language
object - Document, Dictionary, Object must be <16MB
It must have a unique value for _id. If we don't provide one, MongoDB will assign it a GUID of type
ObjectId - a MongoDB GUID type 12 bytes long.
{ "acknowledged”: true, ... } means it has succeeded in writing the data to one member of the
replica set however we have not specified whether we need it to be on more than one, or even
flushed to disk by default.
9106ms
51ms
MongoDB> db.collection1.insertMany(friends)
error. { errmsg : "E11000 duplicate key error ...",
nInserted : 2 }
10
We return the first document we find where all the members match. If there are multiple matches
there is no way to predict which is 'first' in this case.
Then we query by the _id field - which has the user’s email and we find the record - this returns an
object - and mongosh prints what is returned.
We can also query by any other field - although only _id has an index by default so the others here
are less efficient for now.
We can supply multiple fields, and if they all match we find the record - Someone called Timothy
who has spent 0 dollars.
Note that the order of the fields in the query does not matter here - we can think of the comma as
just meaning AND
db.customers.findOne({ spend: "0" }) fails - because it's looking for the String "0" not the
number 0 so doesn't match.
An Empty object matches everything. However, due to the inherent nature of findOne() it would
return us only one document.
Find and Retrieve documents
13
The example is done in javascript regex since mongosh is a js REPL. The regex
structure is entirely language dependent based on the driver you are working with.
Projection: choosing the fields to return
14
We can select the fields to return by providing an object with those fields and a value of 1 for each.
Documents can be large; with the help of projection we can have MongoDB return a subset of the
fields.
We can instead choose what field NOT to return by providing an object with fields set to 0.
We cannot mix and match 0 and 1 - as what should it do with any other fields?
There is an exception where we can use _id: 0 it to remove _id from the projection and project
only the fields that are required { _id:0, name : 1 }
There are some more advanced projection options, including projecting parts of an array
and projecting computed fields using aggregation but those are not covered here.
Fetch multiple documents using find()
MongoDB> db.taxis.find({})
We fetch documents from the { _id : ObjectId("609b9aaccf0c3aa225ce9116"), plate : 0 }
{ _id : ObjectId("609b9aaccf0c3aa225ce9117"), plate : 1 }
cursor to get all matches ...
{ _id : ObjectId("609b9aaccf0c3aa225ce9129"), plate : 19 }
Type "it" for more
mongosh fetches and displays MongoDB> it
object. ...
{ _id : ObjectId("609b9aaccf0c3aa225ce913d"), plate : 39 }
15
Find returns a cursor object, by default the shell then tries to print that out.
The cursor object prints out by displaying its next 20 documents and setting the value of a
variable called it to itself.
If we type it - then it tries to print the cursor again - and display the next 20 objects.
We can add .pretty() to a cursor object to make the shell display larger documents with newlines
and indentation.
Cursors
16
Using Cursors
we fetch results from the cursor. MongoDB> let mycursor = db.taxis.find({}) // No Output
17
mycursor is a cursor object, it knows the database, collection and query we want to run.
Until we do something with it it has not run the query - it has not even contacted the server.
It has methods - importantly , in mongosh hasNext() and next() to check for more values and
fetch them.
We can iterate over a cursor in various ways depending on our programming language.
If we don't fetch information from a cursor - it never executes the find - this might not be expected
when doing simple performance tests like the one below.
To pull the results from a cursor in a shell for testing speed we can use
db.collection.find(query).itcount()
Cursor modifiers
MongoDB> db.taxis.find({}).limit(5)
Skip and limit return us cursors. { _id : ObjectId("609b9aaccf0c3aa225ce9116"),
{ _id : ObjectId("609b9aaccf0c3aa225ce9117"),
plate
plate
:
:
0
1
}
}
{ _id : ObjectId("609b9aaccf0c3aa225ce9118"), plate : 2 }
{ _id : ObjectId("609b9aaccf0c3aa225ce9119"), plate : 3 }
{ _id : ObjectId("609b9aaccf0c3aa225ce911a"), plate : 4 }
MongoDB> db.taxis.find({}).skip(2)
{ _id : ObjectId("609b9aaccf0c3aa225ce9118"), plate : 2 }
... REMOVED for clarity ...
{ _id : ObjectId("609b9aaccf0c3aa225ce912b"), plate : 21 }
Type "it" for more
MongoDB> db.taxis.find({}).skip(8).limit(2)
{ _id : ObjectId("609b9aaccf0c3aa225ce911e"), plate : 8 }
{ _id : ObjectId("609b9aaccf0c3aa225ce911f"), plate : 9 }
18
We can add a limit instruction to the cursor to stop the query when it finds enough results.
We can add a skip instruction to the cursor to tell it to ignore the first N results.
The Skip is always performed before the limit when computing the answer.
This can be used for simple paging of results - although it's not the optimal way of doing so.
Skip has a cost on the server - skipping a large number of documents is not advisable.
Sorting Results
19
With Skip and Limit sorting can be very important so we skip to limit to what we expect.
Sorting results without an index is very inefficient - we cover this when talking about indexes later.
Cursors work in batches
The default batch size in the shell is 101 documents during the
initial call to find() with a limit of 16MB.
20
Rather than make a call to the server every time we get the next document from a cursor, the
server fetches the result in batches and stores them at the client or shell end until we want them.
Fetching all documents at once would use too much client RAM.
We can change the batch size on the cursor if we need to but it's still limited to 16M.
Fetching additional data from a cursor uses a function called getmore() behind the scenes, it
fetches 16MB at a time.
Exercise
Add four documents to a MongoDB> db.diaries.drop()
collection called diaries using the MongoDB> db.diaries.insertMany([
commands shown here. {
name: "dug", day: ISODate("2014-11-04"),
txt: "went for a walk"
Write a find() operation to },
{
output only diary entries from name: "dug", day: ISODate("2014-11-06"),
txt: "saw a squirrel"
dug. },
{
name: "ray", day: ISODate("2014-11-06"),
Modify it to output the line below txt: "met dug in the park"
},
using skip, limit and a {
21
22
#1. When does a find() query get
executed on the MongoDB server?
24
find() returns a cursor object rather than a document/s. The shell starts retrieving the first 20
results but by default find() on its own does not retrieve documents.
Calling find() does not return any values until you start retrieve data with the cursor.
The find() query does not have relationship with a connection pool or the driver connection.
The creation of a cursor, adding a projection, or creating an index do not execute the find query.
#2. Why is insertMany() faster than
multiple insertOne() operations?
Performs the
Needs fewer Reduces the
A writes to disk. B network time. C writes as a single
transaction.
Allows parallel
Replicates to
processing of
D other servers
faster.
E inserts in
sharded clusters.
Performs the
Needs fewer Reduces the
A writes to disk. B network time. C writes as a single
transaction.
Allows parallel
Replicates to
processing of
D other servers
faster.
E inserts in
sharded clusters.
26
Recap
Recap
27
Exercise
Answers
28
Answer -Exercise: find, skip and limit
Write a find() to output only diary entries from "dug":
MongoDB> db.diaries.find({name:"dug"})
Modify it to output the line below using skip, limit and a projection:
MongoDB> db.diaries.find({name:"dug"},{_id:0,day:0}).skip(1).limit(1)
29