0% found this document useful (0 votes)
188 views

DF100 - 02 - Storage and Retrieval Part 1

The document discusses MongoDB CRUD operations for working with documents. It covers creating documents using insertOne and insertMany, reading documents using findOne and find, updating documents using update operations, and deleting documents.

Uploaded by

iambinhnpt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
188 views

DF100 - 02 - Storage and Retrieval Part 1

The document discusses MongoDB CRUD operations for working with documents. It covers creating documents using insertOne and insertMany, reading documents using findOne and find, updating documents using update operations, and deleting documents.

Uploaded by

iambinhnpt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

DF100

Storage and Retrieval I


MongoDB Developer Fundamentals

Release: 20240216
Topics we cover
Creating Documents

Cursors

Updating Documents

Absolute Changes

Relative Changes

Conditional Changes

Deleting Documents
3
Load Sample Data

In your Atlas cluster,


Click the three dots [...]
Select Load Sample Dataset
Click Browse Collections to view
the databases and collections we
loaded.

We will be using the


sample_training database.

Follow the instructions to load sample data set in Atlas.


Click on the Collections Button to see a list of databases, out of which we would be using the
sample_training database.
Validate Loaded Data

Connect to the MongoDB Atlas MongoDB> use sample_training


switched to db sample_training
Cluster using mongosh MongoDB> db.grades.countDocuments({})
100000

Verify that data is loaded MongoDB> db.inspections.countDocuments({})


80047

Database to use: MongoDB>

sample_training

Collections to verify: grades and


inspections

Validate the loaded data by checking collection counts for sample_training.grades and
sample_training.inspections.

countDocuments causes the query to return just the number of results found.
Basic Database CRUD Interactions

Single Document Multiple Documents

Create insertOne(doc) insertMany([doc,doc,doc])

Read findOne(query, projection) find(query, projection)

Update updateOne(query,change) updateMany(query,change)

Delete deleteOne(query) deleteMany(query)

MongoDB APIs allow us to perform Create Read Update and Delete operations options to perform
single or multiple operations.
Creating
Documents
7
Creating New Documents - insertOne()

insertOne() adds a document to MongoDB> db.customers.insertOne({


_id : "[email protected]",
a collection. name: "Robert Smith", orders: [], spend: 0,
lastpurchase: null
})
Documents are essentially { acknowledged: true, insertedId : "[email protected]" }

Objects. MongoDB> db.customers.insertOne({


_id : "[email protected]",
name: "Bobby Smith", orders: [], spend: 0,
_id field must be unique, it will })
lastpurchase: null

be added if not supplied. MongoServerError: E11000 duplicate key error ...

MongoDB> db.customers.insertOne({
name: "Andi Smith", orders: [], spend: 0,
lastpurchase: null
})

{acknowledged: true, insertedId: ObjectId("609abxxxxxx254")}

insertOne() adds a document to the collection on which it is called. It is the most basic way to
add a new document to a collection.

There are a very few default constraints, the document - which is represented by a language
object - Document, Dictionary, Object must be <16MB

It must have a unique value for _id. If we don't provide one, MongoDB will assign it a GUID of type
ObjectId - a MongoDB GUID type 12 bytes long.

{ "acknowledged”: true, ... } means it has succeeded in writing the data to one member of the
replica set however we have not specified whether we need it to be on more than one, or even
flushed to disk by default.

We can request stronger write guarantees as we will explain later.


Add Multiple Documents - insertMany()

Accepts an array of documents. // 1000 Network Calls


MongoDB> let st = ISODate()
for(let d=0;d<1000;d++) {
Single network call normally. db.orders.insertOne({ product: "socks", quantity: d})
}

Reduces network time. print(`${ISODate()-st} milliseconds`)

9106ms

Returns an object with // 1 Network call, same data


information about each insert. MongoDB> let st = ISODate()
let docs = []
for(let d=0;d<1000;d++) {
docs.push({ product: "socks", quantity: d})
}
db.orders.insertMany(docs)
print(`${ISODate()-st} milliseconds`)

51ms

insertMany() can add multiple new documents. Often 1000 at a time.


This avoids the need for a network round trip per document, which is really slow
Returns a document showing the success/failure of each and any primary keys assigned
Limit of 48MB or 100,000 documents data in a single call to the server, but a larger batch is broken
up behind the scenes by the driver
There is a way to bundle Insert, Update and Delete operations into a single network call too called
BulkWrite.
Order of operations in insertMany()

insertMany() can be ordered or MongoDB> let friends = [


{_id: "joe" },
unordered. {_id: "bob" },
{_id: "joe" },
{_id: "jen" }
Ordered (default) stops on first ]

MongoDB> db.collection1.insertMany(friends)
error. { errmsg : "E11000 duplicate key error ...",
nInserted : 2 }

Unordered reports errors but MongoDB> db.collection2.insertMany(friends,{ordered:false})


{ errmsg : "E11000 duplicate key error ...",
nInserted : 3 }
continues; can be reordered by
MongoDB> db.collection1.find()
the server to make the operation { _id : "joe" }
{ _id : "bob" }

faster. MongoDB> db.collection2.find()


{ _id : "joe" }
{ _id : "bob" }
{ _id : "jen" }

10

If we opt for strict ordering then:


● It must stop on first error
● No reordering or parallelism can be done so slower in sharded cluster.
Reading
Documents
11
Find and Retrieve documents

findOne() retrieves a single MongoDB> db.customers.insertOne({


_id : "[email protected]",
document. name: "Timothy",
orders: [], spend: 0,
lastpurchase: null
Accepts a document as a filter })

to “query-by-example.” { acknowledged: true, insertedId : "[email protected]" }


MongoDB> db.customers.findOne({ _id : "[email protected]" })
{ _id : "[email protected]",
Empty object (or no object) name : "Timothy",
orders : [ ],
spend : 0,
matches everything. }
lastpurchase : null

MongoDB> db.customers.findOne({ spend: 0 })


MongoDB> db.customers.findOne({ spend: 0 , name: "Timothy" })
MongoDB> db.customers.findOne({ name: "timothy" }) // No match

MongoDB> db.customers.findOne({ spend: "0" }) // No Match ✗
MongoDB> db.customers.findOne({}) // All Match - Return one
12

We can retrieve a document using findOne(). findOne() takes an Object as an argument

We return the first document we find where all the members match. If there are multiple matches
there is no way to predict which is 'first' in this case.

Here we add a record for customer Timothy using insertOne()

Then we query by the _id field - which has the user’s email and we find the record - this returns an
object - and mongosh prints what is returned.

We can also query by any other field - although only _id has an index by default so the others here
are less efficient for now.

We can supply multiple fields, and if they all match we find the record - Someone called Timothy
who has spent 0 dollars.

Note that the order of the fields in the query does not matter here - we can think of the comma as
just meaning AND

db.customers.findOne({ spend: "0" }) fails - because it's looking for the String "0" not the
number 0 so doesn't match.

An Empty object matches everything. However, due to the inherent nature of findOne() it would
return us only one document.
Find and Retrieve documents

Regex can be used to find string


MongoDB> db.customers.findOne({ name: "timothy" }) // No match
values without needing exact ✗

matching related to case MongoDB> db.customers.findOne({ name: {$regex: /timothy/i }})


//Returns a match

sensitivity MongoDB> db.customers.findOne({ name: /timothy/i })


//Returns a match

$regex operator is optional in


syntax and can be omitted to
get the same result

13

The example is done in javascript regex since mongosh is a js REPL. The regex
structure is entirely language dependent based on the driver you are working with.
Projection: choosing the fields to return

Find operations can include a MongoDB> db.customers.insertOne({


_id : "[email protected]",
projection parameter. name: "Ann", orders: [], spend: 0,
lastpurchase: null
})
Projections only return a subset MongoDB> db.customers.findOne({ name: "Ann" })
{ _id : "[email protected]",
of each document. name : "Ann",
orders : [], spend: 0, lastpurchase: null }

Projections include/exclude a set MongoDB> db.customers.findOne({ name:"Ann" },{name:1, spend:1})


{ _id : "[email protected]", name : "Ann", spend : 0 }

of fields. MongoDB> db.customers.findOne({ name:"Ann" },{name:0, orders:0})


{ _id : "[email protected]", spend : 0, lastpurchase : null }

MongoDB> db.customers.findOne({ name:"Ann" },{name:0, orders:1})


MongoServerError: "Cannot do inclusion on field orders in
exclusion projection"

MongoDB> db.customers.findOne({ name:"Ann" },{_id: 0, name:1})


{ name : "Ann" }

14

We can select the fields to return by providing an object with those fields and a value of 1 for each.

Documents can be large; with the help of projection we can have MongoDB return a subset of the
fields.

_id is always returned by default.

We can instead choose what field NOT to return by providing an object with fields set to 0.

We cannot mix and match 0 and 1 - as what should it do with any other fields?

There is an exception where we can use _id: 0 it to remove _id from the projection and project
only the fields that are required { _id:0, name : 1 }

There are some more advanced projection options, including projecting parts of an array
and projecting computed fields using aggregation but those are not covered here.
Fetch multiple documents using find()

find() returns a cursor object MongoDB> for(let x=0;x<200;x++) {


db.taxis.insertOne({ plate: x })
rather than a single document }

MongoDB> db.taxis.find({})
We fetch documents from the { _id : ObjectId("609b9aaccf0c3aa225ce9116"), plate : 0 }
{ _id : ObjectId("609b9aaccf0c3aa225ce9117"), plate : 1 }
cursor to get all matches ...
{ _id : ObjectId("609b9aaccf0c3aa225ce9129"), plate : 19 }
Type "it" for more
mongosh fetches and displays MongoDB> it

20 documents from the cursor { _id : ObjectId("609b9aaccf0c3aa225ce912a"), plate : 20 }


{ _id : ObjectId("609b9aaccf0c3aa225ce912b"), plate : 21 }

object. ...
{ _id : ObjectId("609b9aaccf0c3aa225ce913d"), plate : 39 }

MongoDB> db.taxis.find({ plate: 5 })


{ _id : ObjectId("609b9aaccf0c3aa225ce911b"), plate : 5 }

15

Find returns a cursor object, by default the shell then tries to print that out.

The cursor object prints out by displaying its next 20 documents and setting the value of a
variable called it to itself.

If we type it - then it tries to print the cursor again - and display the next 20 objects.

As a programmer - cursors won't do anything until we look at them.

We can add .pretty() to a cursor object to make the shell display larger documents with newlines
and indentation.
Cursors
16
Using Cursors

Here, we store the result of find MongoDB> let mycursor = db.taxis.find({})

to a variable. MongoDB> while (mycursor.hasNext()) {


let doc = mycursor.next();

We then manually iterate over }


printjson(doc)

the cursor. { _id : ObjectId("609b9aaccf0c3aa225ce9117"), plate : 0 }


{ _id : ObjectId("609b9aaccf0c3aa225ce9118"), plate : 1 }
...
The query is not actually run until { _id : ObjectId("609b9aaccf0c3aa225ce91dd"), plate : 199 }

we fetch results from the cursor. MongoDB> let mycursor = db.taxis.find({}) // No Output

MongoDB> mycursor.forEach( doc => { printjson(doc) })

//This does nothing - does not even contact the server!


MongoDB> for(let x=0;x<100;x++) {
let c = db.taxis.find({})
}

17

mycursor is a cursor object, it knows the database, collection and query we want to run.

Until we do something with it it has not run the query - it has not even contacted the server.

It has methods - importantly , in mongosh hasNext() and next() to check for more values and
fetch them.

We can iterate over a cursor in various ways depending on our programming language.

If we don't fetch information from a cursor - it never executes the find - this might not be expected
when doing simple performance tests like the one below.

To pull the results from a cursor in a shell for testing speed we can use
db.collection.find(query).itcount()
Cursor modifiers

Cursors can include additional MongoDB> for(let x=0;x<200;x++) {


db.taxis.insertOne({plate:x})
instructions like limit, skip, etc. }

MongoDB> db.taxis.find({}).limit(5)
Skip and limit return us cursors. { _id : ObjectId("609b9aaccf0c3aa225ce9116"),
{ _id : ObjectId("609b9aaccf0c3aa225ce9117"),
plate
plate
:
:
0
1
}
}
{ _id : ObjectId("609b9aaccf0c3aa225ce9118"), plate : 2 }
{ _id : ObjectId("609b9aaccf0c3aa225ce9119"), plate : 3 }
{ _id : ObjectId("609b9aaccf0c3aa225ce911a"), plate : 4 }

MongoDB> db.taxis.find({}).skip(2)
{ _id : ObjectId("609b9aaccf0c3aa225ce9118"), plate : 2 }
... REMOVED for clarity ...
{ _id : ObjectId("609b9aaccf0c3aa225ce912b"), plate : 21 }
Type "it" for more

MongoDB> db.taxis.find({}).skip(8).limit(2)
{ _id : ObjectId("609b9aaccf0c3aa225ce911e"), plate : 8 }
{ _id : ObjectId("609b9aaccf0c3aa225ce911f"), plate : 9 }

18

We can add a limit instruction to the cursor to stop the query when it finds enough results.

We can add a skip instruction to the cursor to tell it to ignore the first N results.

The Skip is always performed before the limit when computing the answer.

This can be used for simple paging of results - although it's not the optimal way of doing so.

Skip has a cost on the server - skipping a large number of documents is not advisable.
Sorting Results

Use sort() cursor modifier to MongoDB> let rnd = (x)=>Math.floor(Math.random()*x)

retrieve results in a specific order MongoDB>


for(let x=0;x<100;x++) { db.scores.insertOne({ride:rnd(40),swim:
rnd(40),run:rnd(40)})}

Specify an object listing fields in //Unsorted


MongoDB> db.scores.find({},{_id:0})
the order to sort and sort { ride : 5, swim : 11, run : 11 }
{ ride : 0, swim : 17, run : 12 }
{ ride : 17, swim : 2, run : 2 }
direction //Sorted by ride increasing
MongoDB> db.scores.find({},{_id:0}).sort({ride: 1})
{ ride : 0, swim : 38, run : 10 }
{ ride : 1, swim : 37, run : 37 }
{ ride : 1, swim : 30, run : 20 }

//Sorted by swim increasing then ride decreasing


MongoDB> db.scores.find({},{_id:0}).sort({swim: 1, ride: -1})
{ ride : 31, swim : 0, run : 14 }
{ ride : 11, swim : 0, run : 14 }
{ ride : 30, swim : 1, run : 34 }
{ ride : 21, swim : 1, run : 3 }

19

With Skip and Limit sorting can be very important so we skip to limit to what we expect.

We cannot assume anything about the order of unsorted results.

Sorting results without an index is very inefficient - we cover this when talking about indexes later.
Cursors work in batches

Cursors fetch results from the server in batches.

The default batch size in the shell is 101 documents during the
initial call to find() with a limit of 16MB.

If we fetch more than the first 100 document from a cursor it


fetches in 16MB batches in the shell or up to 48MB in some drivers.

20

Rather than make a call to the server every time we get the next document from a cursor, the
server fetches the result in batches and stores them at the client or shell end until we want them.

Fetching documents one by one would be slow.

Fetching all documents at once would use too much client RAM.

We can change the batch size on the cursor if we need to but it's still limited to 16M.

Fetching additional data from a cursor uses a function called getmore() behind the scenes, it
fetches 16MB at a time.
Exercise
Add four documents to a MongoDB> db.diaries.drop()
collection called diaries using the MongoDB> db.diaries.insertMany([
commands shown here. {
name: "dug", day: ISODate("2014-11-04"),
txt: "went for a walk"
Write a find() operation to },
{
output only diary entries from name: "dug", day: ISODate("2014-11-06"),
txt: "saw a squirrel"
dug. },
{
name: "ray", day: ISODate("2014-11-06"),
Modify it to output the line below txt: "met dug in the park"
},
using skip, limit and a {

projection. name: "dug", day: ISODate("2014-11-09"),


txt: "got a treat"
}
[{name: 'dug', txt: 'saw a squirrel'}] ])

21

Answers at the end


Quiz Time!

22
#1. When does a find() query get
executed on the MongoDB server?

When you call When the driver


When a cursor is
A iterated B the find()
function
C connects to the
database

Every time we Every time an


D add a projection E index is created

Answer in the next slide.


23
#1. When does a find() query get
executed on the MongoDB server?

When you call When the driver


When a cursor is
A iterated B the find()
function
C connects to the
database

Every time we Every time an


D add a projection E index is created

24

find() returns a cursor object rather than a document/s. The shell starts retrieving the first 20
results but by default find() on its own does not retrieve documents.
Calling find() does not return any values until you start retrieve data with the cursor.
The find() query does not have relationship with a connection pool or the driver connection.
The creation of a cursor, adding a projection, or creating an index do not execute the find query.
#2. Why is insertMany() faster than
multiple insertOne() operations?

Performs the
Needs fewer Reduces the
A writes to disk. B network time. C writes as a single
transaction.

Allows parallel
Replicates to
processing of
D other servers
faster.
E inserts in
sharded clusters.

Answer in the next slide.


25
#2. Why is insertMany() faster than
multiple insertOne() operations?

Performs the
Needs fewer Reduces the
A writes to disk. B network time. C writes as a single
transaction.

Allows parallel
Replicates to
processing of
D other servers
faster.
E inserts in
sharded clusters.

26
Recap
Recap

Using Bulk writes vs. Single Writes


has better network performance

find() returns us a cursor object


which the shell then pulls from

27
Exercise
Answers
28
Answer -Exercise: find, skip and limit
Write a find() to output only diary entries from "dug":

MongoDB> db.diaries.find({name:"dug"})

{"_id" : ObjectId("609ba812cf0c3aa225ce91de"), "name" : "dug", "day" : ISODate("2014-11-


04T00:00:00Z"), "txt" : "went for a walk" }

{"_id" : ObjectId("609ba812cf0c3aa225ce91df"), "name" : "dug", "day" : ISODate("2014-11-


06T00:00:00Z"), "txt" : "saw a squirrel" }

{"_id" : ObjectId("609ba812cf0c3aa225ce91e1"), "name" : "dug", "day" : ISODate("2014-11-


09T00:00:00Z"), "txt" : "got a treat" }

Modify it to output the line below using skip, limit and a projection:

MongoDB> db.diaries.find({name:"dug"},{_id:0,day:0}).skip(1).limit(1)

{ name: "dug", txt: "saw a squirrel" }

29

You might also like