0% found this document useful (0 votes)
122 views46 pages

Unit 4 (MongoDB)

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
122 views46 pages

Unit 4 (MongoDB)

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 46

Big Data and Analytics

UNIT-IV
MongoDB
MongoDB: An introduction
• MongoDB, the most popular NoSQL database, is an open-source
document-oriented database. The term ‘NoSQL’ means ‘non-relational’. It
means that MongoDB isn’t based on the table-like relational database
structure but provides an altogether different mechanism for storage and
retrieval of data.
• MongoDB is available under General Public license for free, and it is also
available under Commercial license from the manufacturer.
• The manufacturing company 10gen has defined MongoDB as:
• "MongoDB is a scalable, open source, high performance, document-
oriented database." - 10gen
• MongoDB was designed to work with commodity servers. Now it is used
by the company of all sizes, across all industry.
MongoDB is such a NoSQL database that scales by adding more and more
servers and increases productivity with its flexible document model.
History of MongoDB
• The initial development of MongoDB began in 2007 when the company
was building a platform as a service similar to window azure.
• MongoDB was developed by a NewYork based organization named 10gen
which is now known as MongoDB Inc. It was initially developed as a
PAAS (Platform as a Service). Later in 2009, it is introduced in the market
as an open source database server that was maintained and supported by
MongoDB
• The first ready production of MongoDB has been considered from version
1.4 which was released in March 2010.
• MongoDB2.4.9 was the latest and stable version which was released on
January 10, 2014.
Features of MongoDB
1. Support ad hoc queries
In MongoDB, you can search by field, range query and it also supports
regular expression searches.
2. Indexing
Without indexing, a database would have to scan every document of a
collection to select those that match the query which would be inefficient.
So, for efficient searching Indexing is a must and MongoDB uses it to
process huge volumes of data in very less time.
3. Replication and High Availability: MongoDB increases the data
availability with multiple copies of data on different servers. By providing
redundancy, it protects the database from hardware failures. If one server
goes down, the data can be retrieved easily from other active servers which
also had the data stored on them.
4. Document Oriented: MongoDB stores the main subject in the minimal
number of documents and not by breaking it up into multiple relational
structures like RDBMS. For example, it stores all the information of a
computer in a single document called Computer and not in distinct
relational structures like CPU, RAM, Hard disk, etc.
5. Scalability: MongoDB scales horizontally using sharding (partitioning data
across various servers). Data is partitioned into data chunks using the shard
key, and these data chunks are evenly distributed across shards that reside
across many physical servers. Also, new machines can be added to a
running database.
6. Aggregation: Aggregation operations process data records and return the
computed results. It is similar to the GROUPBY clause in SQL. A few
aggregation expressions are sum, avg, min, max, etc
Where do we use MongoDB?
MongoDB is preferred over RDBMS in the following scenarios:
• Big Data: If you have huge amount of data to be stored in tables, think of
MongoDB before RDBMS databases. MongoDB has built-in solution for
partitioning and sharding your database.
• Unstable Schema: Adding a new column in RDBMS is hard whereas
MongoDB is schema-less. Adding a new field does not effect old
documents and will be very easy.
• Distributed data Since multiple copies of data are stored across different
servers, recovery of data is instant and safe even if there is a hardware
failure.
Language Support by MongoDB:
• MongoDB currently provides official driver support for all popular
programming languages like C, C++, Rust, C#, Java, Node.js, Perl, PHP,
Python, Ruby, Scala, Go, and Erlang.
Installing MongoDB:
• Just go to http://www.mongodb.org/downloads and select your operating
system out of Windows, Linux, Mac OS X and Solaris. A detailed
explanation about the installation of MongoDB is given on their site.
Install MongoDB On Windows
• To install MongoDB on Windows, first download the latest release of
MongoDB from https://www.mongodb.com/download-center.
• Enter the required details, select the Server tab, in it you can choose the
version of MongoDB, operating system and, packaging as:

Now install the downloaded file, by default, it will be installed in the


folder C:\Program Files\.
MongoDB - Data Modelling
• Data in MongoDB has a flexible schema.documents in the same collection.
They do not need to have the same set of fields or structure Common fields
in a collection’s documents may hold different types of data.
Data Model Design
• MongoDB provides two types of data models: — Embedded data model
and Normalized data model. Based on the requirement, you can use either
of the models while preparing your document.
Embedded Data Model
In this model, you can have (embed) all the related data in a single document,
it is also known as de-normalized data model.
{
_id: ,
Emp_ID: "10025AE336“
Personal_details:{
First_Name: "Radhika",
Last_Name: "Sharma",

Date_Of_Birth: "1995-09-26" },
Contact: {
e-mail: "[email protected]",
phone: "9848022338"
},
Address: {
city: "Hyderabad",
Area: "Madapur",
State: "Telangana"
}}
Normalized Data Model
• In this model, you can refer the sub documents in the original document,
using references.
Employee:
{
_id: <ObjectId101>,
Emp_ID: "10025AE336"
}
Personal_details:
{
_id: <ObjectId102>,
empDocID: " ObjectId101",
First_Name: "Radhika",
Last_Name: "Sharma",
Date_Of_Birth: "1995-09-26"
}
DataTypes in MongoDB
In MongoDB, the documents are stores in BSON, which is the binary encoded
format of JSON and using BSON we can make remote procedure calls in
MongoDB. BSON data format supports various data-types.
1. String: This is the most commonly used data type in MongoDB to store
data, BSON strings are of UTF-8. So, the drivers for each programming
language convert from the string format of the language to UTF-8 while
serializing and de-serializing BSON. The string must be a valid UTF-8.
2. Integer: In MongoDB, the integer data type is used to store an integer
value. We can store integer data type in two forms 32 -bit signed integer
and 64 – bit signed integer.
3. Double: The double data type is used to store the floating-point values.
4. Boolean: The boolean data type is used to store either true or false.
5. Null: The null data type is used to store the null value.
6. Array: The Array is the set of values. It can store the same or different data
types values in it. In MongoDB, the array is created using square
brackets([]).
7. Object: Object data type stores embedded documents. Embedded
documents are also known as nested documents. Embedded document or
nested documents are those types of documents which contain a document
inside another document.
8. Object Id: Whenever we create a new document in the collection
MongoDB automatically creates a unique object id for that document(if the
document does not have it). There is an _id field in MongoDB for each
document. The data which is stored in Id is of hexadecimal format and the
length of the id is 12 bytes which consist:
• 4-bytes for Timestamp value.
• 5-bytes for Random values. i.e., 3-bytes for machine Id and 2-bytes for
process Id.
• 3- bytes for Counter
• You can also create your own id field, but make sure that the value of that
id field must be unique.
10. Binary Data: This datatype is used to store binary data.
11. Date: Date data type stores date. It is a 64-bit integer which represents the
number of milliseconds. BSON data type generally supports UTC datetime
and it is signed. If the value of the date data type is negative then it represents
the dates before 1970. There are various methods to return date, it can be
returned either as a string or as a date object. Some method for the date:
• Date(): It returns the current date in string format.
• new Date(): Returns a date object. Uses the ISODate() wrapper.
• new ISODate(): It also returns a date object. Uses the ISODate() wrapper.
12. Min & Max key: Min key compares the value of the lowest BSON element
and Max key compares the value against the highest BSON element. Both are
internal data types.
13. Symbol: This data type similar to the string data type. It is generally not
supported by a mongo shell, but if the shell gets a symbol from the database,
then it converts this type into a string type.
• 14. Regular Expression: This datatype is used to store regular
expressions.
• 15. JavaScript: This datatype is used to store JavaScript code into the
document without the scope.
• 17. Timestamp: In MongoDB, this data type is used to store a timestamp.
It is useful when we modify our data to keep a record and the value of this
data type is 64-bit. The value of the timestamp data type is always unique.
• 18. Decimal: This MongoDB data type store 128-bit decimal-based
floating-point value. This data type was introduced in MongoDB version
3.4
MongoDB Operators
MongoDB Query and Projection Operator
The MongoDB query operator includes comparison, logical, element,
evaluation, Geospatial, array, bitwise, and comment operators.
MongoDB Comparison Operators
$eq
The $eq specifies the equality condition. It matches documents where the
value of a field equals the specified value.
Syntax:
• { <field> : { $eq: <value> } }
Example:
• db.books.find ( { price: { $eq: 300 } } )
$gt
• The $gt chooses a document where the value of the field is greater than the specified value.
Syntax:
• { field: { $gt: value } }
Example:
• db.books.find ( { price: { $gt: 200 } } )
$gte
• The $gte choose the documents where the field value is greater than or equal to a specified
value.
Syntax:
• { field: { $gte: value } }
Example:
• db.books.find ( { price: { $gte: 250 } } )
$in
• The $in operator choose the documents where the value of a field equals any value in the
specified array.
Syntax:
• { filed: { $in: [ <value1>, <value2>, ……] } }
Example:
• db.books.find( { price: { $in: [100, 200] } } )
$lt
• The $lt operator chooses the documents where the value of the field is less than the
specified value.
Syntax:
• { field: { $lt: value } }
Example:
• db.books.find ( { price: { $lt: 20 } } )
$lte
• The $lte operator chooses the documents where the field value is less than or equal to a
specified value.
Syntax:
• { field: { $lte: value } }
Example:
• db.books.find ( { price: { $lte: 250 } } )
$ne
• The $ne operator chooses the documents where the field value is not equal to the
specified value.
Syntax:
• { <field>: { $ne: <value> } }
Example:
• db.books.find ( { price: { $ne: 500 } } )
$nin
• The $nin operator chooses the documents where the field value is not in the specified
array or does not exist.
Syntax:
• { field : { $nin: [ <value1>, <value2>, .... ] } }
Example:
• db.books.find ( { price: { $nin: [ 50, 150, 200 ] } } )

MongoDB Logical Operator


$and
• The $and operator works as a logical AND operation on an array. The array should be
of one or more expressions and chooses the documents that satisfy all the expressions
in the array.
Syntax:
• { $and: [ { <exp1> }, { <exp2> }, ....]}
Example:
• db.books.find ( { $and: [ { price: { $ne: 500 } }, { price: { $exists: true } } ] } )
$not
• The $not operator works as a logical NOT on the specified expression and
chooses the documents that are not related to the expression.
Syntax:
• { field: { $not: { <operator-expression> } } }
Example:
• db.books.find ( { price: { $not: { $gt: 200 } } } )
$nor
• The $nor operator works as logical NOR on an array of one or more query
expression and chooses the documents that fail all the query expression in
the array.
Syntax:
• { $nor: [ { <expression1> } , { <expresion2> } , ..... ] }
Example:
• db.books.find ( { $nor: [ { price: 200 }, { sale: true } ] } )
$or
• It works as a logical OR operation on an array of two or more expressions
and chooses documents that meet the expectation at least one of the
expressions.
Syntax:
• { $or: [ { <exp_1> }, { <exp_2> }, ... , { <exp_n> } ] }
Example:
• db.books.find ( { $or: [ { quantity: { $lt: 200 } }, { price: 500 } ] } )
MongoDB Create Database
• There is no create database command in MongoDB. Actually, MongoDB
do not provide any command to create database.
• It may be look like a weird concept, if you are from traditional SQL
background where you need to create a database, table and insert values in
the table manually.
• Here, in MongoDB you don't need to create a database manually because
MongoDB will create it automatically when you save the value into the
defined collection at first time.
• You also don't need to mention what you want to create, it will be
automatically created at the time you save the value into the defined
collection.
• you can create collection manually by "db.createCollection()" but not the
database.
How and when to create database
• If there is no existing database, the following command is used to create a
new database.
• Syntax:
– use DATABASE_NAME
• If the database already exists, it will return the existing database.
• To check the currently selected database, use the command db: >db
• To check the database list, use the command show dbs: >show dbs

MongoDB Drop Database


• The dropDatabase command is used to drop a database. It also deletes the
associated data files. It operates on the current database.
• Syntax:
• db.dropDatabase()
What are the Documents?
• A document is the basic unit of data in MongoDB which is basically
equivalent to the row as per the concept of RDBMS systems.
• A document is an ordered set of keys with its associated data or values.
{“message” : “Hello, MongoDB”}
• The above document sample contains a key named “message” with a value
of “Hello, MongoDB”.
• a key within a document is mainly string type data.
• Key does not contain any null character. If any null character is used
within a key it means it will denote the end of the key.
• There are some special characters (like $ or . etc) which are mainly used
for some special circumstances. These characters are basically reserved
characters.
• MongoDB is actually a type-sensitive and case-sensitive database.
• document is that it does not allow any duplicate keys in a single
document.
What is the Collection?
• In MongoDB, a collection can be considered as a Table as per the concept
of RDBMS.
• the collection is a group of documents.
• Every collection in MongoDB always has dynamic schemas. This means
that documents within a single collection can have any number of different
elements or shapes.
• Collection names can be any string value or a UTF-8 string. But still, it
contains some restrictions:
– The Empty string (“”) is not a valid collection name.
– Collection names cannot contain a null character.
– Also, we cannot create any collection name starting with the system
keyword.
– The collection name which is created by the user does not contain any
special characters like ($) in the name.
MongoDB Shell
• To interact with MongoDB, we can use the MongoDB Shell which is an
interactive JavaScript interface for the MongoDB.
• We can use the Mongo shell to query and insert or update or delete any
document from the MongoDB Collections
• To start the MongoDB shell, we need to open a command prompt and then
go to the root directory when MongoDB installed.
Create and Drop Collections in MongoDB
• create a new collection in the database, we need to use
db.createCollection(name, option) commands. This method takes two
parameters.
– name parameter basically takes a string value which represents the
collection name
– option parameter takes a document object which specifies the memory
size including indexing options. This parameter is optional. This
parameter takes four types of options,
Field Type Description
If the value is true, then the collection will act as a capped collection. The
capped collection is basically a fixed-size collection which automatically
capped Boolean
replaces the old values with the new values when size exceed. The default value
is false.
If it is true, then automatically creates an index on the fields of _id. The default
autoIndexId Boolean
value is false.
It is required when the capped field value is true. We need to provide the value
size Number
in number to mention the maximum size in bytes for the capped collections.
It is required when the capped field value is true. It mentions the maximum
max Number
number of documents within a capped collection.
creating a collection is as below,

Also, we can create the collections using both the parameters, which means
name and options like below –

if we need to drop the collections then we need to use drop() command as


below.
> db.Employee.drop()
Insert Single Document
• Insert is the basic command to insert any documents within the
collections.
• This command inserts a single document within the collections. This
command also adds “_id” command within the document if it is not
mentioned in the document objects.
Insert Multiple Records

db.employee.insert(
[
{name:"Sandeep Sharma", email:"[email protected]", age:28, salary:5333},
{name:"Manish Fartiyal", email:"[email protected]", age:26, salary:5555.4},
{name:"Santosh Kumar", email:"[email protected]", age:30, salary:7000.74},
{name:"Dhirendra Chauhan", email:"[email protected]", age:29,
salary:4848.44}
]
)
Select Record/s
• db.collection.find() method is used to fetch the data from the
database. db.collection.find() method with no parameter will fetch all data
from the collection similarly db.collection.find
(query, projection) method with the parameter will fetch the data
conditionally.

> db.employee.find()
{ "_id" : ObjectId("5fa2c0e19577bba42e1db54a"), "name" : "Atul Rai", "email" :
"[email protected]", "age" : 28, "salary" : 5000.54 }
{ "_id" : ObjectId("5fa2c2409577bba42e1db54b"), "name" : "Sandeep Sharma",
"email" : "[email protected]", "age" : 28, "salary" : 5333.94 }
{ "_id" : ObjectId("5fa2c2409577bba42e1db54c"), "name" : "Manish Fartiyal", "email"
: "[email protected]", "age" : 26, "salary" : 5555.4 }
{ "_id" : ObjectId("5fa2c2409577bba42e1db54d"), "name" : "Santosh Kumar", "email"
: "[email protected]", "age" : 30, "salary" : 7000.74 }
{ "_id" : ObjectId("5fa2c2409577bba42e1db54e"), "name" : "Dhirendra Chauhan",
"email" : "[email protected]", "age" : 29, "salary" : 4848.44 }
Fetch with condition

db.employee.find({email:"[email protected]"})

{ "_id" : ObjectId("5fa2c0e19577bba42e1db54a"), "name" : "Atul Rai",


"email" : "[email protected]", "age" : 28, "salary" : 5000.54 }

Fetch with multiple condition

db.employee.find({age:28, salary: { $gt: 5000 }})

{ "_id" : ObjectId("5fa2c0e19577bba42e1db54a"), "name" : "Atul Rai",


"email" : "[email protected]", "age" : 28, "salary" : 5000.54 }
{ "_id" : ObjectId("5fa2c2409577bba42e1db54b"), "name" : "Sandeep
Sharma", "email" : "[email protected]", "age" : 28, "salary" : 5333.94 }
Delete Document
• if we want to delete this data then we need to use the below commands,
db.Employee.remove()
• This command will remove all the documents within the Employee
Collections. This command does not remove the collections. It only deletes
all the documents of the collections and makes the collections empty. This
remove() accepts an optional parameter called query document which is
normally used to specify any specific criteria for the remove the
documents. If the query value is given, then only those documents will be
deleted which are matched with a query expression.
db.Employee.remove(“Department”:”Accounts”)
Document Update
• db.collection.update({criteria},{$set: {new value}}) method is used to
update the existing document of the collection.

db.employee.update({email:"[email protected]"},{$set:{salary: 8000.99}})

WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })


MongoDB Capped Collection
• These are fixed collection, which supports high throughput operations. On
the basis of an insertion order, we can insert and retrieve a document.
• They work similar to circular buffers: once a collection file is allocated
space, it makes room for new documents by overwriting the oldest
documents in the collection.
• It means that it starts deleting the oldest document in the collection without
providing any explicit commands. Capped collection restrict updates if the
updates result in increased document size.
• It stores documents in the order of disk storage, so it keeps a check on the
document size as it should not increase the size allocated to disk. It is best
for storing log information, cache data or any other high volume data.
How to Create Capped Collection in
MongoDB?
• To create a MongoDB capped collection, we use the normal createCollection
command with a capped option as true and specifying the maximum size of the
collection in bytes.
>db.createCollection("cappedLogCollection",{capped:true,size:10000})
• We can limit the number of documents in the collection using max parameter-
>db.createCollection("cappedLogCollection",
{capped:true,size:10000,max:1000})
How to Check Collection is Capped or
not?
• If you want to check whether a collection is capped or not, then use the
following command:
>db.cappedLogCollection.isCapped()

How to Convert a Collection to Capped?


• If there is an existing collection which you want to change it to capped,
you can do it by using the following command:
>db.runCommand({"convertToCapped":"posts",size:10000})
Facts about MongoDB Capped Collection
• You cannot delete documents from a capped collection. It can only be
deleted automatically upon insertion of new documents when the allocated
size to the collection has been exhausted.
• After reading the documents from a capped collection, MongoDB returns
the same document in the order which they were present on disk. Because
of this, it makes the read operations to execute very fast.
• Update operation has one restriction with itself. If the update in collection
results in the increase of the document’s size, then it will not update that
document in the collection as each document has it’s fixed size during the
first time insertion into the capped collection.
MongoDB Index
• In MongoDB, Indexes helps to solve queries more efficiently. Indexes are a
special data structure used to locate the record in the given table very quickly
without being required to traverse through every record in the table.
• MongoDB uses these indexes to limit the number of documents that had to be
searched in a collection. The data structure that is used by an index is a Binary
Tree.
a. Default Index
• In MongoDB indexing, all the collections have a default index on the _id field.
If we don’t specify any value for the _id the MongoDB will create _id field with
an object value. This index prevents clients from creating two documents with
the same value _id field.
b. Create an Index
• In MongoDB, a user can create indexes using the following syntax.
• Db.collection_name.createIndex( <key and index type specification>,
<options>)
> db.employee.createIndex({name:1})
Types of Index in MongoDB
i. Single Field Index
• MongoDB supports user-defined indexes like single field index. A single
field index is used to create an index on the single field of a document.
With single field index, MongoDB can traverse in ascending and
descending order.
ii. Compound Index
• MongoDB supports a user-defined index on multiple fields as well. For
this MongoDB has a compound index. There sequential order of fields for
a compound index.
• For example, if a compound index consists of {“name”:1,”city”:1}), then
the index will sort first the name and then the city.
iii. Multikey Index
• MongoDB uses the multikey indexes to index the values stored in arrays. If
we index a field with an array value, MongoDB creates separate index
entries for each element of the array. These indexes allow queries to select
documents with the matching criteria.
iv. Geospatial Index
• To query geospatial data, MongoDB supports two types of indexes – 2d
indexes and 2d sphere indexes. 2d indexes use planar geometry when
returning results and 2dsphere indexes use spherical geometry to return
results.
v. Text Index
• Text index supports searching for string content in a collection. These
index types do not store language-specific stop words (e.g. “the”, “a”,
“or”). Text indexes restrict the words in a collection to only store root
words.
vi. Hashed Index
• MongoDB supports hash-based sharding and provides hashed indexes.
These indexes are the hashes of the field value.
MongoDB Index Properties

a. Unique Indexes
• This property of index causes MongoDB to reject duplicate values for the
indexed field. In other words, a unique property of indexes restricts it to
insert the duplicate value of an indexed field. The unique indexes can be
interchanged functionally with other MongoDB indexes.
• We create an index using createIndex() for the name field and set unique to
be true.
• > db.dataflair.createIndex({name:1},{unique:true})

b. Partial Indexes
• Partial Indexes only index the documents that match the filter criteria. If
we are creating an index with some conditions applied then it is a partial
index.
c. Sparse Indexes
• The sparse property ensures that the index only contains entries for
documents with the indexed field. The index will skip the documents
without the indexed field.
• We can combine this option with the unique index option in order to reject
documents with duplicate values for a field. And can ignore documents
without an indexed key at the same time.
d. TTL Indexes
• TTL or “total time to live” indexes are the special indexes in MongoDB.
These indexes are used to auto-delete documents from a collection after the
specified time duration. The option that we use is expireAfterSeconds to
provide the expiration time.
• This property is ideal for certain types of information like machine-
generated data, logs and session information that only need to be there for
a finite amount of time in a database.

You might also like