0% found this document useful (0 votes)
8 views

Document Database

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Document Database

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 25

Document Databases – MongoDB and CouchDB

Dr. Richa Sharma


Commonwealth University
Introduction
 A type of NoSQL database that stores the document data as JSON and
other data serialization format for documents such as XML, YAML etc.

 These databases are considered a subclass of the key-value pair


NoSQL database.

 One of the reasons of the popularity of document databases lies in its


similarity to objects in programming languages. Documents in a
document DB are roughly equivalent to the programming concept of an
object.

 Document databases make use of additional metadata associated with


and stored along with the document content.

 The metadata facilities organizing documents, providing security, or


other implementation specific features.
Example of formats
 JSON:
{
field1: value1,
field2: value2,
....
fieldN: valueN
}
 XML:
<attr1>
<field1>value1</field1>
<field2>value2</field2>
.....
<fieldN>valueN</fieldN>
</attr1>
Differences with KV Pair DB
 In a key-value database, both keys and values can be anything, from
simple to complex compound objects.

 In a document-oriented DB, which is a special type of key-value pair


DB, keys can only be strings. The document in the database is encoded
using standards like JSON, XML etc. One can also store PDFs, image
files, or text documents directly as values in the document database.

 The data processing in a KV database is considered to be inherently


opaque to the database whereas a document-oriented system relies on
internal structure of the document in order to extract metadata that the
database engine uses for further optimization.
 The document-oriented is designed to offer a richer experience with
modern programming techniques.
Application Areas
 Content management: for content management applications such as
blogs and video platforms, document-DB is an excellent choice!
Store each entity in the application as a single document – this offers
the flexibility required to update an application as requirements evolve
over time. If the data model is to be changed for a new requirement,
only the affected documents need to be updated. No schema update
required! No database downtime required!

 Catalog Management: effective for storing catalog information. Different


products usually have different numbers of attributes – using RDBMS in
this scenario is inefficient (sparse storage of data & data retrieval
expensive)
Using a document DB, each product’s attributes can be stored as a
single document for easy management and faster retrieval. Changing
the attributes of one product won’t affect others.
Application Areas
 Sensor management: Internet of Things (IoT) has resulted in
collecting data from smart devices like sensors and meters.
Sensor data typically comes in as a continuous stream of
variable values. Due to latency issues, some data objects
might be incomplete, duplicated, or missing. It is usually a
very large volume of data that gets collected.
Document DB is a better choice for storing voluminous sensor
data as it is, without cleaning it or making it conform to any
pre-determined schema. One can scale the database as
required and delete entire documents once analytics is done!

Source: https://aws.amazon.com/nosql/document/
Advantages
 Scalability: Document databases scale horizontally, like other No-
SQL databases, across multiple servers without impacting
performance which is cost-efficient as well!
Document databases provide fault tolerance and availability through
built-in replication.

 Flexibility: allows creating multiple documents with different fields


within the same collection. This can be handy when storing
unstructured data like emails or social media posts. However, some
document databases offer schema validation, if required, so as to
impose some restrictions on the structure of the data.

 Ease of use: JSON documents map to objects in most


programming languages, so while developing applications,
developers can flexibly create and update documents directly from
7
the code. This makes application development fast and efficient!
MongoDB
Introduction
 MongoDB derives its name from “humongous”—with performance
and easy data access as core design goals!

 A JSON document database. Allows storing document objects


nested to whichever depth we like, and we can query that nested
data in an ad hoc fashion.

 It enforces no schema – documents contain field–value pair just like


KV pair database as a JSON object!
 Documents can have a primary key as a unique identifier, and sets
of documents are referred to as collection (just like tables in
RDBMS).

 MongoDB sits somewhere between the powerful querying ability of


a relational database and the distributed nature of NoSQL
databases.
CRUD Operations
 Creating a collection in Mongo is as easy as adding an
initial record to the collection.

 Because Mongo is schemaless, there is no need to


define anything up front; merely using it is enough.

 An example: the following piece of code creates/inserts


a towns collection:

db.towns.insert ({
name: "New York“,
population: 22200000
})
CRUD Operations
 An example: the following command lists the collections
present in the database:
> show collections
towns

 We can list the contents of a collection using find()


command:
> db.towns.find()
{ "_id" : ObjectId("59093bc08c87e2ff4157bd9f"),
"name" : "New York",
"population" : 22200000,
}
CRUD Operations
 Unlike a relational database, Mongo does not support
server-side joins, i.e. table joins. A single JavaScript call will
retrieve a document and all of its nested contents!
 The true power of Mongo stems from its ability to dig down
into a document and return the results of deeply nested
subdocuments.
Indexing
 One of Mongo’s useful built-in features is indexing, for query
performance enhancement – this feature is not available on
other NoSQL databases.

◦ Indexes are special data structures, that store a small portion of


the data set in an easy-to-traverse form.
◦ The index stores the value of a specific field or set of fields,
ordered by the value of the field as specified in the index.

◦ MongoDB provides several good data structures for indexing,


such as the classic B-tree, two-dimensional and spherical
GeoSpatial indexes.
◦ Without indexes, MongoDB must scan every document of a
collection to select those documents that match the query
statement. This scan is highly inefficient and require MongoDB
Attributes of database to explore!
 Unique characteristic of database – What makes document
databases like MongoDB unique is their ability to efficiently
handle arbitrarily nested, schemaless data documents.

◦ Mongo’s primary strength lies in its ability to handle huge


amounts of data (and huge amounts of requests) by replication
and horizontal scaling.
◦ It also has an added benefit of a very flexible data model. Without
conforming to a schema, one can simply nest any values!
◦ For big projects, one may have MongoDB setup as a cluster of
machines, which would provide for much higher availability and
enable replicating data across servers, shard collections into
many pieces, and perform queries in parallel.

 Communication interface of database – MongoDB provides


Attributes of database to explore!
 Nature of problem and usage of database – Mongo is an
excellent choice for an ever-growing class of web projects with
large-scale data storage requirements but very little budget to
buy hardware.

 Durability – has mechanism for crating database dump and


recover from failure using those dumps!
◦ To create backup of database in MongoDB, use
mongodump command. This command will dump the entire
data of the server into the dump directory.
◦ To restore backup data MongoDB's mongorestore
command is used. This command restores all of the data
from the backup directory.

 Performance and Scalability – MongoDB is meant to scale


Attributes of database to explore!

 Security – MongoDB provides features, such as authentication,


access control, encryption, to secure MongoDB deployments.

 Database Replication – MongoDB achieves replication by the


use of replica set.
◦ A replica set is a group of 2 or more MongoDB instances that host the
same data set to provide high availability
◦ In a replica set, one node is primary node that receives all write
operations. Replica set can have only one primary node.
◦ All other instances that are secondary, apply operations from the primary
so that they have the same data set. Thus, data replicates from primary
to secondary nodes!
◦ At the time of automatic failover or maintenance, election establishes for
primary and a new primary node is elected.
◦ After the recovery of failed node, it again join the replica set and works
Challenges - MongoDB!

 MongoDB encourages denormalization of schemas (by not having


any schema at all) but that can be a bit too much at times!

 It can be dangerous to insert any old value of any type into any
collection.
 A single typing error can cause hours of headache if you don’t care
to look at field names and collection names while adding values!
 Security is also a concern as user-authentication is not enabled by
default.

 Because Mongo is focused on large datasets, it works best in large


clusters, which require some effort to design and manage. Unlike
clustered databases where adding new nodes is a transparent and
relatively painless process, setting up a Mongo cluster requires a
little more forethought as it works with one primary node principle!
CouchDB
Introduction

 CouchDB is document oriented, using JSON as its storage


and communication language.

 Couch is an acronym for ‘cluster of unreliable commodity


hardware’.

 CouchDB was designed with the web in mind and all the
innumerable flaws, faults, failures, and glitches that come
with it. Consequently, CouchDB offers a robustness in terms
of availability of the database (trade-off with consistency).

 Whereas other systems tolerate occasional network drops,


CouchDB thrives even when connectivity is only rarely
available.
Introduction (ctd.)

 It is written in Erlang (functional programming language),


 Like MongoDB, CouchDB stores documents as—JSON
objects consisting of key-value pairs where values may be any
of several types, including other objects nested to any depth!
 What is missing in CouchDB, though, is ad hoc querying.

 CouchDB offers a lot of flexibility to decide how to structure,


protect, and distribute your data.
 Access to CouchDB happen through its REST interface. All
items (key-value pairs) have a unique URI that gets exposed
via HTTP.
 HTTP methods POST, GET, PUT and DELETE are used for
CRUD operations on all resources.
Introduction (ctd.)

 CouchDB aims to support a variety of deployment scenarios


from the datacenter down to the smartphones!
 We can run CouchDB on Android phone, or on any
laptop/desktop, or in any data-center.

 CouchDB comes with a web interface, Fauxton, to interact


with the DB!
 With its append-only storage model, the data is virtually
incorruptible and easy to replicate, back up, and restore.
 Replication can be one way (from one database to another)
or bidirectional (back and forth between databases – multi-
master replication), and is ad hoc (triggered at will) or
continuous (triggered at periodic intervals).
Pros and Cons of CouchDB

 CouchDB’s Strengths:

◦ CouchDB is a robust and stable member of the NoSQL


databases group!
◦ CouchDB offers a highly decentralized approach to data storage
ensuring availability, and is considered eventually consistent with
read scenarios.
◦ Provides document-level ACID semantics with eventual
consistency, (incremental) MapReduce, and (incremental)
replication.
◦ Multi-master replication, which allows it to scale across
machines to build high-performance systems.
◦ Small enough to live in a smartphone and big enough to support
the enterprise, CouchDB affords a variety of deployment
Pros and Cons of CouchDB

 CouchDB’s drawbacks:
◦ Of course, CouchDB isn’t well suited for everything – it doesn’t
support ad-hoc queries unlike MongoDB!

◦ Query system does not lock database objects on writes –


therefore, any conflict resolution or locking mechanisms for
consistency need to be implemented by the application
developer. This adds unnecessary complexity to the application
code.

◦ As with many other NoSQL databases, CouchDB works best


when there is a very good sense of what is needed in advance.
In some databases, that means knowing the key or “address” of
an object; in CouchDB, that means knowing all of the queries in
MongoDB vs CouchDB

 CouchDB stores data in JSON format whereas MongoDB


prefers BSON (optimized for storage & sharing) format.

 Database in CouchDB contains documents whereas


database in MongoDB contains collections and collection
contain documents.

 CouchDB favours availability whereas MongoDB


chooses consistency over availability.

 CouchDB uses HTTP/REST-based interface whereas


MongoDB uses TCP/IP-based interface.
MongoDB vs CouchDB

 CouchDB offers both master-master and master-slave


replication but MongoDB supports only master-slave
replication.

 CouchDB provides support for mobile devices, which is


missing in MongoDB.

 CouchDB does not support ad-hoc queries but these are


supported in MongoDB.

 CouchDB is not a suitable choice for a rapidly growing


database whereas MongoDB is an apt choice for a rapidly
growing database.

You might also like