0% found this document useful (0 votes)
30 views

DBMS Unit2

Master of data science certified course, Database Management System notes

Uploaded by

girab87633
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

DBMS Unit2

Master of data science certified course, Database Management System notes

Uploaded by

girab87633
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Course: MSc DS

Advanced Database Management

Systems

Module: 2
Learning Objectives:

1. Differentiate from traditional RDBMS.

2. Understand scalability, flexibility, and diverse data models.

3. Recognize key-value, document, column, and graph

databases.

4. Evaluate factors for appropriate NoSQL database

selection.

5. Understand the synergy and challenges presented.

6. Identify and strategize solutions for common hurdles.

Structure:

2.1 Understanding NoSQL Databases

2.2 Exploring the Different Facets of NoSQL Databases

2.3 Tailoring the NoSQL Choice: How to Make the Right

Decision

2.4 NoSQL's Vital Role in the Big Data Revolution

2.5 Summary

2.6 Keywords
2.7 Self-Assessment Questions

2.8 Case Study

2.9 Reference

2.1 Understanding NoSQL Databases

NoSQL, which stands for "Not Only SQL," refers to a category of

databases that provide mechanisms to store and retrieve data in

ways other than the tabular relationships utilised in relational

databases. Contrary to the term's implication, it doesn't mean the

exclusion of SQL but rather indicates that these databases don't

solely rely on a relational model. They have risen in prominence

due to the growing needs of businesses to manage vast volumes

of unstructured, semi-structured, or polymorphic data.

A Historical Context: Evolution from RDBMS to NoSQL

In the late 20th century, Relational Database Management

Systems (RDBMS) dominated the database landscape. They were

predicated on the relational model and predominantly utilised

SQL for querying. However, with the emergence of web-scale


applications and the necessity to accommodate massive datasets

in the 21st century, the traditional RDBMS faced challenges in

scalability, flexibility, and performance for specific use cases.

Enter NoSQL databases, which were crafted to address these very

challenges and to meet the demands of modern-day applications.

NoSQL vs. Relational Databases: A Comparative Analysis

Data Model:

● RDBMS: Organised in tables, rows, and columns, following

the ACID (Atomicity, Consistency, Isolation, Durability)

properties.

● NoSQL: Can adopt various data models, including document,

key-value, columnar, and graph.

Schema Flexibility:

● RDBMS: Schema-on-write. Requires predefined schemas

which can make alterations difficult.

● NoSQL: Schema-on-read. Offers dynamic schemas, allowing

for on-the-fly modifications.

Scalability:
● RDBMS: Generally scales vertically, which can become costly

and complex.

● NoSQL: Designed for horizontal scaling, making it suitable

for big data and high-velocity applications.

Consistency Model:

● RDBMS: Strong consistency.

● NoSQL: Offers eventual consistency, though some offer

tunable consistency levels.

Advantages of Choosing NoSQL Over Traditional RDBMS

● Scalability: Handling Massive Data Volumes with Ease:

NoSQL databases are designed from the ground up for

horizontal scalability. Distributing data across multiple nodes

or even across geographies is intrinsic to most NoSQL

architectures.

● Flexibility: Adapting to Dynamic Schema Changes: Unlike

traditional RDBMS where schema changes can be onerous,

NoSQL databases allow developers to modify schema

dynamically. This is especially useful when evolving an


application rapidly or when the data's nature is intrinsically

dynamic.

● Agility: Meeting the Demands of Rapid Development Cycles:

The agile methodologies of modern software development

demand tools that can iterate and adapt swiftly. NoSQL,

with its flexible schemas, offers a conducive environment for

rapid prototyping and iterations.

● Diverse Data Models: Going Beyond Tables and Rows:

RDBMS, bound by its tabular model, can sometimes be

limiting. NoSQL databases, with their variety (document

stores, graph databases, columnar stores, etc.), provide

alternatives that can be more suitable for specific use cases,

like hierarchical data or interconnected datasets.

2.2 Exploring the Different Facets of NoSQL Databases

Key-Value Stores: Simplified Data Storage

Characteristics and Strengths

● Simplicity: Key-Value stores are the most straightforward


NoSQL databases. They save data as a collection of key-value

pairs.

● Scalability: These databases are inherently scalable, making

it easy to distribute data across multiple nodes.

● Performance: Due to their simple design, key-value stores

often provide rapid data access.

● Flexibility: They allow developers to store any type of

serialised data.

Popular Key-Value Store Databases

● Redis: An in-memory data structure store used for caching

and real-time analytics.

● Amazon DynamoDB: A managed key-value and document

database service by Amazon Web Services.

● Riak: A highly scalable and distributed key-value store.

Document Stores: Storing Complex Data Structures

Delving into JSON, BSON, and More

● JSON (JavaScript Object Notation): A lightweight data-


interchange format, which is easy for humans to read and

write and easy for machines to parse and generate.

● BSON (Binary JSON): A binary-encoded serialisation of JSON-

like documents. MongoDB uses BSON to represent

document structures.

Leading Document Store Databases in the Market

● MongoDB: The most popular document-oriented database,

built on an architecture of collections and documents using

BSON.

● CouchDB: A database that uses JSON for documents,

JavaScript for indexing, and regular HTTP for its API.

● Elasticsearch: While it's primarily a search engine, it's built

on a document store principle.

Column Stores: Addressing Analytical Workloads

Design and Advantages

● Efficiency: Designed to read and write data using columns,

which can significantly improve performance for certain

queries.
● Scalability: They scale out by distributing columns of data

rather than rows.

● Compression: By storing data of the same type in each

column, they allow for more efficient data compression.

● Flexibility: Easily adjust to changing workloads and evolving

datasets.

Noteworthy Column Store Databases

● Cassandra: An open-source distributed database system

designed to handle vast amounts of data across many

commodity servers.

● HBase: Built on top of Hadoop, it's designed for massive

scalability.

● Google Bigtable: A proprietary service offered by Google

Cloud, underpins many of Google's core services.

Graph Databases: Navigating Relationships and Networks

Graph Theory Basics: Nodes, Edges, and Properties

● Nodes: Represent entities in the graph, like a person, a

business, or an event.
● Edges: Represent relationships between nodes. They can be

directed or undirected.

● Properties: Information that can be attached to nodes and

edges. For example, a node representing a person might

have properties like name, age, or email.

Top Graph Database Solutions Today

● Neo4j: A leading graph database platform that uses Cypher

Query Language for querying the database.

● Amazon Neptune: A managed graph database service by

Amazon Web Services.

● ArangoDB: A multi-model database that supports graph,

document, and key-value data models.

2.3 Tailoring the NoSQL Choice: How to Make the Right Decision

In Advanced Database Management Systems, it's imperative to

make the right choice of database technology that aligns with

specific business and technical requirements. NoSQL databases

have emerged as powerful tools for various use cases due to their
flexibility, scalability, and diversity. The choice of a NoSQL system

should be based on a set of criteria tailored to the unique needs

of each application or enterprise system.

Evaluating Business and Technical Requirements

Before diving into the specific features and capabilities of NoSQL

systems, it's critical to establish a clear understanding of both

business and technical needs.

● Data Complexity and Structure: What kind of data will be

stored? Is it structured, semi-structured, or unstructured?

The answers will influence whether you should choose a

document-based, key-value, column-family, or graph NoSQL

database.

● Data Volume: If expecting large volumes of data, ensure

that the NoSQL database can handle this efficiently without

performance degradation.

● Query Requirements: Some NoSQL databases excel in

supporting complex queries, while others prioritise write

operations. The nature of the anticipated queries can dictate


the optimal choice.

● Integration Needs: It's essential to consider how the NoSQL

system will fit with other elements of the tech stack, such as

data lakes, ETL processes, and analytics tools.

Performance, Latency, and Throughput Considerations

A system's ability to deliver high performance, low latency, and

high throughput is paramount, especially in real-time applications.

● Performance: Refers to the database's efficiency in

processing read and write operations. Depending on the use

case, some databases are optimised for write-heavy

applications, while others prioritise read operations.

● Latency: This is the delay between a user's action and a

system's response. Lower latency is preferable, especially in

interactive applications.

● Throughput: Represents the number of operations

processed within a given time frame. For high-demand

applications, you'd need a database that offers high

throughput capabilities.
Scalability and Distribution: Planning for Growth

As applications grow and demands evolve, so should the

database's ability to scale and distribute its operations.

● Horizontal vs. Vertical Scalability: While vertical scalability

refers to adding more resources (like CPU or RAM) to an

existing server, horizontal scalability involves adding more

servers to the system. NoSQL databases tend to favour

horizontal scalability.

● Distribution Mechanisms: NoSQL databases use techniques

such as sharding, replication, and partitioning to distribute

data across multiple servers, ensuring resilience and high

availability.

● Operational Complexity: While scaling is necessary, it's also

important to consider the operational overhead. How easy is

it to add or remove nodes? Is there a significant downtime

involved?

Consistency, Availability, and Partition Tolerance: The CAP

Theorem
The CAP theorem, proposed by Eric Brewer, is fundamental in the

database world. It states that in any distributed system, only two

out of the following three guarantees can be fully achieved:

● Consistency: Every read will return the most recent write. All

nodes have the same data.

● Availability: Every request (either read or write) will return a

response, without the guarantee that it contains the most

recent write.

● Partition Tolerance: The system continues to function even

when communication breaks down between nodes.

2.4 NoSQL's Vital Role in the Big Data Revolution

The advent of the Big Data revolution introduced new challenges

and opportunities in the realm of data management. With the

increasing volume, velocity, and variety of data, conventional

relational database systems began to show their limitations. This

ushered in the era of NoSQL databases, offering a more scalable,

flexible, and varied approach to managing vast quantities of


diverse data.

The Confluence of NoSQL and Big Data: A Synergistic

Relationship

● Synergy: NoSQL databases emerged as the natural

counterpart to Big Data due to their inherent ability to

handle vast quantities of unstructured or semi-structured

data. Big Data technologies like Hadoop and Spark, when

integrated with NoSQL, provide comprehensive solutions for

data processing and analytics.

● Flexibility: NoSQL databases, unlike their relational

counterparts, do not rely on fixed schemas. This allows them

to adapt more readily to the unpredictable and varied

nature of Big Data.

Handling Velocity, Volume, and Variety with NoSQL

● Velocity: One of the primary features of Big Data is the rapid

rate at which it accumulates. NoSQL databases, especially

those of the key-value and wide-column types, are

particularly well-suited to handle high-speed data ingestion.


● Volume: NoSQL databases can scale horizontally, meaning

they can expand across multiple nodes or clusters. This

capability makes them adept at managing vast amounts of

data, characteristic of the Big Data paradigm.

● Variety: Given that Big Data can be structured, semi-

structured, or unstructured, NoSQL databases, with their

flexible data models, are poised to manage this varied data.

Challenges and Solutions in NoSQL Big Data Management

Overcoming Initial Integration Hurdles

● Mismatch of Paradigms: The shift from relational to NoSQL

often requires a rethinking of data models and application

architectures.

● Training sessions, proof-of-concept implementations, and

using middleware can help bridge the gap between the two

paradigms.

Ensuring Data Consistency in a Distributed Environment

● As NoSQL databases scale out, ensuring consistency across

all nodes becomes complex.


● Techniques like eventual consistency, tunable consistency

levels, and quorum-based approaches can help address

these challenges.

Addressing Security Concerns in NoSQL Architectures

● NoSQL databases, being relatively newer, might not have as

mature security features as traditional databases.

● Employing role-based access control, encryption (both in-

transit and at-rest), and regular audits can help bolster

security in NoSQL implementations.

Potential Solutions and Best Practices for Common Pitfalls

● Data Modeling: Unlike relational databases, NoSQL

databases require a different approach to data modelling.

Nested structures, denormalization, and understanding

access patterns are key.

● Performance Tuning: As with any system, understanding the

specific characteristics and performance metrics of your

NoSQL database is vital. Regularly monitor and adjust

configurations as necessary.
● Backup and Recovery: Ensure robust backup strategies,

considering the distributed nature of NoSQL databases.

Techniques such as sharding can also impact backup

strategies.

2.5 Summary

❖ NoSQL databases are non-relational systems that allow for

high-performance, agile processing of information at

massive scale. They differ fundamentally from traditional

RDBMS in their data models.

❖ NoSQL databases offer greater scalability, flexibility in data

modelling, faster development cycles, and the ability to

handle a variety of data types, compared to traditional

relational databases.

❖ Simple databases that store data as a key-pair combination.

Store data in documents, usually JSON-like formats.

Optimised for operations over columns and suited for

analytics. Designed to handle data about interconnected


entities.

❖ When choosing a NoSQL database, considerations include

specific business needs, data volume, desired performance

metrics, and the type of data being dealt with (e.g.,

interconnected data vs. vast amounts of simple data).

❖ NoSQL databases play a pivotal role in the big data paradigm,

handling the three Vs (Volume, Velocity, Variety) efficiently,

which are often challenges in traditional databases.

❖ Integration, ensuring consistency across distributed data

systems, and security are some challenges in using NoSQL

for big data. However, best practices and solutions are

emerging to address these.

2.6 Keywords

● NoSQL: NoSQL, which stands for "Not Only SQL", refers to a

class of database management systems that deviate from

the traditional relational database (RDBMS) structure. They

are typically designed to handle unstructured data, provide


scalability, and allow for more flexible schema definitions.

NoSQL databases are especially useful when dealing with

large volumes of rapidly changing, diverse data.

● CAP Theorem: This theorem outlines the trade-offs between

consistency, availability, and partition tolerance in

distributed systems. Specifically, it posits that it's impossible

for a distributed data store to simultaneously provide all

three guarantees. NoSQL databases often make

compromises based on the CAP theorem to best suit their

intended use cases.

● Document Store: This type of NoSQL database is designed to

store, retrieve, and manage document-oriented information.

Each document is typically stored as a JSON or BSON object,

and can contain various fields and structures, offering more

flexibility than a traditional row/column model in RDBMS.

● Graph Database: Graph databases use graph structures

(comprising nodes, edges, and properties) to represent and


store data. They excel in scenarios where relationships

between data points are as crucial as the data itself, like

social networks or recommendation engines.

● Key-Value Store: A simplistic type of NoSQL database where

every single item is stored as an attribute name (or 'key'),

together with its value. Examples include Redis and

DynamoDB. They excel in use cases where quick, simple

access to data is required, often without the need for

complex querying.

● Column Store (or Columnar Database): Unlike traditional

relational databases where data is stored in rows, columnar

databases store data tables as columns. This arrangement is

especially beneficial for analytical query scenarios where

operations are often performed over a subset of data within

a table. Examples of this kind of database include Apache

Cassandra and HBase.

2.7 Self-Assessment Questions

1. How do NoSQL databases differ from traditional Relational


Database Management Systems (RDBMS) in terms of

scalability and flexibility?

2. What are the primary advantages of using a Document Store

over a Key-Value Store in terms of data structure complexity?

3. Which of the following best describes the CAP Theorem? a.

It outlines the relationship between CPU, RAM, and Disk

Space. b. It states that a distributed computer system can

achieve at most two out of three of Consistency, Availability,

and Partition tolerance. c. It is a theorem describing the

relationship between nodes, edges, and properties in a

graph database. d. It highlights the fundamental principles

of storing data in columns rather than rows.

4. What are the main considerations when choosing between

different NoSQL databases for a specific business or

technical requirement?

5. Which NoSQL database type is best suited for mapping

intricate relationships and understanding complex network

structures?
2.8 Case Study

Title: Transforming Bhoomi's E-Governance with NoSQL

Introduction:

Bhoomi, a flagship project by the Government of Karnataka, India,

was initiated to digitise land records, making them easily

accessible to the public. Since its inception in 2000, it managed

millions of records related to land ownership, crop details, and

loan statuses. By 2015, Bhoomi faced significant challenges in

terms of latency, scalability, and the ability to integrate with other

e-governance modules.

Background:

With a rising population and rapid urbanisation, there was an

exponential growth in the number of users accessing Bhoomi. The

relational database system, which had initially been adequate,

began to show strains, primarily in terms of querying complex

relationships and scaling horizontally.

The government sought to revamp the Bhoomi system. After

careful consideration, the decision was made to transition to a


NoSQL database system. The flexibility in schema, ability to

handle large volumes of structured and unstructured data, and

efficient horizontal scaling made NoSQL an ideal choice.

Post-migration, Bhoomi integrated seamlessly with other digital

initiatives like Aadhaar, the nation's biometric identification

system. By leveraging a graph-based NoSQL system, Bhoomi was

able to illustrate intricate land ownership networks, mortgage

associations, and even crop rotation patterns over time.

The outcome? Users experienced a dramatic decrease in query

times, and the government had a robust, scalable system capable

of handling diverse datasets. Moreover, by integrating with

Aadhaar, Bhoomi enhanced its data security and validation

procedures, ensuring that land record manipulations and frauds

decreased substantially. This initiative reinforced the power of

NoSQL in transforming traditional systems to handle modern-day

challenges.

Questions:

1. What challenges did the Bhoomi project face with its initial
relational database system?

2. How did the migration to a NoSQL database benefit the

Bhoomi project in terms of data integration with other

digital initiatives?

3. In what ways did the NoSQL system enhance Bhoomi's

capability in representing and analysing complex land-

related relationships and patterns?

2.9 References

1. "NoSQL Distilled: A Brief Guide to the Emerging World of

Polyglot Persistence" by Pramod J. Sadalage and Martin

Fowler.

2. "Data-Intensive Text Processing with MapReduce" by Jimmy

Lin and Chris Dyer.

3. "Graph Databases" by Ian Robinson, Jim Webber, and Emil

Eifrém.

4. "Cassandra: The Definitive Guide" by Jeff Carpenter and

Eben Hewitt.

5. "Designing Data-Intensive Applications: The Big Ideas Behind


Reliable, Scalable, and Maintainable Systems" by Martin

Kleppmann.

You might also like