0% found this document useful (0 votes)

16 views

BD unit 1

The document covers various aspects of big data applications and technologies, including data storage, mining, analytics, and visualization tools like Apache Hadoop, MongoDB, and Tableau. It also introduces NoSQL databases, highlighting their flexibility, scalability, and types, such as key-value and document-oriented stores. Additionally, it discusses the '3 Vs' of big data (volume, velocity, variety) and the CAP theorem related to distributed databases.

Uploaded by

SUJITHA M

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views

BD unit 1

Uploaded by

SUJITHA M

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

1)BIG DATA APPLICATIONS (Padeepz app)

2)Big data Technologies (Padeepz app)

+
Big Data Technologies and Tools

1. Data Storage

This involves storing and organizing large amounts of data in a way that makes it easy to
access and manage. Two widely used tools are:

 Apache Hadoop:
o It's open-source software that helps store and process big data by dividing the
work across multiple computers (clusters).
o It works quickly and handles different types of data.
 MongoDB:
o A database designed to handle massive amounts of unstructured data, like text
aor images.
o It uses "key-value pairs" (like labels and their values) to organize information.

2. Data Mining

This is about extracting useful patterns, trends, or information from raw data. Two common
tools for this are:

 RapidMiner:
o It helps process data and create machine learning models to predict outcomes
(like predicting future sales).
o Combines data preparation and advanced analytics in one platform.
 Presto:
o Originally created by Facebook, Presto is a tool to run queries on large
datasets from different sources (like databases or cloud storage).
o It provides results very quickly, combining data efficiently.

3. Data Analytics

Here, tools are used to analyze and make sense of the data to support business decisions.
Examples include:

 Apache Spark:
o A fast tool for analyzing big data.
o Unlike Hadoop, it processes data in memory (RAM) instead of relying on
slower storage methods.
o
It handles complex analytics tasks with speed.
 Splunk:
o A tool to find insights and trends in large datasets.
o It creates visualizations like charts, dashboards, and reports, and supports AI
integration to enhance data analysis.

4. Data Visualization

This step involves creating visual representations (like graphs or dashboards) to make the
data insights easy to understand for decision-makers.

 Tableau:
o A popular tool with a simple drag-and-drop feature for creating graphs, pie
charts, and dashboards.
o Visualizations can be securely shared in real-time.
 Looker:
o A business intelligence tool that simplifies sharing data insights with others.
o It allows teams to monitor and track important metrics, like social media
performance.

3) Vs of Big Data
Big data definitions may vary slightly, but it will always be described in terms
of volume, velocity, and variety. These big data characteristics are often
referred to as the “3 Vs of big data” and were first defined by Gartner in 2001.

1. Volume

As its name suggests, the most common characteristic associated with big data
is its high volume. This describes the enormous amount of data that is available
for collection and produced from a variety of sources and devices on a
continuous basis.

2. Velocity

Big data velocity refers to the speed at which data is generated. Today, data is
often produced in real-time or near real-time, and therefore, it must also be
processed, accessed, and analyzed at the same rate to have any meaningful
impact.
3. Variety

Data is heterogeneous, meaning it can come from many different sources and
can be structured, unstructured, or semi-structured. More traditional structured
data (such as data in spreadsheets or relational databases) is now supplemented
by unstructured text, images, audio, video files, or semi-structured formats like
sensor data that can’t be organized in a fixed data schema.

**In addition to these three original Vs, three others are often mentioned in
relation to harnessing the power of big data: veracity, variability, and value**.

4. Veracity

Big data can be messy, noisy, and error-prone, which makes it difficult to
control the quality and accuracy of the data. Large datasets can be unwieldy and
confusing, while smaller datasets could present an incomplete picture. The
higher the veracity of the data, the more trustworthy it is.

5. Variability

The meaning of collected data is constantly changing, which can lead to

inconsistency over time. These shifts include not only changes in context and
interpretation but also data collection methods based on the information that
companies want to capture and analyze.

6. Value

It’s essential to determine the business value of the data you collect. Big data
must contain the right data and then be effectively analyzed in order to yield
insights that can help drive decision-making.

4)Crowdsourcing (note)

5)Web Analytics (Padeepz app)

6)Colud and Big data (pdf) + diff b/w them (note)

7)Mobile BI (padeepz)

8) inter and trans firewall analytics.(pdf)

UNIT 2

Introduction to NoSQL

What is NoSQL?

NoSQL stands for "Not Only SQL", meaning it is not limited to traditional relational
databases. It is designed to handle huge amounts of data that relational databases struggle
with. NoSQL databases are schema-free and non-relational, making them more flexible for
different data structures.

Most NoSQL databases are open-source and distributed, meaning data is copied across
multiple servers (either local or remote). If one server goes offline, the system continues to
run without losing data access.

Unlike traditional databases, NoSQL databases do not follow strict consistency rules, making
them faster and more scalable.

Why NoSQL?

NoSQL databases are useful due to modern business data challenges:

1. Volume & Velocity – Handles large, fast-growing datasets.

2. Variability – Supports diverse data types that do not fit into structured tables.
3. Agility – Adapts quickly to business changes.

Key Features of NoSQL

 Works on multiple processors and low-cost hardware.

 Supports linear scalability – adding more processors increases performance.
 Designed for big data processing used by companies like Facebook, Twitter, and Google.

Types of NoSQL Databases

There are four main types:

1. Key-Value Stores – Data is stored in a hash table format.

o Example: Redis, Amazon DynamoDB
2. Document-Oriented Stores – Data is stored in documents (mostly JSON format).
o Example: MongoDB, CouchDB
3. Column-Oriented Stores – Stores data in columns instead of rows, optimizing
performance for big data.
o Example: Cassandra, HBase
4. Graph Databases – Stores relationships between data points, often used in social
networks.
o Example: Neo4j, AllegroGraph
Examples

CouchDB – A JSON-based document database using JavaScript for queries.

 Elasticsearch – A document database with a powerful search engine.

 Couchbase – A key-value and document store used for cloud applications.

Advantages of NoSQL:

 Can store different data types easily.

 Works efficiently with big data.
 Cost-effective – Many NoSQL databases are open-source and run on cheap hardware.

Disadvantages of NoSQL:

 Does not have built-in consistency like relational databases.

 Developers may need to write extra code for reliability.

CAP Theorem (Trade-Off in Distributed Databases)

The CAP theorem states that a database can guarantee only two of the following three
properties:

1. Consistency – Every request returns the most recent data (but may require waiting).
2. Availability – Every request gets a response, but it may not be the latest data.
3. Partition Tolerance – The system remains operational even if some network failures occur.

Since NoSQL databases are distributed, they must choose between Consistency and
Availability while ensuring Partition Tolerance.

Big Data Unit 1 Notes
100% (1)
Big Data Unit 1 Notes
27 pages
cp5293 Big Data Analytics Question Bank
0% (1)
cp5293 Big Data Analytics Question Bank
13 pages
2014 Smart Card cloner User's Manual V3.0: 1、Equipment introduction
No ratings yet
2014 Smart Card cloner User's Manual V3.0: 1、Equipment introduction
2 pages
BD unit 1
No ratings yet
BD unit 1
3 pages
BDA notes part 1
No ratings yet
BDA notes part 1
11 pages
BD 1
No ratings yet
BD 1
15 pages
Big Data Analysis by deshbandhu
No ratings yet
Big Data Analysis by deshbandhu
368 pages
3 Assignment
No ratings yet
3 Assignment
5 pages
Big Data
No ratings yet
Big Data
190 pages
BIG DATA Notes
No ratings yet
BIG DATA Notes
11 pages
Chapter 1
No ratings yet
Chapter 1
21 pages
IMP Questions pdf in Big Data
No ratings yet
IMP Questions pdf in Big Data
15 pages
Unit 1 Big Data Analytics Full
No ratings yet
Unit 1 Big Data Analytics Full
29 pages
Sem Csen1301
No ratings yet
Sem Csen1301
12 pages
Unit 1
No ratings yet
Unit 1
18 pages
(15) Big Data
No ratings yet
(15) Big Data
10 pages
CS8091 LN
No ratings yet
CS8091 LN
68 pages
Module 1. 16974328175990
No ratings yet
Module 1. 16974328175990
119 pages
big data notes
No ratings yet
big data notes
89 pages
Big Data
No ratings yet
Big Data
18 pages
Introduction To Bda
No ratings yet
Introduction To Bda
67 pages
Unit 1_BDS_DS307
No ratings yet
Unit 1_BDS_DS307
47 pages
Ethiopin Tecica University Departement of Ict Cours Title: Big Data
No ratings yet
Ethiopin Tecica University Departement of Ict Cours Title: Big Data
15 pages
UNIT-1:Overview of Big Data
No ratings yet
UNIT-1:Overview of Big Data
10 pages
BIGDATAUNIT1AKTUpdf
No ratings yet
BIGDATAUNIT1AKTUpdf
33 pages
Big Data Unit 1 Notes
No ratings yet
Big Data Unit 1 Notes
20 pages
Report On Bigdata
No ratings yet
Report On Bigdata
3 pages
BDA Notes
No ratings yet
BDA Notes
96 pages
BDA 01 - Introduction
No ratings yet
BDA 01 - Introduction
43 pages
BA ppt
No ratings yet
BA ppt
17 pages
BIG DATA_UNIT-I
No ratings yet
BIG DATA_UNIT-I
17 pages
unit-1-big-data-notes
No ratings yet
unit-1-big-data-notes
40 pages
unit 1 b tech 3 year bd
No ratings yet
unit 1 b tech 3 year bd
10 pages
Big Data and Hadoop Self Notes
No ratings yet
Big Data and Hadoop Self Notes
16 pages
PPT 1.1.2
No ratings yet
PPT 1.1.2
17 pages
239700a5-6c7a-43c1-810e-687c652d046e
No ratings yet
239700a5-6c7a-43c1-810e-687c652d046e
14 pages
Big Data-Introduction
No ratings yet
Big Data-Introduction
14 pages
ESE_BDA
No ratings yet
ESE_BDA
28 pages
Big Data
No ratings yet
Big Data
16 pages
Big Data Analytics
No ratings yet
Big Data Analytics
21 pages
Big Data (Analytics) in Power Systems
No ratings yet
Big Data (Analytics) in Power Systems
20 pages
DBMS Unit1
No ratings yet
DBMS Unit1
30 pages
BDAV Question Bank Solution
No ratings yet
BDAV Question Bank Solution
63 pages
Unit 1 Big Data Notes
No ratings yet
Unit 1 Big Data Notes
40 pages
Unit I
No ratings yet
Unit I
64 pages
bigdata
No ratings yet
bigdata
15 pages
Now To Be Data
No ratings yet
Now To Be Data
16 pages
Bigdata
No ratings yet
Bigdata
7 pages
Module 6_Big Data and NOSQL
No ratings yet
Module 6_Big Data and NOSQL
63 pages
Big Data Ashish
No ratings yet
Big Data Ashish
7 pages
Big Data With Cloud Computing Discussions and Challenges
No ratings yet
Big Data With Cloud Computing Discussions and Challenges
9 pages
Unit-1.1-Introduction To Big Data
No ratings yet
Unit-1.1-Introduction To Big Data
50 pages
Seminar Report Alisha
No ratings yet
Seminar Report Alisha
22 pages
Big Data Analytics M1
No ratings yet
Big Data Analytics M1
27 pages
BIG data1
No ratings yet
BIG data1
49 pages
Cp5293 Big Data Analytics Question Bank
0% (1)
Cp5293 Big Data Analytics Question Bank
13 pages
BDA U1 copy
No ratings yet
BDA U1 copy
78 pages
New Microsoft Word Document
No ratings yet
New Microsoft Word Document
4 pages
Introduction To Big Data Platform
No ratings yet
Introduction To Big Data Platform
20 pages
Data Analytics
No ratings yet
Data Analytics
69 pages
The Power of Big Data: Transforming Industries and Shaping the Future
From Everand
The Power of Big Data: Transforming Industries and Shaping the Future
Tom Henricksen
No ratings yet
ba unit 4 UA
No ratings yet
ba unit 4 UA
19 pages
CCS369 Two Marks
No ratings yet
CCS369 Two Marks
9 pages
unit 5 UA
No ratings yet
unit 5 UA
19 pages
Medical Imaging Techniques - Hca
No ratings yet
Medical Imaging Techniques - Hca
3 pages
Embedd Iat
No ratings yet
Embedd Iat
6 pages
ba unit 1 UA
No ratings yet
ba unit 1 UA
13 pages
vm security attacks and real case studies
No ratings yet
vm security attacks and real case studies
4 pages
ba unit 4
No ratings yet
ba unit 4
13 pages
EXERCISE-2
No ratings yet
EXERCISE-2
2 pages
UNIT-5 (1)
No ratings yet
UNIT-5 (1)
50 pages
obt unit 3 IAT2
No ratings yet
obt unit 3 IAT2
3 pages
ba unit 3 own (1)
No ratings yet
ba unit 3 own (1)
7 pages
exe 10
No ratings yet
exe 10
10 pages
internn ppt
No ratings yet
internn ppt
9 pages
HCA-1
No ratings yet
HCA-1
71 pages
Big data Unit 4 own
No ratings yet
Big data Unit 4 own
18 pages
EXERCISE-1
No ratings yet
EXERCISE-1
3 pages
Exercise 4
No ratings yet
Exercise 4
2 pages
BA unit2 own
No ratings yet
BA unit2 own
10 pages
exercise 1 changes
No ratings yet
exercise 1 changes
3 pages
UNIT II
No ratings yet
UNIT II
59 pages
CloudComputing Unit 3
No ratings yet
CloudComputing Unit 3
31 pages
migration no sql
No ratings yet
migration no sql
4 pages
cc,IAM design challengs
No ratings yet
cc,IAM design challengs
3 pages
Unit 3 - Desktop, Network, Storage Virtualization
No ratings yet
Unit 3 - Desktop, Network, Storage Virtualization
8 pages
cc unit 5 own notes
No ratings yet
cc unit 5 own notes
13 pages
cc unit 3 (virtual clusters and resource management) (1)
No ratings yet
cc unit 3 (virtual clusters and resource management) (1)
3 pages
Civil Draftsman AutoCAD 6 Months FINAL
No ratings yet
Civil Draftsman AutoCAD 6 Months FINAL
26 pages
Connie Miller Resume 2014
No ratings yet
Connie Miller Resume 2014
2 pages
Fop 1.3 (22MSM40017)
No ratings yet
Fop 1.3 (22MSM40017)
10 pages
Excel FFT
No ratings yet
Excel FFT
5 pages
+030221491 GB
No ratings yet
+030221491 GB
64 pages
Group Assignment SYD
No ratings yet
Group Assignment SYD
3 pages
Kaleido Alto HD - en PDF
No ratings yet
Kaleido Alto HD - en PDF
5 pages
Data Management Nuts and Bolts
No ratings yet
Data Management Nuts and Bolts
21 pages
PDF
No ratings yet
PDF
1 page
Sim Unlocker Pro: Samsung Alcatel Wiko Coolpad Wingtech Zte - Huawei - Xiaomi Aio, Att, CCT, TMK, TMB, VZW, TFN Credits
No ratings yet
Sim Unlocker Pro: Samsung Alcatel Wiko Coolpad Wingtech Zte - Huawei - Xiaomi Aio, Att, CCT, TMK, TMB, VZW, TFN Credits
6 pages
Software Project Management - SupJuly 2023
No ratings yet
Software Project Management - SupJuly 2023
1 page
Case Study Imperium PDF
No ratings yet
Case Study Imperium PDF
14 pages
Trading Full Gold
No ratings yet
Trading Full Gold
6 pages
XGL C42B
No ratings yet
XGL C42B
144 pages
PRA_Iris_User_Manual_Version4
No ratings yet
PRA_Iris_User_Manual_Version4
54 pages
Front End Development Roadmap
No ratings yet
Front End Development Roadmap
9 pages
Unifi: System Administrator's Guide
No ratings yet
Unifi: System Administrator's Guide
116 pages
Lambda DG
No ratings yet
Lambda DG
487 pages
MCSA Assaignment
No ratings yet
MCSA Assaignment
4 pages
CSE MINI PROJECT Report
No ratings yet
CSE MINI PROJECT Report
14 pages
ARM Cortex-M Technologies
No ratings yet
ARM Cortex-M Technologies
4 pages
Ebook - Web Programming Unleashed
No ratings yet
Ebook - Web Programming Unleashed
1,121 pages
Chameleon Chips Full Report
No ratings yet
Chameleon Chips Full Report
24 pages
Jeeevee Final Reporttt
No ratings yet
Jeeevee Final Reporttt
31 pages
Storehouse For Textile Industry
No ratings yet
Storehouse For Textile Industry
16 pages
ISA-101.01-2015 - Human Machine Interfaces For Process Automation Systems
100% (1)
ISA-101.01-2015 - Human Machine Interfaces For Process Automation Systems
56 pages
Comandos Switch
No ratings yet
Comandos Switch
3 pages
Oh The Microservices You LL Build Learn Microservices From Zero To Hero
No ratings yet
Oh The Microservices You LL Build Learn Microservices From Zero To Hero
13 pages
Install GCC 4.7 On RHEL 6
No ratings yet
Install GCC 4.7 On RHEL 6
3 pages

BD unit 1

Uploaded by

BD unit 1

Uploaded by

1)BIG DATA APPLICATIONS (Padeepz app)

2)Big data Technologies (Padeepz app)

The meaning of collected data is constantly changing, which can lead to

5)Web Analytics (Padeepz app)

6)Colud and Big data (pdf) + diff b/w them (note)

8) inter and trans firewall analytics.(pdf)

NoSQL databases are useful due to modern business data challenges:

1. Volume & Velocity – Handles large, fast-growing datasets.

Key Features of NoSQL

 Works on multiple processors and low-cost hardware.

Types of NoSQL Databases

There are four main types:

1. Key-Value Stores – Data is stored in a hash table format.

CouchDB – A JSON-based document database using JavaScript for queries.

 Elasticsearch – A document database with a powerful search engine.

 Can store different data types easily.

 Does not have built-in consistency like relational databases.

CAP Theorem (Trade-Off in Distributed Databases)

You might also like