0% found this document useful (0 votes)
72 views

Nosql Column-Family Stores

Column-family stores are NoSQL databases that store data in columns grouped by key-value mappings. Cassandra is an example of a column-family store that stores data across multiple nodes for high availability and scalability. It uses a column-family data model and provides tunable consistency, availability and partitioning. Cassandra is suitable for applications requiring high write performance and the ability to scale out by adding nodes.

Uploaded by

nguyentthai96
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
72 views

Nosql Column-Family Stores

Column-family stores are NoSQL databases that store data in columns grouped by key-value mappings. Cassandra is an example of a column-family store that stores data across multiple nodes for high availability and scalability. It uses a column-family data model and provides tunable consistency, availability and partitioning. Cassandra is suitable for applications requiring high write performance and the ability to scale out by adding nodes.

Uploaded by

nguyentthai96
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 30

BỘ GIÁO DỤC VÀ ĐÀO TẠO​

TRƯỜNG ĐẠI HỌC KHOA HỌC TỰ NHIÊN TP.HCM​


KHOA CÔNG NGHỆ THÔNG TIN​

NoSQL
Column-Family Stores
Báo cáo môn Các hệ cơ sở dữ liệu nâng cao 
NoSQL - Not Only SQL GVHD: Ts. Nguyễn Trần Minh Thư
Nhóm 07: 
1. 19C11015 - Đỗ Huy Gia Cát
2. 21C12003 - Đào Thanh Danh
3. 21C11026 - Nguyễn Thành Thái
1
CONTENTS

• Column-Family Stores NoSQL


o Overview
o Column-Family Databases
• Cassandra's Structure and Features
• Compare Colum-Family Data Store with others
• Query features 
• Expand analyse
• Scaling
• Some compare Cassandra and HBase
• Apply suitable usecases
2
Introduction

3
Wide Column / Column Family Database
• Column-family stores are databases in which data is stored by key-
value mapping and values group into multiple column families, with
each being a map of data
• Keyword comparison between RDBMS and Cassandra
RDBMS Cassandra
Database instance Cluster
Database Keyspace
Table Column Family
Row Row
Column (same for all rows) Column (can be different per row)

4
Column Family Database

6
Column Family Unit Structure Storage
• Column: the basic storage unit,
consist of a name-value pair with
the name also acts as the key, and
stored with a timestamp value

• Super column: column


whose value is a map of columns

7
Column Family Unit Structure Storage
• Standard column family: column
family where all columns are
simple columns

• Super column family: column


family where exists at least one
super column

8
Cassandra's Features
• Consistency
• Transactions
• Availability
• Scaling

9
Consistency
• Cassandra stores replicas on multiple nodes to ensure reliability
• Cassandra provides three consistency levels:
ONE: Only need one of the nodes to respond to the request, good for
high write performance requirements
QUOROM: Ensures that majority of the node respond to the request
ALL: All nodes will have to respond to the requests
• If a node is down, the data will be stored later when it comes back via
hints (hinted handoff) or repair command.

10
Transactions
• In Cassandra, transactions are atomic and isolated
• Atomic: inserted or updating columns in a row is treated as a write
operation
• Isolation: writes to a row are isolated to client and not visible to other
uses until completion

11
Availability
• Availability is governed by the formula
(R + W) > N
R, W: minimum number of nodes read/write request is successfully
responded; N: number of replicas of data
• Keyspaces should be set up depending on your need – higher
availability for read or write

12
Scaling
• Cassandra handles scaling by adding additional nodes to the cluster
• Allows clusters to be scaled on the fly without operations => maxium
uptime

13
Database - Open-source NoSQL - Column Family
- Store data no relationship on column-family model
Scalability - Scalabilitiable by increasing nodes
Replication - Replica data on multi node

15
Infrastructure Design independence, can integrate Base on Hadoop, can integrate with
with DBMS other and Storm, Hadoop Zookeeper
HBase master, HBase data node,
name node

Support Support ordered partitioning Not-support ordered partitioning

Node Multi seed node in clusster Node master monitoring/coordinator


nodes
Query language Cassandra Query Language – CQL Only support HBase shell
Cassandra Query Language Shell -
CQLSH

16
17
18
Basic Queries CQL
• Cassandra Query Language • Only support HBase Shell
• Cassandra Query Language Shell - • Apache Phoenix -> Query Engine
CQLSH https://data-flair.training/blogs/hba
se-shell-commands/
 

19
Cassandra Query Language
• CREATE KEYSPACE <identifier> WITH <properties>
• CREATE KEYSPACE videodb WITH REPLICATION =
{ 'class' : 'SimpleStrategy', 'replication_factor' : 1 };

SimpleStrategy
NetworkTopologyStrategy

20
Cassandra Query Language
CREATE TABLE video_rating
CREATE (TABLE | COLUMNFAMILY)
<tablename> (

('<column-definition>' , '<column-definition>') videoid uuid,


(WITH <option> AND <option>) rating_counter counter,
rating_total counter,
USE videodb; PRIMARY KEY (videoid)
CREATE TABLE videos );
(
videoid uuid,
CREATE TABLE video_event
videoname varchar,
(
username varchar,
videoid uuid,
description varchar,
username varchar,
location map<varchar,varchar>,
event varchar,
tags set<varchar>,
event_timestamp timeuuid,
upload_date timestamp,
video_timestamp bigint,
PRIMARY KEY (videoid)
PRIMARY KEY ((videoid, username), event_timestamp, event)
);
) WITH CLUSTERING ORDER BY (event_timestamp DESC, event ASC); 21
Cassandra Query Language
• Built-In Data Type: boolean, int, bigint, variant, float, double, decimal,
ascii, varchar, text, timestamp, blob, inet, timeuuid, uuid,…
• Collection Data Type: LIST, SET, MAP
• User-Defined Data Type

22
Cassandra Query Language
• User-Defined Data Type
CREATE TYPE <keyspace>.<data type>
(variable,variable)

CREATE TYPE records (


name text,
branch text,
phone int,
city text,
id set<int>
);

23
Cassandra Query Language
SELECT Clause, WHERE Clause & ORDERBY

INSERT INTO <table name>


(<field name 1>,<field name 2>,<field name 3>.,...)
VALUES ('value 1','value 2','value 3',....)
USING <update parameter>;

UPDATE <table name> USING <update parameter>


SET <field name 1>=< value 1>,
< field name 2>=< value 2>,
< field name 3>=<value 3>, .....
WHERE <field>=<value>;

24
Cassandra Query Language
DELETE <table name>
USING <update parameter>
WHERE <identifier>

BEGIN BATCH
//different data manipulation command syntax -> INSERT, UPDATE // DELETE
APPLY BATCH;

25
Cassandra Query Language
• Advanced Queries and Indexing

• CREATE INDEX <field name> ON <table name>​


Indexes are implemented as bit-mapped indexes and perform well for
low-cardinality column values.

• USE, CREATE, ALTER, DROP, TRUNCATE,...

26
Stores writing

• Memory space - memtable


• Disk store SSTable

27
Retried reading

• Memory space - memtable


• Disk store SSTable

28
Suitable Use Cases
• A great choice to store event information, such as application state or errors
encountered by the application

• Content Management Systems, Blogging Platforms


=> store blog entries with tags, categories, links, and trackbacks
• Count and categorize visitors of a page to calculate analytics
• Data for specific time -> as ad banners on a website
29
When Not to Use
• Systems that require ACID transactions for writes and reads
• The database to aggregate the data using queries (such as SUM or
AVG)
• Sample product prototypes or initial tech spikes

30
31

Conclusion
32

You might also like