0% found this document useful (0 votes)
18 views3 pages

Ocs352-Iot Book-69-71

Ocs

Uploaded by

gopaldhanu608
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views3 pages

Ocs352-Iot Book-69-71

Ocs

Uploaded by

gopaldhanu608
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

188 Internet of Things: Architecture and Design Principles

5.5.6 Big Data Analytics


Big data is multistructured data while RDMS maintain more structured data. The open
source software Hadoop and MapReduce are from Apache Software. They enable storage
and analyse the massive amounts of data. Hadoop File System (HDFS), Mahout, a library
of machine learning algorithms and HiveQ, a SQL like scripting language software are
used for Big data analytics in the Hadoop ecosystem. MapReduce is a programming model
and a core of Hadoop. Large data sets process onto a cluster of nodes using MapReduce.
Same node runs the algorithm using the data sets at HDFS and processing is at that node
itself.
Hadoop is an open-source framework. The framework stores and processes big data.
The clusters of computing nodes process that data using simple programming models.
Processing takes place in a distributed environment. The framework scales up from single
server to thousands of processing machines and servers, each offering environment of local
storage and processing. Hadoop accesses data in sequential manner and performs batch
processing. A new data set results from input data set that also processes sequentially.
Data Acquiring, Organising, Processing and Analytics 189

HBase is an example of columnar format data storage which enables read or write
access in real time for very large tables distributed in Hadoop File System (HDFS). HBase
is database for big data. Data access is random access. Therefore, it provides fast look-up
from large tables and access latency is small. HBase uses big hash tables. HBase can be
considered similar to Google’s BigTable.

5.5.7 Data Analytics Architecture and Stack


Analytics architecture consists of the following layers:
● Data sources layer

● Data storage and processing layer

● Data access and query processing layer

● Data services, reporting and advanced analytics layer

Figure 5.4 shows an overview of a reference model for analytics architecture. Figure 5.4
also shows on the right-hand side the layers in the reference model.

Services, Reporting, Data Visualisations, OLAP, Analytics


Advance Analytics (Predictive/Prescriptive Analytics) Applications

Data Access, SQL, Query Processing, OLTP, ETL, Analytics


R-Descriptive Statistics, In-Memory or On-Store Applications
Database Processing, MapReduce and Others Support
Applications Support Layer

Organised
Traditional DataStore/Data Warehouse
Data Store
Event Stream Processing
Layer
Complex Event Processing

Sources
IoT/M2M Data Sources Acquiring
Enterprise Data Sources Data
External Data Sources

Figure 5.4 Analytics Architecture Reference Model

Analytical sandbox means analytics tools and analytics environment for predictive
analytics on multistructured data. Mesos v0.9 is a resources management platform which
enable multiple frameworks sharing of cluster of nodes and which is compatible with
open analytics stack [data processing (Hive, Hadoop, HBase, Storm), data management
(HDFS)].
190 Internet of Things: Architecture and Design Principles

Berkeley Data Analytics Stack (BDAS) consists of data processing, data management
and resource management layers.
Applications, AMP-Genomics and Carat run at the BDAS. Data processing software
component provides in-memory processing which processes the data efficiently across the
frameworks. AMP stands for Berkeley’s Algorithms, Machines and Peoples Laboratory.
Data processing combines batch, streaming and interactive computations.
Resource management software component provides for sharing the infrastructure
across the frameworks.
Figure 5.5 shows an overview of BDAS architecture which is a reference model for
analytics architecture. Figure 5.5 also shows on right-hand side the file system, library of
machine learning algorithms and SQL like scripting language software for the Big data
analytics in Hadoop ecosystem.

Mahout
Business Distributed and
Services, Reporting, Data Visualisations, OLAP, Analytics and Scalable Library of
Advance Analytics (Predictive/Prescriptive Analytics) Intelligence Machine Learning
Applications Algorithms

Data Access, SQL, Query Processing, OLTP, ETL, Analytics HiveQL


R-Descriptive Statistics, In-Memory or On-Store Applications (SQL like
Database Processing, MapReduce and Others Support Scripting
Language)
Applications Support Layer

Organised
Traditional DataStore/Data Warehouse HDFS
Data Store
Event Stream Processing (Hadoop File
Layer System) for
Complex Event Processing
Sources Big Data
Acquiring Data
IoT/M2M Data Sources
Enterprise Data Sources
External Data Sources

Figure 5.5 Berkeley data analytics stack architecture

Reconfirm Your Understanding


● Organised data in database or data store is used for analytics, new facts and decision taking on
those facts. Analytics has three phases before deriving new facts and provide business intelligence—
descriptive, predictive and prescriptive analytics.

You might also like