Big Data Introduction
Big Data Introduction
• Variety • Volatility
• Variability • Visualizatio
n
• Veracity
• Value
• Validity
What is Digital Data?
What is Digital Data?
• In computing world, digital data is
considered as a collection of facts that is
transmitted and saved in an electronic
format, and processed through software
system.
• Digital Data is generated by various devices,
like desktops, laptops, tablets, mobile
phones, and electronic sensors.
• Digital data is stored as a strings of binary
values (0s and 1s) on a storage medium
that’s either internal or external to the
devices generating or accessing the
information.
What is Digital Data?
• The storage devices could also be of various
varieties, like magnetic, optical, or solid state
storage devices.
• Examples of digital data are electronic
documents, text files, e-mails, e-books, digital
pictures, digital audio, and digital video.
Data and Information Processing
• Processing and analyzing information is significant and critical to any
organization.
• It allows organizations to derive value from information to take intelligent
decisions and improve organizational effectiveness.
• It is easier to analyze the structured data because it is stored in organised
format.
• On the opposite hand, processing non-structured data and extracting value
from it using traditional applications is tough, long, and needs to increase
the hardware resources.
• New architectures, technologies, and techniques have emerged that modify
storing, managing, analyzing, and bringing value from unstructured information
which is coming from various sources.
Data Types
Structured Data Type
• It is the type of data that is stored in a relational databases such as SQL
and Oracle where data is organised in rows and columns within named
tables.
• It is highly specific and is stored in a predefined format
• Structured data also adheres to predefined rules for formatting and
labeling information.
• It consists of clearly defined data types with patterns that make them
easily searchable.
• It usually resides in relational databases (RDBMs). Fields store length-
delimited data like phone numbers, Social Security numbers, or ZIP
codes, and records even contain text strings of variable length like
names, making it a simple matter to search.
Structured Data
• Data may be human- or machine-generated, as long as the data is created
within an RDB structure.
• This format is eminently searchable, both with human-generated queries
and via algorithms using types of data and field names, such as
alphabetical or numeric, currency, or date.
• Common relational database applications with structured data include
airline reservation systems, inventory control, sales transactions, and ATM
activity.
• Structured Query Language (SQL) enables queries on this type of
structured data within relational databases.
Characteristics of Structured Data
• The structured data conform to a data model with a predefined
structure.
• Data is organized into entities such as tables, and these columns
are linked together using relationships.
• All data stored in a table column have similar attributes. For
example, if a table contains the [FirstName] column as string data,
it will always store the string data for all records in the column.
• It does not allow dynamic structure change for a specific record.
Merits of Structured Data
• The fixed and well-defined schema helps easy management, less storage, and access
to the data.
• The data can be indexed based on its attributes. The indexing helps to read data from
a database quickly.
• Data security can be implemented at the granular level, i.e., row, column, or table.
• The structured data can be accessed easily by the machine learning algorithms.
Therefore, you can quickly do data manipulation and calculations.
• You can perform Business Intelligence operations with Increased access to more tools.
• The structured data enables users to understand and analyze different data
relationships quickly.
Demerits of Structured Data
• You need to define the schema well in advance, typical for all data requirements. If you need
an additional column requirement, it requires structure modification for all records in the
table. Therefore, the structured data is less flexible.
• It can be used for its intended goal with limiting business use case.
• Limitations On Use: Due to the organization style of structured data, it is more difficult to
have flexibility or varied use cases.
• Limited Storage: Structured data is stored in specific spaces of data warehouses. While
accessing the data is easy, scalability can be difficult. Changes within data warehouses can
become hard to manage. Using cloud data centers help with the storage problems.
• High Overhead: Data centers or other storage for structured data can become expensive and
be part of the structured data ordeal. Again, cloud data centers are recommended, but the
storage can still require significant work to keep the data maintained properly .
Examples of Structured Data
• Spreadsheets. • Phone numbers
• Relational databases • Email addresses
such as Microsoft • ATM activity
SQL Server, Oracle.
• Inventory control
• Online Transaction
Processing – OLTP • Student fee payment
Systems. databases