Database_print
Database_print
The three main components of the ER Model are entities, attributes and relationships.
There are three types of relationships that can exist between two entities.
One-to-One Relationship.
One-to-Many or Many-to-One Relationship.
Many-to-Many Relationship.
What are the 4 types of relationships?
There are four basic types of relationships: family relationships, friendships,
acquaintanceships, and romantic relationships. Other more nuanced types of relationships
might include work relationships, teacher/student relationships, and community or group
relationships.
ER Diagram Symbols
Attributes
Relationships
ENTITY
An entity can be place, person, object, event or a concept, which stores data in the database.
The characteristics of entities are must have an attribute, and a unique key. Every entity is
made up of some ‘attributes’ which represent that entity.
Examples of entities:
Entity set:
Student
An entity set is a group of similar kind of entities. It may contain entities with attribute
sharing similar values. Entities are represented by their properties, which also called
attributes. All attributes have their separate values. For example, a student entity may have a
name, age, class, as attributes.
Relationship
Relationship is nothing but an association among two or more entities. E.g., Tom works in
the Chemistry department.
Entities take part in relationships. We can often identify relationships with verbs or verb
phrases.
For example:
Weak Entities
A weak entity is a type of entity which doesn’t have its key attribute. It can be identified
uniquely by considering the primary key of another entity. For that, weak entity sets need to
have participation.
Strong Entity Set
Strong entity set always has a primary key.
It is represented by a rectangle symbol.
It contains a Primary key represented by the underline symbol.
The member of a strong entity set is called as dominant entity
set.
Primary Key is one of its attributes which helps to identify its
member.
In the ER diagram the relationship between two strong entity set
shown by using a diamond symbol.
The connecting line of the strong entity set with the relationship
is single.
The relational model (RM) for database management is an approach to managing data
using a structure and language consistent with first-order predicate logic
Basic relationship:
One-to-One Relationships
One-to-Many Relationships
May to One Relationships
Many-to-Many Relationships
1.One-to-one:One entity from entity set X can be associated with at most one entity of entity
set Y and vice versa.
Example: One student can register for numerous
courses. However, all those courses have a single
line back to that one student.
2.One-to-many:
3. Many to One
More than one entity from entity set X can be associated with
at most one entity of entity set Y. However, an entity from
entity set Y may or may not be associated with more than one
entity from entity set X.
4. Many to Many:
One entity from X can be associated with more than one entity from Y and vice versa.
For example, Students as a group are associated with multiple faculty members, and faculty
members can be associated with multiple students.
Mapping Process
Mapping Entity
Mapping Relationship
Mapping Process
Create table for a relationship.
Add the primary keys of all participating Entities as fields of table with their
respective data types.
If relationship has any attribute, add each attribute as field of table.
Declare a primary key composing all the primary keys of participating entities.
Declare all foreign key constraints.
A weak entity set is one which does not have any primary key associated with it.
Mapping Process
Create table for weak entity set.
Add all its attributes to table as field.
Add the primary key of identifying entity
set.
Declare all foreign key constraints.
Relational Algebra
Relational Algebra is procedural query language, which takes Relation as input and
generate relation as output. Relational algebra mainly provides theoretical foundation for
relational databases and SQL.
Operators in Relational Algebra
Projection (π)
Projection is used to project required column data from a relation.
Example :
R
(A B C)
----------
1 2 4
2 2 3
3 2 3
4 3 4
π (BC)
B C
-----
2 4
2 3
3 4
Note: By Default projection removes duplicate data.
Selection (σ)
Selection is used to select required tuples of the relations.
for the above relation
σ (c>3)R
will select the tuples which have c more than 3.
π (σ (c>3)R ) will show following tuples.
A B C
-------
1 2 4
4 3 4
Union (U)
Union operation in relational algebra is same as union operation in set theory, only
constraint is for union of two relation both relation must have same set of Attributes.
Rename (ρ)
Rename is a unary operation used for renaming attributes of a relation.
ρ (a/b)R will rename the attribute ‘b’ of relation by ‘a’.
A distributed database system allows applications to access data from local and remote
databases. In a homogenous distributed database system, each database is an Oracle
Database. In a heterogeneous distributed database system, at least one of the databases is
not an Oracle Database. Distributed databases use a client/server architecture to process
information requests.
Architectural Models
Distributed query processing is the procedure of answering queries (which means mainly
read operations on large data sets) in a distributed environment where data is managed at
multiple sites in a computer network.
1. We can transfer the data from S2 to S1 and then process the query
2. We can transfer the data from S1 to S2 and then process the query
3. We can transfer the data from S1 and S2 to S3 and then process the query. So
the choice depends on various factors like, the size of relations and the results,
the communication cost between different sites, and at which the site result will
be utilized.
Commonly, the data transfer cost is calculated in terms of the size of the messages. By
using the below formula, we can calculate the data transfer cost:
Data transfer cost = C * Size
2. Using Semi join in Distributed Query processing :
The semi-join operation is used in distributed query processing to reduce the number of
tuples in a table before transmitting it to another site.
Example : Find the amount of data transferred to execute the same query given in the
above example using semi-join operation.
Answer : The following strategy can be used to execute the query.
1. Select all (or Project) the attributes of the EMPLOYEE table at site 1 and then
transfer them to site 3. For this, we will transfer NAME, DID(EMPLOYEE) and
the size is 25 * 1000 = 25000 bytes.
2. Transfer the table DEPARTMENT to site 3 and join the projected attributes of
EMPLOYEE with this table. The size of the DEPARTMENT table is 25 * 50 =
1250.
UNIT 3
of XML-based languages.
TYPES
What is XML? The Extensible Markup Language (XML) is a simple text-based format
for representing structured information: documents, data, configuration, books,
transactions, invoices, and much more. It was derived from an older standard format called
SGML (ISO 8879), in order to be more suitable for Web use.
Advantages of XML
XML uses human, not computer, language. XML is readable and understandable,
even by novices, and no more difficult to code than HTML.
XML is completely compatible with Java™ and 100% portable. Any application that
can process XML can use your information, regardless of platform.
XML is extendable.
DIsadav
1. Structured data –
Structured data is data whose elements are addressable for effective analysis. It
has been organized into a formatted repository that is typically a database. It
concerns all data which can be stored in database SQL in a table with rows and
columns. They have relational keys and can easily be mapped into pre-designed
fields. Today, those data are most processed in the development and simplest way
to manage information. Example: Relational data.
2. Semi-Structured data –
Semi-structured data is information that does not reside in a relational database
but that has some organizational properties that make it easier to analyze. With
some processes, you can store them in the relation database (it could be very hard
for some kind of semi-structured data), but Semi-structured exist to ease
space. Example: XML data.
3. Unstructured data –
Unstructured data is a data which is not organized in a predefined manner or does
not have a predefined data model, thus it is not a good fit for a mainstream
relational database. So for Unstructured data, there are alternative platforms for
storing and managing, it is increasingly prevalent in IT systems and is used by
organizations in a variety of business intelligence and analytics
applications. Example: Word, PDF, Text, Media logs.
Matured transaction
and various No transaction
Transaction concurrency Transaction is adapted from management and
management techniques DBMS not matured no concurrency
What is XQuery?
XQuery Example
for $x in doc("books.xml")/bookstore/book
where $x/price>30
order by $x/title
return $x/title
What is XQuery?
XQuery is a language for finding and extracting elements and attributes from XML
documents.
"Select all CD records with a price less than $10 from the CD collection stored in
cd_catalog.xml"
XQuery 1.0 and XPath 2.0 share the same data model and support the same functions and
operators. If you have already studied XPath you will have no problems with understanding
XQuery.
"books.xml":
<bookstore>
<book category="COOKING">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
<book category="CHILDREN">
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category="WEB">
<title lang="en">XQuery Kick Start</title>
<author>James McGovern</author>
<author>Per Bothner</author>
<author>Kurt Cagle</author>
<author>James Linn</author>
<author>Vaidyanathan Nagarajan</author>
<year>2003</year>
<price>49.99</price>
</book>
<book category="WEB">
<title lang="en">Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
</bookstore>
UNIT 4
Introduction
NOSQL
Not only SQL
Most NOSQL systems are distributed databases
or distributed storage systems
Focus on semi-structured data storage, high
performance, availability, data replication, and
scalability
NOSQL systems focus on storage of “big data”
Typical applications that use NOSQL
Social media
Web links
User profiles
Marketing and sales
Posts and tweets
Road maps and spatial data
Email
BigTable
Google’s proprietary NOSQL system
Column-based or wide column store
DynamoDB (Amazon)
Key-value data store
Cassandra (Facebook)
Uses concepts from both key-value store and
column-based systems
MongoDB and CouchDB
Document stores
Neo4J and GraphBase
Graph-based NOSQL systems
OrientDB
Combines several concepts
Database systems classified on the object model
Or native XML model
The CAP theorem is a belief from theoretical computer science about distributed data
stores that claims, in the event of a network failure on a distributed database, it is possible to
provide either consistency or availability—but not both.
Let’s take a look.
What is the CAP theorem?
The CAP Theorem is comprised of three components (hence its name) as they relate to
distributed data stores:
In normal operations, your data store provides all three functions. But the CAP theorem
maintains that when a distributed database experiences a network failure, you can provide
either consistency or availability.
NoSQL
NoSQL databases do not require a schema, and don’t enforce relations between tables. All its
documents are JSON documents, which are complete entities one can readily read and
understand. They are widely recognized for:
Ease-of-use
Scalable performance
Strong resilience
Wide availability
Cloud Firestore
Firebase Real-time DB
MongoDB
MarkLogic
Couchbase
CloudDB
Amazon DynamoDB
NoSQL
NoSQL databases do not require a schema, and don’t enforce relations between tables. All its
documents are JSON documents, which are complete entities one can readily read and
understand. They are widely recognized for:
Ease-of-use
Scalable performance
Strong resilience
Wide availability
Cloud Firestore
Firebase Real-time DB
MongoDB
MarkLogic
Couchbase
CloudDB
Amazon DynamoDB
Consistency in databases
Consistent databases should be used when the value of the information returned needs to be
accurate.
Financial data is a good example. When a user logs in to their banking institution, they do not
want to see an error that no data is returned, or that the value is higher or lower than it
actually is. Banking apps should return the exact value of a user’s account information. In
this case, banks would rely on consistent databases.
Examples of a consistent database include:
MongoDB
Redis
HBase
Availability in databases
Availability databases should be used when the service is more important than the
information.
Cassandra
DynamoDB
Cosmos DB
Some database options, like Cosmos and Cassandra, allow a user to turn a knob on which
guarantee they prefer—consistency or availability.
Unit 5
Insider Threats
Malware
An Evolving IT Environment