BRIEF HISTORY OF THE RELATIONAL
MODEL
Introduction:
The relational model was
first proposed by E. F. Codd . In the relational model, all data is logically
arranged within relations (tables). The relational model is based on the
mathematical concept Relation. E.F. Codd
used mathematical terminology from the set theory and predicate logic. Each
relation has a name and is made up of named attributes (columns) of data. Each tuple
(row) contains one value per attribute. This simple logical structure is a
great strength of the relational model.
The relational model’s objectives are as follows:
• To allow a high degree of data
independence. Application programs must not be affected by changes to the
internal data representation.
• To deal with data semantics,
consistency, and redundancy problems
• To enable the expansion of
set-oriented data manipulation languages.
Three major projects are contributed
for development of the Relational Model:
System R:
This is the first
project that proved practicality the Relational Model. It was developed at
IBM’s San José Research Laboratory in California. This provides an implementation of relational model its data structures and
operations.
The System R project led to two
major developments:
• The development of a structured
query language called SQL.
•The production of various
commercial relational DBMS products for example, DB2 and SQL/DS from IBM and
Oracle from Oracle Corporation.
INGRES:
This is second project for
relational model, which was active at the same time as the System R project. The
INGRES(Interactive Graphics Retrieval System) project involved in development
of a prototype RDBMS. It was developed
at the University of California at
Berkeley.
• This research led to development
of commercial products like INGRES from
Relational Technology Inc. and the Intelligent Database Machine from Britton Lee
Inc.
Peterlee Relational Test Vehicle:
The third project was the Peterlee
Relational Test Vehicle at the IBM UK Scientific Centre in Peterlee .This
project had a more theoretical orientation than the System R and INGRES
projects. It supports query processing and optimization as well as functional extension.
TERMINOLOGY
The relational model is based on the
mathematical concept of a relation, which is physically represented as a
table. Codd used terminology taken from mathematics, principally set theory
and predicate logic.
Relational Data Structure
Relation
A relation is a table
with columns and rows. An RDBMS requires only that database to be perceived by the user as
tables. This applies only to the logical structure of the database: that is,
the external and conceptual levels of the ANSI-SPARC architecture .
Attribute
An attribute is a named column of a
relation. The table columns treated as attributes.
Attributes can appear in any order .
For example, the information on branch offices is represented by the Branch relation, with columns for attributes branchNo, street, city, and postcode. Similarly, the information on staff is represented by the Staff relation, with columns for attributes staffNo, fName, IName, position, sex, DOB, salary, and branchNo.
Domain
A domain is the set of allowable
values for one or more attributes. Every attribute in a relation is defined on
a domain. Domains may be distinct for each attribute, or two or more
attributes may be defined on the same domain .
Tuple
A tuple is means a row in
a relation. Tuples can appear in any order .
Degree
The degree of a relation
is the number of attributes it contains.
1.
A relation with only one
attribute would have degree one and be called a unary relation or
one-tuple.
2.
A relation with two
attributes is called binary.
3.
A relation one with three attributes is called ternary,
and after that the term n-ary is usually used.
Cardinality
The cardinality of a
relation is the number of tuples it contains. This will changes as tuples are
added or deleted.
Relational database
A relational database
consists of relations that are appropriately structured. This can be known as normalization.
Alternative terminology
A relation may be referred as a file, the tuples as records, and the attributes as fields. This terminology comes from the fact that, physically, the RDBMS may store each relation in a file.
Properties of Relations
A relation has the following
properties:
• The relation has a name that is
distinct from all other relation names in the relational schema;
• Each cell of the relation contains
exactly one atomic (single) value;
• Each attribute has a distinct
name;
• The values of an attribute are all
from the same domain;
• Each tuple is distinct; there are
no duplicate tuples;
• The order of attributes has no
significance;
• The order of tuples has no
significance, theoretically.
Relational Keys
We need to be able to identify one
or more attributes (called relational keys) that uniquely identifies
each tuple in a relation.
Superkey
A superkey uniquely
identifies each tuple within a relation.
It may be an attribute, or set of attributes, that uniquely identifies a
tuple within a relation. However, a superkey may contain additional attributes
that are not necessary for unique identification.
Candidate key
A Candidate key is a superkey such
that no proper subset is a superkey within the relation.
A candidate key K for a
relation R has two properties:
• Uniqueness. In each tuple
of R, the values of K uniquely identify that tuple.
• Irreducibility. No proper
subset of K has the uniqueness property.
There may be several candidate keys for a relation. When a key consists of more than one attribute, we call it a composite key.
Consider the Branch relation shown
in figure we can determine one tuple with branchNo. So it is a candidate key. Similarly, postcode is
also a candidate key for this relation.
For example, from the Figure the suitable candidate key for the Staff relation
would be IName, the employee’s surname.
Primary key
Because a relation has
no duplicate tuples, it is always possible to identify each row uniquely. This
means that a relation always has a primary key. In the worst case, the entire
set of attributes could serve as the primary key, but usually some smaller
subset is sufficient to distinguish the tuples.
Alternate keys
The candidate keys that
are not selected to be the primary key are called alternate keys.
For the Branch relation, if we
choose branchNo as the primary key, postcode would then be an alternate key.
Foreign key
An attribute, or set of
attributes, within one relation that matches the candidate key of some relation.
When an attribute appears in more than one relation, its appearance usually
represents a relationship between tuples of the two relations.
For example, the inclusion of
branchNo in both the Branch and Staff relations is it links each branch to the
details of staff working at that branch. In the Branch relation, branchNo is the
primary key. However, in the Staff relation, the branchNo attribute exists to
match staff to the branch office they work in. In the Staff relation, branchNo
is a foreign key.
INTEGRITY CONSTRAINTS
Integrity
constraints are the rules that ensure the data is accurate. There are two
important integrity rules for the relational models. Those are:
1. Entity
integrity
2.
Referential integrity
Nulls
Null
represents a value for an attribute that is currently unknown or is not
applicable for this tuple. It represents the absence of a value. A null mean
the logical value “unknown.” It can mean that a value is not applicable to a
particular tuple, or it could merely mean that no value has yet been supplied.
Nulls are a way to deal with incomplete or exceptional data.
Entity Integrity
Entity integrity is a
constraint that states that in a base relation no attribute of a primary key
can be null.
For example, as branchNo is the primary key of the Branch relation, we should not be able to insert a tuple into the Branch relation with a null for the branchNo attribute.
Referential Integrity
If a foreign key exists
in a relation, either the foreign key value must match a candidate key value of
some tuple in its home relation or the foreign key value must be wholly null.
For example, branchNo in
the Staff relation is a foreign key targeting the branchNo attribute in the Branch
relation, Branch. It should not be possible to create a staff record with
branch number B025, for example, unless there is already a record for branch
number B025 in the Branch relation. However, we should be able to create a new
staff record with a null branch number to allow for the situation where a new
member of staff has joined the company but has not yet been assigned to a particular
branch office.
General Constraints
Additional rules
specified by the users or database administrators of a database that define or
constrain some aspect of the enterprise.
For example, if an upper limit of 20 has been placed upon the number of staff that may work at a branch office, then the user must be able to specify this general constraint and expect the DBMS to enforce it. In this case, it should not be possible to add a new member of staff at a given branch to the Staff relation if the number of staff currently assigned to that branch is 20.
VIEWS
A view is a virtual or
derived relation.
Terminology
Base relation
A named relation
corresponding to an entity in the conceptual schema, whose tuples are
physically stored in the database.
View
A view is a virtual
relation that does not necessarily exist in the database but can be
produced upon request by a particular user, at the time of request. It is
dynamic result of one or more relational
operations operating on the base relations to produce another relation.
·
A view is a relation
that appears to the user to exist, can be manipulated as if it were a base
relation, but does not necessarily exist in storage in the sense that the base
relations do.
·
The contents of a view
are defined as a query on one or more base relations. Any operations on the
view are automatically translated into operations on the relations from which
it is derived.
·
Views are dynamic,
meaning that changes made to the base relations that affect the view are
immediately reflected in the view. When users make permitted changes to the
view, these changes are made to the underlying relations.
Purpose of Views
The view mechanism is desirable for
several reasons:
• It provides a powerful and
flexible security mechanism by hiding parts of the database from certain users
• It permits users to access data in
a way that is customized to their needs, so that the same data can be seen by
different users in different ways, at the same time.
• It can simplify complex operations
on the base relations.
Updating Views
If a view is updated,
then the underlying base relation should reflect the change. The conditions
most systems determine whether an update is allowed through a view:
• Updates are allowed through a view
defined using a simple query involving a single base relation and containing
either the primary key or a candidate key of the base relation.
• Updates are not allowed through
views involving multiple base relations.
• Updates are not allowed through
views involving aggregation or grouping operations.
Based on the updates allowed, Views
are classified into the following classes:
1.
Theoretically not updatable
2. Theoretically updatable
3. Partially updatable
The Relational algebra
is a procedural language . It can be used
to tell the DBMS how to build a new relation from one or more relations
in the database.
The following are the possible
operations of relational algebra:
Unary Operations
The two unary operations are:
1.
Selection
2.
Projection.
Selection (or Restriction) σ
·
Selection
Operator (σ) is a unary operator in relational algebra that performs a
selection operation.
·
It selects those
rows or tuples from the relation that satisfies the selection condition.
Syntax
σ<selection_condition>(R) |
We may use
logical operators like ∧ , ∨ , ! and relational operators like =
, ≠ , > , < , <= , >= with the selection condition.
·
Selection operator only selects the required tuples according to the
selection condition.
·
It does not display the selected tuples.
Examples
·
Select tuples from a relation “Books” where subject is “database”
σsubject = “database” (Books)
·
Select tuples from a relation “Books” where subject is “database” and
price is “450”
σsubject = “database” ^price="450" (Books)
Projection
Projection
Operator (Ï€) is a unary operator in relational algebra that performs a
projection operation.
It
displays the columns of a relation or table based on the specified attributes.
Syntax:
Ï€<attribute
list>(R) |
Projection operator automatically removes all the duplicates while projecting the output relation.
Example:
Set Operations
The relational algebra, starting
with the set operations of Union, Set difference, Intersection, and Cartesian
product.
Union
This
operation is used to fetch data from two relations(tables) or temporary
relation(result of another operation).
For this
operation to work, the relations(tables) specified should have same number of
attributes(columns) and same attribute domain. Also the duplicate tuples are
autamatically eliminated from the result.
Syntax: A ∪ B
where A
and B are relations.
For
example, if we have two tables RegularClass and ExtraClass, both have a column
student to save name of student, then,
∏Student(RegularClass)
∪ ∏Student(ExtraClass)
Above
operation will give us name of Students who are attending both regular classes
and extra classes, eliminating repetition.
Set difference
This operation is used to find data
present in one relation and not present in the second relation. This operation
is also applicable on two relations, just like Union operation.
Syntax: A – B
where A and B are relations.
For example, if we want to find name
of students who attend the regular class but not the extra class, then, we can
use the below operation:
∏Student(RegularClass) - ∏Student(ExtraClass)
Intersection
Let R and S be two relations.
Then-
- R ∩ S is the set of all tuples belonging to
both R and S.
- In R∩S, duplicates are automatically removed.
- Intersection operation is both commutative and associative
d). Cartesian product:
The Cartesian product operation
multiplies two relations to define
another relation consisting of all possible pairs of tuples from the two
relations.
Therefore, if one
relation has I tuples and N attributes and the other has J tuples and M
attributes, the Cartesian product relation will contain (I * J) tuples with (N
+ M) attributes.
Product operation can be denoted by
the following notation:
(A × B)
Example: Find the Cartesian Product
of the relations A and B
Join Operations
The Join operation, which combines
two relations to form a new relation. There are various forms of the Join
operation
·
Theta join
·
Equijoin (a particular
type of Theta join)
·
Natural join
·
Outer join
·
Semijoin
Natural Join (⋈)
Natural
join is a binary operator. Natural join between two or more relations will
result set of all combination of tuples where they have equal common attribute.
Let us see
below example
Equijoin
When a theta join uses only
equivalence condition, it becomes an equi join.
Outer join
An outer join doesn't require each
record in the two join tables to have a matching record. In this type of join,
the table retains each record even if no other matching record exists.
Three types of Outer Joins are:
- Left Outer Join
- Right Outer Join
- Full Outer Join
Left Outer Join
The LEFT
JOIN returns all the rows from the table on the left even if no matching rows
have been found in the table on the right. Where no matching record found in
the table on the right, NULL is returned.
Right Outer join
outer JOIN is the opposite of LEFT
JOIN. The RIGHT JOIN returns all the columns from the table on the right even
if no matching rows have been found in the table on the left. Where no matches
have been found in the table on the left, NULL is returned.
Full Outer join
In a full outer join, all tuples
from both relations are included in the result, irrespective of the matching
condition.
Semi join
Semi-Join matches
the rows of two relations and then show the matching rows of the relation whose
name is mentioned to the left side of ⋉ Semi Join operator.
No comments:
Post a Comment