DBMS1.2

THE THREE-LEVEL ANSI-SPARC ARCHITECTURE
The Three-Level ANSI-SPARC Architecture
The American National Standards Institute (ANSI) Standards Planning and Requirements Committee (SPARC), or ANSI/X3/SPARC, produced Three-Level architecture in 1975 .It is known as ANSI-SPARC ARCHITECTURE.
 Although the ANSI-SPARC model did not become a standard, it still provides a basis for understanding some of the functionality of a DBMS.
The ANSI-SPARC database architecture uses three levels of abstraction: external, conceptual, and internal. The objective of the three-level architecture is to separate each user’s view of the database from the way the database is physically represented.
Reasons why this separation :
• Each user should be able to access the same data, but have a different customized view of the data.
• Users should not have to deal directly with physical database storage details, such as indexing or hashing .
 • The DBA should be able to change the database storage structures without affecting the users’ views.
• The internal structure of the database should be unaffected by changes to the physical aspects of storage, such as  changing to a new storage device.
• The DBA should be able to change the conceptual structure of the database without affecting all users.

 External Level
The external level consists of a number of different external views(users views) of the database. This level describes that part of the database that is relevant to each user.
Different views may have different representations of the same data. For example, one user may view dates in the form (day, month, year), while another may view dates as (year, month, day).

Conceptual Level
The conceptual level is the community view of the database. This is middle level in the three-level architecture is the conceptual level. This level describes what data is stored in the database and the relationships among the data. This level contains the logical structure of the entire database as seen by the DBA. The conceptual level represents:
• all entities, their attributes, and their relationships;
• the constraints on the data;
• semantic information about the data;
• security and integrity information.

Internal Level
The internal level is the computer’s view of the database .This level describes how the data is stored in the database and  how records are sequenced, what indexes and pointers exist, and so on.
The internal level covers the physical implementation of the database .It covers the data structures and file organizations used to store data on storage devices. The internal level is concerned with such things as:
• storage space allocation for data and indexes;
• record descriptions for storage (with stored sizes for data items);
• record placement;
• data compression and data encryption techniques.
Below the internal level there is a physical level that may be managed by the operating system under the direction of the DBMS. The physical level consists info of operating system , such as how the sequencing is implemented and how the internal records(fields) are stored as contiguous bytes on the disk.

Schemas, Mappings, and Instances

The overall description of the database is called the database schema.  There are three different types of schema in the database .
ü  At the highest level, we have multiple external schemas (also called subschemas) that correspond to different views of the data.
ü  At the conceptual level, we have the conceptual schema, which describes all the entities, attributes, and relationships together with integrity constraints.
ü  At the lowest level of abstraction we have the internal schema, which is a complete description of the internal model.
There is only one conceptual schema, one internal schema and more than one external schema per database.  
The DBMS is responsible for mapping between these three types of schema.  The conceptual schema is related to the internal schema through a conceptual/internal mapping. This mapping enables the DBMS to find the actual record or combination of records in physical storage that constitute a logical record in the conceptual schema. It also allows any differences in entity names, attribute names, attribute order, data types, and so on to be resolved.
Finally, each external schema is related to the conceptual schema by the external/conceptual mapping. This mapping enables the DBMS to map names in the user’s view to the relevant part of the conceptual schema.
An example of the different levels is shown in Figure  Two different external views of staff details exist: one consisting of a staff number (sNo), first name (fName),last name (IName), age, and salary; a second consisting of a staff number (staffNo), last name (IName), and the number of the branch the member of staff works at (branchNo). These external views are merged into one conceptual view. In this process, the major difference is that the age field has been changed into a date of birth field, DOB. The DBMS maintains the external/conceptual mapping; for example, it maps the sNo field of the first external view to the staffNo field of the conceptual record. The conceptual level is then mapped to the internal level, which contains a physical description of the structure for the conceptual record. Again, the DBMS maintains the conceptual/ internal mapping.
The data in the database at any particular point in time is called a database instance. Therefore, many database instances can correspond to the same database schema. The schema is sometimes called the intension of the database; an instance is called an extension (or state) of the database.

Data Independence
There are two kinds of data independence: logical and physical.
Logical data independence
The immunity of the external schemas to changes in the conceptual schema. Changes to the conceptual schema, such as the addition or removal of new entities, attributes, or relationships, should be possible without having to change existing external schemas or having to rewrite application programs.
Physical data  independence
The immunity of the conceptual schema to changes in the internal schema. Changes to the internal schema, such as using different file organizations or storage structures, using different storage devices, modifying indexes or hashing algorithms, should be possible without having to change the conceptual or external schemas. From the users’ point of view, the only effect that may be noticed is a change in performance.
More efficient mapping, the ANSI-SPARC model allows the direct mapping of external schemas onto  the internal schema, thus by-passing the conceptual schema. This mapping of course reduces data independence. 

Database Languages

A data sublanguage consists of two parts:

1.     Data Definition Language (DDL)

2.     Data Manipulation Language (DML).

 These languages are called data sublanguages because they do not used constructs for all computing needs, such as conditional or iterative statements.

Many DBMSs have a facility for embedding the sublanguage in a high-level programming language such as COBOL, Fortran, Pascal, Ada, C, C++, C#, Java, or Visual Basic.

 The Data Definition Language (DDL)

A language that allows the DBA or user to describe and name the entities, attributes, and relationships required for the application, together with any associated integrity and security constraints.

The DDL is used to define a schema or to modify an existing one. It cannot be used to manipulate data.

 The Data Manipulation Language (DML)

A language that provides a set of operations to support the basic data manipulation operations on the data held in the database.

Data manipulation operations usually include the following:

• insertion of new data into the database;

• modification of data stored in the database;

• retrieval of data contained in the database;

• deletion of data from the database.

The part of a DML that used for data retrieval is called a query language.

There are two types of DML: procedural and nonprocedural.

Procedural DML

A language that allows the user to tell the system what data is needed and exactly how to retrieve the data. Procedural languages treat records individually.  Network and hierarchical DMLs are normally procedural.

Nonprocedural DML

 With nonprocedural DMLs, the user specifies what data is required without specifying how it is to be obtained. Nonprocedural languages operate on sets of records. Nonprocedural languages are also called declarative languages. 

SQL or QBE (Query-By-Example) are the nonprocedural languages. Nonprocedural DMLs are normally easier to learn and use than procedural DMLs

 Fourth-Generation Languages (4GLs)

A 4GL is nonprocedural: the user defines what is to be done, not how.

A 4GL is depends on  higher-level components known as fourth-generation tools. The user  needs to define parameters for the tools that use them to generate an application program. It is claimed that 4GLs can improve productivity by a factor of ten.

Fourth-generation languages encompass:

Presentation Languages, such as query languages and report generators;

Speciality Languages, such as spreadsheets and database languages;

Application Generators that define, insert, update, and retrieve data from the database to build applications.

Very High-Level Languages that are used to generate application code.

SQL and QBE, mentioned previously, are examples of 4GLs.

Forms generators

A forms generator is an interactive facility for rapidly creating data input and display layouts for screen forms. The forms generator allows the user to define what the screen is to look like, what information is to be displayed, and where on the screen it is to be displayed. It also allow the user to define formatting for screen elements .

Report generators

 A report generator is a facility for creating reports from data stored in the database. The report generator also  automatically determine how the output should look or we can create our own customized output reports .

There are two main types of report generator: language-oriented and visually oriented.

 In the first case, we enter a command in a sublanguage to define what data is to be included in the report and how the report is to be laid out.

In the second case, we use a facility similar to a forms generator to define the same information.

Graphics generators

A graphics generator is a facility to retrieve data from the database and display the data as a graph showing trends and relationships in the data. It allows the user to create bar charts, pie charts, line charts, scatter charts, and so on.

Application generators

An application generator is a facility for producing a program that interfaces with the database. It reduces the time to design an entire software application. Application generators typically consist of Pre-written modules .The user specifies what the program is supposed to do  the application generator determines how to perform the tasks.

 

DATA MODELS AND CONCEPTUAL MODELING

Data model

A data model represents the organization itself. A Data model is a integrated collection of concepts for describing and manipulating data, relationships between data, and constraints on the data in an organization.

It provides the basic concepts and notations that will allow database designers and end-users to communicate accurately to understand the organizational data.

The purpose of a data model is to represent data and to be understandable.

 A data model can be  contains three components:

(1) A Structural Part: A set of rules according to which databases can be constructed.

(2) A Manipulative Part: Defining the types of operation that are allowed on the data.

(3) A Set Of Integrity Constraint: Ensures the data is accurate.

 The data models  are 3 categories:

Object-Based Data Models

Object-based data models use concepts such as entities, attributes, and relationships.

An entity is a distinct object (a person, place, thing, concept, event) in the organization that can be represented in the database. An attribute is a property that describes some aspect of the object that we wish to record.

A relationship is an association between entities.

Some  object-based data model are:

• Entity-Relationship (ER)

• Semantic

• Functional

• Object-oriented

Entity-Relationship (ER) is the main technique used for database design.

Record-Based Data Models

The database consists of a number of fixed-format records. Each record type defines a fixed number of fields.

These are three types:

1     The Relational Data Model

2    The Network Data Model

       The Hierarchical Data Model.

Relational data model

In the relational model, data and relationships are represented as tables, each of which has a number of columns with a unique name.

Figure (a)  is a example of DreamHome case study, showing branch and staff  details. For example, it shows that employee John White is a manager with a salary of £30,000, who works at branch (branchNo) B005, which, from the first table, is at 22 Deer Rd in London. It is important to note that there is a relationship between Staff and Branch: a branch office has staff.


Network data model

The network model represented in Figure(b). The data is represented as collections of records, and relationships are represented by sets.

The records are organized as generalized graph structures with records appearing as nodes (also called segments) and sets as edges in the graph.

Hierarchical data model

Hierarchical model is represented in figure (c) .The data is represented as collections of records and relationships are represented by sets. The hierarchical model allows a node to have only one parent. A hierarchical model can be represented as a tree graph, with records appearing as nodes (also called segments) and sets as edges.

Physical Data Models

Physical data models describe how data is stored in the computer, representing information such as record structures, record orderings, and access paths. The most common ones are the unifying model and the frame memory. 

Conceptual Modeling

Conceptual modeling or conceptual database design is the process of constructing a model of the information use in an enterprise that is independent of implementation details, such as the target DBMS, application programs, programming languages, or any other physical considerations. This model is called a conceptual data model. Conceptual models are also referred to as “logical models” .

FUNCTIONS OF A DBMS

 (1) Data storage, retrieval, and update

A DBMS provides users with the ability to store, retrieve, and update data in the database and it provides a catalog in which descriptions of data items are stored and which is accessible to users. The DBMS should hide the internal physical implementation details from the user.

(2) A user-accessible catalog

A DBMS must provide a catalog in which descriptions of data items are stored and is accessible to users.

System catalog OR Data Dictionary:

A system catalog or data dictionary is a repository of information describing the data in the database.

It is the “data about the data” or the metadata. Typically, the system catalog stores:

• Names, types, and sizes of data items;

• Names of  relationships;

• Integrity constraints on the data;

• Names of authorized users who have access to the data;

• The data items that each user can access and the types of access allowed.

• External, conceptual, and internal schemas and the mappings between the schemas.

• Usage statistics, such as the frequencies of transactions and counts on the number of accesses made to objects in the database.

 Some benefits of a system catalog are:

• Information about data can be collected and stored centrally. This helps to maintain control over the data as a resource.

• The meaning of data can be defined, which will help other users understand the purpose of the data.

• Communication is simplified, because exact meanings are stored.

• Redundancy and inconsistencies can be identified more easily as the data is centralized.

• Changes to the database can be recorded.

• The impact of a change can be determined before it is implemented because the system catalog records each data item, all its relationships, and all its users.

• Security can be enforced.

• Integrity can be ensured.

• Audit information can be provided.

 (3) Transaction support

 A transaction is a series of actions, carried out by a single user or application program, which accesses or changes the contents of the database. If a transaction fails during execution , the database will be in an inconsistent state. The DBMS then undoes the changes and returns the database to a consistent state again.

(4) Concurrency control services

When two or more users are accessing the database simultaneously and at least one of them is updating data, there may be interference that can result in inconsistencies. The DBMS must ensure that when multiple users are accessing the database, interference cannot occur.

(5) Recovery services

A DBMS must furnish a mechanism for recovering the database in the event that the database is damage or failure. This failure may be the result of a system crash, media failure, a hardware , software error causing the DBMS to stop, it may be the result of the user detecting an error during the transaction.

(6) Authorization services

A DBMS must furnish a mechanism to ensure that only authorized users can access the database.

(7) Support for data communication

 A DBMS must be capable of integrating with communication software. A DBMS must be capable of being integrated with a variety of Data Communications Managers (DCMs) to handle communication messages.

(8) Integrity services

Integrity is related to security. Integrity is concerned with the quality of data itself. Integrity is usually expressed in terms of constraints, which are consistency rules that the database is not permitted to violate.

(9) Services to promote data independence

A DBMS must include facilities to support the independence of programs from the actual structure of the database. Data independence is normally achieved through a view or subschema mechanism.

(10) Utility services

A DBMS should provide a set of utility services. Utility programs help the DBA administer. Examples of utilities of the latter kind are:

• Monitoring facilities, to monitor database usage and operation.

•Index reorganization facilities, to reorganize indexes and their overflows.

•Garbage collection and reallocation, to remove deleted records.


COMPONENTS OF A DBMS

A DBMS is partitioned into several software components (or  modules), each one is assigned for  a specific operation. 


Query processor. This component transforms queries into a series of low-level instructions directed to the database manager.

Database manager (DM). The DM interfaces with user-submitted application programs and queries. The DM accepts queries and examines the external and conceptual schemas to determine what conceptual records are required to satisfy the

request.

File manager. The file manager manipulates the underlying storage files and manages the allocation of storage space on disk. It establishes and maintains the list of structures and indexes defined in the internal schema.

DML preprocessor. This module converts DML statements embedded in an application program into standard function calls in the host language. The DML preprocessor must interact with the query processor to generate the appropriate code.

DDL compiler. The DDL compiler converts DDL statements into a set of tables containing metadata. These tables are then stored in the system catalog while control information is stored in data file headers.

Catalog manager. The catalog manager manages access to and maintains the system catalog. The system catalog is accessed by most DBMS components.

The major software components for the database manager are as follows:

Authorization control. This module confirms whether the user has the necessary authorization to carry out the required operation.

Command processor. Once the system has confirmed that the user has authority to carry out the operation, control is passed to the command processor.

Integrity checker. For an operation that changes the database, the integrity checker checks whether the requested operation satisfies all necessary integrity constraints (such as key constraints).

Query optimizer. This module determines an optimal strategy for the query execution.

Transaction manager. This module performs the required processing of operations that it receives from transactions.

Scheduler. This module is responsible for ensuring that concurrent operations on the database proceed without conflicting with one another. It controls the order in which transaction operations are executed.

Recovery manager. This module ensures that the database remains in a consistent state in the presence of failures. It is responsible for transaction commit and abort.

Buffer manager. This module is responsible for the transfer of data between main memory and secondary storage, such as disk and tape. The recovery manager and the buffer manager are sometimes referred to collectively as the data manager.

The buffer manager is also known as the cache manager.



Next Topic(Relational Model)

http://degreecsa.blogspot.com


No comments:

Post a Comment