Database Management System Conceptual View
11/25/2008 Institute of Management Sciences Muhammad Atif Nasim
Table of Contents Database Database ........................... ........................................ ........................... ........................... ........................... ............................ ........................... ........................... .............. 2 Database Management Systems ....................................................................................... 2 Uses of databases ............................................................................................................. 2 Type of Databases ............................................................................................................ 2 Delimited text files ....................................................................... .............................................................................................................................. ....................................................... 2 Comma-separated variable (CSV) files...................................................... ................................................................................................ .......................................... 3 Locking .............................................................. .................................................................................................................................. .................................................................................. .............. 3 Complex data ...................................................................................................................................... ...................................................................................................................................... 3 Efficiency ............................................................................................................................................. ............................................................................................................................................. 3 Hierarchical Database Definition ........................................................................................................ ........................................................................................................ 4 Network model ................................................................................................................................... ................................................................................................................................... 4 Relational Database ............................................................................................................................ ............................................................................................................................ 4 Object-Oriented Database Definition ................................................................................................. ................................................................................................. 5
Tables and relationships ................................................................................................... 5 Entity-Relationship Diagrams (ERD) .................................................................................. 8 Data Flow Diagram (DFD) ............................................................................................... 13 Guidelines .......................................................... ............................................................................................................................. ............................................................................... ............ 13 Decomposition.................................................................................................................................. ..................................................................................................................................13 13 Symbols ............................................................. ................................................................................................................................. ................................................................................ ............15 15 Data process ......................................................... ............................................................................................................................ ................................................................... 15 Data store ............................................................ ................................................................................................................................ .................................................................... 15 A ctor......................................................................................................................................... ......................................................................................................................................... 16 Anchor.................................................................................................................... ...................................................................................................................................... .................. 16 Data flow ...................................................................... .................................................................................................................................. ............................................................ 16 Control flow .................................................................................... .............................................................................................................................. .......................................... 16 Update flow..................................... flow...................................................................................................................... ......................................................................................... ........ 16 Flow names and inheritance inheritance .................................................................................................... .................................................................................................... 17 Data Flow Diagram Layers ................................................................................................. ....................................................................................................... ...... 19 Context Diagrams ............................................................................................................. .................................................................................................................... ....... 20 DFD levels ............................................................................................................... ................................................................................................................................. .................. 20
Key.......................... ........................................ ............................ ........................... ........................... ........................... ........................... ............................ ..................... ....... 21 Primary key ....................................................................................................................................... ....................................................................................................................................... 21 Secondary/Foreign key ..................................................................................................................... 21
Database Normalization ................................................................................................. 21 1. Eliminate Repeating Groups ................................................................................................. ......................................................................................................... ........ 24 2. Eliminate Redundant R edundant Data ............................................................................................................ ............................................................................................................ 25 3. Eliminate Columns Not Dependent On Key .................................................................................. .................................................................................. 26 BCNF. Boyce-Codd Normal Form........................................ Form.............................................................................................. ...................................................... 26
4. Isolate Independent Multiple Relationships R elationships ................................................... ................................................................................. .............................. 27 5. Isolate Semantically Related Multiple Relationships ................................................................ .................................................................... .... 28 6. Optimal Normal Form ................................................................................................................... ................................................................................................................... 29 7. Domain-Key Normal Form ............................................................................................................ ............................................................................................................ 29
Components of DBMS..................................................................................................... 30
Table of Contents Database Database ........................... ........................................ ........................... ........................... ........................... ............................ ........................... ........................... .............. 2 Database Management Systems ....................................................................................... 2 Uses of databases ............................................................................................................. 2 Type of Databases ............................................................................................................ 2 Delimited text files ....................................................................... .............................................................................................................................. ....................................................... 2 Comma-separated variable (CSV) files...................................................... ................................................................................................ .......................................... 3 Locking .............................................................. .................................................................................................................................. .................................................................................. .............. 3 Complex data ...................................................................................................................................... ...................................................................................................................................... 3 Efficiency ............................................................................................................................................. ............................................................................................................................................. 3 Hierarchical Database Definition ........................................................................................................ ........................................................................................................ 4 Network model ................................................................................................................................... ................................................................................................................................... 4 Relational Database ............................................................................................................................ ............................................................................................................................ 4 Object-Oriented Database Definition ................................................................................................. ................................................................................................. 5
Tables and relationships ................................................................................................... 5 Entity-Relationship Diagrams (ERD) .................................................................................. 8 Data Flow Diagram (DFD) ............................................................................................... 13 Guidelines .......................................................... ............................................................................................................................. ............................................................................... ............ 13 Decomposition.................................................................................................................................. ..................................................................................................................................13 13 Symbols ............................................................. ................................................................................................................................. ................................................................................ ............15 15 Data process ......................................................... ............................................................................................................................ ................................................................... 15 Data store ............................................................ ................................................................................................................................ .................................................................... 15 A ctor......................................................................................................................................... ......................................................................................................................................... 16 Anchor.................................................................................................................... ...................................................................................................................................... .................. 16 Data flow ...................................................................... .................................................................................................................................. ............................................................ 16 Control flow .................................................................................... .............................................................................................................................. .......................................... 16 Update flow..................................... flow...................................................................................................................... ......................................................................................... ........ 16 Flow names and inheritance inheritance .................................................................................................... .................................................................................................... 17 Data Flow Diagram Layers ................................................................................................. ....................................................................................................... ...... 19 Context Diagrams ............................................................................................................. .................................................................................................................... ....... 20 DFD levels ............................................................................................................... ................................................................................................................................. .................. 20
Key.......................... ........................................ ............................ ........................... ........................... ........................... ........................... ............................ ..................... ....... 21 Primary key ....................................................................................................................................... ....................................................................................................................................... 21 Secondary/Foreign key ..................................................................................................................... 21
Database Normalization ................................................................................................. 21 1. Eliminate Repeating Groups ................................................................................................. ......................................................................................................... ........ 24 2. Eliminate Redundant R edundant Data ............................................................................................................ ............................................................................................................ 25 3. Eliminate Columns Not Dependent On Key .................................................................................. .................................................................................. 26 BCNF. Boyce-Codd Normal Form........................................ Form.............................................................................................. ...................................................... 26
4. Isolate Independent Multiple Relationships R elationships ................................................... ................................................................................. .............................. 27 5. Isolate Semantically Related Multiple Relationships ................................................................ .................................................................... .... 28 6. Optimal Normal Form ................................................................................................................... ................................................................................................................... 29 7. Domain-Key Normal Form ............................................................................................................ ............................................................................................................ 29
Components of DBMS..................................................................................................... 30
Data dictionary/directory ...................................................... ................................................................................................................. ........................................................... 30 Data languages ............................................................................. .................................................................................................................................. ..................................................... 30 Teleprocessing monitors .......................................................................................................... ................................................................................................................... ......... 31 Application development system ..................................................................................................... 31 Security software .............................................................................................................................. .............................................................................................................................. 31 Archiving and recovery system ................................................................ ......................................................................................................... ......................................... 31 Report writers ................................................................................................................................... ................................................................................................................................... 31 SQL and other Query languages ....................................................................................................... ....................................................................................................... 31
Data Redundancy ........................................................................................................... 21 Data Integrity ................................................................................................................. 21 Cascade Updates and Deletes ................................................... .................................................................................................. ............................................... 22 Business Rules and Levels of Enforcement................................................... ............................................................................... ............................ 22 Field Level Integrity .......................................................................................................... .................................................................................................................. ........ 22 Table Level Integrity ....................................................... ................................................................................................................. .......................................................... 23 Validation Tables ........................................................... ..................................................................................................................... .......................................................... 23 23
Database A database is a collection of related information in organize manner. The data stored in a database is Constant.
Database Management Systems A database management system (DBMS) is software or a collection of software which can be used to create, maintain and work with databases. A client/server database system is one in which the database is stored and managed by a database server, and client software is used to request information from the server or to send commands to the server.
Uses of databases Databases are commonly used to store bodies of data which are too large to be managed on paper or through simple spreadsheets. Most businesses use databases for accounts, inventory, personnel, and other record keeping. Databases are also becoming more widely used by home users for address books, cd collections, recipe archives, etc. There are very few fields in which databases cannot be used.
Type of Databases •
Flat-file text databases
•
Hierarchical databases such as LDAP
•
Network databases
•
Relational databases
•
Object Oriented databases
Delimited text files A delimited text file is one in which each line of text is a record, and the fields are separated by a known character. The character used to delimit the data varies according to the type of data. Common delimiters include the tab character (\t in Perl) or various punctuation characters. The delimiter should always be one which does not appear in the data. Delimited text files are easily produced by most desktop spreadsheet and database applications (eg Microsoft Excel, Microsoft Access). You can usually choose "File" then "Save As" or "Export", then select the type of file you would like to save as. 2|Page
Imagine a file which contains peoples' given names, surnames, and ages, delimited by the pipe (|) symbol: Fred|Flintstone|40 Wilma|Flintstone|36 Barney|Rubble|38 Betty|Rubble|34 Homer|Simpson|45 Marge|Simpson|39 Bart|Simpson|11 Lisa|Simpson|9
The file above is available in your exercises directory as delimited.txt.
Comma-separated variable (CSV) files Comma separated variable files are another format commonly produced by spreadsheet and database programs. CSV files delimit their fields with commas, and wrap textual data in quotation marks, allowing the textual data to contain commas if required: "Fred","Flintstone",40 "Wilma","Flintstone",36 "Barney","Rubble",38 "Betty","Rubble",34 "Homer","Simpson",45 "Marge","Simpson",39 "Bart","Simpson",11 "Lisa","Simpson",9
CSV files are harder to parse than ordinary delimited text files. The best way to parse them is to use the Text::ParseWords module: Problems with flat file databases
Locking
When using flat file databases without locking, problems can occur if two or more people open the files at the same time. This can cause data to be lost or corrupted. If you are implementing a flat file database, you will need to handle file locking using Perl's flock function. Complex data
If your data is more complex than a single table of scalar items, managing your flat file database can become extremely tedious and difficult. Efficiency
Flat file databases are very inefficient for large quantities of data. Searching, sorting, and other simple activities can take a very long time and use a great deal of memory and other system resources.
3|Page
Hierarchical Database Definition A kind of {database management system} that links records together like a family tree such that each record type has only one owner, e.g. an order is owned by only one customer. Hierarchical structures were widely used in the first {mainframe} database management systems. However, due to their restrictions, they often cannot be used to relate structures that exist in the real world.
Network model The network model is a database model conceived as a flexible way of representing objects and their relationships. Its original inventor was Charles Bachman, and it was developed into a standard specification published in 1969 by the CODASYL Consortium. Where the hierarchical model structures data as a tree of records, with each record having one parent record and many children, the network model allows each record to have multiple parent and child records, forming a lattice structure. The chief argument in favour of the network model, in comparison to the hierarchic model, was that it allowed a more natural modelling of relationships between entities. Although the model was widely implemented and used, it failed to become dominant for two main reasons. Firstly, IBM chose to stick to the hierarchical model with semi-network extensions in their established products such as IMS and DL/I. Secondly, it was eventually displaced by the relational model, which offered a higher-level, more declarative interface. Until the early 1980s the performance benefits of the low-level navigational interfaces offered by hierarchical and network databases were persuasive for many large-scale applications, but as hardware became faster, the extra productivity and flexibility of the relational model led to the gradual obsolescence of the network model in corporate enterprise usage
Relational Database •
•
•
•
A relational database is a collection of data items organized as a set of formallydescribed tables from which data can be accessed or reassembled in many different ways without having to reorganize the database tables. The relational database was invented by E. F. Codd at IBM in 1970. The standard user and application program interface to a relational database is the structured query language (SQL). SQL statements are used both for interactive queries for information from a relational database and for gathering data for reports. In addition to being relatively easy to create and access, a relational database has the important advantage of being easy to extend. After the original database creation, a new data category can be added without requiring that all existing applications be modified. A relational database is a set of tables containing data fitted into predefined categories. Each table (which is sometimes called a relation) contains one or more data categories in columns. Each row contains a unique instance of data for the categories defined by the columns. For example, a typical business order entry database would include a table that described a customer with columns for name, address, phone number, and so forth. Another table would describe an order: product,
4|Page
customer, date, sales price, and so forth. A user of the database could obtain a view of the database that fitted the user's needs. For example, a branch office manager might like a view or report on all customers that had bought products after a certain date. A financial services manager in the same company could, from the same tables, obtain a report on accounts that needed to be paid. •
•
When creating a relational database, you can define the domain of possible values in a data column and further constraints that may apply to that data value. For example, a domain of possible customers could allow up to ten possible customer names but be constrained in one table to allowing only three of these customer names to be specifiable. The definition of a relational database results in a table of metadata or formal descriptions of the tables, columns, domains, and constraints.
Object-Oriented Database Definition (OODB) A system offering {DBMS} facilities in an {object-oriented programming} environment. Data is stored as {objects} and can be interpreted only using the {method}s specified by its {class}. The relationship between similar objects is preserved ({inheritance}) as are references between objects. Queries can be faster because {joins} are often not needed (as in a {relational database}). This is because an object can be retrieved directly without a search, by following its object ID. The same programming language can be used for both data definition and data manipulation. The full power of the database programming language's {type system} can be used to model {data structures} and the relationship between the different data items. {Multimedia} {applications} are facilitated because the {class} {method}s associated with the data are responsible for its correct interpretation. OODBs typically provide better support for {versioning}. An object can be viewed as the set of all its versions. Also, object versions can be treated as fully fledged objects. OODBs also provide systematic support for {triggers} and {constraints} which are the basis of {active databases}. Most, if not all, object-oriented {application programs} that have database needs will benefit from using an OODB. {Ode} is an example of an OODB built on {C++}.
Tables and relationships In a relational database, data is stored in tables. Each table contains data about a particular type of entity (either physical or conceptual). For instance, our sample database is the inventory and sales system for Acme Widget Co. It has tables containing data for the following entities: Table 4-1. Acme Widget Co Tables Table
Description
stock_item Inventory items customer
Customer account details
saleperson Sales people working for Acme Widget Co. Sales
5|Page
Sales events which occur
Tables in a database contain fields and records. Each record describes one entity. Each field describes a single item of data for that entity. You can think of it like a spreadsheet, with the rows being the records and the columns being the fields, thus: Table 4-2. Sample table ID number Description Price Quantity in stock
1
widget
$9.95 12
2
gadget
$3.27 20
Every table must have a primary key, which is a field which uniquely identifies the record. In the example above, the Stock ID number is the primary key. The following figures show the tables used in our database, along with their field names and primary keys (in bold type). Table 4-3. the stock_item table stock_item Id
Description Price Quantity Table 4-4. the customer table Customer Id
Name Address Suburb State Postcode Table 4-5. the salesperson table salesperson Id
Name Table 4-6. the sales table Sales
6|Page
Id
sale_date salesperson_id customer_id stock_item_id quantity Price
• •
• • • • • •
A database table contains fields and records of data about one entity SQL (Structured Query Language) can be used to manipulate and retrieve data in a database A SELECT query may be used to retrieve records which match certain criteria An INSERT query may be used to add new records to the database A DELETE query may be used to delete records from the database An UPDATE query may be used to modify records in the database A CREATE query may be used to create new tables in the database A DROP query may be used to remove tables from the database
7|Page
Entity-Relationship Diagrams (ERD) Data models are tools used in analysis to describe the data requirements and assumptions in the system from a top-down perspective. They also set the stage for the design of databases later on in the SDLC. There are three basic elements in ER models: Entities are the "things" about which we seek information. Attributes are the data we collect about the entities. Relationships provide the structure needed to draw information from multiple entities.
Generally, ERD's look like this:
8|Page
Developing an ERD Developing an ERD requires an understanding of the system and its components. Before discussing the procedure, let's look at a narrative created by Professor Harman. Consider a hospital: Patients are treated in a single ward by the doctors assigned to them. Usually each patient will be assigned a single doctor, but in rare cases they will have two. Heathcare assistants also attend to the patients, a number of these are associated with each ward. Initially the system will be concerned solely with drug treatment. Each patient is required to take a variety of drugs a certain number of times per day and for varying lengths of time. The system must record details concerning patient treatment and staff payment. Some staff are paid part time and doctors and care assistants work varying amounts of overtime at varying rates (subject to grade). The system will also need to track what treatments are required for which patients and when and it should be capable of calculating the cost of treatment per week for each patient (though it is currently unclear to what use this information will be put). How do we start an ERD? 1. Define Entities: these are usually nouns used in descriptions of the system, in the discussion of business rules, or in documentation; identified in the narrative (see highlighted items above). 2. Define Relationships: these are usually verbs used in descriptions of the system or in discussion of the business rules (entity ______ entity); identified in the narrative (see highlighted items above).
9|Page
Fully attributed ERD with keys
3. Add attributes to the relations; these are determined by the queries,and may also suggest new entities, e.g. grade; or they may suggest the need for keys or identifiers. What questions can we ask? a. Which doctors work in which wards? b. How much will be spent in a ward in a given week? c. How much will a patient cost to treat? d. How much does a doctor cost per week? e. Which assistants can a patient expect to see? f. Which drugs are being used? 4. Add cardinality to the relations Many-to-Many must be resolved to two one-to-manys with an additional entity Usually automatically happens Sometimes involves introduction of a link entity (which will be all foreign key) Examples: Patient-Drug 5. This flexibility allows us to consider a variety of questions such as: a. Which beds are free? b. Which assistants work for Dr. X? c. What is the least expensive prescription? d. How many doctors are there in the hospital? e. Which patients are family related? 10 | P a g e
6. Represent that information with symbols. Generally E-R Diagrams require the use of the following symbols:
Reading an ERD It takes some practice reading an ERD, but they can be used with clients to discuss business rules. These allow us to represent the information from above such as the E-R Diagram below:
11 | P a g e
ERD brings out issues: Many-to-Manys Ambiguities Entities and their relationships What data needs to be stored The Degree of a relationship Now, think about a university in terms of an ERD. What entities, relationships and attributes might you consider? Look at this simplified view. There is also an example of a simplified view of an airline on that page.
12 | P a g e
Data Flow Diagram (DFD) The DFDs show the flow of data values from their sources in objects through the processes that transform them to their destination in other objects. Values can include input values, output values, and internal data stores. Control information is shown only in the form of control flows. The following table lists the important elements of DFDs. Symbol
Stands For
Data process Data processing Data flow
Data flow or the exchange of data between processes
Data store
Data storage
Actor
Object producing and consuming data
Guidelines You can follow certain guidelines to draw meaningful DFDs. •
•
•
Optional input flows do not exist. A process can perform its function only if all its input flows are always available. You cannot assign the same data to two output flows from the same process. If a process produces more than one data flow, these flows are mutually exclusive. You can split a flow, and you can merge two flows into one.
Decomposition To specify what a high-level process does, break it down into smaller units in more DFDs. A high-level process is an entire DFD. Each high-level process is decomposed into other processes with data flows and data stores. Each decomposition is a DFD in itself. You can continue to break down processes until you reach a level on which further decomposition seems impossible or meaningless. The data flows of the opened process are connected in the new diagram to the process related to the opened process. Vertices, and the flows and objects connected to them, are transferred with the flows that are connected to the decomposed process.
13 | P a g e
Example DFD
The following illustration shows a sample DFD.
14 | P a g e
Symbols Data process A data process transforms data values.
You can make a distinction between the following types of processes: Process Type
High-level
Low-level Leaf or atomic processes
Indicates
Process containing nonfunctional components such as data stores or external objects that cause side effects Pure function without side effects, such as the sum of two numbers
Process that is not further decomposed
The name of a process is usually a description of the transformation it performs. There are three sorts of transformation: • • •
Transformation of the structure, for example, reformatting Transformation of information contained in data Generation of new information
If you open a process, you can either create a new DFD or open an existing DFD in which the process is specified. The data flows of the opened process are connected in the new diagram to the process with the name of the opened process. Vertices, and the flows and objects connected to them, are transferred with the flows that are connected to the decomposed process. If a data process has a decomposition at a lower level, an asterisk is placed inside the ellipse. The data process can be opened only if it has a name.
Data store A data store stores data passively for later access. A data store responds to requests to store and access data. It does not generate any operations. A data store allows values to be accessed in an order different from the order in which they were generated. Input flows indicate information or operations that modify the stored data such as adding or deleting elements or changing values. Output flows indicate information retrieved from the store; this information can be an entire value or a component of a value. 15 | P a g e
Actor An actor produces and consumes data, driving the DFD. Actors lie on the boundary of the diagram; they terminate the flow of data as sources and sinks of data. They are also known as terminators. Data flows between an actor and a diagram are inputs to and outputs of the diagram. The system interacts with people through the actor.
Anchor A DFD anchor provides a start or end point. In decomposition diagrams, anchors represent the nodes connected to the decomposed process in the higher level diagram.
Data flow A data flow moves data between processes or between processes and data stores. As such, it represents a data value at some point within a computation and an intermediate value within a computation if the flow is internal to the diagram. This value is not changed. The names of input and output flows can indicate their roles in the computation or the type of the value they move. Data names are preferably nouns. The name of a typical piece of data, the data aspect, is written alongside the arrow.
Control flow A control flow is a signal that carries out a command or indicates that something has occurred. A control flow occurs at a discrete point in time. The arrow indicates the direction of the control flow. The name of the event is written beside the arrow. Control flows can correspond to messages in CCDs or events in STDs; however, because they duplicate information in the DFD, use them sparingly.
Update flow Update (or bidirectional) flows are used to indicate an update of a data store, that is, a read, change, and store operation on a data flow.
16 | P a g e
Flow names and inheritance Flows in DFDs must be named. However, flows can inherit the names of the objects they are connected to. The table below shows the rules for inheritance of names. These rules are applied in the order shown, until nothing more can be inherited. In some situations, the flow's inherited name causes an error when a Check command is carried out. The result of the inheritance is confusing in the diagram. Original
Situation After
Situation
Inheritance
Explanation
Diverging flows without names inherit the name of an incoming flow with a name. If the incoming flow has several names, each diverging flow inherits all of them. Converging flows without names inherit the name of an outgoing flow with a name. If the outgoing flow has several names, each converging flow inherits all of them. Flows connected to a data store, control store, message queue, message box, event queue, or event flag inherit the name of that node.
A forked (converging or diverging) data flow is either a split or merging data flow, or a composite data flow. A composite data flow has one name for each branch. A composite flow can split into the original flows again. A split or a merging data flow has only one name. The name of the flow is taken as type name if no data type is specified
17 | P a g e
Process Notations
Yourdon and Coad Process Notations
Gane and Sarson Process Notation Datastore Notations
Yourdon and Coad Datastore Notations
Gane and Sarson Datastore Notations Dataflow Notations
External Entity Notations
18 | P a g e
Data Flow Diagram Layers Draw data flow diagrams in several nested layers. A single process node on a high level diagram can be expanded to show a more detailed data flow diagram. Draw the context diagram first, followed by various layers of data flow diagrams.
The nesting of data flow layers
19 | P a g e
Context Diagrams A context diagram is a top level (also known as Level 0) data flow diagram. It only contains one process node (process 0) that generalizes the function of the entire system in relationship to external entities.
DFD levels The first level DFD shows the main processes within the system. Each of these processes can be broken into further processes until you reach pseudocode.
An example first-level data flow diagram
20 | P a g e
Key Primary key Most DBMSs require a tabl to be defined as having a single unique key, rather than a number of possible unique k eys. A primary key is a key which the database designer has designated for this purpose. P imary Key identifies the whole record.
Secondary/Foreign ke y Secondary key is a key whic reference to the Primary key which exists in the other table. It is necessary to make the relationships.
Data Redundancy Data Redundancy refers to a da ta organization act that duplicates your unnece sary data within the database. To make any change or modification in the redundant data, you ar e supposed to make changes in the multiple fields o the database. While this is a general behaviour for Spreadsheet and Flat File Database structure, it o erwhelms the function of relational database str cture.
The data connections should al low you to keep up and maintain just one da a field, only at one location, and make the database the main relational model that would be respons ible for any changes, across the data base. The redu ndant database utilizes lot of place unnecessa ily and also creates problem for the maintenance of t he database.
The database software removes he data redundancy by centralizing the data into one database and all the application can access the sa e data
Data Integrity The database designer is resp nsible for incorporating elements to promo e the accuracy and reliability of stored data within the database. There are many different techniques that can be used to encourage data integrity, with some of these dependants on what database technology 21 | P a g e
is being used. There are different types of data integrity techniques available whilst working with Microsoft Access: 1. Referential Integrity 2. Cascade Updates & Deletes 3. Table Level Integrity 1. Field Comparisons 2. Validation Tables
Referential Integrity - part of the definition of a true relational database product is that it supports referential integrity. Referential Integrity principles may be stated by: "Every non-null foreign key value must match an existing primary key value"
If a value exists in the foreign key field of a table, then there must be a matching value in the primary key field of the table to which it is related. Referential Integrity is all about preserving the validity of the foreign key values.
Cascade Updates and Deletes As with anything in the real world, things can alter and you will need to ensure that the database can cope with this. Code names such as DepartmentCode will get revised, and departments can close or merge, therefore we need to be able to maintain the data when changes required will violate referential integrity rules. RDBMS products generally handle these changes through cascading updates and deletes (different products may handle this differently, and have different names and techniques for this). In some database products you may need to create rules or triggers or use an operator.
Business Rules and Levels of Enforcement Referential Integrity is enforced at the database level, in that it controls the integrity of the data between tables. As the database designer, you can also do things at both field and table levels to help ensure data integrity. Business rules should be implemented to ensure that the data entered meets the requirements of a particular setting for the database. Business rules should be documented as they are implemented. This should detail each rule, where and how it is implemented and enforced within the database design. Over time these rules may change, and having each and every rule documented will make it much easier to find and modify the design. As you implement a rule, it is important that each one is tested. Does the rule give the intended result? What happens when the rule is violated? Good application design will also give the user feedback (messages) when a rule is broken, and allow them to rectify any changes they were making.
Field Level Integrity Using Field Properties - Each of the fields that are contained in the database has properties associated with it. These properties may be referred to as elements or attributes of the field. These enable you, as the database designer, to place constraints on the values that may be entered into that field. Data Types - the most obvious constraint that can be placed on the fields in your database will be done with the selection of a data type for the field. Data types may vary by RDBMS,
22 | P a g e
however in general they will be pretty much the same; usually, you will also be able to create custom data types through code. As you begin to collect information regarding the design of the database, you will be defining what types of data can, or should be entered into the fields that you define.
A number or numeric data type will only allow the entry of numbers and should be used for most fields on which calculations will be performed; it will however drop leading zeros and may occasionally encounter rounding errors. A currency data type can eliminate rounding errors, but may not be as accurate as to the many digits that a number data type can contain. A text field can contain basically anything, but may be limited to a certain number of characters. It can be used for numeric data on fields where no calculations will be required, or where the data needs to retain a leading zero(s). Memo data types, if available, will allow for a much larger number of characters. Date/Time fields are restricted to only allowing valid dates and times. A Boolean (Yes/No data type in Microsoft Access) will permit the entry of only one of two values - yes/no, true/false or on/off.
Most of these data types can also be restricted further by setting allowable sizes (some may already have default values that cannot be changed). Some of the data types may also allow you to define a format, for example the amount of decimal places.
Table Level Integrity Field Comparisons - Database tables also have properties that you can use to set a validation rule on records in the table. By doing this, you can set a rule that compares the value of one field in the record to that of another value, in another field, in the same record. This rule is run before the record is saved. An example of this would be to compare dates, as part of your business rules. You business may have a rule in place that a OrderDespatchDate must be no more than 3 days after the OrderPlacedDate. The rule would look something like: OrderDespatchDate <= OrderPlacedDate + 3 If the rule is violated, an error message can be displayed, and the data must be amended before the record can be saved.
Validation Tables A validation table is created to promote data integrity. Normally, a validation table will consist of two fields; one is the primary key, and the other holds the values used by some other field in the database. The validation table normally will hold a static set of values, enabling you to store a master set of values in one location and, by referencing those values instead of entering values directly into a field, you can ensure consistent values are used.
Database Normalization Database normalization, sometimes referred to as canonical synthesis, is a technique for designing relational database tables to minimize duplication of information and, in so doing, to safeguard the database against certain types of logical or structural problems, namely data anomalies. For example, when multiple instances of a given piece of information occur in a table, the possibility exists that these instances will not be kept consistent when the data within the table is updated, leading to a loss of data integrity. A table that is sufficiently normalized is less vulnerable to problems of this kind, because its structure reflects the basic assumptions for when multiple instances of the same information should be represented by a single instance only. 23 | P a g e
Higher degrees of normalization typically involve more tables and create the need for a larger number of joins, which can reduce performance. Accordingly, more highly normalized tables are typically used in database applications involving many isolated transactions (e.g. an Automated teller machine), while less normalized tables tend to be used in database applications that need to map complex relationships between data entities and data attributes (e.g. a reporting application, or a full-text search application). Database theory describes a table's degree of normalization in terms of normal forms of successively higher degrees of strictness. A table in third normal form (3NF), for example, is consequently in second normal form (2NF) as well; but the reverse is not always the case. Although the normal forms are often defined informally in terms of the characteristics of tables, rigorous definitions of the normal forms are concerned with the characteristics of mathematical constructs known as relations. Whenever information is represented relationally, it is meaningful to consider the extent to which the representation is normalized. 1NF Eliminate Repeating Groups - Make a separate table for each set of related attributes, and
give each table a primary key. 2NF Eliminate Redundant Data - If an attribute depends on only part of a multi-valued key, remove
it to a separate table. 3NF Eliminate Columns Not Dependent On Key - If attributes do not contribute to a description of
the key, remove them to a separate table. BCNF Boyce-Codd Normal Form - If there are non-trivial dependencies between candidate key
attributes, separate them out into distinct tables. 4NF Isolate Independent Multiple Relationships - No table may contain two or more 1:n or n:m
relationships that are not directly related. 5NF Isolate Semantically Related Multiple Relationships - There may be practical constrains on
information that justify separating logically related many-to-many relationships. ONF Optimal Normal Form - a model limited to only simple (elemental) facts, as expressed in
Object Role Model notation. DKNF Domain-Key Normal Form - a model free from all modification anomalies.
1. Eliminate Repeating Groups In the original member list, each member name is followed by any databases that the member has experience with. Some might know many, and others might not know any. To answer the question, "Who knows DB2?" we need to perform an awkward scan of the list looking for references to DB2. This is inefficient and an extremely untidy way to store information. Moving the known databases into a seperate table helps a lot. Separating the repeating groups of databases from the member information results in first normal form. The MemberID in 24 | P a g e
the database table matches the primary key in the member table, providing a foreign key for relating the two tables with a join operation. Now we can answer the question by looking in the database table for "DB2" and getting the list of members.
2. Eliminate Redundant Data In the Database Table, the primary key is made up of the MemberID and the DatabaseID. This makes sense for other attributes like "Where Learned" and "Skill Level" attributes, since they will be different for every member/database combination. But the database name depends only on the DatabaseID. The same database name will appear redundantly every time its associated ID appears in the Database Table. Suppose you want to reclassify a database - give it a different DatabaseID. The change has to be made for every member that lists that database! If you miss some, you'll have several members with the same database under different IDs. This is an update anomaly. Or suppose the last member listing a particular database leaves the group. His records will be removed from the system, and the database will not be stored anywhere! This is a delete anomaly. To avoid these problems, we need second normal form. To achieve this, separate the attributes depending on both parts of the key from those depending only on the DatabaseID. This results in two tables: "Database" which gives the name for each DatabaseID, and "MemberDatabase" which lists the databases for each member. Now we can reclassify a database in a single operation: look up the DatabaseID in the "Database" table and change its name. The result will instantly be available throughout the application.
25 | P a g e
3. Eliminate Columns Not Dependent On Key The Member table satisfies first normal form - it contains no repeating groups. It satisfies second normal form - since it doesn't have a multivalued key. But the key is MemberID, and the company name and location describe only a company, not a member. To achieve third normal form, they must be moved into a separate table. Since they describe a company, CompanyCode becomes the key of the new "Company" table. The motivation for this is the same for second normal form: we want to avoid update and delete anomalies. For example, suppose no members from the IBM were currently stored in the database. With the previous design, there would be no record of its existence, even though 20 past members were from IBM!
BCNF. Boyce-Codd Normal Form Boyce-Codd Normal Form states mathematically that: A relation R is said to be in BCNF if whenever X -> A holds in R, and A is not in X, then X is a candidate key for R. BCNF covers very specific situations where 3NF misses inter-dependencies between non-key (but candidate key) attributes. Typically, any relation that is in 3NF is also in BCNF. However, a 3NF relation won't be in BCNF if (a) there are multiple candidate keys, (b) the keys are composed of multiple attributes, and (c) there are common attributes between the keys. 26 | P a g e
Basically, a humorous way to remember BCNF is that all functional dependencies are: "The key, the whole key, and nothing but the key, so help me Codd."
4. Isolate Independent Multiple Relationships This applies primarily to key-only associative tables, and appears as a ternary relationship, but has incorrectly merged 2 distinct, independent relationships. The way this situation starts is by a business request list the one shown below. This could be any 2 M:M relationships from a single entity. For instance, a member could know many software tools, and a software tool may be used by many members. Also, a member could have recommended many books, and a book could be recommended by many members.
Initial business request So, to resolve the two M:M relationships, we know that we should resolve them separately, and that would give us 4th normal form. But, if we were to combine them into a single table, it might look right (it is in 3rd normal form) at first. This is shown below, and violates 4th normal form.
Incorrect solution To get a picture of what is wrong, look at some sample data, shown below. The first few records look right, where Bill knows ERWin and recommends the ERWin Bible for everyone to read. But something is wrong with Mary and Steve. Mary didn't recommend a book, and Steve Doesn't know any software tools. Our solution has forced us to do strange things like create dummy records in both Book and Software to allow the record in the association, since it is key only table.
27 | P a g e
Sample data from incorrect solution The correct solution, to cause the model to be in 4th normal form, is to ensure that all M:M relationships are resolved independently if they are indeed independent, as shown below.
Correct 4th normal form NOTE! This is not to say that ALL ternary associations are invalid. The above situation made it obvious that Books and Software were independently linked to Members. If, however, there were distinct links between all three, such that we would be stating that "Bill recommends the ERWin Bible as a reference for ERWin", then separating the relationship into two separate associations would be incorrect. In that case, we would lose the distinct information about the 3-way relationship.
5. Isolate Semantically Related Multiple Relationships OK, now lets modify the original business diagram and add a link between the books and the software tools, indicating which books deal with which software tools, as shown below.
Initial business request This makes sense after the discussion on Rule 4, and again we may be tempted to resolve the multiple M:M relationships into a single association, which would now violate 5th normal form. The ternary association looks identical to the one shown in the 4th normal form 28 | P a g e
example, and is also going to have trouble displaying the information correctly. This time we would have even more trouble because we can't show the relationships between books and software unless we have a member to link to, or we have to add our favorite dummy member record to allow the record in the association table.
Incorrect solution The solution, as before, is to ensure that all M:M relationships that are independent are resolved independently, resulting in the model shown below. Now information about members and books, members and software, and books and software are all stored independently, even though they are all very much semantically related. It is very tempting in many situations to combine the multiple M:M relationships because they are so similar. Within complex business discussions, the lines can become blurred and the correct solution not so obvious.
Correct 5th normal form
6. Optimal Normal Form At this point, we have done all we can with Entity-Relationship Diagrams (ERD). Most people will stop here because this is usually pretty good. However, another modeling style called Object Role Modeling (ORM) can display relationships that cannot be expressed in ERD. Therefore there are more normal forms beyond 5th. With Optimal Normal Form (OMF) It is defined as a model limited to only simple (elemental) facts, as expressed in ORM.
7. Domain-Key Normal Form This level of normalization is simply a model taken to the point where there are no opportunities for modification anomalies. 29 | P a g e
"if every constraint on the relation is a logical consequence of the definition of keys and domains" Constraint "a rule governing static values of attributes" Key "unique identifier of a tuple" Domain "description of an attribute’s allowed values" •
• • •
1. A relation in DK/NF has no modification anomalies, and conversely. 2. DK/NF is the ultimate normal form; there is no higher normal form related to modification anomalies 3. Defn: A relation is in DK/NF if every constraint on the relation is a logical consequence of the definition of keys and domains. 4. Constraint is any rule governing static values of attributes that is precise enough to be ascertained whether or not it is true 5. E.g. edit rules, intra-relation and inter-relation constraints, functional and multivalued dependencies. 6. Not including constraints on changes in data values or time-dependent constraints. 7. Key - the unique identifier of a tuple. 8. Domain: physical and a logical description of an attributes allowed values. 9. Physical description is the format of an attribute. 10. Logical description is a further restriction of the values the domain is allowed 11. Logical consequence: find a constraint on keys and/or domains which, if it is enforced, means that the desired constraint is also enforced. 12. Bottom line on DK/NF: If every table has a single theme, then all functional dependencies will be logical consequences of keys. All data value constraints can them be expressed as domain constraints. 13. Practical consequence: Since keys are enforced by the DBMS and domains are enforced by edit checks on data input, all modification anomalies can be avoided by just these two simple measures.
Components of DBMS Data dictionary/directory Database management systems, a file that defines the basic organization of a database. A data dictionary contains a list of all files in the database, the number of records in each file, and the names and types of each field. Most database management systems keep the data dictionary hidden from users to prevent them from accidentally destroying its contents. Data dictionaries do not contain any actual data from the database, only bookkeeping information for managing it. Without a data dictionary, however, a database management system cannot access data from the database.
Data languages To define the entries in the data dictionary special language is used which is known as DDL (Data Definition Language or Data Description Language). This language is concerned with the database administrators
30 | P a g e