Topic 2.4: The Evolution of Data Models The quest for better data management has led to different models that attempt to resolve the file system’s critical shortcomings. Because each data model evolved from its predecessors, it is essential to examine the major data models in roughly chronological order.
2.4.1 The Hierarchical Model The first data model was developed by Rockwell and IBM in the 1970s. It is known as the hierarchical model. The hierarchical database is a collection of records that is logically organized to conform to the upside-down tree (hierarchical) structure. Within the hierarchy, the top layer (the root) is perceived as the parent of the segment directly beneath it. While this model represents 1:M relationships well, it does not represent M:N relationships. Basic Structure Given its manufacturing heritage, the hierarchical model’s best basic logical structure is best understood when you examine a manufacturing process. For, example, let’s examine a somewhat simplified production process that creates a filing cabinet: 1. A filing cabinet has many components: a frame, a set of drawers, and sliding bars for those drawers. 2. A component may be composed of many smaller assemblies. For example, each drawer has a handle with a latching mechanism, a set of rollers that fits into the frame’s sliding bars, and a divider blade. 3. An assembly may contain many parts. For instance, each roller is composed of a small wheel, an axle, and a brace. 4. The production process is based on data relationships that remain fixed over time. Whether a given filing cabinet model is produced today or tomorrow, the same parts are put together in the same ways to produce the same assemblies that are combined to produce the same components that are assembled in the same way to create the filing cabinet. Tracking the parts, the assemblies, and the components we have just described is facilitated by understanding the logical process that is represented by the upside-down “tree,” known as a hierarchical structure, shown in Figure 2.1. We have labeled the structure’s components to help you understand the basic hierarchical model’s vocabulary. As you examine Figure 2.1, note that the user perceives the hierarchical database as a hierarchy of segments. A segment is the equivalent of a file system’s record type. In other words, the hierarchical database is a collection of record segment structures that is logically organized to conform to the
upside-down tree (hierarchical) structure shown in Figure 2.1.Within the hierarchy, the top layer (the root) is perceived as the parent of the segment directly beneath it.
For example, in Figure 2.1, the root segment is the parent of level 1segments, which in turn, are the parents of the level 2 segments, and so on. In turn the segments below other segments are the children of the segment above them. In short: ° °
Each parent can have many children Each child has only one parent
In this hierarchical structure, it is easy to trace both the database’s components and the 1:M relationships among them. Advantages – Conceptual simplicity – Database security – Data independence (because the data characteristics of the database structure are not defined in the programs accessing the database, instead the database structure and its data characteristics are defined in the data dictionary component of the DBMS. Therefore the programs accessing the database become independent of the database) – Database integrity (because data duplication or data redundancy is minimized as a result of relating the segments or records) – Efficiency (the hierarchical DBMS file storage organization and access methods are based on the new hierarchal database structure which is much faster than the file storage organization and access methods used in
the old file system) Disadvantages – Complex implementation – Difficult to manage – Lacks structural independence (because the programmer still needs to write instructions on how and where to find the data stored on the computer disk, which depends on the database structure) – Complex applications programming and use – Implementation limitations (because the hierarchical data model does not support entities or record segments having multiple parents which are modeled in a M:M relationships between two or more entities) – Lack of standards among the implementation software (DBMS) developed by various software vendors
2.4.2 The Network Model The network model was created to represent complex data more effectively than the hierarchical model could, to improve database performance, and to impose a database standard. Basic Structure In many respects the network model resembles the hierarchical model. For example, as in the hierarchical model, the user perceives the network database as a collection of records in 1:M relationships. However, unlike the hierarchical model, the network model allows a record to have more than one parent or multiple parents. This feature allows the network model to handle complex (M:M) relationships between two or more entities, such the commonly encountered M:M relationships depicted in Figure 2.2 can be handled easily by the network model.
In Figure 2.2, the M:M relationship between the ORDER and PART is resolved
by the introduction of the ORDER_LINE bridge entity. In network database terminology, a relationship is called a set. Each set is composed of at least two record types: an owner record and a member record. The difference between the hierarchical model and the network model is that the latter might include a condition in which a record can appear (as a member) in more than one set. In other words, a member may have several owners. A set represents a 1:M relationship between the owner and the member. An example of such a relationship is depicted in Figure 2.3. Advantages – Conceptual simplicity – Handles more relationship types – Data access flexibility – Promotes database integrity – Data independence – Conformance to standards Disadvantages – System complexity – Lack of structural independence (because the programmer still needs to write instructions on how and where to find the data stored on the computer disk, which depends on the database structure)
2.4.3 The Relational Model The basic building block of the relational model is the table, which is a matrix of rows and columns. Tables are related to each other via a common entity characteristic or attribute (primary key in the parent table is a foreign key in the child table). The parent table is the table which maps to the entity of the “1” side of the relationship and the child table maps to the entity of the “many” side of the relationship between the two tables. All three relationship types are easily represented in this model. One of the disadvantages of the relational model is that it requires substantial system overhead to run the Relational DBMS (RDBMS). However, with the currently available advanced computer hardware and software, high requirements for processing relational databases do not represent an overhead problem any more.
Basic Structure The relational data model is implemented through a very sophisticated relational database management system (RDBMS). The RDBMS performs the same basic functions provided by the hierarchical and network database systems, plus a host of other functions that make the relational data model easier to understand and to implement. The most important advantage of the RDBMS is its ability to let the user/designer operate in a human logical environment. The RDBMS manages all of the complex physical details. Thus, the relational database is perceived by the user to be a collection of tables in which data are stored. Each table is a matrix consisting of series of row/column intersections. Tables, also called relations, are related to each other by sharing a common entity characteristic/attribute. For example, the CUSTOMER table in Figure 2.4 might contain a sales agent’s number which maintains a common link to the agent table. The common link between the CUSTOMER and AGENT tables thus enables us to match the customer to his/her sales agent, even though the customer data are stored in another table. Although the tables are completely independent of one another, we can easily connect the data between tables. The relational model thus provides a minimum level of controlled redundancy to eliminate most of the redundancies found in old file systems.
The relationship type (1:1, 1:M, or M:N) is often shown in a relational schema, an example of which is depicted in Figure 2.5. A relational schema is a visual representation of the relational database’s entities, the attributes within those entities, and the relationship between those entities.
As you examine Figure 2.5, note that the relational schema shows the connecting fields (in this case, AGENT_CODE) and the relationship type, 1:M. MS Access DBMS software used to generate Figure 2.5, employs the ∞ symbol to indicate the “many” side. In this example, the CUSTOMER represents the
“many” side because an AGENT can serve many CUSTOMERS. The AGENT represents the “1” side, because each CUSTOMER is served by only one AGENT. Advantages – Structural independence – Improved conceptual simplicity – Easier database design, implementation, management, and use – Ad hoc query capability – Powerful database management system Disadvantages – Substantial hardware and system software overhead (this is not an issue any more because of the currently available hardware and software) – Can facilitate poor design and implementation (Less experienced database designers may develop poor database design) – May promote “islands of information” problems (because various users in different departments will be developing their own database applications)
2.4.4 The Entity Relationship Model An alternate model is the Entity Relationship (ER) model. In this model, entities are drawn by using diagrams with line connectors that depict their relationships. This model has the advantage of visually depicting relationships. A disadvantage is that there is no corresponding (data management language (DML). The ER model or ERM is a widely accepted and adapted graphical tool for data modeling. Peter Chen first introduced the ER data model in 1976 in his landmark paper “The Entity Relationship Model: Toward a Unified View of Data”. The ERM yielded a graphical representation that popularized the use of the ER diagrams as a tool for conceptual-level data modeling. Better yet, the ER model complemented the relational model concepts, thus providing the foundation for a tightly structured database design environment to ensure the proper design of relational databases.
Basic Structure ER models are normally represented in an entity relationship diagram (ERD), which uses graphical representations to model the database requirements. °
An entity is represented in the ERD model by a rectangle, also known as an entity box. The name of the entity, a noun, is written in the center of the rectangle. The name of the entity is generally written in capital letters and is written in singular form: PAINTER rather than PAINTERS. Normally,
when applying the ERD to the relational model, an entity is mapped to a relational table. Each row in the relational table is known as an entity instance or entity occurrence in the ER model. Each entity is described by a set of attributes that describe particular characteristics of the entity. For example, the entity EMPLOYEE will have attributes such as Social Security number, a last name, and a first name. °
Relationships describe associations among data entities. Most relationships describe associations between two entities. ERD modelers use the term connectivity to label the types of relationships (1:M, M:N, 1:1). The entity connectivity is written next to each entity box. Relationships are represented by a diamond connected to the related entities through a relationship line. The name of the relationship, an active or passive verb, is written inside the diamond. For example, each of the company’s DEPARTMENTs has many EMPLOYEEs. And a PAINTER paints many PAINTINGS.
Figure 2.6 shows some basic ERD models that illustrate these relationships and connectivity type.
The ERD shown in Figure 2.6 is based on the so-called Chen model. Although the entities and relationships are shown in a horizontal format in Figure 2.6, they also may be oriented vertically. The entity location and the order in which the entities are presented are immaterial – just remember to always read a 1:M relationship from the “1” side to the “M” side. A more current version of the ERD is the Crow’s Foot Model shown in Figure
2.7. The label “Crow’s Foot” is derived from the three-pronged symbol used to represent the “many” side of the relationship. The Crow’s Foot model places the relationship name in the relationship line.
As you examine the basic Crow’s Foot ERD in Figure 2.7, note that the connectivity is represented by symbols. For example, the “1” is represented by a short line segment and the “M” is represented by the three-pronged “crow’s foot.” Like the Chen ERD, the entities and the relationships may be represented horizontally or vertically. And again like the Chen ERD, the location and the order in which the entities are presented in a Crow’s Foot ERD are immaterial. Advantages – Exceptional conceptual simplicity – Visual representation – Effective communication tool – Integrated with the relational data model Disadvantages – Limited constraint representation – Limited relationship representation (relationships between attributes can not be modeled) – No data manipulation language – Loss of information content (limited space is available to draw large number of entities in the Chen original notations of the ERD technique)
2.4.5 The Object Oriented Model In the Object Oriented model entities are represented as objects that contain both data and operations. An advantage of this model is the addition of semantic content. A disadvantage is the steeper learning curve. The semantic data model (SDM) modeled both data and their relationships in a single structure known as an object. Because its basic modeling structure is an object, the SDM is said to be an object oriented data model (OODM). In turn, the OODM becomes the basis for the object oriented database management system (OODBMS). An OODM reflects a very different way to define and use entities. Like the relational model’s entity, an object is described by its factual content. But, quite unlike an entity, an object includes information about relationships between the facts within the object, as well as information about relationships with other objects. Therefore, the facts within the objects are given greater meaning.
Basic Structure The object oriented data model is based on the following components: °
° °
°
An object is an abstraction of a real-world thing. An object class is a representation of a set of objects with shared attributes and behavior. For example, an object class student is a model of all students in an educational institution. An object class may be considered equivalent to an ER model’s entity. More precisely, an object represents only one individual occurrence of an entity. Attributes describe the properties of an object. For example, a PERSON object class includes the attributes ID, Name, Social Security Number and Date of Birth. Objects that share similar characteristics are grouped in classes. A class is a collection of similar objects with shared structure (attributes) and behavior (methods). In a general sense, a class resembles the ER model’s entity set. However, a class is different from an entity in that it contains a set of procedures known as methods. A class’s method represents a real-world action such as finding a selected PERSON’s name, changing a PERSON’s name, or printing a PERSON’s address. In other words, methods are the equivalent of procedures in traditional programming languages. In object oriented terms methods define an object’s behavior. Classes are organized in class hierarchy. The class hierarchy resembles an upside-down tree in which each class has only one parent. For example, the CUSTOMER and EMPLOYEE class share a parent PERSON class. However it is possible that one child class to have
°
multiple parents. Inheritance is the ability of an object within the class hierarchy to inherit the attributes and methods of the classes above it. For example, we can create two classes, CUSTOMER and EMPLOYEE, as subclasses from the class PERSON. In this case CUSTOMER and EMPLOYEE will inherit all attributes and methods from PERSON.
To illustrate the difference between the OO model and the ER model, let’s examine their graphic representations in the simple invoicing problem shown in Figure 2.8. As you examine Figure 2.8, note that: °
°
The OO data model represents an object class as a box; all of the object’s attributes and relationships to other objects are included within the object class box. The object class representation of the INVOICE includes all related objects within the same object class box. The ER model uses three separate entities and two relationships to represent an invoice transaction. Because customers can put more than one item at a time, each invoice references one or more lines, one item per line. And, because invoices are generated by customers, the datamodeling requirements include a customer entity and a relationship between the customer and the invoice.
Advantages – Adds semantic content – Visual presentation includes semantic content – Database integrity
–
Both structural and data independence
Disadvantages – Slow pace of OODM standards development – Complex navigational data access – Steep learning curve – High system overhead slows transactions – Lack of market penetration
2.4.6 Other Models Another semantic data model was developed in response to the increasing complexity of applications- the extended relational data model (ERDM). The ERDM championed by many relational database researchers constitutes the relational model’s response to the OODM challenge. This model includes many of the OO model’s best features within an inherently simpler relational database structural environment. That’s why a DBMS based on the ERDM is often described as an object/relational database management system (O/RDBMS). The OODM and ERDM are similar in the sense that each attempts to address the demand for more semantic information to be incorporated into the model. However, the OODM and the ERDM differ substantially both in underlying philosophy and in the nature of the problem to be addressed. Although the ERDM includes a strong semantic component, it is primarily based on the relational data model’s concepts. In contrast, the OODM is wholly based on the OO semantic data model concepts. The ERDM is primarily geared to business applications, while the OODM tends to focus on very specialized engineering and scientific applications. In the database arena, the most likely scenario appears to be an ever-increasing merging of OO and relational data model concepts and procedures.
2.4.7 Data Models: Summary The evolution of database management systems has always been driven by the search for new ways of modeling increasingly complex real-world data. A summary of the most commonly recognized data models is shown in Figure 2.9.
Concept Check What are major types of data models? How does the hierarchical data model address the problem of data redundancy? What are the features of relational data models?