ENTITY RELATIONSHIP MODEL
In software engineering, an entity-relationship model (ERM) is an abstract and conceptual representation of data. Entity-relationship modeling is a database modeling method, used to produce a type of conceptual schema or semantic data model of a system, often a relational database, and its requirements in a top-down fashion. Diagrams created by this process are called entity-relationship diagrams, ER diagrams, or ERDs.
This article refers to the techniques proposed in Peter Chen's 1976 paper.[1] However, variants of the idea existed previously,[2] and have been devised subsequently.
In software engineering, an entity-relationship model (ERM) is an abstract and conceptual representation of data. Entity-relationship modeling is a database modeling method, used to produce a type of conceptual schema or semantic data model of a system, often a relational database, and its requirements in a top-down fashion. Diagrams created by this process are called entity-relationship diagrams, ER diagrams, or ERDs.
This article refers to the techniques proposed in Peter Chen's 1976 paper.[1] However, variants of the idea existed previously,[2] and have been devised subsequently.
Overview
The first stage of information system design uses these models during the requirements analysis to describe information needs or the type of information that is to be stored in a database. The data modeling technique can be used to describe any ontology (i.e. an overview and classifications of used terms and their relationships) for a certain area of interest. In the case of the design of an information system that is based on a database, the conceptual data model is, at a later stage (usually called logical design), mapped to a logical data model, such as the relational model; this in turn is mapped to a physical model during physical design. Note that sometimes, both of these phases are referred to as "physical design".
The building blocks: entities, relationships, and attributes
An entity may be defined as a thing which is recognized as being capable of an independent existence and which can be uniquely identified. An entity is an abstraction from the complexities of some domain. When we speak of an entity we normally speak of some aspect of the real world which can be distinguished from other aspects of the real world.[3]
An entity may be a physical object such as a house or a car, an event such as a house sale or a car service, or a concept such as a customer transaction or order. Although the term entity is the one most commonly used, following Chen we should really distinguish between an entity and an entity-type. An entity-type is a category. An entity, strictly speaking, is an instance of a given entity-type. There are usually many instances of an entity-type. Because the term entity-type is somewhat cumbersome, most people tend to use the term entity as a synonym for this term.
Entities can be thought of as nouns. Examples: a computer, an employee, a song, a mathematical theorem.
A relationship captures how two or more entities are related to one another. Relationships can be thought of as verbs, linking two or more nouns. Examples: an owns relationship between a company and a computer, a supervises relationship between an employee and a department, a performs relationship between an artist and a song, a proved relationship between a mathematician and a theorem.
The model's linguistic aspect described above is utilized in the declarative database query language ERROL, which mimics natural language constructs. ERROL's semantics and implementation are based on Reshaped relational algebra (RRA), a relational algebra which is adapted to the ERM and captures its linguistic aspect.
Entities and relationships can both have attributes. Examples: an employee entity might have a Social Security Number (SSN) attribute; the proved relationship may have a date attribute.
Every entity (unless it is a weak entity) must have a minimal set of uniquely identifying attributes, which is called the entity's primary key.
Entity-relationship diagrams don't show single entities or single instances of relations. Rather, they show entity sets and relationship sets. Example: a particular song is an entity. The collection of all songs in a database is an entity set. The eaten relationship between a child and her lunch is a single relationship. The set of all such child-lunch relationships in a database is a relationship set. In other words, a relationship set corresponds to a relation in mathematics, while a relationship corresponds to a member of the relation.
Relationships, roles and cardinalities
In Chen's original paper he gives an example of a relationship and its roles. He describes a relationship "marriage" and its two roles "husband" and "wife".
A person plays the role of husband in a marriage (relationship) and another person plays the role of wife in the (same) marriage. These words are nouns. That is no surprise, naming things requires a noun.
However as is quite usual with new ideas, many eagerly appropriated the new terminology but then applied it to their own old ideas. Thus the lines, arrows and crows-feet of their diagrams owed more to the earlier Bachman diagrams than to Chen's relationship diamonds. And they similarly misunderstood other important concepts.
In particular, it became fashionable (now almost to the point of exclusivity) to "name" relationships and roles as verbs or phrases.
Relationship Names
A relationship expressed with a verb that implies direction, makes it impossible to discuss the model using correct English. Examples are:
- the song and the performer are related by a 'performs'
- the husband and wife are related by an 'is-married-to'.
Expressing the relationships with a noun resolves this:
- the song and the performer are related by a 'performance'
- the husband and wife are related by a 'marriage'.
Role naming
It has also become prevalent to name roles with phrases e.g. is-the-owner-of and is-owned-by etc. Correct nouns in this case are "owner" and "possession". Thus "person plays the role of owner" and "car plays the role of possession" rather than "person plays the role of is-the-owner-of" etc.
The use of nouns has direct benefit when generating physical implementations from semantic models. When a person has two relationships with car then it is possible to very simply generate names such as "owner_person" and "driver_person" which are immediately meaningful.
[edit] Cardinalities
However some modifications to the original specification are beneficial. Chen described look-across cardinalities. UML perpetuates this. (As an aside, the Barker-Ellis notation used in Oracle Designer, uses same-side for minimum cardinality (analogous to optionality) and role, but look-across for maximum cardinality (the crows foot)).
Other authors (Merise,[4] Elmasri & Navathe [5] amongst others[6]) prefer same-side for roles and both minimum and maximum cardinalities. Recent researchers (Feinerer,[7] Dullea et. alia [8]) have shown that this is more coherent when applied to n-ary relationships of order >2.
In Dullea et. alia "An analysis of structural validity in entity-relationship modeling" one reads "A 'look across' notation such as used in the UML does not effectively represent the semantics of participation constraints imposed on relationships where the degree is higher than binary."
In Feinerer it says "Problems arise if we operate under the look-across semantics as used for UML associations. Hartmann [9] investigates this situation and shows how and why different transformations fail." (Although the "reduction" mentioned is spurious as the two diagrams 3.4 and 3.5 are in fact the same) and also "As we will see on the next few pages, the look-across interpretation introduces several difficulties which prevent the extension of simple mechanisms from binary to n-ary associations."
[edit] Semantic Modelling
The father of ER modelling said in his seminal paper: "The entity-relationship model adopts the more natural view that the real world consists of entities and relationships. It incorporates some of the important semantic information about the real world." [10] He is here in accord with philosophic and theoretical traditions from the time of the Ancient Greek philosophers: Socrates, Plato and Aristotle (428 BC) through to modern epistemology, semiotics and logic of Pierce, Frege and Russell. Plato himself associates knowledge with the apprehension of unchanging Forms (The forms, according to Socrates, are roughly speaking archetypes or abstract representations of the many types of things, and properties) and their relationships to one another. In his original 1976 article Chen explicitly contrasts Entity-Relationship diagrams with record modelling techniques: "The data structure diagram is a representation of the organisation of records and is not an exact representation of entities and relationships." Several other authors also support his program:
Kent in "Data and Reality" : "One thing we ought to have clear in our minds at the outset of a modelling endeavour is whether we are intent on describing a portion of "reality" (some human enterprise) or a data processing activity."
Abrial in "Data Semantics" : "... the so called "logical" definition and manipulation of data are still influenced (sometimes unconsciously) by the "physical" storage and retrieval mechanisms currently available on computer systems."
Stamper: "They pretend to describe entity types, but the vocabulary is from data processing: fields, data items, values. Naming rules don't reflect the conventions we use for naming people and things; they reflect instead techniques for locating records in files."
In Jackson's words: "The developer begins by creating a model of the reality with which the system is concerned, the reality which furnishes its [the system's] subject matter ..."
Elmasri, Navathe: "The ER model concepts are designed to be closer to the user’s perception of data and are not meant to describe the way in which data will be stored in the computer."
A semantic model is a model of concepts, it is sometimes called a "platform independent model". It is an intensional model. At the latest since Carnap, it is well known that:[11] "...the full meaning of a concept is constituted by two aspects, its intension and its extension. The first part comprises the embedding of a concept in the world of concepts as a whole, i.e. the totality of all relations to other concepts. The second part establishes the referential meaning of the concept, i.e. its counterpart in the real or in a possible world". An extensional model is that which maps to the elements of a particular methodology or technology, and is thus a "platform specific model". The UML specification explicitly states that associations in class models are extensional and this is in fact self evident by considering the extensive array of additional "adornments" provided by the specification over and above those provided by any of the prior candidate "semantic modelling languages"."UML as a Data Modeling Notation, Part 2"
Diagramming conventions
Various methods of representing the same one to many relationship. In each case, the diagram shows the relationship between a person and a place of birth: each person must have been born at one, and only one, location, but each location may have had zero or more people born at it.
Two related entities shown using Crow's Foot notation. In this example, an optional relationship is shown between Artist and Song; the symbols closest to the song entity represents "zero, one, or many", whereas a song has "one and only one" Artist. The former is therefore read as, an Artist (can) perform(s) "zero, one, or many" song(s).
Chen's notation for entity-relationship modeling uses rectangles to represent entities, and diamonds to represent relationships appropriate for first-class objects: they can have attributes and relationships of their own. Entity sets are drawn as rectangles, relationship sets as diamonds. If an entity set participates in a relationship set, they are connected with a line.
Attributes are drawn as ovals and are connected with a line to exactly one entity or relationship set.
Cardinality constraints are expressed as follows:
- a double line indicates a participation constraint, totality or surjectivity: all entities in the entity set must participate in at least one relationship in the relationship set;
- an arrow from entity set to relationship set indicates a key constraint, i.e. injectivity: each entity of the entity set can participate in at most one relationship in the relationship set;
- a thick line indicates both, i.e. bijectivity: each entity in the entity set is involved in exactly one relationship.
- an underlined name of an attribute indicates that it is a key: two different entities or relationships with this attribute always have different values for this attribute.
Attributes are often omitted as they can clutter up a diagram; other diagram techniques often list entity attributes within the rectangles drawn for entity sets.
Related diagramming convention techniques:
- Bachman notation
- EXPRESS
- IDEF1X[12]
- Martin notation
- (min, max)-notation of Jean-Raymond Abrial in 1974
- UML class diagrams
- Merise
Crow's Foot Notation
Crow's Foot notation is used in Barker's Notation, SSADM and Information Engineering. Crow's Foot diagrams represent entities as boxes, and relationships as lines between the boxes. Different shapes at the ends of these lines represent the cardinality of the relationship.
Crow's Foot notation was used in the 1980s by the consultancy practice CACI. Many of the consultants at CACI (including Richard Barker) subsequently moved to Oracle UK, where they developed the early versions of Oracle's CASE tools, introducing the notation to a wider audience. The following tools use Crow's Foot notation: ARIS, System Architect, Visio, PowerDesigner, Toad Data Modeler, DeZign for Databases, Devgems Data Modeler, OmniGraffle, MySQL Workbench and SQL Developer Data Modeler. CA's ICASE tool, CA Gen aka Information Engineering Facility also uses this notation.
ER diagramming tools
There are many ER diagramming tools. Some free software ER diagramming tools that can interpret and generate ER models and SQL and do database analysis are MySQL Workbench (formerly DBDesigner), and Open ModelSphere (open-source). A freeware ER tool that can generate database and application layer code (webservices) is the RISE Editor.
Some of the proprietary ER diagramming tools are Avolution, dbForge Studio for MySQL, ER/Studio, ERwin, MEGA International, ModelRight, OmniGraffle, Oracle Designer, PowerDesigner, Rational Rose, Sparx Enterprise Architect, SQLyog, System Architect, Toad Data Modeler, and Visual Paradigm.
Some free software diagram tools just draw the shapes without having any knowledge of what they mean, nor do they generate SQL. These include yEd, LucidChart, Kivio, and Dia.
Entity-relationship modelling
In this chapter I introduce the concepts of entity-relationship (ER) modelling. At the end of this chapter you should be able to explain and apply these concepts.
1 Databases, this class, and you
Data bases, and the data base management systems that lord over them, are the core information systems technology. They are used---and will be used---to store corporate data, web pages, on-line movies, work flow information, document databases---absolutely everything that is of interest to business. After taking this class you will clearly understand and be able to explain why this is a good thing. This understanding will allow you to see opportunities for exploiting this technology in innovative ways.
Both of the above will be useful for a student (and eventual businessperson) whether they be interested in accounting, marketing, human resources, or finance. Business process reengineering (BPR) (by whatever name) virtually demands that data bases take a more central role in a corporation’s life. By its very definition BPR demands that people from throughout an organisation apply information technology solutions to broad problems. Data bases are one of the more frequently applied solutions. Thus, I propose that a great percentage of students interested in business should be knowledgeable about data bases.
2 A very simple modelling exercise
The whole purpose of ER modelling is to create an accurate reflection of the real world in a database. The ER model doesn’t actually give us a database description. It gives us an intermediate step from which it is easy to define a database. Let’s look at an example. (You will see much more detail on these concepts in the rest of this chapter. For now just try to understand the overall process.)
Suppose you are presented with the following situation and are told to create a database for it:
Every department within our company is in only one division. Each division has more than one department in it. We don’t have an upper limit on the number of departments that a division can have. For example, the New Business Development---the one managed by Mackenzie---and Higher Education departments are both in the Marketing division.
This is a fairly clear description of a situation. Many things are left unsaid that we understand about the situation. For example: each division has a name, and that name is unique within the company. For now, though, let’s focus on the description as it is given.
The first step is to figure out the items of interest in this situation. (In this document you will come across Problems. You should attempt to perform these before continuing the reading. Simply reading the problem and then reading the answer is not sufficient---you should attempt the problem yourself before you continue reading. Understanding these problems are integral to understanding the text. The answer to the problem appears in the text immediately after the problem.)
Problem 1: What are the items of interest here?
It seems here that the situation is concerned with divisions, departments, and employees or managers. It gives some details about which contains which, how they are related to each other, and provides some examples of each, but basically the situation is concerned with these three entity types. Here’s a formal, if somewhat ambiguous, definition.
An entity type is a collection of entities that share a common definition. An entity is a person, place, concept, or thing about which the business needs data.
So, Department is the name of one entity type. One instance of this entity type is the New Business Development department. The Marketing division is an instance of the Division entity type. Mackenzie is one instance of the Employee entity type. Instances of entity types are referred to as entities. Put more simply: You can touch an entity but an entity type is simply an idea. Person is an idea (entity type) while Scott, Nancy, Lindsey, and Mackenzie are touchable (entities). Entity types provide us with a means for making generalisations about entities. For example, instead of saying “Every department within our company is in only one division,” we could have gone down the list of all departments (that is, all entities with entity type Department) and asserted that each one is, indeed, in one division:
“The New Business Development department is in one division. The Higher Education department is in one division. ...” And so on until we’ve noted that each is on only one division.
But we know more than the facts about each individual department being in one division. We know that all new departments will also be in just one division. And if there is a new division, it, too, will have departments that are unique to the division. So, instead of providing information in the form of statements about specific entities, we use a more powerful and concise format and provide information in the form of statements about relationships among entity types.
Thus, in ER modelling we look for relationships among entity types because it is easier and more concise to speak of relationships among general entity types rather than the touchable entities themselves.
Problem 2: Each of the following is either an entity type and/or an entity. If it’s an entity type, then provide examples of entities; if it’s an entity, define a possible entity type.
- A municipal bond from Detroit
- Ford
- Clothes
- Employee
The municipal bond is an entity; bond is a possible entity type. Ford is an entity; manufacturer is a possible entity type. Clothes could be either: 1) a type if the entities are pants, shirts, etc.; 2) an entity if the type is product. Employee is an entity type; Angela and Natalie are example entities.
Back to our example: we have identified three entity types and four entities. From the description we can assume that there are more entities for each entity type. Go back and read the situation description if you do not think this is immediately obvious.
From the description there is some sort of relationship between Department and Division and another sort of relationship between Department and Employee. The first relationship is one of containment: each division has one or more departments, but any one department can only be in one division. (Think about an NCAA conference such as the Big 10 (the bucket) having many teams (a bunch of balls). On the other hand, each team (one ball) can only be in one bucket (a conference). In this instance the bucket is the division and the balls are the departments.) The second relationship tells us that an employee has a certain relationship relative to a certain Department, namely, that the employee manages the department. Determining the relationships among entity types is another important step in the process of ER modelling.
A relationship is an association between entity types.
Problem 3: What would you name these two relationships?
The defining characteristic of a relationship is that several entity types are involved. So something like a name or birth date would not be a relationship since only one entity is involved.
Now we have identified three entity types (Employee, Department, Division) and two relationships among these entity types (manages, contains). Now we can begin to represent the problem in the language of ER modelling.
ER models are usually represented graphically. The language we are going to use represents entity types as rectangles and relationships as diamonds. Below is the representation of the situation we are working with.
Notice that the contains relationship is drawn between the two entities that it is associated with. Similarly for the manages relationship. This (simplified) ER model tells us that:
- Division is related to department through a relationship called contains.
- Departments are related to employees through a relationship called manages.
- Employees are not directly related to divisions.
Certainly we know more about the problem than this. Consider the relationship between divisions and departments. We know that divisions have multiple departments and departments can only be contained within one division. Or, for every one division there can be many departments. In the language of ER modelling this is called a 1:M (read: “one to many”) relationship.
Problem 4: What is the relationship between departments and managers? Fill in the blanks with either a one or a many:
- For each department there can be, at most, _____ managing employee(s).
- For each managing employee there can be, at most, _____ department(s).
The relationship between department and a managing employee is different. It doesn’t say so but we can assume that a department has only one manager. (Certainly you can imagine an instance in which a department has co-managers. That possibility is just as viable as the possibility I have assumed. This is part of the attraction of this type of work. The database professional has to read descriptions for what is said and then imagine what isn’t said. If you were actually creating a database in this example, you would have to ask someone what the situation actually is. But since you are just given this description, you have to come up with some assumption. For this situation we’ll make the above assumption.) We’ll also assume that an employee can also be the manager of, at most, one department. In other words, for every one department there can be, at most, one managing employee. In the language of ER modelling this is called a 1:1 (read: “one to one”) relationship. This information can also be represented in the ER diagram:
As you might have determined, the M part of a relationship is represented by putting an M next to the appropriate entity type in the relationship while the 1 part is represented by a 1. The ER diagram now represents much more information than it did above:
- Any one division can contain many departments. Any one department can be contained in, at most, one division.
- Any department can have, at most, one managing employee (or manager). Any manager can manage, at most, one department.
If you are a bit confused about all this 1:M and 1:1 stuff, never fear. You’ll see a lot more clarifying detail later.
Several other questions remain about this situation that are not addressed in the description:
- What is the minimum number of departments in a division?
- Does a department have to be associated with a division?
- Does a department have to have a manager?
These questions would have to be answered before we complete the ER model. And we will answer these questions later. For now we are going to stop this part of the analysis since the purpose of this example is to demonstrate what ER modelling is all about.
The ER modelling process is not something for which a set of steps can be given and then performed. The process contains almost as much art as science. Some steps are performed many times and many decisions are re-visited and revised. Given these conditions, a broad outline can be given:
- Determine what entity types are involved.
- Determine which entity types are related.
- Refine the definition of the relationships.
Understand now that there are several methods for representing ER models graphically. Some don’t use the diamonds for the relationship---they might just put the word above the line. It’s not really important how the entity types and relationships are represented; it’s just important that they are represented.
Notice what has happened with this situation. Initially we had a text description of the problem. After analysing it and making some necessary assumptions, we created an ER diagram that reflects the situation accurately and makes explicit the relationship among the entity types. This is why we perform ER modelling. We don’t know any more than we used to about the problem---we just have made explicit what we do know. It is quite a straight-forward step to go from this ER model to an implemented database. Remember why we are doing all this: We are finding out all we need to know to create a database that will hold our data. And a well-defined database can be a very useful tool for solving business problems---and it is also in high demand by recruiters. You will learn how to perform the steps necessary to create such a database in later chapters.
3 More detail
In this previous section I used an example to present an overview of how and why ER modelling is performed. In this section I present more detail on some of the basic concepts.
3.1 Relationships
Relationships define which entity types are directly associated with which other entity types. In the example in an earlier section, we saw that divisions are directly associated with departments and departments are directly associated with employees. No direct association between division and employee was given. This does not mean that there is no relationship between division and employee. In fact, the ER diagram tells us that there is a relationship between the two:
Given any one division, there can be many employees managing departments within that division.
Certainly, this is not earth shattering news. But it is in the ER diagram. The above fact is not represented as a separate relationship between division and employee because it can be inferred from existing relationships. An ER diagram should contain the minimum number of relationships necessary to reflect the situation.
3.1.1 Cardinality
Once a relationship between entity types has been established, the analyst should determine its cardinality.
A relationship’s cardinality defines the maximum number of entities of one type that can be associated with an entity of another type.
For relationships between two entity types, there are three basic cardinalities. Each of the following descriptions are given in terms of a relationship between entity type X and entity type Y.
1:1
--- one-to- one --- One entity of type X can be associated with, at most, one entity of type Y. One entity of type Y can be associated with, at most, one entity of type X.
An example: the relationship between car and steering wheel. A car has only one steering wheel and a steering wheel can only be installed in one car.
1:M
--- one-to-many --- One entity of type X can be associated with many entities of type Y. One entity of type Y can be associated with, at most, one entity of type X.
An example: the relationship between building and rooms. A building can have many rooms but a room can be in, at most, one building.
M:M
--- many-to-many --- One entity of type X can be associated with many entities of type Y. One entity of type Y can be associated with many entities of type X. An example: the relationship between a car and its options (such as air conditioning, ABS brakes). A car can have many options and an option can be installed on many cars.
Problem 5: Determine the cardinality of the relationships between the following four pairs of entity types. For each relationship you have to answer two questions:
- For each entity of type Y there can be, at most, _____ entity or entities of type X.
- For each entity of type X there can be, at most, _____ entity or entities of type Y.
Answering these two questions gives you the answer to the following questions. For example, if you answered M to the first question and 1 for the second question, then this relationship between entity types X and Y is of cardinality M:1. This is read as “for every X there can be only one Y and for every Y there can be many X.” Realize that you will have to make assumptions about the situations below to clarify some of these relationships.
- Patient under care of primary care physician
- Physician performs operation
- Doctors have speciality in disease
- Needle injected into patient
It would seem that at any particular time a patient can only have one primary care physician and that any physician can have many patients (M:1). One physician can perform many operations and one operation can be performed by many physicians (M:M). One doctor can have specialities in many diseases and one disease can be the speciality of many doctors (M:M). One needle can be injected into one patient and one patient can have many needles injected into him/her (M:1).
3.1.2 Existence
In the previous section we were concerned with the maximum number of entities of one type that can be associated with an entity of another type. In this section we examine the minimum number of entities in a relationship.
A relationship’s existence defines what we know about the existence of any entity on the other side of a relationship from a given entity. Existence is given as optional, mandatory, or unknown.
This is best clarified with an example. Consider again the example discussed in Section 2. Specifically, focus on the manage relationship between department and employee. We know the cardinality is 1:1. This tells us that at most one department is managed by an employee and an employee can manage, at most, one department. (Be sure you understand the distinction between these two phrases.) The existence of this relationship tells us the fewest number of departments that can be managed by an employee and the fewest number of employees that can manage a department. Only one of the following can be true:
optional
A department need not have any manager.
mandatory
A department must have at least one manager.
unknown
It is unknown whether or not a department has to have a manager.
Similarly, only one of the following may be true:
optional
An employee need not manage any department.
mandatory
An employee must manage at least one department.
unknown
It is unknown whether or not an employee must manage a department.
Problem 6: For each set of three above, which ones would you choose?
It is not entirely clear from the situation description which of the above are true. I make the relatively standard assumptions that a department must have at least one manager and that an employee need not be the manager of any department. Thus, the existence of this relationship is mandatory in one direction and optional in the other.
Going back to the definition of existence, we can also look at this situation in this way:
- Given any (randomly chosen) department, there must be an employee on the other side of the manage relationship. Thus, the relationship is mandatory in this direction. This is indicated by a dash on the line (you’ll see this below).
- Given any (randomly chosen) employee, there need not be any department on the other side of the manage relationship. Thus, the relationship is optional in this direction. This is indicated by a circle on the line (you’ll also see this below).
I assume that the contains relationship is mandatory in both directions. Given this information, the ER diagram is modified in the following manner:
This diagram is beginning to look a little complicated but remember the following pieces of information and it gets a little easier:
- You “read” information about a relationship in two passes: going left-to-right and going right-to-left (it doesn’t matter which pass is read first).
- When “reading” information about a relationship off an ER diagram, when going left-to-right, cover up the left side of the line; when going right-to-left, cover up the right side of the line.
- The marks on the lines tell you the minimum number in a relationship. A dash on the line looks like a 1; it tells you the minimum number in the relationship is one so the existence is mandatory. A circle on the line looks line a 0; it tells you the minimum number in the relationship is zero so the existence is optional. If there are not any marks on the line, then the existence is unknown.
- The marks below the line tell you the maximum number in a relationship. An M tells you the cardinality is many while a 1 tells you it is one.
Let’s practice this. Look at the manages relationship again.
- Go from left-to-right.
- Cover up the line connecting the manages diamond with the department rectangle.
- Now all you see are the department entity type, the employee entity type, and the right side of the line with a dash on it and a one below it.
- This tells us that a department is managed by at least (the dash) one employee and at most (the 1) one employee. Rephrasing: A department is managed by one and only one employee.
- Go from right-to-left.
- Cover up the line connecting the manages diamond with the employee rectangle.
- Now all you see are the department entity type, the employee entity type, and the left side of the line with a circle on it and a 1 below it.
- This tells us that an employee does not have to manage any departments (the circle) and may manage at most (the 1) one department. Rephrasing: An employee may manage no more than one department.
Problem 7: For each of the relationships listed in Problem 5:
- Define the existence in both directions.
- Draw the ER diagram for the relationship.
- Write out two sentences that represent what the ER diagram says.
3.2 Entity types
Entity types are things for which it is important that your company capture data. If it is not important, it should not be in the database. In an accounting database you would expect to find entity types for expenses, assets, liabilities, expenditures, deposits, etc. You would not expect to find entity types for colour of check, quality of dollar bills received, etc. The database is supposed to reflect reality---but only the part of reality that is important to the company.
3.2.1 Entity subtypes
Entity types are entities that share a common definition. This allows us to make generalisations about that type. This is a powerful capability; however, sometimes we want to make a generalisation only about a certain subset of those entities and another generalisation about the rest of the entities. Consider a simple example. Suppose you have an accounting database which keeps track of accounts receivable and accounts payable. Of course the database keeps track of the companies to which you owe money and the companies that owe you money. For all these companies, you keep track of their mailing address and a contact person. For the companies that owe you money you keep track of how much they owe you. For the companies that you owe money you keep track of how much you owe them. What to do? Should we have three entity types: one for the whole set and one for each subset? That would be a mess. That is why the concept of entity subtypes was created.
An entity subtype is a collection of entities of the same type to which a narrower definition and additional attributes and/or relationships apply.
Problem 8: In this company example, what is the entity type? What are the subtypes?
In this database you should define a company entity type with two subtypes: AR_co and AP_co. The company entity type stores all facts that are common attributes---in this case, the address and contact person. The AR_co entity subtype tracks the balance owed from this company while the AP_co entity subtype tracks the balance owed to this company.
There are many situations in which subtypes can be created but should not be. Only create subtypes
- if the subtype is involved in relationships that the other subtypes are not or
- if the subtype needs to have additional facts stored with it.
If one of these two requirements is not met, then do not create the subtype.
Problem 9: What is the relationship among the following? List a few facts common to all items for each question. List a few facts about each subtype that is not common to the other subtype. Also draw the ER diagram for each.
- book, publication, magazine
- individual, employee, contact person
- laptop, computer, desktop
On the other hand, there are some situations that are not so clear cut. Consider the following figure.
Many students would first suggest the diagram on the right---divide customers into investors and attendees and show that investors buy stocks and attendees register for seminars. I suggest that the figure on the left is better. Here’s my thought process:
What is it that makes an investor an investor? She buys stocks. And what is it that makes an attendee an attendee? She registers for seminars. Is there anything about an investor that keeps her from being an attendee? No. Vice versa? No. Do you want to prevent investors from being classified as attendees or vice versa? No and no. So, define relationships buy and register for the customer entity. Investors can be listed by choosing only those customers that are in the buy relationship. Attendees can be listed by choosing only those customers that are in the register relationship.
Thus, if a relationship defines the members of a proposed subtype, then use the relationship instead of the subtype.
3.2.2 Type/instance distinction
For some people this can one of the more difficult concepts to understand, so read carefully. What we are trying to discern here is the difference between a type of a thing and an actual thing. This is a pretty easy concept when comparing people and Joe. People is the type and Joe is the instance. However, modellers generally don’t make the type/instance distinction between an entity type and an entity---they generally make it between two entities. For example, think about “CT481” and “section 2 of CT481 in Winter 1962.” The second is an instance of the first. The section is an actual class that meets at an actual time with an actual teacher and actual students. CT481 is a type of thing that is an idea that only becomes real when you come into contact with one of its instances (e.g., section 2 of CT481 in Winter 1962).
Realize that this is a different distinction than that between entity types and entities. In this example, CT481 is one specific instance of the entity type Course and section 2 of CT481 in Winter 1962 is one specific instance of the entity type Section. Thus, both are entities and neither one is an entity type.
Problem 10: Analogously to the CT481 story above, fill out the story for a Boeing 747 relative to the type/instance distinction and the distinction between entity types and entities.
Boeing 747 is a type of plane and a specific Boeing 747 that flies through the air with passengers in it is an instance of this type. Boeing 747 is a specific instance of the entity type plane type and a flying Boeing 747 with passengers is a specific instance of an entity type plane.
3.3 Attributes
3.3.1 Basics
To this point we have focused on entity types and relationships among them. We have mentioned, in passing, “facts” about entity types and “attributes” of entity types. In this section I hope to make these ideas a little more clear.
Attributes are the characteristics of an entity type that we are interested in.
An attribute is a descriptor whose values are associated with individual entities of a specific type.
The attribute value for any single entity can have only one value at a given time. This value can change over time. An attribute of an employee might be salary. At any one time if you asked for the salary level of a certain employee, then you should get one answer. And if someone else asked the same question about that employee at the exact same time, they would expect to get the same answer. Of course, if you asked this question at a later time you might expect to get a different answer.
Think back to the example in Section 2. Few attributes are mentioned in the description but a few can be inferred. The department entity type has a name attribute, as do the division and employee entity types. Possible attributes for the employee entity type that aren’t mentioned include date of hire, home mailing address, work phone, and work address.
3.3.2 Identifier
Every entity type has an identifier. This identifier uniquely identifies a single (at least one, and no more than one) entity. If you know the value of the identifier, then you know exactly which entity you are dealing with. Further, the identifier’s value will never change over time. Thus, if you know the identifier now, then you can be confident that at any time in the future the identifier for that entity will not have changed.
Problem 11: Social security number is a possible identifier for a person. What is a possible problem with using it as an identifier? Why shouldn’t we use people’s names as an identifier?
Problem 12: For each of the following entity types, come up with a possible identifier:
- Book
- Car
- UM student
- UM building
- Colour
Problem 13: Suppose that we have a database that has the table car, and this table has an identifier vin (vehicle id number) and another field called model (e.g., “Ford Mustang GT”, “Ford Mustang SHO”, or “Accord Integra GS-R”). Suppose that we want to add information about the type of the model (e.g., “sporty”, “sedan”, “SUV”) to this database. How would you do it? You have two choices---either add model field to the car table or create another table to hold this information.
Consider the two choices:
Adding a field to an existing table
The new relation will have the attributes [vin, model, type]. Before we added the type attribute to this relation, there seemed to be only one entity type (vehicle). Now that we’ve added the type attribute it seems that we have another entity type that we’re collecting information about (model). Remember that the identifier (in this case, vin) uniquely identifies one and only one entity (again, in this case, car). But type is a piece of data about the model, not about the car itself. Every time we added an entity (a vehicle) to this relation, we would have to repeat information about the model that we might have entered before. For example, if there were 5000 Ford Mustang GTs in the relation, then 5000 times we would have to indicate that a Ford Mustang GT is sporty. All in all, this doesn’t seem like a good solution.
Creating a new table
Creating a new table is required since we now have two entity types---vehicles and models. In this first entity type we would have the attributes vin and model while in the second we would have model and type. In the vehicle entity, the model field tells us what model that particular vehicle is. In the model entity, the model field identifies which model we’re talking about while the type field tells us what type of vehicle that instances of this model are.
4 Advanced concepts
The concepts in the previous two sections of this chapter will allow you to model many business situations. The following concepts are needed to round out your repertoire so that you will be ready for almost any situation that comes your way.
4.1 Degree of a relationship
Relationships can be classified by the number of entity types involved. This is referred to as the degree of a relationship. To this point we have concerned ourselves with relationships between two entity types. This is, by far, the most common type of relationships. The most common degrees of relationships are as follows:
binary
This is a relationship between two entity types.
ternary
This is a relationship between three entity types.
recursive
This is a relationship involving only one entity type.
I will not spend any time on binary relationships now because we have discussed them at length already.
4.1.1 Ternary
In the real world there are relationships other than those involving two things. For example, suppose that we want to capture which employees use which skills on which project. We might try to represent this data in a database as three binary relationships between skills and project, project and employee, and employee and skill.
The applies relationship indicates which employee applies which skill. The used on relationship indicates which skill is used on which project. The works on relationship indicates which employee works on which project. But this is not enough to specify which employee uses which skill on which project. Suppose you know the following:
works-on
Lindsey and Mackenzie have worked on projects A and B.
applies
Lindsey has used skills interface design and database design while Mackenzie only used her database design skill.
used on
Both skills have been used on both projects.
Given this information, it is impossible to figure out on which projects Lindsey used which skills. She could have used interface design on project B and database design on project A---or the other way around. Or she might have used both skills on both projects. The database simply does not give us enough information.
In order to capture the necessary information the database needs a ternary relationship. In this case the database needs a relationship, called used- on, among employee, skill, and project.
The used on relationship captures information three pieces at a time. It stores facts such as:
- Lindsey used interface design skill on project A.
- Mackenzie used database design skill on project A.
- Lindsey used interface design skill on project B.
- Lindsey used database design skill on project B.
- Mackenzie used database design skill on project B.
Notice that this ternary relationship captures the information represented in the three binary relationships:
- Lindsey worked on project A (first fact).
- Lindsey worked on project B (third and fourth facts).
- Mackenzie worked on project A (second fact).
- Mackenzie worked on project B (fifth fact).
- Lindsey used her interface design skill (first and third facts).
- Lindsey used her database design skill (fourth fact).
- Mackenzie used her database design skill (second and fifth facts).
- Both skills have been used on both projects (all the facts).
Implementing ternary relationships does not mean that you have to get rid of the binary relationships. You only get rid of the binary relationships if they capture a subset of the information captured by the ternary relationship. If a binary relationship captures information that differs from the ternary relationship, then the binary relationship should be retained if the information is important to your company. For example, consider the following:
The used on relationship stays the same as in the previous ER diagram. The binary relationships are different.
have skill
An employee has a certain skill. This is different than used on because there are some skills that an employee has that an he or she may not have used on a particular project.
needed
A project needs a particular skill. This is different than used on because there may be some skills for which employees have not been assigned to the project yet.
manages
An employee manages a project. This is a completely different dimension than skill so it could not be captured by used on.
Problem 14: We have three entities: car, customer, and salesperson. Consider the following situations.
#1
There is a ternary relationship called sell relating all three entities. Sells: “Customer cust bought car car from salesperson SP.
Sells | ||
Car | Cust | SP |
Cobra | Sam | Don |
Mustang | Sam | Don |
Mustang | Sam | Sharon |
Cobra | Jenn | Sharon |
Mustang | Jenn | Sharon |
Question: Using this data, who sold Sam the Cobra?
#2
We have three binary relations as shown below:
|
|
|
buys
Customer cust buys car car.
buys from
Customer cust buys from salesperson SP.
sells
Salesperson SP has sold car car.
Question: Using this data, who sold Sam the Cobra?
In situation #1 you can see from the first line that Don sold Sam the Cobra. In situation #2, looking at buys you can see that Sam did actually buy a Cobra (second line). Looking at buys from you can see that both Don and Sharon sold Sam cars. Looking at sells you can see that both Don and Sharon have sold Cobras. So the answer is either Don or Sharon sold Sam the Cobra. This is not good enough. This demonstrates that having three binary relationships does not capture the same information that one ternary relationship does.
It might be asserted (and has been by a former student) that the ambiguity in the problem is a result of the data base keeping information about car types (Cobra, Mustang, etc.)\ instead of actual cars (Cobra VIN=32, Cobra VIN=33, etc.). This is the case and I’d like to demonstrate why here.
Suppose that the four cars in this data base are numbered 1, 2, 3, 4. We’re going to try to answer the same question, “Who sold Sam the Cobra?”, using just the binary relationships but with information about numbered cars rather than the car types that is used above. The Buys relationship shows that Sam bought cars 1 and 2. The Sells relationship will show which sales person sold car #2 (the Cobra). It does not say to whom, but we already know that Sam bought car #2. So without even consulting the Buys from relationship, we know who sold Sam the Cobra.
This is a good observation but does not change the essential point. Breaking down a ternary relationship into its component binary relationships will sometimes result in a loss of information. It will always result in a loss of data if at least one of the entity types is a type of thing (e.g., a car or skill) as opposed to a specific thing (e.g., an actual car).
The question also remains: Why break up a ternary relationship into its component binary relationships if the ternary relationship captures what’s really going on in the world. A customer does buy a car from a salesperson. That’s really how we think about it and how it really occurs. It’s not: a customer buys a car, a car is sold by a sales person, and a customer buys from a sales person. The real world event involves three entities. Why not construct the data base to reflect this reality?
Thus far in this section the ER diagrams have not represented the cardinality of ternary relationships. There is a different method for determining cardinalities of higher order relationships:
- Cover up all but one of the lines going from the relationship to the entities. (For example, begin by covering up the lines from used on to employee and from used on to project.)
- For the one remaining entity, ask yourself: Can there be only one of these items for any single combination of these other entities, or can there be many? (Continuing this example, ask yourself: Can there be only one skill for a certain employee and certain project, or can an employee use many skills on one project?)
- If the answer is many, then put an M below the line going to this entity. If the answer is one, then put a 1 below it.
- Repeat this process until each entity involved in the relationship has been examined.
Problem 15: For the used on relationship, what is the cardinality for each entity type?
To determine the cardinality for this relationship, I had to make several assumptions. Other assumptions are possible but I thought these seemed reasonable.
- An employee can use the same skill on many projects.
- An employee can use many skills on the same project.
- Many employees use the same skill on the same project.
Given these assumptions, the cardinality for each entity type is M. So this is a M:M:M relationship.
4.1.2 Recursive
The final, and possibly the most difficult, relationship is the recursive relationship. This is a relationship that an entity has with itself. But it really doesn’t have to be difficult if you think about it as you would any ordinary binary relationship. Let’s look at an example.
Think of an employee who is the manager of other employees.
A manager manages many employees and an employee has exactly one direct manager. This is pretty straightforward. But, now, realize that a manager is really just another name for an employee. So, replace managers with employees in this diagram.
Now this diagram has the entity type employees represented twice. To remedy this situation, “pull” the relationship diamond down and slide the two employee rectangles so that they are lying on top of each other. Now the diagram looks like the following:
This diagram represents what we want:
- Reading from left, down, and back up: An employee may not manage any other employees but may manage many.
- Reading from right, down, and back up: All employees are managed by exactly one other employee.
Not everyone in the company has a manager. The president will not have a direct manager. This is handled in the data in the table by indicating that the president’s manager is the president. A little trick.
4.2 Attributes of a relationship
When we examined attributes earlier, the attributes were exclusively attached to entity types. However, it is also possible for a relationship to have attributes. Consider the is member relationship below.
A person can be a member of many clubs and a club can have many members. A natural piece of information to want to store is the date the person joined the club. If the attribute is of the person entity, then this would indicate when the person joined a club but we would not know which club. If the attribute is of the club entity, then this would indicate (possibly) when the club was founded or (possibly) when the most recent member joined the club but we would not know the dates on which each person joined. The solution is to make join date an attribute of the is member relationship.
4.3 Entity subtype partitioning
This section describes two different ways in which subtypes of an entity can be related to one another and to the super-type.
4.3.1 Optional versus mandatory
Assume there is an entity type called person, and entity subtypes called customer and employee. When a person is created, the designer of the database has two options:
mandatory
He/she can demand that the person be classified as one of the subtypes.
optional
He/she can allow a person to be created without classifying the person as any subtype.
Neither one is preferable to the other. The proper one to choose depends on the business situation.
Mandatory sub-typing is represented by creating a double line from the super-type (person in the following ER diagram) to the circle. Optional sub-typing is represented by leaving a single line from the super-type to the circle.
So, what does this figure tell you? Since it is a mandatory subtype partitioning (you know this from the double line), whenever data for a new person is entered into the database, it must be classified as either a customer or an employee. The database user cannot simply add information about a generic person---she must know whether this person is a customer or an employee. If this had been an optional subtype partitioning, then when that user was entering data about an employee, she had the option of classifying the person as an employee or as a customer---but did not have to classify the person as either.
4.3.2 Disjoint versus overlapping
Consider now the company super-type and the subtypes AR_co and AP_co. As a designer you can specify whether or not an entity of subtype AR_co can also be an entity of type AP_co. Certainly it is not abnormal to think that you can do business with companies that do business with you. Think of being a consultant for Ameritech or IBM.
The following are the two possibilities:
disjoint
If entities are allowed to be no more than one subtype, then the subtypes are said to be disjoint.
overlapping
If entities can be classified as several subtypes, then the subtypes are said to be overlapping.
Disjoint subtypes are represented by putting a “d” in the circle. Overlapping subtypes are represented by putting a “o” in the circle.
The above figure tells us that this is a disjoint entity sub-typing. This means that whenever data for a new company is entered into the database, the company can be classified as either AR_co or AP_co but not both. If this had been an overlapping entity sub-typing, then when that user was entering data about a company, she would have had the option of classifying the company as both AR_co and AP_co.
Problem 16: Suppose there is a student entity type with subtypes CIS majors, finance majors, and accounting majors. Is this a mandatory or optional sub-typing? Are these subtypes disjoint or overlapping?
Suppose I didn’t tell you that this should be an entity subtype problem. Would you represent it this way? What else would you do?
I would think that these would be optional, overlapping entity subtypes. But if I were not going to represent it this way, I may consider having a M:M relationship between student and major.
Certainly, entity subtypes should be classified along both dimensions---that is, you should identify whether the subtype is mandatory or optional and whether it is disjoint or overlapping. All four combinations are possible and each is appropriate at different times.
4.4 Aggregation of entity types
Subtypes are generally thought of in terms of X is a Y (which is why these are commonly referred to as is-a relationships). Another type of relationship that needs to be represented in a database is the part of relationship, more formally called aggregation. When an entity is made up of several different types of other entities, an aggregation relationship may be called for.
Consider the relationship between a car and its engine and body. The engine and body are both part of the car. The relationship is represented as follows in an ER diagram.
4.5 Parallel relationships
Two entities can have more than one type of relationship. This is not surprising; further, it is not difficult to represent in a database or in an ER diagram. Consider the entity types person and insurance policy and the relationships between them of pays for and is insured under.
Look at these relationships one at a time.
- A person pays for zero or more insurance policies. An insurance policy is paid for by exactly one person.
- A person is insured by zero or more insurance policies. An insurance policy insures one or more persons.
These are two distinct relationships. They mean two different things---that is why they are represented as two separate relationships in the ER diagram.
4.6 Weak entities
Weak entities are entities, but with a difference---weak entities only exist because some other entity exists. For example, if you were to define two entities employee and salary-history, then the second would be a weak entity because the record of an employee’s salary history could only exist if a record of an employee also exists. Joe Smith’s salary history wouldn’t make much sense if Joe Smith doesn’t exist in the data base. A weak entity is represented by a double border as shown below.
4.7 Types of attributes
Sometimes it is instructive to classify an attribute by the means in which the value is determined. Here are the three possibilities.
basic
These are values provided to the business. These are the types of attributes that we have been discussing so far. Think of name, address, etc. These values cannot be deduced from the values of other attributes.
designed
This is invented and exists only in the database. An example might be a unique identifier for a department. This value is not changed once it is set.
derived
This is a value that can be calculated from the value of other attributes in the database. An example might be the age of an employee when the birth date is in the database. These attributes should, generally, not be stored in the database but should be calculated when needed.
4.8 Attribute optionality
Not all entities have a value for every attribute; however, some attributes must have a value for all entities.
optional
An entity need not have a value associated with an optional attribute.
mandatory
An entity must have a value associated with a mandatory attribute.
For example, assume the employee entity type has attributes hire date and termination date. Hire date would certainly be classified as a mandatory attribute; if the employee didn’t have a hire date, then the person couldn’t very well be an employee.
Termination date is an optional attribute. You would expect that many people in the database would not have a termination date while others, who are obviously ex-employees, do have a value associated with the termination date attribute.
The optionality of an attribute depends highly on the business situation, how the information is gathered, and how the business updates its database. One company might classify an attribute of an entity type as optional while another company might classify the same attribute of the same type as mandatory. Consider the following example:
Consider the attribute sale price of the catalog item entity type for a computer mail order company. Company A has a policy that they do not put an item into the catalog until it has a price; thus, they do not create a catalog item entity until they can assign a value to the attribute sale price. For this company the sale price attribute is mandatory.
On the other hand, Company B has a policy that they put an item into their catalog as soon as they decide to stock it. This way they can make their product line look as broad as possible. They put Call us for latest quote in the catalog instead of a price. Thus, they do create catalog items even before they have assigned a value to the attribute sale price. For this company the sale price attribute is optional.
Again, in order to classify an attribute as optional or mandatory, you must understand the business situation and practices.
Problem 17: For each of the following attributes, determine whether it is optional or mandatory:
- Delivery date of a customer order
- Order dateof a customer order
- Title of a book
- Serial number of a television
4.9 Other information about attributes
The database designer should also determine miscellaneous other information about each attribute:
default
This is the value that an attribute should take if it is not assigned a value. For example, the state field of an employee table might have the default value of MI.
permitted range
These are the values that an attribute is allowed to take. This ensures that a value that gets put into the database is valid. For example, the sale_price field of the inventory table might have a permitted range of sale_price > 0.
composite
A composite attribute is an attribute made up of many other attributes. The reason for creating a composite attribute is that the attribute itself is referred to as a whole. Think of an employee’s address that is made up of the house number, street, city, state, and zip. One way of storing this attribute is to create an attribute address that would contain the whole address. For example, a value of address might be
address = 202 Crest Avenue, Ann Arbor, MI 48103
An alternative to this would be to store these pieces of data in separate fields in the table. For example, the above information might be stored as
street = 202 Crest Avenue
city = Ann Arbor
state = MI
zip = 48103
city = Ann Arbor
state = MI
zip = 48103
Separating the attributes in this way allows database users to refer to each field independently. For example, under the second scheme a user could easily and quickly determine the employees who live in Michigan. Under the first scheme this would not be nearly as fast.
5 Interpreting ER diagrams
ER diagram for interpretation exercise
The point of this section is to give you some examples of how ER diagrams are interpreted. I try to give you some of the variations but I certainly do not give you all of them. If your “reading” of a relation is not below, then it is not necessarily wrong. Try to determine if they mean the same thing. If they don’t and you cannot figure out the problem, then come by and talk with me during office hours.
needed
· A skill can be needed by many projects but might not be needed by any.
· A project can need one or more skills.
manages
· An employee can manage many projects but might not manage any. (Or: An employee can manage many projects. There are some employees who don’t manage any projects.)
· A project must be managed by an employee. (Or: A project is managed by exactly one employee.) (Or: A project is managed by one and only one employee.)
has-skill
· An employee may have many skills but might not have any.
· A skill can be possessed by many employees. There are some skills that no employees possess.
used-on
The technique for an n-ary (in this case 3-ary, or ternary) relation is different than for binary relations, but still straight-forward. Hold your hand on n-1 entity types (in this case 2) and determine whether a 1 or an m goes on the remaining arm of the relation. Below, in order, are the project, employee, and skill arms.
· An employee uses one skill on many projects.
· Many employees can use a skill on one project.
· An employee can use one skill on a project.
Notice that the other two entity types are held constant; that is, for the project arm (the first one) you are determining how many projects can be associated with any single pairing of employees and skills. You can think of it the following way: “I have an employee named Fred. He is skilled in woodworking. How many projects can Fred be a woodworker on?” If it’s “many”, then put an m on the project arm; if it’s “one”, then put a 1 on it.
6 ER exercises
Question 1
What is the cardinality and existence of each of the following relationships in just the direction given? State any assumptions you have to make.
- Husband to wife
- Student to degree
- Child to parent
- Player to team
- Student to course
Question 2
For each of the following pairs of rules, identify two entity types and one relationship. State the cardinality and existence of the relationship in each case. If you don’t think enough information is available to define either of these, then state an assumption that makes it clear. Draw the ER diagram.
- A department employs many persons. A person is employed by, at most, one department.
- A manager manages, at most, one department. A department is managed by, at most, one manager.
- An author may write many books. A book may be written by many authors.
- A team consists of many players. A player plays for only one team.
- A lecturer teaches, at most, one course. A course is taught by exactly one lecturer.
- A flight-leg connects two airports. An airport is used by many flight-legs.
- A purchase order may be for many products. A product may appear on many purchase orders.
- A customer may submit many orders. An order is for exactly one customer.
Question 3
Draw an ER diagram for the following. Be sure to indicate the existence and cardinality for each relationship.
- A college runs many classes. Each class may be taught by several teachers, and a teacher may teach several classes. A particular class always uses the same room. Because classes may meet at different times or on different evenings, it is possible for different classes to use the same room.
- Each employee in an engineering company has at most one recognised skill, but a given skill may be possessed by several employees. An employee is able to operate a given machine-type (e.g., lathe, grinder) if he has one of several skills, but each skill is associated with the operation of only one machine type. Possession of a given skill (e.g., mechanic, electrician) allows an employee to maintain several machine-types, although maintenance of any given machine-type requires a specific skill (e.g., a lathe must be maintained by a mechanic).
Question 4
Draw an ER diagram for each of the following situations. On the diagram be sure to identify the cardinality, existence, and optionality (for subtypes) of each relationship.
- A company has a number of employees. Each employee may be assigned to one or more projects, or may not be assigned to a project. A project must have at least one employee assigned, and may have several employees assigned.
- A university has a large number of courses in its catalog. Each course may have one or more other courses as prerequisites, or may have no prerequisites.
- A college course may have one or more scheduled sections, or may not have a scheduled section.
- A hospital patient has a patient history. Each patient has one or more history records (we assume that the initial patient visit is always recorded as an instance of the history). Each patient history record belongs to exactly one patient.
- A video store may stock more than one copy of a given movie. It is also true that the store may not have a single copy of a particular movie.
Question 5
Draw an ER diagram to represent the following entity types and the natural relationships among them: Vehicle, Land-vehicle, Air-vehicle, Water-vehicle, Ocean-vessel, River-raft, Helicopter, Rail-vehicle, Road-vehicle, Car, Airplane, Bicycle.
Question 6
Draw an ER diagram that best represents the following situation. There are three types of accounts in a bank, with these attributes:
- Checking: Acct-no, Date-opened, Balance, Service-charge
- Savings: Acct-no, Date-opened, Balance, Interest-rate
- Loan: Acct-no, Date-opened, Balance, Acct-limit
Question 7
Consider the following diagram:
- What are the minimum and maximum number of Cs that can be associated with any one A? This is the same as asking “What are the existence and cardinality for the C to A relationship?”
- Substitute A:person and B:manage and C:project in the above ER diagram. Suppose that someone told you that the following three facts are true. Are these consistent with the diagram? Answer yes or no for each of the three facts. If your answer is no, then explain why it is inconsistent.
- Mackenzie manages the White project.
- Lindsey manages the Brown and Black projects.
- Joe doesn’t manage any projects.
Question 8
The following facts make up all of the leads relation between person and project:
- Jennifer leads the Southpark project.
- Jennifer leads the Briarwood project.
- Don leads the Eastgate project.
You do not know whether or not there are any other people or any other projects. Which diagram(s) that is (are) consistent with this set of three facts. For example, you might answer “1, 4” if both 1 and 4 are consistent with the above facts.
Question 9
For each of the following sets of sentences, draw the corresponding ER diagram.
- An account can be charged against many projects, though it may not be charged against any. A project must have at least one, though it may have many, accounts charged against it.
- Projects must be classified as either top secret or civilian (but not both). There is information specific to top secret projects and specific to civilian projects that we want to record.
- An employee must manage exactly one department. A department may or may not have one employee manage it.
- Men are only allowed to supervise men. Women are only allowed to supervise women. We do not want to allow the database to hold data representing a man supervising a woman. An employee, regardless of sex, is assigned to exactly one office, with each office having exactly one employee in it. (Be sure to include the office entity in this diagram.)
Question 10
The following descriptions all have to do with a holding company for food service companies. You should answer each one separately from the others.
- Each chain consists of 50 to 300 stores that are owned by ACTME, the holding company for several restaurant chains and two caterers.
- Menu items are wide ranging, and can be classified by the section of the menu (appetiser, dessert, etc.) in which each is presented and by ethnic group (Italian, Hungarian, etc.) to which it belongs.
- The menu of each restaurant changes every couple months; management likes to keep track of past menus to track which ones have been successful.
- Each menu item is made of several ingredients (eggs, bacon, etc.) that are used in a certain quantity.
- Ingredients can be acquired from several suppliers. Ingredients are acquired by sending orders for several goods using the vendor’s item numbers. The item number for each of these ingredients varies across suppliers, so if you are going to order eggs from supplier # 1, then you might order item 52 while from supplier # 2 you might order item J216. The company keeps price and item numbers for all the ingredients for all its menu items---and even for some ingredients which it is not currently using in any menu items. The quantity needed for each ingredient is also kept. The current price of these goods is maintained though the historical price is not. Because of special deals and volume discounts, the price at which a good is acquired is often different from its list price. Thus, the database must retain the price at which a good is actually acquired.
entity-relationship model (ERM or ER model)
The entity-relationship model (or ER model) is a way of graphically representing the logical relationships of entities (or objects) in order to create a database. The ER model was first proposed by Peter Pin-Shan Chen of Massachusetts Institute of Technology (MIT) in the 1970s.
In ER modeling, the structure for a database is portrayed as a diagram, called an entity-relationship diagram
Learn More
(or ER diagram), that resembles the graphical breakdown of a sentence into its grammatical parts. Entities are rendered as points, polygons, circles, or ovals. Relationships are portrayed as lines connecting the points, polygons, circles, or ovals. Any ER diagram has an equivalent relational table, and any relational table has an equivalent ER diagram. ER diagramming is an invaluable aid to engineers in the design, optimization, and debugging of database programs.
In a logical sense, entities are the equivalent of grammatical nouns, such as employees, departments, products, or networks. An entity can be defined by means of its properties, called attributes. Relationships are the equivalent of verbs or associations, such as the act of purchasing, the act of repairing, being a member of a group, or being a supervisor of a department. A relationship can be defined according to the number of entities associated with it, known as the degree.
Related glossary terms: relational database, database management system (DBMS), OLE DB (OLEDB or Object Linking and Embedding Database), flat file, Fast Guide: SQL Server 2000 commands, comma-separated values file (CSV), information, full-text database, DDBMS (distributed database management system), relational database management system (RDBMS)
Tutorial - ER Diagram Examples 1-2
Example 1
A publishing company produces scientific books on various subjects. The books are written by authors who specialize in one particular subject. The company employs editors who, not necessarily being specialists in a particular area, each take sole responsibility for editing one or more publications. A publication covers essentially one of the specialist subjects and is normally written by a single author. When writing a particular book, each author works with on editor, but may submit another work for publication to be supervised by other editors. To improve their competitiveness, the company tries to employ a variety of authors, more than one author being a specialist in a particular subject.
Example 2
A General Hospital consists of a number of specialized wards (such as Maternity, Paediatry, Oncology, etc). Each ward hosts a number of patients, who were admitted on the recommendation of their own GP and confirmed by a consultant employed by the Hospital. On admission, the personal details of every patient are recorded. A separate register is to be held to store the information of the tests undertaken and the results of a prescribed treatment. A number of tests may be conducted for each patient. Each patient is assigned to one leading consultant but may be examined by another doctor, if required. Doctors are specialists in some branch of medicine and may be leading consultants for a number of patients, not necessarily from the same ward.
Example 3
A database is to be designed for a Car Rental Co. (CRC). The information required includes a description of cars, subcontractors (i.e. garages), company expenditures, company revenues and customers. Cars are to be described by such data as: make, model, year of production, engine size, fuel type, number of passengers, registration number, purchase price, purchase date, rent price and insurance details. It is the company policy not to keep any car for a period exceeding one year. All major repairs and maintenance are done by subcontractors (i.e. franchised garages), with whom CRC has long-term agreements. Therefore the data about garages to be kept in the database includes garage names, addressees, range of services and the like. Some garages require payments immediately after a repair has been made; with others CRC has made arrangements for credit facilities. Company expenditures are to be registered for all outgoings connected with purchases, repairs, maintenance, insurance etc. Similarly the cash inflow coming from all sources - car hire, car sales, insurance claims - must be kept of file.CRC maintains a reasonably stable client base. For this privileged category of customers special credit card facilities are provided. These customers may also book in advance a particular car. These reservations can be made for any period of time up to one month. Casual customers must pay a deposit for an estimated time of rental, unless they wish to pay by credit card. All major credit cards care accepted. Personal details (such as name, address, telephone number, driving licence, number) about each customer are kept in the database.
Example 4
A database is to be designed for a college to monitor students' progress throughout their course of study. The students are reading for a degree (such as BA, BA(Hons) MSc, etc) within the framework of the modular system. The college provides a number of module, each being characterised by its code , title, credit value, module leader, teaching staff and the department they come from. A module is co-ordinated by a module leader who shares teaching duties with one or more lecturers. A lecturer may teach (and be a module leader for) more than one module. Students are free to choose any module they wish but the following rules must be observed: some modules require pre-requisites modules and some degree programmes have compulsory modules. The database is also to contain some information about students including their numbers, names, addresses, degrees they read for, and their past performance (i.e. modules taken and examination results).
Example 5
A relational database is to be designed for a medium sized Company dealing with industrial applications of computers. The Company delivers various products to its customers ranging from a single application program through to complete installation of hardware with customized software. The Company employs various experts, consultants and supporting staff. All personnel are employed on long-term basis, i.e. there are no short-term or temporary staff. Although the Company is somehow structured for administrative purposes (that is, it is divided into departments headed by department managers) all projects are carried out in an inter-disciplinary way. For each project a project team is selected, grouping employees from different departments, and a Project Manager (also an employee of the Company) is appointed who is entirely and exclusively responsible for the control of the project, quite independently of the Company's hierarchy. The following is a brief statement of some facts and policies adopted by the Company.
Normalisation Tutorial
- A college keeps details about a student and the various modules the student studied. These details comprise
- regno - registration number
- n - student name
- a - student address
- tno - tutor number
- tna - tutor name
- dc - diploma code
- dn - diploma name
- mc - module code
- mn - module name
- res - module exam result
where
details(regno,n,a,tno,tna,dc,dn,(mc,mn,res))
dc -> dn
tno -> tna
mc,mn -> res
n -> a
mc -> mn
Reduce the relation DETAILS to third normal form.
- Classify the following relations as either UNNORMALISED, 1NF, 2NF or 3NF. If the relation is not in 3NF, normalise the relation to 3NF.
1. EMPLOYEE(empno,empname,jobcode)
2. empno -> empname
3. empno -> jobcode
4.
5. EMPLOYEE(empno,empname,(jobcode,years))
6. empno -> empname
7. empno,jobcode -> years
8.
9. EMPLOYEE(empno,empname,jobcode,jobdesc)
10. empno -> empname,jobcode
11. jobcode -> jobdesc
12.
13. empno -> empname
14. empno,project -> hoursworked
15.