PTK Gred F: Database Design Approach

Importance of the Data Model

An effective Database design ensures that key aspects for a successful project implementation are executed within expected time-lines that eventually lead to a cost-effective approach during the project development phase. It's common that almost every application designed and developed will need some sort of Database or Data storage functionality.

Therefore, it is imperative during the design phase that a Data Model is constructed. A Data Model is required to be strictly followed or updated as and when the design might change. This is one of the crucial reference instruments that not only the project development members consisting of both application programmers and Database Administrators will need but also the client / customer; as many questions that would have not been foreseen otherwise would arise from close observation of these representations.

Database redundancy is a No-No

The creation of Entities in your Data Model leads to a representation of what an actual real life scenario would and should look like. Such an approach leads to minimization of data redundancy, restructuring and Input / Output transaction sizes along with referential integrity constraints - Database Normalization in short. Data should never be replicated and stored in another location whereby a required change for that data instance does not force a dependency upon the DBA or developer to implement the same change on the same data else where. Data redundancy is a definite no-no. Enforcement of Normalization principles is one of the key practices for effective database designing that look into the problem of data redundancy. It helps in organizing data within the database efficiently. There are rules to be followed within Normalization and these rules are defined in the terms of "Normal Forms". It is however another matter that in a real-time scenario, implementing the needed normal form is not accomplished and this is where our design might start floundering.

The First Normal Form

The First Normal Form stipulates that no repeating groups or sets should be stored within a table. These similar groups should be stored in a separate table. To identify each unique column of these tables, a primary key should be established.

The Second Normal Form

Data Redundancy should be non-existent. Foreign key constraints should be enforced.

The Third Normal Form

Every column in the table should be related and dependant on the primary key of the table. If this is not possible, the field should be stored in a new table with a new key.

The Fourth Normal Form:

This one would probably exists somewhere in dream land - the elimination of independent relationships in a Relational Database.

The Fifth Normal Form: Exists in never-never land - it calls for a perfect segregation of logically related many-to-many relationships.

So now you know something about relationships (that's what our whole Relational Database thing is about right?). But just keep in mind that as we increase and tighten our relationship enforcements, there would be a little trade off with performance.

The design approach

I believe that a good design should implement, at the least, a 2nd Normal Form. I also feel that we should try our maximum effort to look into implementing a 3rd Normal Form for large applications that have scope for enhancement. Scalability should always be up our minds while designing various components of our application. We can keep dreaming of a 4th and 5th Normal Form but I like to hear from Architects and DBAs on some of the rules they follow when implementing Normalizations.

However, a third normal form is sometimes an ardent task in itself. This may require you to create a host of smaller entity tables to reduce the prospect of data being replicated across tables. This approach, although an enforcer has potential to reduce performance while running SQL queries. I had a conversation with a fellow developer where he mentioned that their team had knowingly broke one of the 3rd Normalization rules since they knew the consequence would not create any dependency issues for their application. He mentioned that they used comma separated values of primary keys stored in a particular field of another table. I am still unclear on a verdict if such an approach is good. I like to hear views from DBAs and Architects on this one..

Data Objects & Properties: Counters parts for Database Tables and Table columns

While designing the class diagram of our application, we would design classes that would be used as Data Objects. These Data Objects contain Properties defined by getter and setter keywords. These properties are actually a descriptive and quantitative representation of an element of the entity. Data Objects and their Properties are actual class representations of our real-life entity players (Employee, Office, Department etc). On the database side, these would be translated into Tables; while Data Object attributes are counterparts for table columns that are defined in your database table.

We need Primary Keys:

While designing your table, a primary key is imperative. We make this possible by declaring a particular column as PRIMARY. Primary keys are attribute columns that would be used as a unique identifier for an occurrence of our entity (table). It always has to contain some value. Sometimes it is enough to create an identity column that auto increments an integer value starting with 1. But I've come across situations where it is suggested and implemented that we create a primary key naming convention thereby enhance our primary key from being just another primary key but also an effective primary key that describes some extra information where the field data would be display in the presentation layer itself (a ticket no. for instance). I would appreciate it if anyone out there could shed some light on whether this is a good practice or not. There is this school of thought that says that Primary keys should be non-intelligent and their meanings easy to decipher.

PRIMARY keys are great for identifying a particular row in your table. It is also another great way to speed up the retrieval information process from your table.

We may need Unique Keys:

Declaring a column as UNIQUE is another way to identify unique rows in your table. It is also called the alternate key. It is exactly similar to the PRIMARY key concept except that 1. you can have more than one unique key 2. you can also have null value unique keys.

We need Relationships and Enforcers

Referential Integrity among entities (or tables) is enforced by the application of relationships. This is a logical link between entities. A business rule that is implemented via Foreign keys in the Database. This business rule ensures that the value that is in the child table must be present as a primary key value in the parent table. A Foreign key is a column(s) whose values are a resultant from the primary key table.

Some of the business rules that a Foreign Key Constraint enforces:
1. It must reference a PRIMARY key or UNIQUE column(s) in PRIMARY key table

2. Datatypes of the FOREIGN key column(s) and the Constraint Columns(s) must be exactly the same.

3. Operations that perform INSERT and UPDATE on a table are not allowed when corresponding value(s) in primary keys are non-existent.

4. A DELETE operation is not permitted if a REFERENCE CONSTRAINT is cancelled.

5. It will reference the PRIMARY key of the primary key table if no column or group of columns is specified in the constraint.

6. There are no restrictions for a foreign key while other constraints refer the same table.

Different types of relationships exists - the one-to-one relationship, the one-to-many relationship and the many-to-many relationship.

One-To-One and One-To-Many relationships

A one-to-many relationship implies that a table element might be referred by many other entities via your table's PRIMARY KEY. Hence, on the database side, you may have a relationship between a DEPARTMENT table (parent) and an EMPLOYEE table (child) along with a relationship between a DEPARTMENT table (parent) and another child table. This means that your one DEPARTMENT entity instance could contain many EMPOYEE entity instances. I hope this clarifies on both relationships.

Many-To-Many relationships

A many-to-many relationship, we have a situation where many entity instances are related to many other entity instances. The only way to resolve this situation and enforcing the normalization principle of minimizing redundancy is to create an intermediary table that would contain primary keys from the earlier mentioned tables. Take a situation in your organization where many Employees can have multiple Roles (Project Manager and Configuration Manager?). Hence, we would have multiple EMPLOYEE Table instances related to Multiple Roles. Creation of multiple links between two tables would lead to replication of data which is bad and will create intertwining relationships that can definitely wreck any good application you have in mind.

Fix: The solution would be to create a third table which would act as a cross reference table. This cross reference table (commonly known as X-REF) will contain primary key columns from the earlier two tables and hence we have a relationship where our X-REF table is actually a child table to the earlier two parent tables. We map the multiple relationships through a third relationship in a third table. Hence, we're breaking our task by crafting two individual one-to-many relationships so our situation would lead us to create two tables EMPLOYEE (child) and EMPLOYEE_ROLES which would be a parent to a child table called ROLES which contains two columns that are foreign key relationships to the parent table primary keys.

[Source]

PTK Gred F

Monday, October 29, 2007

Database Design Approach

No comments:

Blog Archive

Links