Normalize the following relation to 3NF.
emp_id | emp_name | emp_zip | emp_state | emp_city | emp_district |
---|---|---|---|---|---|
1001 | Hari | 282005 | state7 | Dipayal | Doti |
1002 | Keshab | 555780 | state2 | Nepalgunj | Banke |
1003 | Pawan | 822005 | state7 | Dhangadhi | Kailali |
1004 | Aardi | 772290 | state4 | Pokhara | Kaski |
1005 | sunita | 885009 | state3 | Kathmandu | Ktm |
A: Normalize the following relation ( emp_id, emp_name, emp_zip, emp_state, emp_city, emp_district ) to 3NF
First Normal Form(1NF)
Here we consider emp_id as the Primary Key and all the columns have atomic values. Thus, the table is in 1NF.
R1
emp_id | emp_name | emp_zip | emp_state | emp_city | emp_district |
---|---|---|---|---|---|
1001 | Hari | 282005 | state7 | Dipayal | Doti |
1002 | Keshab | 555780 | state2 | Nepalgunj | Banke |
1003 | Pawan | 822005 | state7 | Dhangadhi | Kailali |
1004 | Aardi | 772290 | state4 | Pokhara | Kaski |
1005 | sunita | 885009 | state3 | Kathmandu | Ktm |
Second Normal Form(2NF)
There aren't any Partial Dependencies in R1 as emp_id is a Single Primary Key. Hence, R1 is also in Second Normal Form(2NF).
Third Normal Form(3NF)
Let's check if R1 has Transitive Dependency.
There are multiple transitive dependencies caused by the column emp_zip.
emp_id → emp_zip
emp_zip → emp_city
emp_zip → emp_district
emp_zip → emp_state
i.e.,
emp_id → emp_zip → emp_city
emp_id → emp_zip → emp_district
emp_id → emp_zip → emp_state
Hence we decompose R1 into R2 and R3 to get rid of them.
R2
emp_id | emp_name | emp_zip |
---|---|---|
1001 | Hari | 282005 |
1002 | Keshab | 555780 |
1003 | Pawan | 822005 |
1004 | Aardi | 772290 |
1005 | sunita | 885009 |
R3
emp_zip | emp_state | emp_city | emp_district |
---|---|---|---|
282005 | state7 | Dipayal | Doti |
555780 | state2 | Nepalgunj | Banke |
822005 | state7 | Dhangadhi | Kailali |
772290 | state4 | Pokhara | Kaski |
885009 | state3 | Kathmandu | Ktm |
The Transitive Dependency caused by emp_zip is now removed.
However, R3 could be decomposed further into R4 and R5 because of the following transitive dependency:
emp_zip → emp_district
emp_district → emp_state
i.e., emp_zip → emp_district → emp_state
R4
emp_zip | emp_city | emp_district |
---|---|---|
282005 | Dipayal | Doti |
555780 | Nepalgunj | Banke |
822005 | Dhangadhi | Kailali |
772290 | Pokhara | Kaski |
885009 | Kathmandu | Ktm |
R5
emp_district | emp_state |
---|---|
Doti | state7 |
Banke | state2 |
Kailali | state7 |
Kaski | state4 |
Ktm | state3 |
Here, another transitive dependency (emp_zip → emp_district → emp_state) is removed. Hence, R2, R4 and R5 are in Third Normal Form(3NF).
Note:
If we are talking strictly about the theoretical aspect of Normalization, then there is a 'transitive dependency' because if the data is only from Nepal, then the district can derive the state name. In that sense, one should decompose R3 to R4 and R5 to get rid of this transitive dependency.
However, depending upon the nature of data and the kind of application, some database designers may choose to keep it to R3. This is because zip code, district, city, and state belong to the same address unit and are generally fetched together. Thus keeping them in separate tables will unnecessarily cause JOIN operations which are expensive processes in DBMS.