Prevent data anomalies in relational databases 

data anomalies

Relational databases is carried out through eight steps:

1. Planning.

2. Collection of requirements.

3. Conceptual design.

4. Logical design.

5. Physical design.

6. Development / implementation / testing.

7. Release (deployment).

8. Ongoing maintenance/support.

The database development life cycle consists of these steps, and you're right if you think it looks like a lot of work. Sometimes there is a desire to shorten the process and jump directly to the DBMS and start building the final product. When it does this, it almost always ends in disaster. The process exists because there is a real need to address all the data requirements of the project carefully. To emphasize this, we can consider some of the problems that may arise when some aspect of the steps listed above has not been taken care of. And see how a well-designed and structured database protects you and your project from future headaches on how to solve certain issues.

Elimination of duplicate information

The typical spreadsheet has several columns, which will contain the same piece of information over and over again. When these types of structures are moved to a DBMS, sometimes the error is made to keep all this information. A first problem for the development of a relational database is to understand if redundant information is being stored, and an effort must be immediately made to eliminate any potential tendency to store duplicate data.

Duplicate data is a problem for at least three reasons:

- Slow down the system (occupying extra network bandwidth and disk space on the server).

- They are more difficult to manage (requiring more read and write accesses to update the records).

- There is a risk that duplicate data will become inconsistent.

Removal of inconsistent information

Having duplicate data increases the risk of having inconsistent information, for example, in the case of typing errors or different ways of storing the same information. In the case of the product table, we have two columns: Product and Publishing House. The Publishing House column repeats the same information several times. Inserting inconsistent information with this structure is very easy. Just an extra space or a capital letter instead of a lowercase one, and for the database, it will be two different information.

1. You make a copy of the redundant information (for example, the Publishing House) and insert it into a new table.

2. Give a clear name (for example, "Publishing Houses") to the new table and rename the field containing the information copied to "Publishing House Name."

3. A new column is created in the "Publishers" table that stores a number that uniquely identifies the information. If there is one that uniquely identifies the information, for example, a tax code for people, you can use that. Alternatively, we can create one for internal use, for example, a self-incrementing numerical value. In our case, we create the column "Publishing House ID" and assign a unique identifier for each data (identifier of which the user is often not aware because it is "hidden" in data extraction).

4. Return to the Products table and replace the "Publishing House" column with the "Publishing House ID" column of the "Publishing Houses" table, with the effect of no longer having the name of the publisher repeated several times. But only his identifier, which is a reference to a row in another table in the database.

5. The name of the "Publishing House" column of the Products table in "Publishing House ID" is updated.

Break down information into smaller components

One of the advantages of having data stored in a well-structured relational database is that you can perform a wide range of data queries very easily. One way to ensure that the database is flexible in the types of queries that can be made is to make the field as descriptive and specific as possible, with the result that the fields store much smaller information.

Prevent data conflicts

Another source of frequent data conflicts occurs when the stored values ​​are calculated from other information that is already being tracked. If a value changes, then the value derived from the calculation must also be updated. For example, we review e-commerce invoices. The customer Mario Bianchi has purchased two copies of the book "Version Control with Git." The price of the book (stored in the product table) is € 15.00. The invoice correctly shows the total price of € 30.00.

What happens is the quantity changed? The field containing the total price of the invoice table must be updated. If, after the change of the quantity, for some reason, the total price is not updated, we have a data conflict: the total price does not equal the quantity ordered multiplied by the price per product. The solution to this problem may seem surprising: simply do not memorize this type of derived information. The DBMS will do the work for us and calculate these values ​​on the fly, as needed, rather than storing them permanently in the tables, where they can become obsolete if precautions are not taken. 

Recover complete information

One of the data anomalies concerns the case of fundamental data that is completely missing or not correctly entered. In the case of e-commerce, to complete a purchase, some information cannot be missing. Let's go back to the invoice table and review each column to identify those that are actually indispensable. It is not, in fact, reasonable to have a form (for example of registration) in which all the fields must be valued (second name, fax number, etc.). In relational databases, to indicate that the value of a field is indispensable, the field is specified as NOT NULL: the DBMS does not allow storing the record if that field is left empty.

Maintain a consistent structure

A limit introduced in the Invoices table consists of the fact that each invoice has only one field for the insertion of products, even if obviously the customers can buy more products in the order only. Inserting multiple values ​​in the same field is absolutely not a good solution.

One of the habits that are sometimes adopted with spreadsheets is the tendency to store more information of the same type in the same field. For example, if the user has multiple current accounts or multiple credit cards, there is a temptation to enter all these values ​​(perhaps separated by a comma) within a field. It's not a good idea either to add to the table a field "Credit card 1" and "Credit card 2" and so on (you cannot know how many credit cards the user has and on which column to look for a certain value ). The solution is always the samedata anomalies, create a new table in which to store this information and avoid multivalued fields.


1127 Words


Dec 17, 2019


3 Pages

Looking for a professional

Order Now