What is Data Modeling - Normalization, and Application in OLTP Systems

Explore the essentials of Data Modeling in this article, which covers key types (Conceptual, Logical, Physical) and techniques like ER Modeling and Normalization. It highlights normalization's role in databases, particularly in OLTP systems

What is Data Modeling - Normalization, and Application in OLTP Systems

Data Modelling

In simple words, Data modeling is the practice of visually representing a business’s data, detailing its types, relationships, and how it’s organized and used. This process involves creating diagrams and using symbols to clarify and define the data needs of a business, ensuring it effectively supports business activities and decisions.

There are three main types of data modeling:

  1. Conceptual Data Modeling: This is the most abstract level of data modeling. It provides an overview of what data is involved, focusing on the big picture rather than details. It’s often used for initial planning and understanding the scope of the system.

Conceptual

 

  1. Logical Data Modeling: This type involves more detail than conceptual modeling. It defines how the system should be implemented regardless of the database management system. It includes things like entity types, attributes, keys, and relationships between entities.

logical

 

  1. Physical Data Modeling: This is the most detailed level of data modeling. It shows exactly how the data will be stored in the database. It includes specific tables, columns, data types, constraints, and how they are physically organized in the database system.

Physical

 

Data Modeling Techniques to Effectively Organize and Represent Data:

  1. Entity-Relationship Modeling (ER Modeling): This technique uses entities (data items) and relationships (how these items are connected) to represent data structures. It’s like drawing a map of how different data points are linked.

  2. Normalization: This is a process to organize data in a database efficiently. It involves dividing large tables into smaller, more manageable ones while ensuring data integrity and reducing redundancy.

  3. Dimensional Modeling: Used in data warehousing, this technique organizes data into measurable events (facts) and contexts (dimensions). It’s often visualized as a star or snowflake schema, which is useful for complex data analysis.

  4. Object-Oriented Modeling: This approach models data as objects, similar to object-oriented programming. It defines data structures and behaviors (methods) for those data objects.

  5. Data Flow Modeling: This method focuses on the flow of data within a system. It maps out how data moves from one process to another, often used in software engineering.

Normalization and Dimensional Modeling in Data Engineering and Data Analysis:

Let’s break down the concept of normalization in databases, particularly focusing on the first three normal forms. Normalization is a systematic way of ensuring that a database structure is suitable for general-purpose querying and free of certain undesirable characteristics—such as redundancy and inconsistency—that could lead to loss of data integrity.

First Normal Form (1NF):

A table is in First Normal Form if:

  1. Each column contains atomic, indivisible values: There are no repeating groups or arrays.
  2. Each column has a unique name.
  3. The order in which data is stored does not matter.

Example:

Consider a table that records customer orders:

1

This table is not in 1NF because the ‘Order’ column contains multiple values. To convert it to 1NF, we need to split the order items into separate rows:

2

Now, each row has a single value in the ‘Order’ column, satisfying 1NF.

Second Normal Form (2NF):

A table is in Second Normal Form if:

  1. It is in 1NF.
  2. All non-key attributes are fully functionally dependent on the primary key. This means there should be no partial dependency of any column on the primary key.

Example:

Suppose we have a table:

3

This table is in 1NF but not in 2NF because ‘Room’ is fully dependent on the ‘CourseName’, not on the entire primary key (StudentID and CourseName). This is a partial dependency. This means there should be no partial dependency of any column on a part of the primary key in a composite key scenario. To convert it to 2NF, we can split the table into two:

Table 1: Student Enrollments Table

4

Table 2: Courses Instructors Table

5

Third Normal Form (3NF):

A table is in Third Normal Form if:

  1. It is in 2NF.
  2. There are no transitive dependencies for non-prime attributes. In other words, non-prime attributes (attributes not part of a candidate key) must not depend on other non-prime attributes.

Example: Consider a database for a book club.

The table is in 3NF if it is already in 2NF and there are no transitive dependencies for non-prime attributes. In simpler terms, non-prime attributes (those not part of any candidate key) must not depend on other non-prime attributes.

Table: Member Information

6

Here, MembershipFee is dependent on MembershipType, not directly on MemberID.

The table above, which implies that it must first satisfy the conditions for Second Normal Form (2NF). For a table to be in 2NF, it must meet two criteria:

  • It must be in First Normal Form (1NF), meaning that it has no repeating groups of data (each cell contains a single value, and each record is unique).

  • There must be no partial dependencies of any column on the primary key. This means that every non-prime attribute in the table must be functionally dependent on the whole of a composite primary key, not just part of it. Now, looking at the table Member Information table in the image above, we can see that it meets both criteria for 2NF:

  • The table is in 1NF as each column contains atomic values, and each row appears to contain a unique member with their associated information.

  • The primary key of this table would be MemberID, which uniquely identifies each member. The non-prime attributes are MemberName, MembershipType, and MembershipFee. In this scenario, to satisfy the conditions for 2NF, MembershipType and MembershipFee must be fully functionally dependent on MemberID. There is no indication of a composite key; hence, there is no partial dependency possible.

The transition to 3NF is demonstrated by the removal of transitive dependencies. In the original table, MembershipFee is not directly dependent on the primary key MemberID, but is instead dependent on MembershipType — this is a transitive dependency. To normalize to 3NF, the table is split into two:

Converted to 3NF:

Split into two tables:

Table 1: Member Details

7

Table 2: Membership Fees

8

Table 1: Member Details which contains MemberID, MemberName, and MembershipType. Here, MembershipType is functionally dependent on MemberID. Table 2: Membership Fees which contains MembershipType and MembershipFee. Here, MembershipFee is functionally dependent on MembershipType. This normalization removes the transitive dependency of MembershipFee on MemberID through MembershipType, adhering to the rules of 3NF.

By doing this, we ensure that all attributes in each table are only dependent on the primary key, adhering to 3NF.

 

Normalization on OLTP (Online Transactional Processing) System Database:

let’s take a real-life scenario involving an Online Transaction Processing (OLTP) system and see how normalization, including the concepts of the first three normal forms and keys, can be applied for better data management.

Scenario: Online Retail Store: Imagine an online retail store with an OLTP system that handles a large number of transactions daily. The system records customer information, product details, and orders.

Initial Table Design (Not Normalized)

Initially, the store might have a single table that tries to capture all the transaction details:

Transactions Table:

9

  1. First Normal Form (1NF)

    • Separate Repeating Groups: Each cell should contain atomic values.
    • Remove Duplication: Customer information is repeated with each transaction.
  2. Second Normal Form (2NF)

    • Composite Primary Key in Transactions Table: (TransactionID, ProductID)
    • Full Dependency: Ensure all non-key attributes in the Transactions table are fully dependent on the entire composite key.
    • Since Quantity and OrderDate are dependent on the entire composite key, the Transactions table is already in 2NF.
  3. Third Normal Form (3NF)

    • Remove Transitive Dependencies: For example, ProductPrice depends on ProductID, which is not part of the primary key in the Transactions table.
    • Since the Products table holds information about products, and the Customers table about customers, and these are linked to transactions without transitive dependencies, our design is in 3NF.

Normalized Tables

Transactions Table:

10

Customers Table:

11

Products Table:

12

Benefits of This Normalized Design:

Reduces Data Redundancy: Customer and product details are stored only once, reducing the storage space and improving data consistency.

Improves Data Integrity: Updating customer or product information needs to be done in only one place.

Efficiency in Transaction Processing: The Transactions table is more streamlined, which is crucial for the high volume of transactions in an OLTP system.

Flexibility for Queries: Easier to generate reports and queries (e.g., all products bought by a particular customer, total sales of a product, etc.).