Data Modeling

Dimensional Data Modeling For Data Warehouses



Dimensional data model is the most common design concept used by data warehouse designers to build data warehousing systems. The data model design is the underlying data model used by many of the commercial OLAP products available today in the market. Some of the terms commonly used in this type of modeling are: Dimension- a category of information (e.g. The time dimension); Attribute- a unique level within a dimension (e.g. Month is an attribute in the time dimension); and Hierarchy- the specification of levels that represents relationship between different attributes within a dimension (e.g. Year → Quarter → Month → Day).

Dimensional data model contains two types of tables. They are:

Fact Table: Fact table in a dimensional data model contains the measures of all interest, such measurements or metrics or facts of business processes. Take the example of the sales amount of a business. The amount can be a monthly sales number or sales number for a day. This measure is stored in the fact table with the appropriate granularity. For sales measures, a fact table generally contains three columns: a date column, a store column and a sales amount column. Besides the measurements the table will also contain foreign keys for the dimension tables.

Dimension Table: The dimension table in a dimensional model represents the context of the measurements. The context of measurements can also be understood as the characteristics such as who, what, where, when, how of a measurement (subject). For example, in a business process Sales, the characteristics of the ‘monthly sales number’ measurement would be a Location (Where), Time (When) and Product Sold (What). A dimension table contains a number of dimension attributes or columns. In the Location dimension the various attributes can be Location Code, State, Country, Zip code. Further, dimension attributes contain one or more hierarchical relationships.

If you are looking forward to building a data warehouse for your organization, you should first decide what your data warehouse will contain. Depending upon your organizational goals, you can choose the type of dimensions that can best meet your requirements. For example, if you want to build a data warehouse that would contain monthly sales numbers across multiple store locations, across time and across products then your dimensions would be Location, Time and Product.

In designing data models for data warehouses or data marts the most commonly used schema types are Star Schema and Snowflake Schema.

Star Schema: In this type of schema design, a single object or the fact table is placed in the middle and is radially connected to other surrounding objects or dimension tables like a star. Here, each dimension is represented as a single table and the primary key in each dimension table is related to a foreign key in the fact table. A simple start schema consists of one fact table and a complex star schema may contain more than one fact table.

Snowflake Schema: This type of schema design can be called as an extension of the star schema. In this design each point of the star or each dimension table contains more points. In other words, in a star schema each dimension is represented by a single dimensional table, while in a snowflake schema that dimensional table is normalized into multiple lookup tables, each representing a level in the dimensional hierarchy.

Choosing a particular type of schema design depends on personal preference as well as business needs. So, it is up to you which one you choose among the two for your data warehouse project.

Introduction to Dimensional Modeling for Data Warehousing Part 2, Dimensional Modeling Principles



In part 1 of this article series, we described the general structure of a dimensional model. In the present article we shall describe the basic design principles of dimensional modeling. Dimensional modeling follows the four steps defined below. A. Selection of the business process (or processes), the performance of which shall be monitored. Business processes the performance of which is considered critical, and relevant data are sufficient (e.g. operations data derived from these processes), should be selected with priority. The selected business process, may relate to a single organizational unit, or spanning more than one organizational unit.

The capture of overlapping information by different departments which can lead to many versions of truth, is avoided through the capture of a single data stream for an ‘end-to-end’ process. B. Determination of the level of detail at which the process shall be monitored (also called grain statement). The grain statement is the first step in a dimensional model design. Examples of grain statement are:

Relational Data Modeling



Relational Data Model is a data management model devised by Edgar F. Codd in the year 1970. It is considered as one of the most beautifully designed and widely used data models in recent times. Based on the predicate logic and set of theory of mathematics, relational data models help in managing the data efficiently.

A relational data model is implemented in a database where a relation is represented by a table, a tuple is represented by a row, an attribute is represented by a column of the table, attribute name is the name of the column such as ‘identifier’, ‘name’, ‘city’ etc., attribute value contains the value for column in the row, constraints are applied to the table and form a logical schema. Mostly, relational data modeling is used in OLTP systems, which are transaction, oriented, while dimensional data modeling is used in OLAP systems that are analytical based. Relational data modeling is closely related to data warehousing as in a data warehouse environment, the staging area is designed on OLTP concepts. Therefore, the data requires be normalizing, cleansing and profiling before it is loaded into a data warehouse.

Relational algebra operations such as Select, Intersection, Product, Union, Difference, Project, Join and Division, Merge can be performed on a relational data model. Below are the fundamental concepts in relational data models:

– Domain: A domain “D” is the original sets of atomic values used to model data. Atomic here refers to each value in the domain that is indivisible as far as the relational model is concerned.
– Relation (Relation state): A relation is a subset of the Cartesian product of a list of domains characterized by a name. Relation can be viewed as a “table”. In that table, each row represents a tuple of data values and each column represents an attribute.
– Attribute: A column of a relation designated by name and the name associated should be meaningful. Further, each attribute associates with a domain.
– Relation schema: Denoted by “R”, relation schema is a list of attributes. The degree of the relation is the number of attributes of its relation schema. The cardinality of the relation is the number of tuples in the relation.

Following are the terms used in relational data model:

– Candidate Key: Candidate key refers to any field or a combination of fields that identifies a record uniquely. The Candidate Key cannot contain NULL value and should always contain a unique value.
– Primary Key: Primary key is a candidate key that identifies a record uniquely.
– Foreign Key: A Foreign key is a primary key for other table, in which it uniquely identifies a record. Such a key defines relation between two (or more) tables. It can contain NULL value.
– Constraints: Constraints are logic rules used to ensure data consistency or avoid certain unacceptable operations on the data.

A relational data model provides basis for:

– Research on theory of data/relationship/constraint.
– Numerous database design methodologies.
– The standard database access language SQL.
– Several modern commercial database management systems.