Parent-child relationship I
Attributes can be related to each other (like in the case of city, province, and country) to form relationships between them. Furthermore, the tables holding attribute data can be normalized, as in the DimProduct
, DimProductSubcategory
, and DimProductCategory
or denormalized as in the DimGeography
table where we have province and country names repeated in every row. It is important that foreign keys are in place to preserve data integrity and help the RDBMS use the less expensive query path. So, for example, we have in the DimProductSubcategory
table a foreign key pointing to the primary key in the DimProductCategory
table.
If you look at the content of the DimProduct
table, you'll see some records with NULL
values in the DimProductSubcategory
column; there are no sales transactions for those products, but this is a very common data quality problem. Missing values in foreign keys is something we usually have to deal with. If those products had sales movements, it would be difficult to aggregate them at the subcategory or category level.
Every developer has his/her silver bullet to solve this, I personally don't like NULL
values in FK and always try to set them to -1
(whenever possible) creating a correspondent -1
value in the parent table with description n/a
. Why -1
? Because it is very very very unlikely that another code exists with the same negative value.
Note
I've seen people setting NULL
values to 9999
, to later discover that the company had more than 10,000 products and the 9999
code was actually allocated; then I saw the same people UPDATE millions of rows to 99999
WHERE key
= 9999
. To cut the story short, they ended up using an impressive Z99999
modifying an unspecified quantity of table columns from number to varchar… just in case, you never know.
Getting ready
You need to have completed the previous recipe to continue.
We are creating the Country
, StateProvince
, and City
attributes and relating them with parent-child relationships.
How to do it...
Follow these steps:
- In the Attributes folder, create a new attribute with these forms:
- ID: GeographyKey from DimGeography
- DESC: ConcatBlank
(
City, PostalCode)
from DimGeography
- Save it with the name
City
. - In the same folder, create a new attribute:
- ID: StateProvinceCode from DimGeography
- DESC: StateProvinceName from DimGeography
- In the Attribute Editor, before saving this, click on the Children tab on the top-left then click on Add and from Child candidates, move City to the right of the shopping cart and hit OK.
- Now click on Save and Close and name it
StateProvince
. - Create a new attribute:
- ID: CountryRegionCode from DimGeography
- DESC: EnglishCountryRegionName from DimGeography
- Before saving, select the Children tab and add both StateProvince and City, hit OK, then Save and Close and name it
Country
. - Update the schema.
How it works...
Since the three attributes come from the same denormalized table we do not need FKs: the data for parent and children will be selected from the same DimGeography
. Nevertheless, we need to specify which attribute is the parent and which is the child. Sometimes it is not so obvious and we need to do some research to detect the one part in a one-to-many relationship. As a rule of thumb:
- In case of normalized tables: The table with the primary key is parent, the table with the foreign key is child (no keys? Too bad, see the next point)
- In case of denormalized tables: An attribute with low cardinality is more likely to be parent, an attribute with high cardinality is probably child.
When it's not clear at a first look, do some
SELECT COUNT(DISTINCT <column_name>)
on the columns likely to be part of the relationship.
There's more...
If you go to Data Explorer | System Hierarchy, you can browse the elements of the three attributes and see if the structure makes sense. Expand Country, select one country and expand StateProvince, click on one and browse through the elements inside the City attribute. Is Paris in France and London in the United Kingdom? Good.
Some cities are repeated, because the granularity is postal code.
Exercise 5
Repeat the recipe using the DimDate
table and create the following attributes:
Date
: (ID =DateKey
and DESC =FullDateAlternateKey
)Month
: (ID =CalendarYearMonth
and no DESC)Year
: (ID =CalendarYear
and no DESC)
The Month
attribute has Date
as child, and Year
has both Month
and Date
as children. If you receive a warning message saying that ID for Month
is Text
, don't worry, it will work as expected. The curious thing is that the year 2005 begins in July, and the year 2010 only has November (I suppose it is intentional).