[bok-callout]Collibra has the permission of the Flemish Department of Economy, Science and Innovation to publish this reference case.[/bok-callout]
Introduction
The Collibra Reference Data Accelerator provides a framework for flexible modeling and governing of reference data. Reference data can be modeled in Collibra in many different ways. The way you choose to model is dependent on the business case for managing it. This cookbook explores one approach through the implementation of reference data requirements for the Flemish Department of Economy, Science, and Innovation.
[bok-callout]For suggestions of different approaches to reference data management, go to: Approaches to Reference Data Management[/bok-callout]
Example
The Flemish Department of Economy, Science, and Innovation maintains a funding code classification system. These funding codes correspond to grants and grant classifications. The grants are arranged as a hierarchy based on that classification, and both have start dates and deprecation dates that indicate the time periods in which the codes are/were effective. Sample data is provided below:
Sample Data
Grant | Parent Grant | Generation 2 Funding Code | Generation 1 Funding Code | Funding Stream Codes | Accounting Codes | Start Date | Deprecation Date |
---|---|---|---|---|---|---|---|
Federal level – Study | 4 | ||||||
PPS Science Policy – Impulse Federal Programs |
Federal level – Study | 4.3 | |||||
Action to support the strategic priorities of the Federal Government |
PPS Science Policy – Impulse Federal Programs | 368 | 4250 | G3 | 01/01/2004 | 12/31/2009 | |
NOW |
PPS Science Policy – Impulse Federal Programs
|
355 | 4231 | G3 | 7020 | 01/01/1999 | 31/12/9999 |
AIDS (1991-1995) |
PPS Science Policy – Impulse Federal Programs
|
340 | 4216 | G3 | 01/01/1991 | 31/12/1995 |
Hierarchy
Grant Classification
This diagram shows an example of the hierarchy produced by the pattern of grants having parent and children grants.
Files
A spreadsheet of the data is available from the link below:
Flemish Dept of Economy, Science, and Innovation.xlsx
Challenge
The data presents many reference data challenges, as listed below:
- Multiple Codes – Each grant may have multiple reference data codes. For example, NOW has four different reference codes: 355, 4231, G3 and 7020 each with different effective time periods.
- Business Term Hierarchy – Each grant has a parent grant.
- Cross-Hierarchy Codes – The mappings for the reference data codes must span across the hierarchy.
- Date Attributes – The reference data codes require special attributes for the start/deprecation date.
Solution
These challenges are addressed through some thoughtful modelling in Collibra:
Solution Model
Community, Domain, Relation and Asset Model
This solution model displays the Communities, Domains, Assets, and Relations required to be created in Collibra DGC. This diagram demonstrates the implementation in Collibra based on a small subset of Assets.
Community and Domain Setup
Within a single Community called Funding Community, there are four Codelist Domains and one Glossary Domain. Each Codelist Domain contains the Code Values and a Code Set that groups them.
The Funding Community
This screen shot demonstrates the community/domain structure for this data. All assets are contained within a single community named Funding Community. The assets are further contained in glossary, codelist, and code set domains.
[bok-callout]For information about Domains and Communities, go to: User Guide > Basic Concepts[/bok-callout]
Multiple Codes
Each grant has multiple reference data codes. For example, NOW has four different reference codes: 355, 4231, G3 and 7020. The solution to this issue of multiple codes is thoroughly discussed in Approaches to Reference Data Management. In summary, every grant and grant classification can be created as a Business Term. Each funding code can be created as a Code Value. Each Code Value is then related to the Business Term.
Traceability Diagram of the grant Action to support the strategic priorities of the Federal Government (Actie ter ondersteuning van de Strategische prioriteiten van de Federale overheid) with the three different assigned code values. See user guide’s section on Traceability to render these diagrams.
The grant has a parent grant and three different Code Values. Each Code Value is contained in a Code Set.
Node | Description |
---|---|
PPS Science Policy – Impulse Federal Programs (POD wetenschapsbeleid – Federale Impulsprogramma’s) | The Business Term representing the grant classification or the parent grant |
Action to support the strategic priorities of the Federal Government (Actie ter ondersteuning van de Strategische prioriteiten van de Federale overheid) | The Business Term representing the grant that we are assigning Code Values to |
4250 | The Generation 1 Funding Code Code Value |
368 | The Generation 2 Funding Code Code Value |
G3 | The Funding Stream Code Value |
Traceability Diagram for the Funding Stream Codes Code Set
The Funding Stream Codes Code Set contains all the Funding Stream Code Values. Creating Code Sets to contain Code Values is important. For example, the Funding Stream Code Set could be related to an existing standard or could be designated as allowable values for a database column. (Example: The ISO 3166 Country Codes Code Set can be related to the ISO 3166 standard Asset.)
Business Term Hierarchy
The grant classification hierarchy can be modeled with a single Glossary containing Business Terms related with a “Groups/Grouped by” Relation.
Creating Traceability within the Grant Classification Hierarchy
Creating Groups/Grouped by relations between business terms, implement the child/parent hierarchy between the grants. Notice the result in this Collibra traceability diagram:
[bok-callout]For a detailed description of how to create Relations, go to: Assets, Attributes and (Complex) Relations[/bok-callout]
Cross-Hierarchy Codes
With this approach, any funding Code Value can be applied to any grant across the hierarchy.
A Screen Shot of the Glossary Containing all Grants.
Both grant-to-grant relationships and grant-to-code relationships can be shown within a single view. Using this modeling approach creates flexibility in assigning codes to grants. A code value can describe the parent and all children grants, or there can be separate code values for each child grant.
[bok-callout]For a detailed description of how to create Hierarchies , go to: User Guide > Views > Hierarchy[/bok-callout]
Date Attributes
Both the grant and the funding codes have a start date and a deprecation date. These custom Attribute Types can be added to the Business Term Asset Type and the Code Value Asset Type by modifying the operating model. Filtered views can be created based on the custom Attributes. For example, a view could be created to see expired grants.
[bok-callout]For a detailed description of how to create custom Attribute Types, go to: Metamodel Configuration > Create an Attribute Type[/bok-callout]
Business Term has Custom Attributes Types of Start Date and End Date.
The Start Date and End Date Attributes indicate the grant’s expiration. These Attribute Types did not exist out-of-the-box, they had to be created in the settings.
Code Value 368 has Custom Attributes Types of Effective Date and Deprecation Date.
The Effective Date and Deprecation Date Attributes indicate the grant’s useful time-frame. They are identical to the Start and End dates of the Grants. These Attribute Types did not exist out-of-the-box, they had to be created in the settings.
A Filtered View of Code Values that are missing Deprecation Dates
This view was created in the Reference Data Accelerator. Creating custom attribute types is beneficial to creating meaningful views. By filtering for missing Deprecation Date, it can quickly be determined which code values are lacking this information.
[bok-callout]For a detailed description of how to filter Views , go to: User Guide > Views > Filtering[/bok-callout]
Alternate Approach: Complex Relation between Grant and Code Value
The use of a custom complex relation provides a means to give information about the relation itself. Without the use of a complex relation, it is only possible to show that the business term NOW is simply related to the code value 7020. It is possible that the code 7020 is only related to NOW under special circumstances. The benefit of this of this approach is being able to add information about the relationship between the assets. The downside is that it is more difficult for the user to comprehend the relationships due to the complex nature.
Example of two relations between the same business term and code value due to special circumstances:
The business term NOW “has code value” 7020 for circumstances when the grant has both domestic and international funding. Notice that after 2/16/2014, international funding will have a new code value of 7303, but the code 7020 remains the code for domestic funding.
Traceablity diagram for the previous complex relation example:
Note that code value 7020 relates to NOW through two different gray circles. These gray circles represent some kind of mapping logic. In this case, the mapping logic is a combination of start data, end date, and special mapping condition.
Result
The result is a complete modeling of reference data where Code Values can be cross-walked through Business Terms across hierarchies and filtered by date Attributes.
You have to login to comment.