Presenters:
- Michael Maes, Big Data Specialist, Data Analytics Manager
- Michaël Tandecki, AI Expert
Description:
Michael Maes and Michaël Tandecki discuss how Collibra is embracing a customer-centric approach to its research initiatives and how it is being applied to real-world data issues. One real-world issue encountered by our customers is having duplicate assets across their glossaries. This use case has been explored through the development of Gloria, otherwise known as the Glossary Rationalization portal.
Michaël Tandecki demonstrates using Gloria for duplicate detection and similarity scoring. Gloria has two key features, Compare On-the-Fly and Glossary Lists. The Compare On-the-Fly feature allows you to post two definitions, in a Google Translate-like fashion. The definitions are checked for similarity and given a high/low scoring. The Glossary List feature allow the user to export a glossary from the Data Governance Center and import into Gloria to check for duplicate assets based on the definitions and/or the name. Gloria can also find duplicates across several glossaries. Once these glossaries are imported into Gloria, duplicates are detected, the user provides feedback to the results and the updated glossaries can then be re-imported back into the DGC as a CSV file.
This roundtrip experience allows for assets to be identified, evaluated, and edited if necessary, to then update your glossary. The Gloria portal utilizes machine learning to reduce the time and tedium required to identify duplicate assets in the glossary, providing more time for your team to address other responsibilities.