Centralized Data Hub – What is it and why do we need it?

Why should a centralized data hub be used to capture and store data from different sources:

  • Different sources will have different data models which might also be used in a different way. Therefore centrally storing it by applying a shared meaning and structure of the ingested data is crucial. Hence, no literal copying of data should be applied. For this to work, source applications need to create a interface towards the Inbound process that complies with the agreed data definitions, meaning and thus agreed data model structures within the Logical Data Model and Canonical Data Model.
  • Ingested data must be described in a catalog.
  • Data from sources towards the centralized data hub must be decoupled. This means that the sourced data models must be converted and translated in a shared organizational data model, Logical Data Model and Canonical Data Model to ensure the shared and understood meaning of data. This enables the ability to decouple a source with a new source. By complying with the organizational shared data model, the impact is reduced to only the source layer and not the whole data pipeline.
  • Difficult to create reports when the meaning of data is not properly described and applied when storing it.
  • Inability to find data in your landscape.