The Data Hub supplies several services in different use cases:
- Data Processing and Modelling
- User interface in order to include new data
- Analytics tools aiming querying data
The framework is centralized, modular and configurable. It allows, through cloud services:
- Receiving Data and processing them through event scheduling
- Processing large amount of data
- Modelling information in order to give easy access through Analytics Tools
The framework includes different processing layers, each with different goal.
- First Layer: Receiving and checking data in order to automatically identify issues related to sources
- Second Layer: Formal/technical checks adding data type
- Third/Core Layer: Applying historical logics
- Fourth/Modeled Layer: Containing all the models useful for querying
These layers are implemented through different technologies based on needs and goal of each single use case. In particular, the tools are:
- Postgres for relational data
- Key-value technology
- Hadoop for large amount of data
The framework consists in a modular structure. Each component is a different module with a specific goal. This modular approach guarantess different benefits in terms of extension of the solution and new functionalities integration. Each module is implemented through Open Source technologies, like PySpark, in order to guarantee the reuse.
The entire framework is driven through centralized metadata structures in order to speed up new processing flow and easy control the processes. The technology used is Postgres DB.