Scalable Data Warehouse in Hadoop = Data Vault 2.0


Our chosen modelling technique is, of course, Data Vault 2.0. This architecture will be extremely well shaped for Agile, Big Data Environments and easy to automate from your raw/stage layer to the presentation or access layer.

 

Finally real Agile on DW


Data Vault 2.0 is a modelling technique that everything that you do won’t refactor it neither you will need to do regression testing every time the model grows in number of objects or complexity as the whole hub and spoke methodology will let you only append new objects and never delete or modify existing ones, reducing the risk and allowing to have every 2 weeks (or every agile sprint) value without knowing what will be the final ERD from the very beginning.

 

What‘s the difference with Dimensional or 3NF modeling and the Hub-and-spoke (or DV) modeling
What‘s the difference with Dimensional or 3NF modelling and the Hub-and-spoke (or DV) modelling

 

Data Vault 101 – Beginners Guide


  • Separation of business keys (Hub) and the rest of the data
  • Relationships between business keys (Links)
  • Descriptive attributes (Satellites)