If you’re familiar with data warehousing, you know all about the Star Schema model. Introduced in 1996, the dimensional model contains a central fact table with other dimensional tables acting as points of the star.[1] This model has long been a standard in warehousing, but today, roughly thirty years later, its age and limits are starting to show. With the advances made in technology, not to mention the increased affordability of massive compute and storage today, there are better options available to a business looking for a complete data analytics system. Here are three key issues with the Star Schema model and how the Data Vault resolves those pitfalls.
1. The first issue we come across with the standard Star Schema model is that due to its bottom-up design, it lacks the ability to scale in size organically – meaning it requires refactoring to increase the breadth of business questions a single star schema can address. In a world where a business may have continually changing data needs and ever-increasing questions to be posed, the singular fact table in a Star Schema setup is limiting. With a Data Vault model, scaling is seemingly unlimited thanks to the concept of the separation of concerns implemented through the DV2 hubs, links, and satellites which are designed to manage raw data assets for mass utility.
2. To keep a Star Schema model operating effectively, the data in the fact and dimension tables need to be kept up to date, so what happens when changes are necessary at the source systems? The Data Vault model is designed to absorb source system changes without the need to refactor your historical data. New links are created as new relationships emerge or are discovered, and old links may be closed off to account for changes without losing history; keeping systems organized, but also highly auditable. When an attribute changes, the change history is logged in a satellite for reference purposes. If you are worried about data governance and auditability, DataVault modeling is a smart choice, because you will always have access to the accurate history and lineage of your data.
3. While a Star Schema model is simple in form and easily targeted towards a specific data analysis need, the requirements of data conformity make the model less flexible. The singular fact table inherently limits the kind of dimension tables that can be added. Adding or removing one or more dimensions to a fact table directly impacts the grain of the data that the fact table represents, requiring a recompute / refactoring of summary, aggregate, and other mathematical computations that may be stored in the fact table. A DataVault structure on the other hand is incredibly flexible thanks to its decoupled modeling structures that are not limited in number or size. Additionally, a DataVault model will not leave its consumers tied to a singular type of outcome structure, giving a business more autonomy through its support of cubes, web services, flat wide tables, or any other outcome format required – including Star Schemas.
While it remains a common tool in data warehousing, business data needs are changing and shining a light on the limitations of the Star Schema as the foundational model for enterprise analytics. Does your business need a data warehousing solution that is flexible, scalable, and auditable? Data Vault modeling may be just the solution you are searching for. To find out more, you visit us here.
[1] KIMBALL, R. (1996): The Data Warehouse Toolkit, New York: J. Wiley & Sons