This is part three in a series of posts discussing lessons I learned from helping a partner engineer a 400+ TB VSAN Solution. It turns out that high capacity VSAN nodes are a slightly different beast than the standard off-the-shelf solution. Part one discussed different density issues as related to disk configurations. Part two worked through a hypothetical 1TB/day, 365 day retention engineering solution. In part three, I’ll be continuing with that example and exploring possible models for purchasing and growing such a solution.
To review where I left off, our hypothetical customer is trying to store meteorological data which is generated at the rate of 1TB/day. It needs to be retained for 1 year, and needs two copies of the data (Failures to Tolerate = 1). We calculated that this means a the underlying disk system would need to de-stage writes at the rate of 5.8 IOPS (depending on 4MB block writes). The entire storage solution will need 1054 TB of raw space to accommodate the requirements, including maintaining 30% slack space and 1% disk formatting overhead.
I’d like to discuss the ways in which a customer might want to think about purchasing this storage. One way is to buy three to five years of projected capacity, just in case. Because our data only needs to be retained for a single year, a one year’s capacity on the solution is the same as three or five years. Of course, the customer might have other ideas for what should run on the solution, once they have a large vSphere cluster in place. There’s plenty of room for scope creep, and it might actually help to fund the solution, as long as the customer is disciplined on what solutions have priority and are paying their way on the infrastructure.
Another way is to secure funding for the full solution, but to buy less than the full year up front and periodically purchase and deploy additional high capacity VSAN nodes as needed. This might be done for business reasons (cash flow, spreading CapEx across different periods, etc) or political reasons (needing to prove the concept in production). With that in mind, here’s a way to start out a production implementation, model the capacity it represents, and model purchasing cycles for additional nodes.
The VSAN 6.0 Design and Sizing Guide points out that though the minimum host count is for a production VSAN cluster is three, that leaves workloads (with FTT=1) with no place to recover to if a host outage is experienced. So a strong consideration should be given to four host cluster as the minimum to deploy (especially when each node represents 72 TB of data which needs to be recovered). Let’s calculate the capacity available in a 3 + 1 solution.
So a 3 + 1 host configuration gives us 75 days of capacity with full redundancy, 30% slack space, and 1% formatting overhead. The question now is what amount of capacity the organization wishes to incrementally purchase. For each host, we can do a similar effective capacity calculation:
Each host represents 25 days of capacity. With this in mind, we could advocate purchasing an initial supply of storage representing 90+ days and schedule a review with 30 days of supply left to see whether expectations on the solution are being met. With that in mind, the next purchase can be for some increment of 25 days of capacity (50, 75, 100, etc) which represents some comfortable cadence of review and approval of pre-approved funds. In theory, this could be coordinated to take advantage of vendor promotions, end-of-period sales, or just the convenience of the purchasing organization. Perhaps they prefer to review and purchase twice yearly. Perhaps they want to purchase monthly. The point is that this can be adjusted for the comfort of the purchasing organization.
Up Next: Examining High Capacity VSAN Nodes From Various Vendors
- 42ce4c644fd7bb43_640_time-deadline: pixabay:geralt