Recovering from a HPE StoreVirtual VSA failure

Recovering from a HPE StoreVirtual VSA failure

A VSA cluster is a great option for a resilient HA environment, but how can you recover from a node failure? Let’s take a look…

The great thing about a virtual storage solution is that it’ll sit on top of pretty much anything you want, and allow you to provide a shared, resilient storage platform.

 

Ideal for small deployments of virtual infrastructure or ROBO, the ability to incorporate multiple layers of redundancy into your design as a definite selling point.  Sit the VSA on top of a RAID5 array of host local disk and Network RAID10 to a second node for example, and you can tolerate failures at disk or host level and all without the additional rack space, CapEx and knowledge needed to deploy a SAN solution (but with Enterprise features- Woo!)

All well and good, but how easy is it to actually recover from a failure?  If one of your VSA nodes falls of the face of the earth- sure your environment will stay up as you’ve deployed a HA setup (right?  RIGHT??!) but how do you get back to full resiliency as quickly as possible, without putting your data at risk?  It’s a great question, so in this post let us look at such a scenario.

So for the purposes of this example we have two ESXi hosts both stuffed with disks built in a RAID5, with a HPE StoreVirtual VSA deployed on each.  We installed the CMC (Centralized Management Console, no less) and created us a Management Group and cluster for our VSAs, and all our Virtual Volumes are built as Network RAID10 to mirror the data between the two hosts.  A VIP means that should we lose a host or a VSA the data can still be accessed.  All sweet?  Sweet.

But then oh no!  A localized air con failure pops two disks in one of your host and your local disk RAID5 array falls up it’s own backside and your VSA has the storage pulled from under it.  It ceases to exist.
Don’t panic, your data is still there being served from the remaining VSA except now you’ve got more errors and warnings in the CMC than you can physically read.  Do. Not. Panic.  Here’s what to do:

  1. First things first, fix the air con and swap out those disks.  You’ll need to rebuild that RAID5 from scratch but it’s okay, go for it.
  2. Next deploy yourself a brand new VSA just as you did before – I’d go with a new name and IP address here to avoid any confusion
  3. Now we’re getting to the meat of it.  Launch the CMC and log into your Management Group
    Log into the Management Group
  4. Find your new VSA in the CMC, and add it to the Management Group
    Find your VSA
  5. Over a TB of data and you have a license for the VSA?  Now’s a good time to swap that out.  You will still be able to see your failed VSA in the CMC, highlight it in the tree and you can find MAC address the license would have been locked to.  Contact HPE support to switch the license from the old MAC to the MAC of the new VSA (You’ll need your entitlement number more than likely)
  6. Okay now we’re getting serious.  Right-click your cluster and expand Edit Cluster, then choose Exchange Storage Systems
    Exchange storage systems
  7. Select your failed VSA as the system to exchange and click Exchange Storage System
    Choose failed VSA
  8. Then choose the new VSA to exchange it with and Click OK.
    Exchange Storage System
  9. Confirm it all and watch in amazement as your new VSA seamlessly replaces your failed VSA, and all your data is automatically mirrored back onto the new VSA.  Depending on how much data you have this could take a while.  You can always ramp up the local data bandwidth in the Management Group settings to speed up the rebuild rate, just make sure you leave enough for your live data access.

 

So there it is, simple no?  You should be back to full resiliency in no time.  Enjoy.

 

vM

Related posts