OpenStack Swift Backup Strategy – Part 2

In the last post we identified the need for a solution that addressed how to handle disaster recovery for object data storage systems and also comply with data storage and retention laws. We defined a base-line scenario and a likely approach that could work for that scenario along with the pros and cons of that approach. In this post we look at a couple of more possible approaches that could work for the same scenario.

Approach 2: This approach is much more stable and also simpler. We could call it a modified off-host backup.

  • Here we could consider backing up from a specialized storage node that stores the object copies for all the storage nodes in the setup. This is similar to a journaling mailbox where a copy of every communication within the organization is sent.
  • This type of special storage node can be made a part of the Swift by tweaking Rings or the Ring code to allow the node to act as a passive node that can only receive and acknowledge objects getting created on the Swift.
  • The backup agent of an existing backup product could run as a service/daemon on the storage node, and provide file level or block level protection as required.

Pros and Cons
One significant advantage of this method is faster backups, as only one node needs to be contacted and there is negligible load on production server nodes. The main disadvantage of this approach is the large amount of storage that would be required to initially copy the backup data on off-host, and then recopying the block level data to actual tape/disk by the backup application.

Approach 3: This approach attempts to leverage Swift’s own APIs to get objects and other data that needs to be protected. It should be possible to use these as a backup stream for the backup application.

  • This can be achieved by creating a read-only proxy in addition to a customer facing proxy server.
  • It should function the same way as any other proxy server that works in parallel, but without the load balancing and external interfacing network.
  • This RO proxy has just one purpose as such, to act as backup client.
  • Using Swift administrator credentials we can monitor the status of the swift setup and also get statistics and download data (data under protection) to storage as needed.
  • A good backup strategy includes periodic Full and frequent Incremental and Differential backups. The ability to differentiate between a “Full” and an “Incremental” backup would be possible by querying the SQLite database the system uses, for changed files. By using the Swift-client we should be able to download all objects (data under protection) to an attached disk storage, which can then be backed up by a normal application.
  • For a restore the Swift-client should be able to upload the object/objects which will run through a normal upload process, making the data available point-in-time.

Pros and Cons
The main advantage of this system is that we can save on storage space needed for staging the backup. In addition since the backup and restore process essentially involves Swift’s own API we can eliminate errors related to consistency issues and also rebalance the storage nodes.

Web based services like Netflix’s video service based on Amazon and Google Drive outages have taken hits due to infrastructure failures and subsequent delays in getting the system back online. As Swift in itself does not recommend classical RAID and other types of traditional HA measures and keeping in mind the need for legal compliance a solid backup strategy is needed to provide point in time recovery from disasters. We have outlined here possible approaches for a cloud based backup system to protect itself from disasters. As we have noted each approach has it’s own pros and cons however a distributed system in the cloud could leverage existing infrastructure. What do the readers think of these approaches ? Are there any more that you can think of ?

About the Author: Mandar Shukla is an avid tech enthusiast, working across multiple domains of Backup, Archiving & Storage, Enterprise software performance, Visual computing, Cloud Lifecycle Management. He is a QA professional since 2006 and follows emerging trends in technology to derive inspiration and imagination required for his QA activities.

About krenoadmin