“Do as you please, I’ll back you up” – The Dave Matthews Band
Data backup is required not just for disaster recovery, but also for legal compliance. OpenStack Swift in itself has architecture to deal with disasters by way of data replication to Zones that are distributed across geographies.
This however does not really constitute a backup strategy as it does not provide point-in-time state-of-data under protection and neither does it comply with the legal requirements for data retention. This is especially true if the cloud storage is being used for email archival.
While there are backup solutions that make use of cloud storage for backend storage there are none that protect the storage system itself. This would suggest that there is a need to create a solution which can handle disaster recovery for object data storage systems and also comply with data storage and retention laws. In this 2-part post we will try to pull together our own thoughts about possible approaches to address these issues.
Components to protect:
1) Proxy Server: Ring builder and its metadata are key to recovering from a Disaster Recovery. Currently builder files are also replicated in the Swift as Administrator action, and not by default. (partitions are assigned to devices in a gzipped python structure. same file is copied over to other nodes for reference. Metadata used by ring builder includes, builder file with ring information, and additional data required to create new rings)
2) Object data from storage nodes: Actual data, that can help in recreating entire Swift setup after the proxy server recovery. A simple rebalance of the Rings can be used to redistribute the data to nodes added/recovered as a part of disaster recovery and mitigation.
Let’s take a Swift environment with 100 Swift server nodes with 3 as replication settings. Thus each file/object is replicated onto 3 nodes, but specifically which 3 is not predictable since it follows the least loaded/ or round robin algorithm. This makes it difficult to enumerate backup targets.
A normal backup would be inefficient due to multi-node deployment and monitoring. The likelihood of a backup job failure could also be higher due to the possibility of timeouts or service crashes and so on. In such a scenario here are three backup strategies that could be used.
- In a large system such as this, where lots of objects could be added / deleted every day, each node could be made a target with a backup agent running on it.
- Since the objects are triplicated, only one of the systems should be approached for backup.
- SQLite or other databases being used by Swift could be queried for the location of the objects and only those nodes would be contacted for backup.
- We can eliminate the possibility of backing up the same files/objects from multiple systems by asking the database to provide unique objects/file locations.
Pros and Cons
This approach can take advantage of the distributed backup, and hence it would be faster. However, the downside is that the backup job will impact node performance to some extent and in extreme cases the backup job could also fail.
About the Author: Mandar Shukla is an avid tech enthusiast, working across multiple domains of Backup, Archiving & Storage, Enterprise software performance, Visual computing, Cloud Lifecycle Management. He is a QA professional since 2006 and follows emerging trends in technology to derive inspiration and imagination required for his QA activities.