Why Bother With Backups
One of the most often overlooked aspects on any computer based system has got to be having a robust backup solution. In most cases backups are only even thought of when something has already gone wrong. On the other hand some say that there is no point since they can't afford expensive tape systems or simply have too much data to make tapes a viable solution. Indeed there does seem to be a serious issue for the average consumer who has a growing collection of digital information in their home. Yet, with more and more of our day to day records, information and priceless memories going digital every minute can we really afford not to have some sort of reliable backup mechanism in place?
Solutions?
- Tapes - They are slow (comparatively) and expensive.
- CDs/DVDs (Optical discs) - Way too small these days. You can't even fit one trip's worth of photos and videos on a single DVD.
- Secondary hard drive - Connected to your system a virus could wipe these out at the same time as the original
- Flash drives - Gettting bigger but never as fast as the hard drives you are trying to backup
- External hard drive - Not too bad but if you move it around a lot you may have issues with longevity
So what are we to do?
Coming up with a plan
With no obvious solution we decided to think outside the box. We started by analysing our data and our data usage habits.
- In most cases we didn't tend to generate more than a Gigabyte of data at a given time and usually far less.
- We had approximately enough data to fill somewhere between 5 to 10 percent of the largest hard drive available at any given time.
- We tended to need access to old information about once a year.
- Otherwise access to old information was generally a case of "Did I really just delete that?"
In other words we normally just wanted access to yesterday's data not something from a month ago.
- We wanted to keep snapshots of most of our data (about 30 percent of the total volume of data) on a monthly basis for one year.
- We wanted to keep records of certain data for seven years (about 30 percent of the total volume of data).
So do a little number crunching (using a max hard drive size of 1 TB):
10% of 1 TB = 100 GB
30% of 100 GB = 30 GB
30 GB * 7 years = 210 GB
30 GB * 12 months = 360 GB
Add that all up and we get:
100 GB for instant online access to yesterday's data
210 GB for 7 yearly snapshots worth of archives
360 GB for 12 monthly snapshots worth of archives
= 670 GB
What about growth? Looking back over last last few years we've seen growth of about 10 percent per year on the overall figure.
The plan
So knowing that we currently need 670 GB of storage and factoring in the 10 percent growth we should be able to fit everything for this year and and following 4 years on a 1 TB hard drive. Ah, but didn't we say there are problems with using hard drives. Sure you can't really transport them regularly without risking data loss (raise your hand everyone who has had a laptop hard drive die in the loud noise sort of way). Right, so if we don't want to move the data off site manually how else can we ensure that we are getting off site backups? Well in our case we don't generate that much new data on a day to day basis. So we figured we could probably push just the new data over the internet every night. Setup a computer at two different sites (our home and our office) with a 1 TB hard drive in each and we are good to go for storage. Add to that some custom software we wrote to backup the contents of all of the machines on the local network (or at least everything we care about), bundle up the changes and sync them between the two servers in both directions over an encrypted tunnel and presto you have an instant backup solution. This gives us roughly the same data protection that you would expect from a good tape based strategy. However, the big difference is that our setup cost 2 x 1 TB hard drives (about $280 total right now) instead of several thousand for tapes and a drive. Not to mention that our setup does not required changing tapes or remembering to take them off site every night. We also get instant access to yesterday's data - No finding the right tape and waiting for it to seek.