IntroductionThis is part of a series of articles on backing up computers. The top page is Design for an Archiving Backup System.
The device and type of media that will be used to perform backups is important to consider. At the current time, and probably for the next few years DVD-R media and DVD writer drives (single and dual-layer) will be the most cost effective per byte, for an analysis of this see Backup Media Costs. In a few years time, say in 2008 or 2009, BlueRay or another HD-DVD type system might become less expensive per byte than DVD. Name brand DVD-R media can often be purchased locally in 100 packs for about $0.35 per disk, and no-name blank media can be found for about $0.15 per disk. These prices make the cost per gigabyte $0.03 to $0.08/GB. Compare this to the cost of using a 300GB IDE drive (current price is about $110) of $0.36/GB and the cost of blank tape media of $0.20/GB (for the 200GB LTO tape) or $0.53/GB for the DAT-72 format. The drive mechanism for DVD is also very inexpensive, especially when compared with tape drives, which can easily exceed $1000. The DVD mechanism is comparable in cost to an external USB enclosure that one might put a large IDE drive into, so that you have removable storage. If you use DVD-RW media you can potentially have several sets of media and rotate among them (erasing the old contents first), so that you reuse the media - however this raises the cost to somewhere between $0.11/GB and $0.22/GB (no name versus name brand) so about three times as expensive as the DVD-R approach, but less expensive than tape.
Other things to consider about DVDs are:
Consider the case of a small home LAN, perhaps not your typical home today, but in a few years it might well be. In this there are several Windows workstations, each with a significant amount of local disk space, most of which is unused. As well there is some sort of file server device, perhaps a general purpose Linux box or a NAS box that is powered up all the time and has a large drive with a significant amount of user-data on it (such as an MP3 and family photo collection). In this situation one might find that there are about 10GB of data files on each workstation and 100GB of files on the file server (unless there is a video collection, in which case things get much larger). So for sake of example we are probably looking at needing to back up something like 150GB of files. This would take about $4.50 to $12.00 to do with DVD-R, and about 3 times as much for DVD-RW (or less than the price of one tape of any size!). With prices like this one could even just make a complete copy of everything once every few weeks.
Next consider how much of that data actually changes over time. You will probably find that while some files are added to the file server every week that almost none are removed or modified. You will also find that while more files on the workstations get changed each week they are most likely limited to a fraction of the total (for example the user's email folders are probably the biggest and most frequently changed). With this in mind one might find that in a week 10-20GB is actually modified or added to the total system. This means that if one could employ some form of incremental backup then there would be an initial cost for storing a complete set of files and then the on-going cost would be much smaller because the cost of additional media is in the range of $0.60 to $1.60 per week (20GB/week no-name and name brand DVD-R media).
Thinking about this a bit further one could consider an incremental scheme with a large number of incremental runs, say once per week or even once per day over a whole year and a once per year full backup costing less than $100.00/year for the media. At this price it may well make little sense to worry about using rewritable media to reduce the long term costs. An interesting point here is that at any point in the year one could potentially restore the version of a file as it was at any previous day in that year.
Again, as DVD media costs are so low it also makes sense to consider ways of improving the system's robustness by storing multiple copies of the files on multiple pieces of media. If this is done, then even if a piece of media is damaged or lost another copy of it is available to restore from. For files that change frequently this could be less of an issue as there might well be a slightly older version that could be used in place of the lost file.
Another thing to consider about a long chain of incremental backups is that the incremental backups are much faster to do than full backups. So a system like this, while initially a bit time consuming because of the time taken to burn the initial 30 DVDs could end up requiring less time in total over the year than a system based on a much more expensive high capacity tape drive. This is especially true as the cost of tapes is going to prevent you from having many tapes to extend the duration of the incremental phase, so you'll probably end up having a pair of 1 week rotations, enough tapes to do a full backup once a week followed by daily incrementals for the rest of the week, and a second set to switch to the next week so you don't overwrite the first set until you have a viable backup on the second set. Think about the cost of 14 blank tapes some day, the least expensive are in the $10 range (but for only 12GB) and to handle a 150GB system you're going to need tapes in the $40 range.
The issue of overwriting is also important in the DVD backup, essentially no overwriting is taking place, so wear on the media is very small and all previous versions are available over a long period of time. While one might start a new set periodically (say monthly or yearly) if DVD-R media is used then all the old sets are still available (and some of the data, such as the family photos, might well be nearly current). A possible strategy would be to make a full backup once per month, and then daily incrementals for the next 30 days and then when the next full backup is made the old set it replaces could be taken off site for extra safety (you're probably keeping the current backup on-site so that it is at hand in case you need to restore something quickly). The cost of doing this would be less than about $18/month. If one used DVD-RW media the cost for this would be on the order of $55/month so one might consider having three sets of media, and just rotate between them each month, this way one always has two or three complete copies (the current month, and the past two months) of everything so there is a lot of serial redundancy.
For added robustness the idea of parallel redundancy should be considered. This is the concept of making two copies of each piece of backup media, containing the same data so that if one piece of media gets damaged the files on it can be restored from the copy. This can be trivially implemented by just creating two copies of each DVD-R disk at the time you burn them. Another approach is to keep one copy on a local hard drive and place the second copy on DVD, this is slightly more risky, since if the hard drive dies you loose all of the "A" copies at once (but still loose no data as there is a second copy of everything on DVD media). This approach does have two advantages: it is faster (and with the right software design could actually happen for "free") and it makes for more convenient restores, since the restore can usually be done from the hard drive copy.
Should additional redundant copies of files that do not change be made on a scheduled basis? The only real benefit to doing this would be to protect against media stability issues, which might be a concern if the archival time (from when the full backup pass is made to when the last incremental is done) is on the order of several years. There have been some studies done on how long CDR media might last (see this one referenced on Slashdot). There is some anecdotal evidence that some CDRs may start to fail within a few years of burning, but there are also CDRs (and presumably DVDs) made which should have a life in excess of 100 years. Given this, it seems likely that the risk of data loss due to bit-rot within a shorter time frame (like a year or less) is pretty small, so additional software and data format complexity could be avoided by just planning to keep the incremental cycle less than a year or so. Then, if you keep a second copy of the current media on a hard drive and you keep one copy of the previous media around (in cause you are reusing rewritable media) you should be very safe. In the event your hard drive fills up during the backup cycle you can either stop and restart the cycle, add an additional hard drive, or make a second copy of some of the media on the hard drive before deleting it from the hard drive to free up enough space to complete the cycle.
Some media has different coatings to provide added resistance to damage from surface scratching, this article shows the sort of additional protection that can be achieved.
If you are using rewritable media such as DVD-RW or an IDE hard drive to store the backup data it is tempting to ask the question: could some of this media be reused before the end of the backup cycle? The reason this arises is that there are some files that are being backed up that change quite frequently during the backup cycle, in fact these may make up the majority of most of the incremental data sets. Consider the email archives of most users, the current inbox and some saved mail folders will change each day, and certainly if an incremental backup is run each day a portion of it will be used to backup the same files time and time again. As these files can become quite large (home users with email folders with over a gigabyte of data in them are not uncommon) this can have an impact. Is this effect large enough to warrant the additional software complexity to allow for reuse of this lost space, and what about the time required to perform the extra steps of purging and reusing media? If we assume that 1/2 of the incremental files are of this type then the maximum cost savings that we could achieve would be 1/2 the cost of the media needed to store the incremental backups. In the example above this would be a savings of less than $2 per week of incremental backup (based on needing to do about 20GB of incremental backup per week, hence saving 1/2 of this, using name brand media).
As a model for pricing full backup plus a long run of incremental backups repeated at intervals here is the formula: