The ASUS P5E Motherboard

Backing up Computers

Copyright 2010 by Stephen Vermeulen
Last updated: 2010 Oct 24
Computer Cases





120px adapter aimed album algorithm ap athlon bbc blocks body border cdrom chunk conventional couple created cycle day dec fly freshmeat fun hit loose manager managing needing nov nt4 particularly picture protection question rate recently recovery redundancy references replace restored results sections secure seem sets setting sheet sold special thoughts tomshardware trying university viable worked

There are a few different approaches to the problem of backing up a PC these days:

  1. don't worry-be happy, backing up is hard to do so just restore everything from scratch after your system crashes. The big problems with this is you are going to loose a lot of data (that you created or downloaded...) that is not on those install disks, and its going to take hours or days to restore your system.
  2. use a conventional backup program, perhaps even the one Microsoft supplies. These tools will pretty much write everything to a tape, a CD or some other form of removable media (or perhaps a second hard drive). My ArcvBack program suite is a set of backup and restore utilities that implement a conventional (full plus incremental) backup system, it is particularly suited to disk-to-disk (D2D) and disk-to-DVD backup strategies.
  3. use an image backup tool (like Drive Image or Ghost or Roxio's Take Two (no longer sold), NovaStor's InstantRecovery, V-Com's DriveImage (part of DriveWorks which is reviewed here), and perhaps even the open sourced PartImage with home page here). NTI's Backup Now also does image backups. Slashdot has an article on imaging tools here.
  4. Clonezilla (mentioned here) is a drive imaging and cloning system based on Linux that can be run from a server, bootable CDROM or bootable flash drive.
  5. Acronis' TrueImage  also does image backups. In June 2003 I used version 6.0 of TrueImage to backup an 18GB SCSI drive to files on a spare 40GB IDE drive and then restored this image to a new 36GB SCSI drive while increasing the sizes of some of the partitions at the same time. This worked quite well, even though the partitions were a mixture of FAT16, FAT32, NTFS and Linux ext2 types. It appeared to be about the same speed as Ghost. In fact, Acronis was running a special at the time whereby you could upgrade from Ghost to TrueImage for US$9.99. TrueImage also has the ability to backup to a network drive, it gets its network configuration from DHCP (so if you have a DHCP server this is a breeze) and it allows you to backup to any Windows shared drive you can supply a user name and password for. I was quite impressed with their installer, when it offered to make bootable disks for standalone backup or recovery it detected by 24x CD burner and offered to burn a bootable CDROM. I put in a blank CD and in less than a minute it had burned the disk, and it even works!
  6. The freeware DrvImagerXP does this for FAT and NTFS. These tools will write an exact copy of your hard drive, or selected partitions on it, to some other drive (perhaps across a network), or to some removable media such as tape or CD. The advantage this has is that the restoration process can be a lot faster since you do not have to first install an operating system, the correct drivers for your system and then the back utility. How much faster? With an image restore program you can install a brand new hard drive and have the operating system and a couple of gigabytes of user files back and the system working in about 30 minutes, while if you were to use the conventional route it is unlikely that you would have finished installing the new operating system by then. So for a PC used in an office you are looking at a 30 minute repair process instead of a 3-6 hour job.
  7. the new g4u (ghost for unix) is another possibility, especially if you want to keep the backup information on a small UNIX server
  8. get a RAID system. This will not completely remove the need for backups, but it can protect you against the all-to-common single hard drive failure. So for some people or for some types of data, this might be a viable option.
  9. BootIt, from TeraByte Unlimited is a boot manager, partitioning tool and imaging tool in one.
  10. Second Copy 2000 sounds like it might be a useful approach for backing up files that change often, by copying them to other drives on your network.
  11. HiveCache sounds like it might be a useful tool
  12. StompInc has BackUp MyPC (formerly Backup Exec Desktop by Veritas)
  13. WinImage, a shareware program allows you to also do "image-type" backups, it started with floppy disks but seems to support other types of media these days
  14. Lazy Mirror, will copy changed files to an archive on a separate partition
  15. Terabyte unlimited makes some drive image backup/restore software for Windows, DOS and Linux
I personally favor a combination of the second and third approaches, especially if you are using your PC for business purposes.

NIST has an artical on storage and lifespan of various backup media. CDs may be more mortal than we currently think.

Here's an article from Tom's Hardware on the available types of tape drives. A big tape drive is nice, but using a small drive for large data sets is possible, so long as you're patient (my 12GB DDS-3 unit takes a day to do a backup and full verify of my 34GB data repository partition). The old DAT based DDS series of tape drives has recently been extended to 36GB with the DDS-5 (or DDS-72) standard. Sony has demonstrated its new blue-laser based system for storing up to 23GB on a DVD disk, this is the Professional Disc for Data.

The DVD Forum (Feb'04) has approved the HD-DVD specification, which looks like being 20GB on a single sided disk. When writable versions of these become available this will make for some convenient and cost-effective backup media.

This discussion (and this later one) on Slashdot talks about some of the options, including backing up to removable IDE hard disks (which are almost as inexpensive as an equivalent set of blank tapes). For example, on 12-Dec-01 a 120GB drive could be purchased for US$265, while a 10 pack of DDS3 tapes (4mm, stores 12GB native) which would have a total capacity of 120GB is US$200 (prices from Dirt Cheap Drives). In theory the drive is a less stable storage mechanism than the tapes, but if you were to have a set of 3 or 4 backup drives and rotate through them the chances of loosing much recent work are fairly small - especially if you physically remove them (perhaps using a drive caddy from StarTech, Lian-Li, or DataPort) from the PC when they are not in use. The drives also have a huge speed advantage over tapes, which can be translated to doing a more complete backup each time.

Another way of looking at the pricing is to say: "how much drive storage could I buy for the cost of a tape drive"? The DDS3 drives seem to be about US$620 and the DDS4 drives (20GB native) are about US$880 currently. So for the price of the tape drive alone one can purchase a set of 2 or 3 120GB backup hard drives. So the hard drive solution is more cost effective right from the start (before adding the cost of blank media, at least 2 sets of 10 tapes, to make the tape drives actually useful).

Another way of looking at this issue is to see how long a backup actually takes to run, this imposes a practical upper limit on the amount of storage you can backup with any solution. In the case of the DDS3 backing up an NT server (the tape drive is in this machine) and one workstation (across a 100baseT network), a backup of 10GB of data is taking 3 hours, it takes an additional 3 hours to verify the backup. That's about 55MB/min or 925kB/s. A hard drive solution should be capable of a lot faster backup, although with a 100MHz network in the picture getting about 180MB/min across the network will probably be difficult, still one could backup (and verify) about 30-50GB of networked machines in a single evening, which is not possible with a single DDS3 or DDS4 tape drive. Since the hard drives are relatively inexpensive, if one had several machines with big backup requirements one could just install extra swappable drives in each machine, by passing the network limit completely.

To test this approach I installed swappable drive mountings in two machines (caddies or cartridges might be another name for these). The ones I got were the RH Series Mobile Rack Removable Frame for 3.5" HDD by Lian-Li, the model was the RH-32 for use with up to ATA100 drives. They seem reasonable, partially aluminum and partially plastic in construction, fit well and load and unload pretty smoothly. An 80GB Maxtor 7200RPM drive fits nicely and the included fan keeps the drive pretty cool. Of course one of the machines I wanted to test this with did not have a BIOS that supported IDE drives over 8GB, so I had some issues with it. In the end I got a modern ATA100 PCI card adapter rather than trying to update the BIOS, as for some reason the BIOS did not want to flash the last time I tried.

The next fun was partitioning the swappable drive, since I wanted to use Norton Ghost to write images to the removable disk I decided to partition it as a single 80GB FAT32 partition. I did this using System Commander 2000 without any apparent problems. However, I soon discovered that although I could use this drive from Windows 98 and Windows 2000 I could not access it from Windows NT4.0.  But, you say, NT cannot do FAT32! On my NT box (which I dual boot to Win98 for games etc.) I have installed a FAT32 driver from Winternals which does a good job of this on a 6GB FAT32 partition but for some reason it did not see the 80GB partition. Turns out that they currently only support up to a 32GB partition size (which is also the maximum size that Windows 2000 will allow you to format as FAT32 - if you want to go beyond this you have to format it as NTFS). So now to use FAT32 I need to split up the drive into 3 partitions. It turns out that DriveImage will allow you to write to NTFS partitions, so if I decide to go that route I'll probably just repartition the drive as a single NTFS partition.

My first preliminary tests with Norton Ghost 2002 were showing a backup speed of about 100-130MB/min, which I asked their support about as it seemed a bit on the slow side. They think this is pretty normal. I was able to backup a single partition fine, but when I went to backup an 18GB drive (about 50% full) Ghost aborted the backup shortly after asking for a second output file. This happened on another test, so there seems to be some sort of issue that I need to work out.

As a second approach I dusted off an older copy of DriveImage (version 4) and gave it a try. It ran the job smoothly and was a fair bit faster than Ghost (up to 200M/min on most partitions, but on my digital photo album partition (lots of JPEGs) it dropped to about 90M/min) and didn't have an issue with the large backups. Both utilities want to only write a maximum of 2GB per file, and automatically create additional files as needed (except Ghost wants to stop and ask you for the names of the additional files - its probably got a setting not to do this somewhere).

Note the above speed measurements were on a Pentium II, 400MHz the drive that was being imaged was a 7200RPM Ultra2 SCSI drive and the destination drive was a 7200RPM ATA100 drive. When I did some imaging on another machine (a 600MHz Celeron with ATA66 controllers) that was all IDE based (where the source and destination drives were attached as the master to two separate IDE channels) I got about 110M/min throughput (with compression enabled). So I would conclude that right now you are looking at about 100M/min backup speed with this sort of setup - which is about the same speed as a DDS-3 tape drive gets (before the verify).

10-Jul-02: I have since employed the same technique on a machine I built for work, this uses a dual Athlon 1800+ TYAN motherboard with integrated Ultra-160 SCSI and ATA-100. The system drive (which I am backing up, is a Seagate Cheetah 15K RPM 18G SCSI) and the backup device is an 80G 7200RPM Maxtor attached via the ATA-100. On this machine, using Drive Image 5, with compression enabled (low or high settings) backup speeds are a consistent 560MB/min. With compression turned off it has hit the 1GB/min mark. So CPU performance seems to still be a limiting factor with this type of software.

The marketing material for DriveImage version 5 claims to have improved the performance of its backups, I'll report on this if I get a copy of it installed - I think I can do a test with their free trial version. Well, I ran a few tests of DriveImage 5 on the all IDE system, and got the same results (between 90 and 110MB/min depending on no or low compression). One thing I noted is that this version seems to use much larger buffers (possibly all of free memory, it seemed to read from the source drive about 200MB and then write that chunk in one long burst). Anyway I didn't see any of the "improved performance" their marketing was claiming. Given that these sort of drives are capable of large transfer rates in the 10-30MB/sec range (and I have seen them do large file copies under Windows NT in the 10MB/s range) the speeds that DriveImage and Ghost are achieving seem to be very low. Maybe I'll try a drive to drive copy and see how fast that goes (it will eliminate the file system aspect I think).

27-Apr-03: here's an article about using large IDE-RAID arrays as the backup medium for a university campus.

22-Jul-03: here's an artical on building your own CD-R changer. A bit bulky, but nice work, none the less. There must be some sort of commercial "hopper loaded" auto feeding system, but buying one will probably get the RIAA to pay you a visit.

30-Oct-03: here's a Slashdot article that discusses ways of distributing a data set across a network. A number of the suggestions get into software to replicate selected directories. One link references Venti, a which sounds like a rather interesting approach, and with its hashing of data blocks to determine their uniqueness might prove to be a good way of speeding up a tape based storage system.

23-Jan-04: some people are even outsourcing their backup needs. While this may be infeasible for a home environment (even with DSL or cable modem) it might be possible for some companies.

Notes on Recovering Failed Systems

  • Using a Linux live-CD distro to get access to files on a Windows machine that will not boot anymore
  • Bart's Preinstalled Environment - BartPE, is a way to build a live-CD like environment (like the Linux Knoppix distribution) for Windows (XP, 2003 and it sounds like 2000 too), so you can boot a machine from the CDROM image and work on it using tools on the CD rather than needing to install anything on the hard drive. One possibility for this (as it sounds like it gives you read/write access to NTFS) is to boot a PC this way to enable you to do a true stand-alone backup (without any files being opened or used by the many running processes). Bart Lagerweij also has a number of other boot disk related items for older Windows on this site.
  • ERD Commander from Wininternals is a bootable system recovery CD, they also have NTFSDOS Professional listed but it only seems to be available as part of a package.
  • Allowing Linux to read and write NTFS file systems via Wine.

Other Articles

Various Products

Why You Should Have Your Own Backups

  • PhotoPoint has finally gone, now there are people who never kept their own copies of their own photos
  • Midwinter, had co-located their server. When the company they co-located with collapsed their servers were seized and sold
  • StorageReview ran into the problem of loosing data because of an incomplete or corrupted backup set (StorageReview.pdf)
  • Steve's Digicams ran into the problem of loosing data because of no backups (May 2000)
  • And you can't rely on RAID either....
  • Slashdot discusses the case of Journalspace.com who in early Jan'09 lost their entire site due to not having a backup. They were using a RAID-1 drive set to protect their database against drive failure, but something overwrote the whole database (and the RAID-1 just replicated the overwrites, ensuring total data loss).
  • Slashdot discusses the case of Ma.gnolia which in Jan'09 lost its entire user database (about 500GB). This database was on a RAID-1 array, but something corrupted the database files and their backup program just copied the corrupted data to an off-line drive, and in doing so overwrote the last valid backup. A transcript of an interview about this is here.

Other Links

  • 2010-Oct-24: Drive SnapShot is a drive imaging package that supports on the fly imaging while running Windows. [9421]
  • 2010-Jul-15: There are issues with Apple's Time Capsule backup system. [9300]
  • 2010-Apr-07: Opendedup is an open-source project to make a deduplicating file system. [9059]
  • 2010-Mar-31: When Windows Home Server crashes hard you might have some work to do to get your data back. [9044]
  • 2009-Dec-04: Slashdot discusses adding additional headers and error correction codes to files to protect against bit rot. I would prefer an approach that builds this into the file system so that everything on a disk partition that is formatted this way is protected. [8818]
  • 2009-Nov-23: While not a backup technology, SUN's ZFS is getting block-based file content deduplication to make more efficient use of storage media. [8787]
  • 2009-Oct-19: In Oct'09 T-Mobile's Sidekick users lost their data because of a server failure at Microsoft/Danger (I guess a name change to Microsoft/Safe will be on its way soon). This calls into question the practice of trusting your data to the cloud. Slashdot discusses the Sidekick issue here and has another discussion of problems with cloud storage here. It looks like the finger of blame is being pointed at outsourcing. Microsoft may be able to recover this lost data. Looks like most of the data has been recovered. [8618]
  • 2009-Sep-29: Clonezilla is a free hard drive imaging software package, it gets recommended here. [8583]
  • 2009-Sep-01: BACKBLAZE provides an online data backup service with unlimited storage for $5/month. Of course its effectively limited by the number of GB you can upload per month, but still that could easily be in the range of 300-1000GB per year on some cable internet plans. [8443]
  • 2009-Jul-20: Slashdot has another discussion about the Best Home Backup Strategy. [8305]
  • 2009-Jul-08: can is a simple incremental backup program (similar to tar) written in Python. [8242]
  • 2009-Jun-17: Even if your computer uses RAID-1 and the hard drives don't fail there are other failures that can destroy the data on the drives, like this one that affected a server hosting a number of virtual private server nodes. [8159]
  • 2009-Jun-02: Slashdot discusses data recovery software. Suggestions include: [8061] [1] [2]
  • 2009-May-29: An unexpected side-benefit of an off-site backup program, a laptop thief was caught because the computer he stole (and then used for his own) ran a backup program that copied the crook's photographs to an off-site service. [8055]
  • 2009-May-15: Another case of a web site being hacked and both the server and its backups destroyed. This happened to avsim.com (discussed here on Slashdot) in May'09. Again the lesson being taught is that backups must be external to the servers. It is also important (though this might not have been the issue here) that new backups do not immediately replace old backups, otherwise a few failed backups (or a few successful backups of a corrupted server) will wipe out any useful backup data. [7978]
  • 2008-Dec-14: Slashdot discusses options for long term data storage. This comment lists some of the previous related discussions that have happened on Slashdot. The consensus seems to favor multiple copies on hard drives with periodic testing and migration to new drives before failure takes place. This article gives a formula for calculation of the Mean Time To Data Loss (MTTDL) for multi-drive arrays based on the number of drives, their expected mean time to failure, the degree of redundancy (one or two independent parity channels) and the time to replace a failed drive and rebuild (which can be measured in days if you don't notice a failure right away). With this approach, using the Seagate 1.5TB drive with a quoted annual failure rate of 0.34% less than 2% of these drives should fail in 5 years (the warranty period) so taking 5 years as the mean time between failures should be very, very conservative. Then, if you have a RAID array with 3 data and 1 parity disk (or a 4 drive RAID 5 system) and it takes you a week to detect and replace a failed drive then the MTTDL would be (working in days):
    (5*365)*(5*365)/(4*(4-1)*7) = 39650 days
    
    or about 108 years before you had 2 drives die within the 1 week replacement window and lost your data. Alternatively you might use each drive as a simple redundant copy of some data, so if you have 3 drives you put the same data on each, then once a month you check each to see if it is still fine (perhaps you put more data on it at that time as well), then using the same conservative 5 year MTBF you would have:
    (5*365)**3/(3*(3-1)*(3-2)*(31)**2) = 1,054,178 days 
    
    or 2888 years before you had all three drives die within the same 1 month window and lost your data. So it looks like just putting your important data on two or three external hard drives which you periodically test and refresh should be safe enough, and the more copies you have then safer you will be. Of course, with multiple copies you can place some of them in off site storage which will help protect against fire, theft, flood and other catastrophes. [7342]
  • 2008-Oct-20: Optar is an system for encoding data onto printed paper, with it you can print out about 200kB of data on a single sheet of paper with your laser printer and read it back in with your flatbed scanner. [7058]
  • 2008-Oct-04: Slashdot discusses distributed storage and backup. [6982]
  • 2008-Aug-22: pysync is an implementation of the rsync algorithm in Python. [6722] [1]
  • 2008-Jul-25: A cry for a way of doing better backups that asks for a non-proprietary system that would use a NAS drive as the backup media and allow the user to browse back in time for earlier versions of files. This is something arcvback can do. [6580] [1]
  • 2008-Apr-23: Slashdot discusses storing data for the next 1000 years using a hard drive based approach. [5952] [1]
  • 2008-Apr-16: Thoughts on a script to archive critical files. [5828] [1]
  • 2008-Apr-11: A simple Python script to backup files in a directory tree. [5768] [1] [2]
  • 2008-Jan-07: CartBak makes a hard drive cartridge system intended for data backup. These are compatible with the GoVault dock from Quantum. [4596] [1]
  • 2007-Aug-25: ArcvBack is a backup program I wrote for backing up key directories on a small LAN. It runs a very long series of incremental backups on top of an initial full backup, this makes it run very fast on most days, except when the full backup to start a new cycle is done. It has the further advantage that you can restore files from any incremental since the start of the cycle. [580] [1]
  • 2007-Aug-24: Arnie, a simple backup system written in Python [417] [1]
  • 2007-Aug-24: can, a simple incremental backup program in Python, very similar to the UNIX tar utility [418] [1]
  • 2007-Aug-24: Cedar Backup, a backup program written in Python [419] [1]
  • 2007-Aug-24: sync2cd, a tool for incremental archiving to CD/DVD [420] [1]



              back to vermeulen.ca home