new laptop phase 2: rough edges of versioned backup
31 Mar 2007 I've been busy writing a guide on setting up the versioned backup I want. The script to automate it all is very basic right now, and I didn't bother with all the little details.But it's time I do that now.
As a standalone solution, I could automatically backup the directory using that script right away. But the problem in my case is that my homedirectory is encrypted with EncFS. When I'm not logged in, the script (in my homedirectory) can't be executed and the data in my homedirectory can't even be read.
So, when I'm not logged in, the script shouldn't be run.
Also, I need to keep in mind that the remote site might not be available. There are many ways this can happen.
- I might not be online
- maybe I'm online but don't want to backup because my bandwidth is too small (dialin, GPRS, broadband connection with capped upload)
- maybe the remote server is down and can't be reached
- maybe it can be reached but I can't login (missing SSH-key)
- maybe the hostkey changed or someone is trying to do man-in-the-middle
- the disk might be full
- I might not have write permissions
- maybe BZR is not installed on the remote server
- maybe someone removed the .bzr directory on the remote server
Or maybe some other reason I can't think of now.
The script should also inform me when something goes wrong.
I'll be running this script from cron at a regular interval, say 15 minutes. Of course I have no idea if the script can complete it's run in 15 minutes. 10 minutes at 100MBit/s is 60000 Mbit or about 7GB. I hope I never have to synch that much in 1 run ;) If this happens though, I'll want to get notified by the script right ?
So my script should self-terminate after 10 minutes, just in case it hangs for some reason. If it didn't self-terminate, the cronjobs would start piling up untill I kill them or, worse, my machine crumbles under the load.
To set a timer on the script, I'll just use the trap function and listen for a signal I start in the background. More info here...
[Intermezzo]
I haven't reinstalled my laptop yet and all my tests are on VMWare workstation or server. I decided it would be good idea to have a versioned backup of my files on my desktop too, so I created a BZR branch, and tried adding my entire personal directory (shared computer).
I got this message:
ignored 1 file(s).
If you wish to add some of these files, please add them by name.
I wasn't sure what happened, so I looked into it (obviously ;)
BZR has a file called ~/.bazaar/ignore, which contains a list of filename patterns that it will ignore.
My file (which I guess is the default) contains these lines:
guest@berta:~$ cat .bazaar/ignore
*.a
*.o
*.py[co]
*.so
*.sw[nop]
*~
.#*
[#]*#
guest@berta:~$
All these patterns match temporary files, so it's a good default :)
To check which file was being ignored, I reran the command, this time with the verbose flag:
guest@berta:~$ bzr add -v Steven/
ignored Steven/python/box.pyc matching "*.py[co]"
If you wish to add some of these files, please add them by name.
guest@berta:~$
So it appears I have a compiled python script that wasn't added to the branch. Fine by me :)
When I tried doing a commit, everything seemed to go fine untill my machine totally freaked. I was doing a recode of a TV-show I recorded earlier, which may have caused heavy disk activity. Still, my machine froze for about a minute, then killed the running bzr command.
Now I get this after I rerun "bzr commit":
guest@berta:~$ bzr commit
bzr: ERROR: Could not acquire lock LockDir(file:///home/guest/.bzr/repository/lock)
guest@berta:~$
The lock can be broken with "bzr break-lock".
The reason why my machine froze and killed bzr, is because I tried adding a 700MB iso to the branch, which is not a good idea. According to the developers, BZR can hold as much as 3 times the entire file in memory, which would be about 2.1GB.
I'll have to make sure that huge files are not added to the repository and that corrupt locks are reported...
[End of intermezzo]
I've added code to the script that makes sure it will terminate itself nicely when it receives a SIGALRM. Bash can't send itself that signal in a clean way, so we have to use an external program like timeout or doalarm, as suggested in the Bash FAQ.
Ubuntu has the "timeout" command in it's universe repository, so I'll use that one.
root@berta:~# apt-get install timeout
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following NEW packages will be installed:
timeout
0 upgraded, 1 newly installed, 0 to remove and 0 not upgraded.
Need to get 6388B of archives.
After unpacking 53.2kB of additional disk space will be used.
Get:1 http://be.archive.ubuntu.com edgy/universe timeout 1.11-6.2 [6388B]
Fetched 6388B in 0s (17.1kB/s)
Selecting previously deselected package timeout.
(Reading database ... 130989 files and directories currently installed.)
Unpacking timeout (from .../timeout_1.11-6.2_i386.deb) ...
Setting up timeout (1.11-6.2) ...
root@berta:~#
On receiving the SIGALRM signal, the script will print out what part of the script it was before exiting. This message will be vital to report an error to me.
I've also added a check to see whether or not the BZR repository was locked.
Making sure that noone is doing man-in-the-middle can be done by setting the StrictHostKeyChecking option to yes in the SSH-config. We can even do it for our remote-site only.
[Several coredumps later]
It appears that my harddisk is about to die (talk about luck...)
So in case things take a turn for the worst, I'll post the current version of the script here:
#!/bin/bash
dirlist=$HOME/.bzr-dirlist
remotesite=sftp://guest2@localhost/home/guest2/
# Don't run this script for too long.
# When running this script from cron, it's not a good idea
# to start a second instance of this script before the
# previous run is over
#
# because bash has no decent way to do an alarm() style signal,
# we'll have to count on an external program to do so
# (either doalarm or timeout as specified in http://wooledge.org/mywiki/BashFaq#faq68)
trap timeup 14
# this variable will store a string that describes what we're doing
# in case the script gets killed.
TimeupReason="Script didn't start yet."
timeup()
{
echo "Killed by SIGALRM while: \"$TimeupReason\""
exit 1
}
# Starting from the homedirectory
TimeupReason="Entering home directory [$HOME]"
cd $HOME
# check to see if there are locks
TimeupReason="Checking for a BZR lock."
if bzr info | grep -q locked;
then
echo "Can't autocommit: Lock exists."
exit 1
fi
# for every directory we want to keep in the repository...
TimeupReason="Reading directory listing [$dirlist]"
for dir in `cat $dirlist`;
do
# ... add the directory and all the (new) files in it
TimeupReason="Adding files in [$dir] to BZR."
bzr add $dir
done
# Now commit all this data to BZR with a logmessage that indicates where
# the new versions come from, and the date.
TimeupReason="Committing changes to BZR."
bzr commit --message "Auto-commit `whoami`@`hostname` on `date`"
TimeupReason="Push-and-Update to remote site [$remotesite]"
bzr push-and-update "$remotesite"
TimeupReason="Script is done."
exit 0
To be run with
timeout -14 600 ./autocommit
Cleanup script:
rm -rf ~/.bzr; cd ; bzr init; cd -
ssh guest2@localhost "rm -rf ~/.bzr; bzr init"