automated installations (11)

This is getting to be one long article.
I'm thinking of making it a static page once I've researched everything and finished the project.
(I've actually created a special category for the articles concerning the automated installations project)

Yesterday I stumbled across the ask_user function. It contains the first reference to plugins I've found so far. For each plugin, the ask_user function executes the "choices" script and uses the output to construct a list of choices for the debconf_select function:


debconf_select () {
local IFS priority template choices default_choice default x u newchoices code
priority="$1"
template="$2"
choices="$3"
default_choice="$4"
default=''
# Debconf ignores spaces so we have to remove them from $choices
newchoices=''
IFS="$NL"
for x in $choices; do
local key option
restore_ifs
key=$(echo ${x%$TAB*})
# work around bug #243373
if [ "$TERM" = xterm -o "$TERM" = bterm ]; then
debconf_select_lead="$NBSP"
else
debconf_select_lead="> "
fi
option=$(echo "${x#*$TAB}" | sed 's/ *$//g' | sed "s/^ /$debconf_select_lead/g")
newchoices="${newchoices}${NL}${key}${TAB}${option}"
if [ "$key" = "$default_choice" ]; then
default="$option"
fi
done
choices="$newchoices"
u=''
IFS="$NL"
# escape the commas and leading whitespace but keep them unescaped
# in $choices
for x in $choices; do
u="$u, `echo ${x#*$TAB} | sed 's/,/\\\\,/g' | sed 's/^ /\\\\ /'`"
done
u=${u#, }
if [ -n "$default" ]; then
db_set $template "$default"
fi
db_subst $template CHOICES "$u"
code=0
db_input $priority $template || code=1
db_go || return 255
db_get $template
IFS="$NL"
for x in $choices; do
if [ "$RET" = "${x#*$TAB}" ]; then
RET="${x%$TAB*}"
break
fi
done
return $code
}


The debconf_select function presents a menu to the user using the given template. In this specific case, it's the template "partman/choose_partition":


Template: partman/choose_partition
Type: select
Choices: ${CHOICES}
Description: This is an overview of your currently configured partitions and mount points.
Select a partition to modify its settings (file system, mount point, etc.), a free space
to create partitions, or a device to initialise its partition table.
...


This looks like the following screen:
[img_assist|fid=1017722|thumb=1|alt=6182_large.png]


The first line there "Configure software RAID" is inserted into choose_partition.d by the partman-md udeb.

After the user makes a selection, debconf_select() returns to ask_user. This last one stores the selection in $dir/default_choice so it can remember it later.
Then, the script "do_option" is called.

do_option starts by calling confirm_changes() which seems to do nothing but ask the user to confirm that changed partitions will be committed. It then commits those changes and restarts parted_server. [Actually, thats what the comment says. In fact, the script sends a QUIT to the parted_server and then deletes the pidfile. I don't see it restarted here...]

Next, mdcfg is started. This tool resides in the mdcfg-utils udeb.
When mdcfg is finished, all the scripts in /lib/partman/init.d are called again.

Let's have a look at this mdcfg (which is luckily a bash script)
Seems like I'm finally getting to the good stuff.

The mdcfg script loads the MD and RAID modules, detects and starts MD devices with mdrun, installs the mdadm tool in /target and then calls the md_mainmenu() function.


### Main of script ###

# Try to load the necesarry modules.
# Supported schemes: RAID 0, RAID 1, RAID 5
depmod -a 1>/dev/null 2>&1
modprobe md 1>/dev/null 2>&1
modprobe raid0 >/dev/null 2>&1
modprobe raid1 1>/dev/null 2>&1
modprobe raid5 >/dev/null 2>&1

# Try to detect MD devices, and start them
/sbin/mdrun

# Make sure that we have md-support
if [ ! -e /proc/mdstat ]; then
db_set mdcfg/nomd "false"
db_input high mdcfg/nomd
db_go
db_stop
exit 0
fi

# Force mdadm to be installed on the target system
apt-install mdadm

# We want the "go back" button
#db_capb backup

md_mainmenu

#db_stop
exit 0


md_mainmenu() calls md_createmain() which calls md_create_raid1(). That last one asks the user a bunch of questions about hwo to configure the RAID1 and then calls mdadm to create the RAID1.
It's important to keep in mind that the partitions already need to exist before a RAID1 can be laid out over them.

The more I read all this code, the more I feel like trimming it down to the bear essentials.
I don't need all this user-input to do an automated install. Basically, I can have 2 partitioning schemes: either a simple swap + /boot + / on 1 disk, or 3 RAID 1 devices on 2 disks.
The sizes are pretty much fixed: 64MB for /boot, 2*RAM for swap and the rest for the root disk.

To do all of this "My Way" (TM), I need to pre-empt the whole partman business. Which means my script will have to expect the same input (or less specific input), and produce the same output (or more specific output). Reminds me of Liskov :)

I'm gonna finish this journey first though. Understanding partman is essential to rewriting it.
We're pretty deep inside the rabbithole at the moment. The mdcfg tool was called by the do_option script of the partman-md plugin. That last one was invoked by ask_user in partman.

The next step is calling all scripts in commit.d, followed by those in finish.d.

At first glance, the scripts in commit.d don't do that much:

10filesystems_changed

Removes /var/lib/partman/filesystems_detected

20remove_backup

Removes /var/lib/partman/backup

30parted

Disables swap and sends a "COMMIT" to parted_server

32update-dev

This calls the program "update-dev" if it exists. The program seems to be part of udevfs (I can't find the package at all, and it doesn't look like it was installed in the debian-installer I'm using)



There is only 1 script in finish.d, which kills the parted_server.

All other scripts are inserted by other udeb packages.

These scripts are added by several udeb packages in /lib/partman/init.d:

01unsupported

03kernelmodules_basicfilesystems

03kernelmodules_ext3

03kernelmodules_jfs

03kernelmodules_reiserfs

03kernelmodules_xfs

All the 03kernelmodules_* scripts seem to check if modules are loaded and load them if required. They also touch a status file in /var/lib/partman, no doubt to indicate the presence of the module.

10umount_target

30parted

31md-devices

I'm guessing this script looks at existing MD devices and dumps them into /var/lib/partman (the status directory)

35dump

50lvm

The comment at the start of this file says: "This script sets method lvm for all partitions that have the lvm flag set. It also discovers the logical volumes and creates in them a loop partition table and partition.

51md

This scripts goes over all MD devices and marks them as being RAID

69no_media

70update_partitions

71filesystems_detected

80autouse_swap

Detects and prepares swap partitions

95backup

99initial_auto

Initialisation for partman-auto



It doesn't look like partman-auto can autopartition multiple devices :(
So I think I'll have to provide my own way of partitioning the disks.
Looking at the control file for partman, I find:

Provides: made-filesystems, mounted-partitions, partitioned-harddrives, created-fstab


My guess is that, if I replace partman, I need to partition the harddrives, make the filesystems, mount them and create an /etc/fstab.

After a welldeserved break, I decided a practical test was in order. I modified partman so it would do nothing at all (That is: it would check for a file /done every second untill it found one, and then exit 0)
Then, I logged into the second terminal, created a partition table with sfdisk, created a filesystem and mounted it under /target. Then I touched /done and the installation continued.
To my surprise, the base system was installed and the machine rebooted. However, it couldn't find init... I screwed something up :)

/target/etc/fstab contains:

# UNCONFIGURED FSTAB FOR BASE SYSTEM


Looking at the logs, the installer complained that /etc/fstab and /etc/mtab could not be found, /proc was not mounted and it couldn't find mdadm.
All valid errors I suppose.

Let's see how the real install CD creates these files...

/etc/fstab contains:

# /etc/fstab: static file system information
#
#
proc /proc proc defaults 0 0
/dev/sda1 / ext3 defaults,errors=remount-ro 0 1
/dev/sda5 none swap sw 0 0
/dev/hdc /media/cdrom0 iso9660 ro,user,noauto 0 0
/dev/fd0 /media/floppy0 auto rw,user,noauto 0 0
# UNCONFIGURED FSTAB FOR BASE SYSTEM


/etc/mtab contains:

/dev/sda1 / ext3 rw,errors=remount-ro 0 0
proc /proc proc rw 0 0


The Debian install CD also complains about not finding mdadm.

I manually created an /etc/fstab right before touching /done (VMWare snapshots are great) and gets a little bit further this time. The problem now is that the disk is mounted readonly.

Removing the "errors=remount-ro" from /etc/fstab fixed the problem, but its no solution.
I need to figure out why my manually created partition causes an error.

To find out what is so different, I tracked down the reboot command so I can delay it while I look for the answers. The script /usr/lib/prebaseconfig.d/99reboot is responsible for that. Adding a sleep or so in it, should delay it a bit.

In /etc/rcS.d, these scripts are executed (among others):
mountall.sh,discover,mountvirtfs

Mountvirtfs is done first, then mountall.sh, discover and mountvirtfs again.
Right before discover, the first error appears meaning the problem is somewhere in mountall.sh
Apparently, the errors happen when the filesystem is being cleaned up (/tmp and /var/tmp and such)