automated installations (11)

This is getting to be one long article.
I'm thinking of making it a static page once I've researched everything and finished the project.
(I've actually created a special category for the articles concerning the automated installations project)

Yesterday I stumbled across the ask_user function. It contains the first reference to plugins I've found so far. For each plugin, the ask_user function executes the "choices" script and uses the output to construct a list of choices for the debconf_select function:

debconf_select () {
local IFS priority template choices default_choice default x u newchoices code
# Debconf ignores spaces so we have to remove them from $choices
for x in $choices; do
local key option
key=$(echo ${x%$TAB*})
# work around bug #243373
if [ "$TERM" = xterm -o "$TERM" = bterm ]; then
debconf_select_lead="> "
option=$(echo "${x#*$TAB}" | sed 's/ *$//g' | sed "s/^ /$debconf_select_lead/g")
if [ "$key" = "$default_choice" ]; then
# escape the commas and leading whitespace but keep them unescaped
# in $choices
for x in $choices; do
u="$u, `echo ${x#*$TAB} | sed 's/,/\\\\,/g' | sed 's/^ /\\\\ /'`"
u=${u#, }
if [ -n "$default" ]; then
db_set $template "$default"
db_subst $template CHOICES "$u"
db_input $priority $template || code=1
db_go || return 255
db_get $template
for x in $choices; do
if [ "$RET" = "${x#*$TAB}" ]; then
return $code

The debconf_select function presents a menu to the user using the given template. In this specific case, it's the template "partman/choose_partition":

Template: partman/choose_partition
Type: select
Choices: ${CHOICES}
Description: This is an overview of your currently configured partitions and mount points.
Select a partition to modify its settings (file system, mount point, etc.), a free space
to create partitions, or a device to initialise its partition table.

This looks like the following screen:

The first line there "Configure software RAID" is inserted into choose_partition.d by the partman-md udeb.

After the user makes a selection, debconf_select() returns to ask_user. This last one stores the selection in $dir/default_choice so it can remember it later.
Then, the script "do_option" is called.

do_option starts by calling confirm_changes() which seems to do nothing but ask the user to confirm that changed partitions will be committed. It then commits those changes and restarts parted_server. [Actually, thats what the comment says. In fact, the script sends a QUIT to the parted_server and then deletes the pidfile. I don't see it restarted here...]

Next, mdcfg is started. This tool resides in the mdcfg-utils udeb.
When mdcfg is finished, all the scripts in /lib/partman/init.d are called again.

Let's have a look at this mdcfg (which is luckily a bash script)
Seems like I'm finally getting to the good stuff.

The mdcfg script loads the MD and RAID modules, detects and starts MD devices with mdrun, installs the mdadm tool in /target and then calls the md_mainmenu() function.

### Main of script ###

# Try to load the necesarry modules.
# Supported schemes: RAID 0, RAID 1, RAID 5
depmod -a 1>/dev/null 2>&1
modprobe md 1>/dev/null 2>&1
modprobe raid0 >/dev/null 2>&1
modprobe raid1 1>/dev/null 2>&1
modprobe raid5 >/dev/null 2>&1

# Try to detect MD devices, and start them

# Make sure that we have md-support
if [ ! -e /proc/mdstat ]; then
db_set mdcfg/nomd "false"
db_input high mdcfg/nomd
exit 0

# Force mdadm to be installed on the target system
apt-install mdadm

# We want the "go back" button
#db_capb backup


exit 0

md_mainmenu() calls md_createmain() which calls md_create_raid1(). That last one asks the user a bunch of questions about hwo to configure the RAID1 and then calls mdadm to create the RAID1.
It's important to keep in mind that the partitions already need to exist before a RAID1 can be laid out over them.

The more I read all this code, the more I feel like trimming it down to the bear essentials.
I don't need all this user-input to do an automated install. Basically, I can have 2 partitioning schemes: either a simple swap + /boot + / on 1 disk, or 3 RAID 1 devices on 2 disks.
The sizes are pretty much fixed: 64MB for /boot, 2*RAM for swap and the rest for the root disk.

To do all of this "My Way" (TM), I need to pre-empt the whole partman business. Which means my script will have to expect the same input (or less specific input), and produce the same output (or more specific output). Reminds me of Liskov :)

I'm gonna finish this journey first though. Understanding partman is essential to rewriting it.
We're pretty deep inside the rabbithole at the moment. The mdcfg tool was called by the do_option script of the partman-md plugin. That last one was invoked by ask_user in partman.

The next step is calling all scripts in commit.d, followed by those in finish.d.

At first glance, the scripts in commit.d don't do that much:


Removes /var/lib/partman/filesystems_detected


Removes /var/lib/partman/backup


Disables swap and sends a "COMMIT" to parted_server


This calls the program "update-dev" if it exists. The program seems to be part of udevfs (I can't find the package at all, and it doesn't look like it was installed in the debian-installer I'm using)

There is only 1 script in finish.d, which kills the parted_server.

All other scripts are inserted by other udeb packages.

These scripts are added by several udeb packages in /lib/partman/init.d:







All the 03kernelmodules_* scripts seem to check if modules are loaded and load them if required. They also touch a status file in /var/lib/partman, no doubt to indicate the presence of the module.




I'm guessing this script looks at existing MD devices and dumps them into /var/lib/partman (the status directory)



The comment at the start of this file says: "This script sets method lvm for all partitions that have the lvm flag set. It also discovers the logical volumes and creates in them a loop partition table and partition.


This scripts goes over all MD devices and marks them as being RAID





Detects and prepares swap partitions



Initialisation for partman-auto

It doesn't look like partman-auto can autopartition multiple devices :(
So I think I'll have to provide my own way of partitioning the disks.
Looking at the control file for partman, I find:

Provides: made-filesystems, mounted-partitions, partitioned-harddrives, created-fstab

My guess is that, if I replace partman, I need to partition the harddrives, make the filesystems, mount them and create an /etc/fstab.

After a welldeserved break, I decided a practical test was in order. I modified partman so it would do nothing at all (That is: it would check for a file /done every second untill it found one, and then exit 0)
Then, I logged into the second terminal, created a partition table with sfdisk, created a filesystem and mounted it under /target. Then I touched /done and the installation continued.
To my surprise, the base system was installed and the machine rebooted. However, it couldn't find init... I screwed something up :)

/target/etc/fstab contains:


Looking at the logs, the installer complained that /etc/fstab and /etc/mtab could not be found, /proc was not mounted and it couldn't find mdadm.
All valid errors I suppose.

Let's see how the real install CD creates these files...

/etc/fstab contains:

# /etc/fstab: static file system information
proc /proc proc defaults 0 0
/dev/sda1 / ext3 defaults,errors=remount-ro 0 1
/dev/sda5 none swap sw 0 0
/dev/hdc /media/cdrom0 iso9660 ro,user,noauto 0 0
/dev/fd0 /media/floppy0 auto rw,user,noauto 0 0

/etc/mtab contains:

/dev/sda1 / ext3 rw,errors=remount-ro 0 0
proc /proc proc rw 0 0

The Debian install CD also complains about not finding mdadm.

I manually created an /etc/fstab right before touching /done (VMWare snapshots are great) and gets a little bit further this time. The problem now is that the disk is mounted readonly.

Removing the "errors=remount-ro" from /etc/fstab fixed the problem, but its no solution.
I need to figure out why my manually created partition causes an error.

To find out what is so different, I tracked down the reboot command so I can delay it while I look for the answers. The script /usr/lib/prebaseconfig.d/99reboot is responsible for that. Adding a sleep or so in it, should delay it a bit.

In /etc/rcS.d, these scripts are executed (among others):,discover,mountvirtfs

Mountvirtfs is done first, then, discover and mountvirtfs again.
Right before discover, the first error appears meaning the problem is somewhere in
Apparently, the errors happen when the filesystem is being cleaned up (/tmp and /var/tmp and such)