How operating systems determine the location of the "system volume" when bootstrapped

You've come to this page because you've asked a question similar to the following:
How does an operating system decide which volume is the "system" volume (a.k.a. "system partition") when it is bootstrapped?

This is the Frequently Given Answer to such questions.

When an operating system is bootstrapped it must decide which volume is to be the system volume. There are two basic ways in which operating systems do this:

Using a standard firmware-supplied configuration setting

This is the approach to locating the system volume that is employed by modern operating system designs. The operating system boot loader, that is invoked by the machine firmware, consults configuration information provided by the machine firmware that specifies the locations of the operating system kernel program image file and of the system volume. The operation of locating the system partition is thus a relatively simple one.

What precise form this configuration information takes depends from the machine firmware.

The configuration information is created and written by the operating system installation utility when the operating system is installed. Modern machine firmwares provide facilities to allow system administrators to edit the configuration information without needing to actually bootstrap an operating system in order to run a configuration editing utility.

Standard configuration settings supplied by Extensible Firmware Interface (EFI) firmwares

On machines with EFI firmwares, the location of the system volume is determined by the value of a machine firmware variable that is stored in non-volatile RAM. Each entry on the EFI Boot Manager menu is defined by the value of a single NVRAM variable, named BootXXXX where XXXX is a number. Each variable's value comprises a rich binary data structure (the EFI_LOAD_OPTION structure) that comprises the whole definition.

This binary data structure contains, amongst other things, an array of EFI Device Paths. (An EFI Device Path is EFI firmware's standard general-purpose mechanism for specifying hardware devices, disc volumes, and files.) The first device path specifies the location of the operating system boot loader program image file itself. This is the program image file that the EFI Boot Manager loads and invokes when the entry is selected from the menu. The second and subsequent paths are for use by the operating system boot loader. They specify the device path to the kernel program image file, the device path of the system volume, and so forth.

One thus specifies which volume is the system volume by editing the appropriate device path in the boot menu entry. The operating system boot loader takes the device path and passes it to the operating system kernel, translating it if necessary into whatever internal naming scheme the operating system kernel itself uses for specifying disc volumes. The operating system kernel in turn mounts the designated volume as the system volume.

Standard configuration settings supplied by Advanced RISC Computing (ARC) firmwares

ARC firmwares are very similar to EFI firmwares. Like EFI firmwares, they have a native general-purpose naming mechanism, ARC Paths, for specifying devices, disc volumes, and files. Like EFI firmwares, they have variables stored in non-volatile RAM that name the program image files of operating system boot loaders and that contain configuration information to pass to those boot loaders, such as the ARC Path of the system volume.

The only significant difference is in the implementation detail. Whilst EFI firmwares store everything in the value of a single variable, ARC firmwares store the ARC Paths of the operating system boot loader, the kernel program image file that it loads, and the system volume, in the values of three separate variables: OSLoader, OSLoadFilename, and OSLoadPartition, respectively. Thus the location of the system volume is the value of the ARC firmware's OSLoadPartition variable, held in NVRAM.

(Confusingly, the ARC firmware itself makes use of a disc volume, the ARC System Partition, and designates its location by the value of the SystemPartition variable. This is not the system volume.)

As with EFI firmwares, the operating system boot loader has to translate the ARC Device Path for the system volume into whatever naming scheme the operating system kernel itself uses for specifying disc volumes.

Windows NT needs no translation at all. The Windows NT kernel directly understands ARC Paths, and Windows NT uses ARC Paths as its mechanism for communicating between boot loader and kernel. NTLDR, the Windows NT operating system boot loader, simply passes the ARC Path taken from OSLoadPartition directly to the kernel, which in turn decodes the ARC Path to determine the actual volume that is the system volume.

Standard configuration settings supplied by IBM PC compatible firmwares

On machines with IBM PC compatible firmwares, there is no service available from the firmware for specifying the location of the system volume to the operating system boot loader. Operating systems thus simply pretend that they are using more capable firmwares, employing shim layers that sit between the operating system boot loader proper and the IBM PC compatible firmware.

This is the approach taken by Windows NT.

In Windows NT up to version 5.2, the boot loader and the kernel behave as if they are running on ARC firmware, and a shim layer is added to NTLDR to emulate the services provided by and the behaviour of such firmware. In particular, NTLDR contains a shim that presents a menu of boot options to the user (which the firmware does itself on real ARC systems) reading boot configuration data from the /boot.ini file on the boot volume (rather than from NVRAM as on real ARC systems). NTLDR also contains a shim that switches the machine into protected mode before running the boot loader proper, and that provides disc volume and console I/O services in protected mode — which are provided by the firmwares themselves on ARC firmware and EFI firmware systems.

The /boot.ini file on the boot volume provides just persistent storage for the boot configuration data that are held in non-volatile RAM on ARC firmwares, rather than a full general-purpose NVRAM variable data storage service. Each line in /boot.ini comprises the ARC Path of the system volume, the ARC Path of the kernel image directory, and the kernel command line options — exactly what would be stored in the OSLoadPartition, OSLoadFilename, and OSLoadOptions variables on an ARC firmware system.

The problems with this approach are akin to the problems with the roll-one's-own explicit configuration mechanism described next.

Rolling one's own explicit configuration mechanisms

Some operating systems make no attempt whatever to use the facilities that the machine firmware provides, but instead roll their own non-standard configuration mechanisms for explicitly specifying the location of the system volume, and use that mechanism on all systems.

Linux kernels, for example, make no effort to use the device path information supplied by ARC firmwares and EFI firmwares. Instead, Linux kernels built without the CONFIG_EDD option have the location of the system volume hardwired into the in-memory image of the operating system kernel itself, as a pair of major+minor device numbers.

This pair of values in the kernel image is set manually by the system administrator in one of two ways:

The problems of this approach are severalfold:

Using the boot volume for the system volume

Some operating systems simply do not separate the concepts of "system volume" and "boot volume". They obtain the location of the boot volume and they use that as the location of the system volume.

This approach is rarely used on EFI firmware or ARC firmware systems, simply because those systems provide well-defined and simple mechanisms for specifying the location of the system volume, making it daft to do anything but use those mechanisms.

This approach is commonly used on IBM PC compatible firmware, however. IBM PC compatible firmwares provide no services to operating system boot loaders for locating the system volume, but they do provide services for locating the boot volume.

When the firmware (or the Master Boot Record) on a machine with IBM PC compatible firmware invokes the first stage of an operating system's boot loader, the Volume Boot Record of the operating system's boot volume, it makes three pieces of information available to the boot loader that the boot loader can use to locate the boot volume:

The boot loader passes these data to the operating system kernel, which in turn uses them to locate the system volume.

Operating systems such as PC-/MS-/DR-DOS use the IBM PC firmware for their I/O to and from the actual disc units. It is therefore a simple matter for them to determine which volume is the system volume given the aforementioned data.

The major problem with this mechanism for determining the system volume is that it requires that the operating system's idea of disc unit numbers exactly matches the machine firmware's idea of disc unit numbers. Thus either the operating system must use the machine firmware for all disc I/O, or the operating system's own disc device drivers must exactly mirror the machine firmware's disc device drivers. In particular:

This scheme does not work at all for operating systems that simply do not employ the same disc unit numbering system as the machine's firmware does, but have a very different native disc device naming scheme. This is the case for Linux, for example. There is no straightforward mapping between IBM PC firmware disc unit numbers and the major+minor device number pairs that Linux uses to identify disc volumes.

In order to determine which volume known to the operating system kernel corresponds to the disc unit number supplied by the machine firmware, such operating systems employ a bodge: The operating system boot loader reads data from the boot volume, via the machine firmware, and then the operating system kernel reads data from each volume in turn, via its own device drivers, until it finds a volume with matching data. This volume is then designated the system volume.

This is the mechanism that is employed by Linux kernels built with the CONFIG_EDD option.

It is a problematic mechanism for several reasons:


© Copyright 2006–2006 Jonathan de Boyne Pollard. "Moral" rights asserted.
Permission is hereby granted to copy and to distribute this web page in its original, unmodified form as long as its last modification datestamp is preserved.