How to create a toy OS for x86 processors in C: Part 1. Understanding the booting process and implement bootloaders

This is the first stage of building a simple toy OS using C and Assembly. We are starting with understanding the booting process and then building a simple two stage bootloader.

Introduction

BIOS, booting, UEFI, MBR, interrupts, Kernel etc. are some of the terms you might hear when you try to understand how a computer works. In this series, we will meet some of them and understand what they are and how important they are in the computer systems. We will also understand:

  • Why memory is important in computers.
  • What it means when we say load something to memory.
  • How the screen displays a character, e.g. A, when we press character A on the keyboard, yet they are not physically connected to each other.

The final product for this series is a simple toy operating system that has a shell that we can interact with. The series is broken into 3 parts:

  • Understanding the boot process and implementing bootloaders.
  • Entering the protected mode and implementing GDT.
  • Developing the Kernel, setting up interrupts.

Prerequisites

  • Working knowledge of C and assembly will help in understanding the code quickly.
  • Desire to understand the basics of computer operating systems.
  • An emulator, e.g. Qemu.

Getting Started

Booting process

Booting is simply the process of starting a computer.

After the user has pressed the power button, power is directed to the motherboard and hardware components such as the processor and the RAM. So how does the computer know what to do next, given that the operating system is not yet loaded? This is where the BIOS, Basic Input Output System, comes in. The BIOS is the startup software that is already stored in the computer system on non-volatile memory. This means that the BIOS is persistent even after the computer is shut off.

The first task of the BIOS is to perform hardware checks and initialization, a process commonly referred to as POST, Power On Self Test. During the POST process, the BIOS performs the following:

  • Sets up video display to VGA connection.
  • Initializes memory.
  • Scans for attached hardware and identifies attached devices such as hard disks.
  • Catch and report on errors on any of the hardware devices.

After the POST process, we choose the device to boot from. From the above list, we see that the BIOS identifies the devices attached. Furthermore, it also maps the access to the devices. If you press F12 or the appropriate key on the monitor screen, you can choose the device to boot from. The BIOS go through the devices attached until it finds a bootable device, then it boots from it. Users can alter the order of the devices to boot from via the BIOS settings. The BIOS settings on computers have a User Interface to change various settings on the computer such as the booting devices, system time and date etc.

How does the BIOS identify a bootable device? This is where we introduce a new term MBR, Master Boot Record. The smallest unit on a hard disk, which is the common bootable device, is called a sector. On the hard disk, it is 512 bytes in size. As the BIOS goes through the list of devices to boot from, it checks the first sector, on each attached device, for some information on the device and executable code. This first sector is identified as the MBR. The device is considered bootable if the MBR ends with the last two bytes as 0x55 and 0xaa. The 512 bytes of the generic MBR is divided into the following sections:

  • Byte 0 to 445 – bootloader code. (446 bytes)
  • Byte 446 to 461 – first partition information. (16 bytes)
  • Byte 462 to 477 – second partition information. (16 bytes)
  • Byte 478 to 493 – third partition information. (16 bytes)
  • Byte 494 to 509 – fourth partition information. (16 bytes)
  • Byte 510 – 0x55 – 1st byte of magic signature
  • Byte 511 – 0xaa – 2nd byte of magic signature.

This layout has changed on modern operating systems. Furthermore, depending on the machine endianness the order of the last two bytes of the MBR is different. On a small endian machine, the order is 0xaa 0x55 while on an endian beginning machine, the order is 0x55 0xaa. MBR is found on all computers.

Sidenote: What’s new in computers

Both BIOS and MBR are old technologies, and thus, they are being replaced by better and newer technologies. The BIOS is being replaced with UEFI (Unified Extensible Firmware Interface), and the MBR partitioning system is being replaced by GPT ( GUID Partition Table). GPT still maintains the space for the MBR sector for limited backward compatibility.

Out of curiosity, I decided to check which partition system my operating system uses. I am on Linux Mint, so after some google-fu, I found this command to help me see my system details.

sudo fdisk -l

The output was:

Disk /dev/sda: 931.53 GiB, 1000204886016 bytes, 1953525168 sectors
Disk model: TOSHIBA
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt

Device       Start        End    Sectors  Size Type
/dev/sda1     2048    1050623    1048576  512M EFI System
/dev/sda2  1050624 1953523711 1952473088  931G Linux filesystem

From the Disklabel type field, you see it says gpt. To be sure that GPT still reserves space for the MBR sector, I read the first 512 bytes of my disk, i.e. /dev/sda1 and checked the last 2 bytes to see if they were the magic signature bytes. I used the dd command

sudo dd if=/dev/sda1 of=test.bin bs=512 count=1

Using any hex editor, you can view the bytes stored in test.bin file. I got this using hexdump :

hexdump test.bin -v
0000000 58eb 6d90 666b 2e73 6166 0074 0802 0020
0000010 0002 0000 f800 0000 003f 00ff 0800 0000
0000020 0000 0010 0400 0000 0000 0000 0002 0000
0000030 0001 0006 0000 0000 0000 0000 0000 0000
0000040 0180 6829 f8d8 4ef7 204f 414e 454d 2020
0000050 2020 4146 3354 2032 2020 1f0e 77be ac7c
0000060 c022 0b74 b456 bb0e 0007 10cd eb5e 32f0
0000070 cde4 cd16 eb19 54fe 6968 2073 7369 6e20
0000080 746f 6120 6220 6f6f 6174 6c62 2065 6964
0000090 6b73 202e 5020 656c 7361 2065 6e69 6573
00000a0 7472 6120 6220 6f6f 6174 6c62 2065 6c66
00000b0 706f 7970 6120 646e 0a0d 7270 7365 2073
00000c0 6e61 2079 656b 2079 6f74 7420 7972 6120
00000d0 6167 6e69 2e20 2e2e 0d20 000a 0000 0000
00000e0 0000 0000 0000 0000 0000 0000 0000 0000
00000f0 0000 0000 0000 0000 0000 0000 0000 0000
0000100 0000 0000 0000 0000 0000 0000 0000 0000
0000110 0000 0000 0000 0000 0000 0000 0000 0000
0000120 0000 0000 0000 0000 0000 0000 0000 0000
0000130 0000 0000 0000 0000 0000 0000 0000 0000
0000140 0000 0000 0000 0000 0000 0000 0000 0000
0000150 0000 0000 0000 0000 0000 0000 0000 0000
0000160 0000 0000 0000 0000 0000 0000 0000 0000
0000170 0000 0000 0000 0000 0000 0000 0000 0000
0000180 0000 0000 0000 0000 0000 0000 0000 0000
0000190 0000 0000 0000 0000 0000 0000 0000 0000
00001a0 0000 0000 0000 0000 0000 0000 0000 0000
00001b0 0000 0000 0000 0000 0000 0000 0000 0000
00001c0 0000 0000 0000 0000 0000 0000 0000 0000
00001d0 0000 0000 0000 0000 0000 0000 0000 0000
00001e0 0000 0000 0000 0000 0000 0000 0000 0000
00001f0 0000 0000 0000 0000 0000 0000 0000 aa55
0000200

As you can see, the last bytes are the magic signature bytes, and it seems my computer is small endian.

The bootloader

The first 446 bytes of the MBR sector contain the bootloader code. By our current normal standards of code, 446 bytes is such a small space, thus to write such precise code, we will use the help of the assembly language. However,

Why is the bootloader important?

For any program to run on a computer, it must be loaded to memory, the operating system is no exception. The question is what loads the operating system to memory? After the BIOS gets the bootable device, it loads the code found in the MBR to memory, and the CPU starts executing instructions. However, the code on the MBR is limited to 446 bytes in size, and we are sure that the operating system can not fit 446 bytes in size since features like graphics take a huge amount of code size. This is where the bootloader comes in. The code stored in the MBR is used to load the operating system stored in other bootable device sectors without size constraints. So, the bootloader is a piece of code/firmware that loads the operating system to memory after startup.

Most bootloaders come in 2 stages. The first stage is the bootloader found in the 446 bytes of the MBR, and the next one is found in another sector. The reason for having 2 stage bootloaders is the limiting size for the first bootloader. The second bootloader is not limited in size, and thus, it can have more initialization code before loading the operating system. An excellent example of a 2 stage bootloader is grub. However, you can have a single stage bootloader, it all depends on your functionality.

Real mode

The computer starts operation in real mode, in 16 bit. This is because the first x86 processor, 8086, was 16 bit and had a 1 MB size of memory; thus, all subsequent x86 processors start in 16 bit to ensure backward compatibility.

However, after the computer boots from real mode, it shifts to protected mode, which we will see later.

First bootloader

For our operating system, in the first bootloader, we will do two things:

  • Display a message: “Setting up OS”
  • Load the second bootloader to memory.
[org 0x7c00] ; address in memory where BIOS loads the MBR
mov ah,0x00
int 0x13 ; reset the disk system and recalibrate the drive heads
jc disk_error 
mov [boot_drive],dl ;save driver number identified by bios
mov ax,0x7e0
mov ss,ax ; set the stack segment just after MBR
mov bp,0x8000
mov sp,bp ; set the stack to start at 0x8000
mov si,welcome
call print_msg ; print welcome message
mov bx,boot_two_addr
mov dh,0x01
xor ax,ax
mov dl,[boot_drive]
push bx
mov bx,0x02 ;minor delay
call init_wait
pop bx
call load_second_bootloader; loads the second bootloader code to memory
jmp boot_two_addr
%include "routines.asm"
boot_drive: db 0
boot_two_addr equ 0x9000 ; Address for the 2nd bootloader in memory
welcome db "Setting up OS"
times 510-($-$$) db 0
dw 0xaa55 ; make the last two bytes the MBR magic signature

The BIOS always loads the MBR to the location 0x7c00 in memory and transfers control to the code in that location; thus, the first line tells the program to take its initial memory as 0x7c00. It is important to setup the stack for the real mode; lines 5-8 does just that. To better understand why to set up the stack, we had to set the SS(Stack segment) and the SP(stack pointer); please read about segmentation in 8086.

The last two lines make sure that the last 2 bytes are the magic signature bytes used to identify the MBR.

We are using NASM assembler language.

The routines called are defined below i.e. routines.asm:

set_screen_size:
    mov ah,0x00
    mov al, 0x03
    int 0x10
    ret
clear_screen:
    mov ah,0x07
    mov al,0x00
    mov bh,0xe0 
    mov dh,30
    mov dl,79
    int 0x10
    ret
set_cursor_center:
    mov ah,0x02
    mov bh,0x00
    mov dh,0
    mov dl, 0
    int 0x10
    ret
print_msg:
    lodsb
    or al,al
    jz done
    mov ah,0x0e
    int 0x10
    jmp print_msg
done:
    ret
init_wait:
    pusha
loop_time:
    mov cx,0x0f
    mov dx,0x4240
    mov ah,0x86
    int 0x15
    dec bx
    cmp bx,0x00
    jnz loop_time
    popa
    ret
load_second_bootloader:
   push dx ;
   mov ah, 0x02
   mov al, dh ;number of sector to read
   mov ch, 0x00;cylinder no.
   mov dh, 0x00;head number
   mov cl, 0x02;sector no - sector no.1 has MBR
   int 0x13
   jc disk_error
   pop dx
   cmp dh,al
   jne disk_error_header
   ret
print_hex:
    mov bx,dx
    and bx,0x000f
    cmp bl,0x09
    jg print_letter
    add bx,0x30
    mov ah,0x0e
    mov al,bl
    int 0x10
    shr dx,4
    rol dx,8
    and dx,dx
    jnz print_hex
    ret
print_letter:
    sub bl,0x09
    dec bl
    add bl,0x61
    mov ah,0x0e
    mov al,bl
    int 0x10
    shr dx,4
    rol dx,8
    and dx,dx
    jnz print_hex
    ret

disk_error:
    mov si,dsk_err_msg
    call print_msg
    jmp $
disk_error_header:
    mov si,dsk_err_head_msg
    call print_msg
    jmp $


dsk_err_msg db "[DISK] Loading error",0
dsk_err_head_msg db "Header loading error",0
buf dw 0x0031
sec dd 0x000f4240

Of all the routines, the load_second_bootloader is an important one. It is used to load the second bootloader code into memory. This is done by specifying where on the hard disk the code is located, i.e. the cylinder number, header number, sector number and the number of sectors to load to memory. For our case, since the MBR occupies one sector and is found on the first cylinder (index 0), header (index 0) and sector(index 1), thus the second bootloader will be on the same cylinder and head, but the second sector. If one of these is wrongly specified, you will get errors since the wrong code or no code will be loaded to memory.

Please note, this routine can be used to load any other code to memory, provided the cylinder number, header number and sector are correctly specified. Furthermore, more times than not, whenever you have issues in this series, e.g. the code is not loaded correctly, check this routine for loading code to memory; this will help you save many hours of debugging 😌. In fact trying playing with it e.g. changing the sector number and see the outcome e.g. change sector to 1 i.e. point to the code for bootloader one.

The routines have a lot of interrupts used, lines beginning with int 0x, to get more information on interrupts and how to use them, visit Ralf’s Brown list.

Second bootloader

The second bootloader should:

  • Display the message: “Welcome to the OS”
  • loop continuously doing nothing.
[org 0x9000]
Boot2:
    call set_screen_size ;just set screen size
    call clear_screen ; clear the screen
    mov si, os_msg
    call print_msg
    jmp $
%include "routines.asm"
os_msg db 0x0a,0x0d,"Welcome to the OS",0
times ((0x200) - ($ - $$)) db 0x00

Please note the first line sets the address to the same one we set in the first bootloader. Line 7 just makes the program loop at the same point continuously.

Output

To get the resultant image, we compile both bootloaders’ code using nasm:

nasm -f bin boot.asm -o boot.com
nasm -f bin boot2.asm -o boot2.com

We then combine the .com files into one .img file in the correct order i.e. first bootloader then the second bootloader. Run this if the commands above do not have errors.

dd if=boot.com of=boot.img bs=512 count=1
dd if=boot2.com of=boot.img bs=512 seek=1

The output is boot.img. Thus we use this file with qemu:

qemu-system-x86_64 -drive format=raw,file=boot.img

Just an addition, to check the code stored in the .com files and the final .img file and how they are arranged you can use a tool like hexcurse. This will help you understand how the bootloader codes are accessed on the hard disk.

voilà, the bootloaders at work:

output of first bootloader
output of the second bootloader

Concluding part one

In this first part of the series we have gone through the booting process, built a two stage bootloader and saw some assembly code(it was not that bad, I hope) for the OS we are building.

In the second part of the series we will learn some new terminologies and apply them, this includes protected mode and Global Descriptor Tables. So stay tuned for the second part.

The code so far is found in this repo.

0 Shares:
You May Also Like