BIOS, booting, UEFI, MBR, interrupts, Kernel etc. are some of the terms you might hear when you try to understand how a computer works. In this series, we will meet some of them and understand what they are and how important they are in the computer systems. We will also understand:
- Why memory is important in computers.
- What it means when we say load something to memory.
- How the screen displays a character, e.g. A, when we press character A on the keyboard, yet they are not physically connected to each other.
The final product for this series is a simple toy operating system that has a shell that we can interact with. The series is broken into 3 parts:
- Understanding the boot process and implementing bootloaders.
- Entering the protected mode and implementing GDT.
- Developing the Kernel, setting up interrupts.
- Working knowledge of C and assembly will help in understanding the code quickly.
- Desire to understand the basics of computer operating systems.
- An emulator, e.g. Qemu.
Booting is simply the process of starting a computer.
After the user has pressed the power button, power is directed to the motherboard and hardware components such as the processor and the RAM. So how does the computer know what to do next, given that the operating system is not yet loaded? This is where the BIOS, Basic Input Output System, comes in. The BIOS is the startup software that is already stored in the computer system on non-volatile memory. This means that the BIOS is persistent even after the computer is shut off.
The first task of the BIOS is to perform hardware checks and initialization, a process commonly referred to as POST, Power On Self Test. During the POST process, the BIOS performs the following:
- Sets up video display to VGA connection.
- Initializes memory.
- Scans for attached hardware and identifies attached devices such as hard disks.
- Catch and report on errors on any of the hardware devices.
After the POST process, we choose the device to boot from. From the above list, we see that the BIOS identifies the devices attached. Furthermore, it also maps the access to the devices. If you press F12 or the appropriate key on the monitor screen, you can choose the device to boot from. The BIOS go through the devices attached until it finds a bootable device, then it boots from it. Users can alter the order of the devices to boot from via the BIOS settings. The BIOS settings on computers have a User Interface to change various settings on the computer such as the booting devices, system time and date etc.
How does the BIOS identify a bootable device? This is where we introduce a new term MBR, Master Boot Record. The smallest unit on a hard disk, which is the common bootable device, is called a sector. On the hard disk, it is 512 bytes in size. As the BIOS goes through the list of devices to boot from, it checks the first sector, on each attached device, for some information on the device and executable code. This first sector is identified as the MBR. The device is considered bootable if the MBR ends with the last two bytes as 0x55 and 0xaa. The 512 bytes of the generic MBR is divided into the following sections:
- Byte 0 to 445 – bootloader code. (446 bytes)
- Byte 446 to 461 – first partition information. (16 bytes)
- Byte 462 to 477 – second partition information. (16 bytes)
- Byte 478 to 493 – third partition information. (16 bytes)
- Byte 494 to 509 – fourth partition information. (16 bytes)
- Byte 510 – 0x55 – 1st byte of magic signature
- Byte 511 – 0xaa – 2nd byte of magic signature.
This layout has changed on modern operating systems. Furthermore, depending on the machine endianness the order of the last two bytes of the MBR is different. On a small endian machine, the order is 0xaa 0x55 while on an endian beginning machine, the order is 0x55 0xaa. MBR is found on all computers.
Sidenote: What’s new in computers
Both BIOS and MBR are old technologies, and thus, they are being replaced by better and newer technologies. The BIOS is being replaced with UEFI (Unified Extensible Firmware Interface), and the MBR partitioning system is being replaced by GPT ( GUID Partition Table). GPT still maintains the space for the MBR sector for limited backward compatibility.
Out of curiosity, I decided to check which partition system my operating system uses. I am on Linux Mint, so after some google-fu, I found this command to help me see my system details.
sudo fdisk -l
The output was:
Disk /dev/sda: 931.53 GiB, 1000204886016 bytes, 1953525168 sectors Disk model: TOSHIBA Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disklabel type: gpt Device Start End Sectors Size Type /dev/sda1 2048 1050623 1048576 512M EFI System /dev/sda2 1050624 1953523711 1952473088 931G Linux filesystem
From the Disklabel type field, you see it says gpt. To be sure that GPT still reserves space for the MBR sector, I read the first 512 bytes of my disk, i.e. /dev/sda1 and checked the last 2 bytes to see if they were the magic signature bytes. I used the dd command
sudo dd if=/dev/sda1 of=test.bin bs=512 count=1
Using any hex editor, you can view the bytes stored in test.bin file. I got this using hexdump :
hexdump test.bin -v
0000000 58eb 6d90 666b 2e73 6166 0074 0802 0020 0000010 0002 0000 f800 0000 003f 00ff 0800 0000 0000020 0000 0010 0400 0000 0000 0000 0002 0000 0000030 0001 0006 0000 0000 0000 0000 0000 0000 0000040 0180 6829 f8d8 4ef7 204f 414e 454d 2020 0000050 2020 4146 3354 2032 2020 1f0e 77be ac7c 0000060 c022 0b74 b456 bb0e 0007 10cd eb5e 32f0 0000070 cde4 cd16 eb19 54fe 6968 2073 7369 6e20 0000080 746f 6120 6220 6f6f 6174 6c62 2065 6964 0000090 6b73 202e 5020 656c 7361 2065 6e69 6573 00000a0 7472 6120 6220 6f6f 6174 6c62 2065 6c66 00000b0 706f 7970 6120 646e 0a0d 7270 7365 2073 00000c0 6e61 2079 656b 2079 6f74 7420 7972 6120 00000d0 6167 6e69 2e20 2e2e 0d20 000a 0000 0000 00000e0 0000 0000 0000 0000 0000 0000 0000 0000 00000f0 0000 0000 0000 0000 0000 0000 0000 0000 0000100 0000 0000 0000 0000 0000 0000 0000 0000 0000110 0000 0000 0000 0000 0000 0000 0000 0000 0000120 0000 0000 0000 0000 0000 0000 0000 0000 0000130 0000 0000 0000 0000 0000 0000 0000 0000 0000140 0000 0000 0000 0000 0000 0000 0000 0000 0000150 0000 0000 0000 0000 0000 0000 0000 0000 0000160 0000 0000 0000 0000 0000 0000 0000 0000 0000170 0000 0000 0000 0000 0000 0000 0000 0000 0000180 0000 0000 0000 0000 0000 0000 0000 0000 0000190 0000 0000 0000 0000 0000 0000 0000 0000 00001a0 0000 0000 0000 0000 0000 0000 0000 0000 00001b0 0000 0000 0000 0000 0000 0000 0000 0000 00001c0 0000 0000 0000 0000 0000 0000 0000 0000 00001d0 0000 0000 0000 0000 0000 0000 0000 0000 00001e0 0000 0000 0000 0000 0000 0000 0000 0000 00001f0 0000 0000 0000 0000 0000 0000 0000 aa55 0000200
As you can see, the last bytes are the magic signature bytes, and it seems my computer is small endian.
The first 446 bytes of the MBR sector contain the bootloader code. By our current normal standards of code, 446 bytes is such a small space, thus to write such precise code, we will use the help of the assembly language. However,
Why is the bootloader important?
For any program to run on a computer, it must be loaded to memory, the operating system is no exception. The question is what loads the operating system to memory? After the BIOS gets the bootable device, it loads the code found in the MBR to memory, and the CPU starts executing instructions. However, the code on the MBR is limited to 446 bytes in size, and we are sure that the operating system can not fit 446 bytes in size since features like graphics take a huge amount of code size. This is where the bootloader comes in. The code stored in the MBR is used to load the operating system stored in other bootable device sectors without size constraints. So, the bootloader is a piece of code/firmware that loads the operating system to memory after startup.
Most bootloaders come in 2 stages. The first stage is the bootloader found in the 446 bytes of the MBR, and the next one is found in another sector. The reason for having 2 stage bootloaders is the limiting size for the first bootloader. The second bootloader is not limited in size, and thus, it can have more initialization code before loading the operating system. An excellent example of a 2 stage bootloader is grub. However, you can have a single stage bootloader, it all depends on your functionality.
The computer starts operation in real mode, in 16 bit. This is because the first x86 processor, 8086, was 16 bit and had a 1 MB size of memory; thus, all subsequent x86 processors start in 16 bit to ensure backward compatibility.
However, after the computer boots from real mode, it shifts to protected mode, which we will see later.
For our operating system, in the first bootloader, we will do two things:
- Display a message: “Setting up OS”
- Load the second bootloader to memory.
[org 0x7c00] ; address in memory where BIOS loads the MBR mov ah,0x00 int 0x13 ; reset the disk system and recalibrate the drive heads jc disk_error mov [boot_drive],dl ;save driver number identified by bios mov ax,0x7e0 mov ss,ax ; set the stack segment just after MBR mov bp,0x8000 mov sp,bp ; set the stack to start at 0x8000 mov si,welcome call print_msg ; print welcome message mov bx,boot_two_addr mov dh,0x01 xor ax,ax mov dl,[boot_drive] push bx mov bx,0x02 ;minor delay call init_wait pop bx call load_second_bootloader; loads the second bootloader code to memory jmp boot_two_addr %include "routines.asm" boot_drive: db 0 boot_two_addr equ 0x9000 ; Address for the 2nd bootloader in memory welcome db "Setting up OS" times 510-($-$$) db 0 dw 0xaa55 ; make the last two bytes the MBR magic signature
The BIOS always loads the MBR to the location 0x7c00 in memory and transfers control to the code in that location; thus, the first line tells the program to take its initial memory as 0x7c00. It is important to setup the stack for the real mode; lines 5-8 does just that. To better understand why to set up the stack, we had to set the SS(Stack segment) and the SP(stack pointer); please read about segmentation in 8086.
The last two lines make sure that the last 2 bytes are the magic signature bytes used to identify the MBR.
We are using NASM assembler language.
The routines called are defined below i.e. routines.asm:
set_screen_size: mov ah,0x00 mov al, 0x03 int 0x10 ret clear_screen: mov ah,0x07 mov al,0x00 mov bh,0xe0 mov dh,30 mov dl,79 int 0x10 ret set_cursor_center: mov ah,0x02 mov bh,0x00 mov dh,0 mov dl, 0 int 0x10 ret print_msg: lodsb or al,al jz done mov ah,0x0e int 0x10 jmp print_msg done: ret init_wait: pusha loop_time: mov cx,0x0f mov dx,0x4240 mov ah,0x86 int 0x15 dec bx cmp bx,0x00 jnz loop_time popa ret load_second_bootloader: push dx ; mov ah, 0x02 mov al, dh ;number of sector to read mov ch, 0x00;cylinder no. mov dh, 0x00;head number mov cl, 0x02;sector no - sector no.1 has MBR int 0x13 jc disk_error pop dx cmp dh,al jne disk_error_header ret print_hex: mov bx,dx and bx,0x000f cmp bl,0x09 jg print_letter add bx,0x30 mov ah,0x0e mov al,bl int 0x10 shr dx,4 rol dx,8 and dx,dx jnz print_hex ret print_letter: sub bl,0x09 dec bl add bl,0x61 mov ah,0x0e mov al,bl int 0x10 shr dx,4 rol dx,8 and dx,dx jnz print_hex ret disk_error: mov si,dsk_err_msg call print_msg jmp $ disk_error_header: mov si,dsk_err_head_msg call print_msg jmp $ dsk_err_msg db "[DISK] Loading error",0 dsk_err_head_msg db "Header loading error",0 buf dw 0x0031 sec dd 0x000f4240
Of all the routines, the load_second_bootloader is an important one. It is used to load the second bootloader code into memory. This is done by specifying where on the hard disk the code is located, i.e. the cylinder number, header number, sector number and the number of sectors to load to memory. For our case, since the MBR occupies one sector and is found on the first cylinder (index 0), header (index 0) and sector(index 1), thus the second bootloader will be on the same cylinder and head, but the second sector. If one of these is wrongly specified, you will get errors since the wrong code or no code will be loaded to memory.
Please note, this routine can be used to load any other code to memory, provided the cylinder number, header number and sector are correctly specified. Furthermore, more times than not, whenever you have issues in this series, e.g. the code is not loaded correctly, check this routine for loading code to memory; this will help you save many hours of debugging 😌. In fact trying playing with it e.g. changing the sector number and see the outcome e.g. change sector to 1 i.e. point to the code for bootloader one.
The routines have a lot of interrupts used, lines beginning with int 0x, to get more information on interrupts and how to use them, visit Ralf’s Brown list.
The second bootloader should:
- Display the message: “Welcome to the OS”
- loop continuously doing nothing.
[org 0x9000] Boot2: call set_screen_size ;just set screen size call clear_screen ; clear the screen mov si, os_msg call print_msg jmp $ %include "routines.asm" os_msg db 0x0a,0x0d,"Welcome to the OS",0 times ((0x200) - ($ - $$)) db 0x00
Please note the first line sets the address to the same one we set in the first bootloader. Line 7 just makes the program loop at the same point continuously.
To get the resultant image, we compile both bootloaders’ code using nasm:
nasm -f bin boot.asm -o boot.com nasm -f bin boot2.asm -o boot2.com
We then combine the .com files into one .img file in the correct order i.e. first bootloader then the second bootloader. Run this if the commands above do not have errors.
dd if=boot.com of=boot.img bs=512 count=1 dd if=boot2.com of=boot.img bs=512 seek=1
The output is boot.img. Thus we use this file with qemu:
qemu-system-x86_64 -drive format=raw,file=boot.img
Just an addition, to check the code stored in the .com files and the final .img file and how they are arranged you can use a tool like hexcurse. This will help you understand how the bootloader codes are accessed on the hard disk.
voilà, the bootloaders at work:
Concluding part one
In this first part of the series we have gone through the booting process, built a two stage bootloader and saw some assembly code(it was not that bad, I hope) for the OS we are building.
In the second part of the series we will learn some new terminologies and apply them, this includes protected mode and Global Descriptor Tables. So stay tuned for the second part.
The code so far is found in this repo.