Tuesday, January 31, 2017

SPO600 - Lab 3 - Assembly Lab

Within this lab we were tasked with writing code that prints the numbers 0 to 99 in both x86_64 and aarch64 (arm64) assembly code. However in this write up, I will only focus on the aarch64 assembly code, as both topics overlap.

In C a task like this is simple, and can be done in 8 lines of code, including formatting.

#include <stdio.h>

int main() {
int i;
for(i = 0; i < 100; i++) {
printf("Loop: %d\n", i);
}
 return 0;
}

While in both x86_64 and AArch 64 it's much more complex. We do not have direct access to the C standard library (where printf is located), so we need to instead invoke something known as a "syscall" to display our text. A syscall is effectively a function provided by your operating system's kernel, and we need to use one to display to the console. In C the code would then look like:

#include <unistd.h>
#define STDOUT 1
#define ZERO_ASCII 48

int main() {
int i;
for(i = 0; i < 100; i++) {
write(STDOUT, "Loop: ", 6);
if(i > 9){
write(STDOUT, (i / 10) + ZERO_ASCII, 1);
}
write(STDOUT, (i % 10) + ZERO_ASCII, 1);
write(STDOUT, "\n", 1);
}
 return 0;
}

While this code is more complex, as it has to format the number it is still rather simple, and only 15 lines of code. In AArch64 assembly, a program that does the same thing would look like:

.text
.globl _start

start = 0
max = 100

_start:
/*setup initial loop counter */
mov x19, start

loop:
/* Start loop here */

        /* Print the Loop string */
        mov     x0, 1        /* file descriptor: 1 is stdout */
        adr     x1, loop_msg /* message location (memory address) */
        mov     x2, loop_msg_len /* message length (bytes) */

        mov     x8, 64       /* write is syscall #64 */
        svc     0            /* invoke syscall */

        mov x20, num_msg_len
        udiv x21, x19, x20

        cmp x19, x20
        b.lt skip

        mov     x0, 1        /* file descriptor: 1 is stdout */
        adr     x1, num_msg  /* message location (memory address) */
        add     x1, x1, x21  /* add the loop count */
        mov     x2, 1        /* message length (bytes) */

        mov     x8, 64
        svc     0
skip:
        msub x21, x20, x21, x19

        mov     x0, 1
        adr     x1, num_msg
        add     x1, x1, x21
        mov     x2, 1

        mov     x8, 64
        svc     0


        /* Print newline */

        mov     x0, 1
        adr     x1, nl_msg
        mov     x2, nl_msg_len

        mov     x8, 64
        svc     0

        /* Increment loop */
        add x19, x19, 1
        /* compare the loop counter (x19) to the max value */
        cmp x19, max
        /* branch if less then the max */
        b.lt loop

        mov     x0, 0           /* status -> 0 */
        mov     x8, 93          /* exit is syscall #93 */
        svc     0               /* invoke syscall */

.data

loop_msg:       .ascii  "Loop: "
loop_msg_len = . - loop_msg

num_msg:        .ascii "0123456789"
num_msg_len = . - num_msg

nl_msg:         .ascii "\n"
nl_msg_len = . - nl_msg


As you can see, the amount of code required to complete the same task is far bigger, but I'll break down each section of code.

_start:
mov x19, start

This section of the code is relatively easy to understand, as it simply store the number of times we are going to loop into register 19. You can think of it like the 'i' variable from the C program. The next section of code is the body of the loop, but we're only going to look at half of it for now.

loop:
        mov     x0, 1        /* file descriptor: 1 is stdout */
        adr     x1, loop_msg /* message location (memory address) */
        mov     x2, loop_msg_len /* message length (bytes) */

        mov     x8, 64       /* write is syscall #64 */
        svc     0            /* invoke syscall */

        mov x20, num_msg_len
        udiv x21, x19, x20

        cmp x19, x20
        b.lt skip

        mov     x0, 1
        adr     x1, num_msg
        add     x1, x1, x21
        mov     x2, 1

        mov     x8, 64
        svc     0

We can then break this section down more, info the first part of the code.

        mov     x0, 1        /* file descriptor: 1 is stdout */
        adr     x1, loop_msg /* message location (memory address) */
        mov     x2, loop_msg_len /* message length (bytes) */

        mov     x8, 64       /* write is syscall #64 */

        svc     0            /* invoke syscall */

This section of the code is equivalent to the first call to write() from the C code, where it prints "Loop: ". x0 contains the first parameter (1 is STDOUT), x1 contains the second parameter (the pointer to the block of memory we are printing), and x2 contains the third parameter (number of bytes we are writing). We then store the number 64 in x8, as it is the id for the write syscall, that is invoked on the next line.

At this point I would like to note that it is possible to print the whole line with one syscall by buffering it ahead of time, instead we chose to do each print separately, as it was easier to model if after our C example, and as an added benefit we were able to easily extend our code to be able to write values in formats other then base-10. All you would have to do would be add more characters to the "num_msg" buffer.


This next section of the loop body is where we print the first digit of the number, if it exists.

        mov x20, num_msg_len
        udiv x21, x19, x20

        cmp x19, x20
        b.lt skip

        mov     x0, 1
        adr     x1, num_msg
        add     x1, x1, x21
        mov     x2, 1

        mov     x8, 64
        svc     0


 We first load the number of characters in our charset into x20, and then divide x19 (our current loop count) by x20 (number of characters) and store the result int x21 (the quotient of the division). We then check to see if our current loop count is less then the charset size, and if it is we skip printing the value (as it would be 0), otherwise we print the number, in much the same way as before, only we do pointer arithmetic instead. By adding the digit to the num_msg pointer, we are able to single out an individual character to be printed from our set.


Now we're going to look at the section of the code where we print the last digit. This code will be executed every iteration, no matter what.

skip:
        msub x21, x20, x21, x19

        mov     x0, 1
        adr     x1, num_msg
        add     x1, x1, x21
        mov     x2, 1

        mov     x8, 64
        svc     0

        mov     x0, 1
        adr     x1, nl_msg

        mov     x2, nl_msg_len

This section is almost the same as the previous one, with only one difference, that being the msub call. It gets the remainder of dividing the loop counter by the charset size, and then prints it out. After that our code then prints out the newline.

The last section of code we need to look at is where we increment the loop counter, and actually check to see if we need to loop again.

        add x19, x19, 1
        cmp x19, max
        b.lt loop

This is relatively easy to understand, as we just increment the loop counter (x19) by one, then we compare it with the max value. If it's less then the max, we jump to the beginning of the loop, and run again


To conclude this blog post, it has been an interesting introduction to assembly, and I look forward to writing more.

No comments:

Post a Comment